SECaps: A Sequence Enhanced Capsule Model for Charge Prediction
Automatic charge prediction aims to predict appropriate final charges according to the fact descriptions for a given criminal case. Automatic charge prediction plays an important role in assisting judges and lawyers to improve the efficiency of legal decisions, and thus has received much attention. Nevertheless, most existing works on automatic charge prediction perform adequately on those high-frequency charges but are not yet capable of predicting few-shot charges with limited cases. On the other hand, some works have shown the benefits of capsule network, which is a powerful technique. This motivates us to propose a Sequence Enhanced Capsule model, dubbed as SECaps model, to relieve this problem. More specifically, we propose a new basic structure, seq-caps layer, to enhance capsule by taking sequence information in to account. In addition, we construct our SECaps model by making use of seq-caps layer. Comparing the state-of-the-art methods, our SECaps model achieves 4.5% and 6.4% F1 promotion in two real-world datasets, Criminal-S and Criminal-L, respectively. The experimental results consistently demonstrate the superiorities and competitiveness of our proposed model.
, , and
harge prediction\sepcapsule network\sepattention mechanism\sepfew-shot\sepfocal loss
The task of automatic charge prediction is to help lawyers or judges to determine appropriate charges (e.g., fraud, robbery or larceny) according to a given case. The automatic charge prediction plays an important role in many legal intelligent scenarios (e.g., legal assistant systems or legal consulting). The legal assistant system can improve the efficiency of professionals. The legal consulting is benefit for people who are unfamiliar with legal terminology of their interested cases. Therefore, automatic charge prediction is an extremely beneficial topic for many legal intelligent scenarios.
Most existing works of automatic charge prediction can be divided into three categories. The first categories are usually mathematical or quantitative [16, 29], which restricted to a small dataset with few labels. The second categories use a lot of manpower to design legal text features, and then use machine learning algorithms or natural language processing methods to deal with them. Liu et al. [6, 22] utilize word-level and phrase-level features and k-Nearest Neighbor (KNN) method to predict charges. Liu et al.  first use Support Vector Machine (SVM) for preliminary article classification, and then re-rank the results by using word level features and co-occurence tendency among articles. Katz et al.  extract efficient features from case profiles (e.g., dates, locations, terms, and types). However, the shallow textual features of these human designs require a lot of manpower and have limited ability to capture the semantic information of legal texts. Recently, owing to the success of deep neural networks on nature language processing tasks , some popular neural network methods apply on automatic charge prediction task [24, 9], obtaining attractive performance. For example, Luo et al.  propose an attention-based neural network for charge prediction by incorporating the relevant law articles. This work is not yet capable of predicting few-shot charges with limited cases. Hu et al.  propose attribute-attentive charge prediction model to alleviate few-shot charges problem. At the same time, Zhao et al.  applies the capsule network  to the text classification scene to achieve attractive performance.
Inspired by the above observations, in this paper we proposed a Sequence Enhanced Capsule model, dubbed as SECaps model. The SECaps model belongs to the deep neural network method, which can better capture the legal text semantic information. What’s more, it can deal with the problem of the few-shot charges.
To summarize, the main contributions of this paper are:
We propose a Sequence Enhanced Capsule model that not only captures the prominent features and the semantic information of legal texts in a better way, but also has a competitive performance on few-shot charges problem.
Our Sequence Enhanced Capsule model introduces focal loss, which first appear on object detection problems and is able to alleviate category imbalances to some extent.
Comparing the state-of-the-art methods, our SECaps model achieves 4.5% and 6.4% Macro F1 promotion in two real-world datasets, Criminal-S and Criminal-L, respectively. The experimental results consistently demonstrate the superiorities and competitiveness of our proposed model.
The rest of the paper is organized as follows. Section 2 surveys the related works. Section 3 introduces the detail descriptions of SECaps model. The performance evaluation is given in section 4. Section 5 concludes the paper.
2 Related Works
Automatic charge prediction plays an import role in the legal area and thus has receive much attention. Researchers have proposed many methods for implementing automatic charge prediction. In this paper, these methods are classified into three categories: (1) traditional methods, (2) machine learning methods, and (3) deep neural network methods.
Traditional methods are usually mathematical or quantitative. Kort  represents an attempt to apply quantitative methods to the prediction of human events. Nagel  applys correlation analysis to case prediction. Keown  introduces mathematical (e.g., linear models and the scheme of nearest neighbors) models, which is used for legal prediction. These traditional methods have achieved some effects in certain scenarios, but they are restricted to a small datasets with few labels.
Researchers begin to use machine learning methods to handle charge prediction because of its success in many areas. This type of work usually focuses on extracting features from case facts and then using machine learning algorithms to make predictions. Liu et al. [21, 22] use k-Nearest Neighbor (KNN) method to classify criminal charges. Lin et al.  fetches 21 legal factor labels for case classification. Mackaay et al.  extracts N-grams features which creates by clustering semantically similar N-grams. Sulea et al.  propose a SVM-based system, which uses the case description, time span and ruling as features. However, these methods only extract shallow text features or manual tags, which are difficult to collect on larger datasets. Therefore, when the amount of the data is large, they will not perform well.
Recently, own to success of deep neural network in the natural language processing (NLP), computer vision (CV) and speech fields, some works begin to apply the deep neural network to the charge prediction tasks and show a huge performance boost. Luo et al.  propose an hierarchical attentional network method, which predicts charges and extracts relevant articles jointly. However, this work cannot handle few-shot problem. Hu et al.  propose an attention-based neural model by incorporating several discriminative legal attributes. The method proposed in this paper is classified into a deep neural network method. The most relevant to our work is the work of Hu et al. . Compared to this work, our work shares several common features with they: (1) we are both based on deep neural network methods (2) we are both trying to solve the few-shot problem of charge prediction. Nevertheless, our work is different from this work in several features at least: (1) the structure of the proposed model is different from them (2) the method to handle the few-shot problem in our model is different from them (3) we achieve the state-of-the-art performance for charge prediction as far as we known. We propose a Sequence Enhance Capsule model, dubbed as SECaps model, that can the prominent features and the semantic information of legal texts in a better way. Meanwhile, the SECaps model itself solves the few-shot through two strategies: (1) the seq-caps layer based on capsule can achieve very attractive results in few-shot charges datasets (2) the SECaps model introduces local loss, which can alleviate category imbalances to some extent.
Another line of works that discussed zero-shot classification are also related to our work. Most works of zero-shot classification gain success in computer vision (CV). Jayaraman et al.  introduce a random forest approach that explicitly explain the unreliability of attribute prediction. Akata et al.  propose label embedding framework can transition smoothly from zero-shot learning to learning with large quantities of data. Lampert et al.  introduce attribute-based classification that the attribute classifiers can be pre-learned independently. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. Elhoseiny et al.  make use of text description of the class label for zero-shot learning of object categories. Wu et al.  present a general framework for the zero-shot learning problem of performing high-level event detection. Zellers et al.  model the visual and linguistic attributes of action verbs for large-scale zero-shot activity recognition. They also extend it to the few-shot scenarios. Recently, Hu et al.  introduce several discriminative attributes of charges that provide additional information for few-shot charges. Compared with this line of works, although our work also handle few-shot problem, our work is different from their works, since (1) the strategy we use to deal with the few-shot problem is different from the above (2) as far as we know, we are the first to introduce capsule for charge prediction and achieve the state-of-the-art performance for charge prediction.
Our work is also related to the task of text classification. Recently, various neural network (NN) architectures such as Convolutional Neural Networks (CNN) [14, 39, 4] and Recurrent Neural Networks (RNN) have been used for text classification. Zhang et al.  offer an empirical exploration on the use of character-level convolutional networks for text classification. Zhao et al.  explore capsule network with dynamic routing for text classification. From the perspective of using the capsule network, our work is related to Zhao et al. . But mainly differs in that we propose a new SECaps model based seq-caps layer, which can handle few-shot problem and achieve the state-of-the-art performance for charge prediction as far as we known.
In this section, we propose the Sequence Enhanced Capsule model, dubbed as SECaps model, for charge prediction task. We mainly focus on the few-shot charge problem in the charge prediction modeling. Our SECaps model combines wildly used deep learning method for natural language processing with a newly proposed capsule network. In what follows, we first review the capsule network, for ease of understanding our proposed SECaps model (Section 3.1). Then we introduce the basic structure of SECaps model: seq-caps layer (Section 3.2). Finally, we provide the details architecture of the proposed SECaps model (Section 3.3).
3.1 Capsule Network
Capsule network which initially aims to solve object recognition task show its competitive performance in MNIST task. A capsule is a group of neurons whose activity vector represents the feature parameters of a specific type of entity such as an object or an object part . We can view a capsule as a vector that describe a specific features. The length of the capsule represent the probability that the feature exists and the orientation of that represent the feature parameters.
A capsule is a basic element of a capsule network like a neuron is a basic element of a neural network. Neural network transform lower-layer neurons to higher-layer neurons by making use of affine transformation and a non-linear activate function e.g. , , . Whereas, in capsule network, a dynamic routing mechanism is used to send lower-layer capsules predictions to higher-layer capsules that agrees with the lower-layer capsules.
Define lower-layer capsules as and higher-layer capsule as , where is the -th capsule in lower-layer and is the -th capsule in higher-layer, and is the dimension of the capsule. The dynamic routing mechanism have the follows two steps:
Linear transformation. In this step, an intermediate feature vector of is produced by multiplying the output by a weight matrix
There is a weight matrix in each connection between lower-layer and higher-layer . There are weight matrices between two capsule network layer. One could be worry about overfitting due to the large amount of the parameters. In order to reduce parameters, we introduce share weight mechanism, which is similar with Zhao et al. . In share weight mechanism, the connection between all the lower-layer capsules and the -th capsule in higher-layer share a common weight matrix , so the intermediate feature vector is computed as follows
Clustering for lower-layer capsules. In this step, the dynamic routing mechanism minimize an agglomerative fuzzy k-means clustering-like loss function as follows:
where is an -by- partition matrix, represents the association degree of membership of the intermediate feature of -th lower-layer capsule to the -th cluster , is cluster centers. Then, similar to Hinton et al. , we use a non-linear “squashing” function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1 to get the higher-layer capsule. Deriving the coordinate descent updates of and , we obtain the updates in Algorithm 1 .
And we can see the “squashing” function in this dynamic routing mechanism is
3.2 Seq-caps Layer: the basic structure of SECaps model
Capsule network treat a feature as a activity vector, it can be used in many nature language processing (NLP) tasks. Generally, the input of many NLP tasks is a sequence of word which represent a sentence or a text. We often transform each word in the sequence to the distribution representation of the word, due to the success of word embeddings [3, 28]. The word distribution representation can be seen as a activity vector, so that a sequence of word can be seen as a group of capsules. We can use capsule network in these NLP tasks as long as we set the first layer of capsule network to word distribution representation of word sequence.
However, the higher-layer capsules capture the key information of lower-layer capsules by making use of fuzzy clustering. This lead to the higher-layer capsules loss the sequence information of the input word sequence. In many language, a word is often highly correlated with its context. Losing sequence information weaken the performance of capsule network in NLP tasks. Therefore, we propose a new basic structure, which is named sep-caps layer, to enhance the capsule layer by taking the sequence information into account.
Suppose the input capsules of the seq-caps layer , our seq-caps layer have the follows two component:
Sequence Information Encoder. It use a Long Short-Term Memory  (LSTM) encoder as a sublayer to restore sequence information of the input capsules. In this step, we get hidden layer .
Dynamic Routing Transformer. It transform the hidden layer to higher-layer capsules by using dynamic routing mechanism (Algorithm 1). In this step, we get higher-layer capsules .
Figure 1 show the framework of our seq-caps layer.
3.3 SECaps model for charge prediction
In charge prediction task, the fact description of a case can also be seen as a sequence of word , where is length of the fact description, is a word. Given the fact description , the charge prediction task aims to predict a charge from a charge set .
In real world, some charges (e.g., theft, intentional injury) have large amount of cases, while others like scalping relics, disrupting the order of the court just have few cases. This is so called few-shot problem. Traditional models pay much attention to charges which have large amount of cases and thus ignore these few-shot charges. In order to mitigate the effect of few-shot problem, our SECaps model combine seq-caps layer with the focal loss , in which, seq-caps layer captures the prominent features and the semantic information of legal texts in a better way and focal loss is able to alleviate category imbalances to some extent.
Our proposed SECaps model includes four parts: Input, Multiple seq-caps layer, Attention, Output. Figure 2 show the architecture of our SECaps model.
Input. In this part, we treat the fact description of a case as a sequence of word , then, each word in this sequence is to be transformed to primary capsules.
Multiple seq-caps layer. This part has two seq-caps layer. We treat the word embeddings as primary capsules, they are to transferred to higher-layer capsules. The seq-caps layer output some features which are captured from fact description of a case. Meanwhile, seq-caps layer restores the sequence information of fact description, which is key factor for charge prediction.
Attention. Attention mechanism is an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies between input sequence and the context vector . Multiple seq-caps layer part can capture some prominent features of the fact description of a case, but cannot gain the global context information. To overcome this shortcoming, we introduce Attention part. The Attention part aims to encode the global context information of the fact description. Suppose the primary capsules from the input part is , the global context information vector is computed as follows:
where is a weight matrix and is bias.
Output. In order to consider prominent features and the global context information together, we first flatten all the feature vectors from the Multiple seq-caps layer, concatenate they we the global context vector . Then, we use a fully connected network and softmax function to generate the probability , where is the number of charge. This part considers both the prominent features of fact description and the global context information, make our model more robust. As for loss function, we here apply focal loss to SECaps model. Focal loss is proposed for dense object detection initially, which address the few-shot problem by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples . It can be calculate as follows:
where is the -th output of , is weighting factor and is the focusing parameter.
In order to evaluate the effectiveness of our model for charge prediction, we experiment on several real-world datasets and compare our model with several state-of-the-art baselines.
4.1 Dataset and Settings
In this subsection, we introduce the datasets, evaluation metrics, all baselines and experimental details. At first, we introduce the datasets, and then describe the scoring formula in detail. Next, we describe the baseline methods. Finally, we present the relevant experimental parameters in detail.
Same to the recent work on charges prediction , the proposed model is evaluated on dataset of Hu et al, which was published by the Chinese government from China Judgments Online
Following previous works on charges prediction [24, 9], we employ Accuracy (Acc.), Macro Precision (MP), Macro Recall (MR) and Macro F1 as our main evaluation metrics. Macro Precision, Macro Recall and Macro F1 are the most widely used performance measurements for text classification. The macro strategy calculates macro precision and recall scores by averaging the precision/recall of each category, which is preferred because the categories are usually unbalanced and give more challenges to classifiers. The formula of Macro F1 is calculated by Macro Precision and Macro Recall scores.
where is the Macro Precision and is the Macro Recall.
We select several representative text classification models and two of the best performing charge predicting models recently as baselines:
TFIDF+ SVM is a simple machine learning model based on Support Vector Machine (SVM)  with linear kernel, extracting text features from term-frequency inverse document frequency (TFIDF)  as input. Then two based deep learning model are also to compare with our proposed SECaps model, the first is Convolutional Neural Network(CNN)  which is to encode fact descriptions with multiple filter widths, and the second employ a two-layer LSTM  with a max-pooling layer as the fact encoder.
Moreover, to future illustrate the effectiveness of our model, we compared our model with two latest similar tasks, Fact-Law Attention Model  and Attribute-attentive Charge Prediction Model . Fact-Law Attention Model is an attention-based neural network method for charge prediction task, and Hu et al. proposed Attribute-attentive Charge Prediction Model which can infer the attributes and charges simultaneously.
Since all the case documents have been employed THULAC
For our proposed SECaps models, we use the Adam  optimization method to minimize the focal loss  over the training data. For hyperparameters of Adam and focal loss, we keep it consistent with the original papers since better performance in their papers, by setting =0.25 and = 2 and the learning rate to 0.001 respectively. For our proposed SECaps models, we have two seq-caps layers and set different hyperparameters which is shown in Table 2. Then by utilizing two fully connected layer and setting to 1024 and 512 respectively can enhance maps the “distributed feature representation” learned to the sample markup space. Additionally, we make use of the batch normalization  to reduce overfitting in the fully connected layer during training. We train the model for a fixed number of epochs and monitor its performance on the validation set. Once the training is finished, we choose the model with the best Accuracy score on the validation dataset as our final model and evaluate its performance on the test dataset.
4.2 Results and Analysis
In order to illustrate the effectiveness of our proposed SECaps model, we compare it against basic classical text classification methods and two existing state-of-the-art charge prediction methods in three datasets. In addition, to prove the validity of our model in dealing with few-shot charge predictions, we run a set of experiments with different frequencies for charge prediction. In particular, consider the influence of hyperparameters in our proposed SECaps model, we run a set of experiments to evaluate our model by setting different parameters.
Table 3 shows the results of our model with baselines in the test dataset. Overall, we find that the SECaps model outperforms all previous baselines with a significant margin on three datasets. More specifically, compared to the previous state-of-the-art in charge prediction , our model obtains , , and absolutely considerable improvements across three datasets respectively under the Macro F1, which demonstrates that our model can solve charges predictions effectively. Based on this observation, we can infer the conclusion that our proposed SECaps model generally beats the baselines and obtains the state-of-the-art performance.
Ideally, our model learns useful information by utilizing seq-caps layer, which brings global sequence information instead of partial information. Moreover, by applying the Multiple seq-caps layer and Attention mechanism, our model learns to directly capture the effective information and guide benefit in making decisions. Consequently, they profit to predict accurate predispositions, which lead to better performance.
Few-shot Charges Comparison
Following Hu et al. , to futher illustrating the effectiveness of the proposed SECaps model on handle few-shot charges, we run a set of experiments to split charges with different frequencies. We divide the charges into three parts according to their frequencies (low-frequency, medium-frequency and high frequency). Low-frequency is defined as the charges appears less than 10 times (include 10 times) in all datasets, high-frequency is defined as the charges appears more than 100 times (except 100 times) in all datasets and otherwise belongs to medium-frequency. Table 4 shows the performance of our model with different frequencies on the test dataset, we report the low-frequency, the medium-frequency and the high frequency results of Macro F1. From the table we see that the low-frequency Macro F1 is which achieves more than improvements than LSTM-200 model and obtains a considerable improvement by over the best baseline . With the help of the proposed SECaps model, not only can it alleviate the accuracy of few-shot , but also propose an end-to-end model, which reduces manual data mining. Specifically, the SECaps model has good power on vector representation and time series representation ability, focal loss has a good performance in handle the problem of unbalanced classification and classification difficulty, which can relieve the shortage of few-shot charges prediction.
The Impact of Hyperparameter
In particular, considering the influence of hyperparameter in our proposed SECaps model, the two parameters that have the most influence on the structure of our SECaps model are the number of the capsules and the dimension of capsules, we run a set of experiments to evaluate our model by setting different values. For the part of the number of capsules, by setting number of capsules from 7 to 12 and retain the rest of the model unchanged, we can see that adding more capsules to SECaps that can capture more vector representation, and so might introduce noise, and consequently decrease accuracy as shown in Figure 4. When initialing the number of capsules is to 7, we can see that by adding capsules, the performance of SECaps slightly improves until 10 capsules are added, but then by adding more capsules the performance does not change and slightly down. For the part of dimension of capsule experiment, we set capsule’s dimension to 10, 12 ,14, 16, 18, 20, choose 10 capsules and the rest of parameter are unchanged. Figure 4 shows evaluation with Macro Precision, Macro Recall and Macro F1 when the model is trained using different dimensional capsule. As the dimensions of the capsules increase, the model can achieve better results until 16 dimensions are increased.
More specifically, for our proposed SECaps model, increasing the number of capsules can constantly improve the vector representation of text and provide richer information, as the same time, it will bring a lot of redundant information, our model may increase the risk of overfitting, thus leading to the decline of the model effect. What’s more, higher dimension can reconstruct more information and abstract out deeper semantic information, but when the dimension is too large, this may introduce noise and consequently affect the effect of the model.
This paper explored the problem of few-shot charge prediction according to the fact descriptions of criminal cases. To alleviate the problem, we introduce a Sequence Enhanced Capsule model for charge prediction. In particular, the seq-caps layer can capture characteristics of the sequence and abstract deeper semantic features simultaneously, and combine with focal loss, which can relieve the problem of severe unbalance of categories. This strategy contributes to the predicting for few-shot charges, bringing considerable chemical reaction for the task. Experiments on the real-world dataset show that our proposed SECaps model achieves Macro F1 of , , on test dataset respectively, surpassing existing state-of-theart methods by a considerable margin.
In the future, we plan to explore neural network models for some cases that contain multiple defendants and multiple charges in real-world. Additionally, we only utilize several simple attributes of charges, while there exist more complex essential conditions of charges. We plan to research on the possibility of applying transfer learning to predict more complex essential conditions of charges by our proposed SECaps model.
|Layer||Number of Capsule||Dimension of capsule||Routing||Length of GRU|
Fig.1 The framework of seq-caps layer, the input capsules is lower-layer capsules and the output capsules of the seq-caps layer is .
Fig.2 The architecture of SECaps model, including Input, Multiple seq-caps layer, Attention, and Output.
Fig.3 The number of capsule.
Fig.4 The dimension of capsule.
- IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 2017.
- Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for attribute-based classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 819–826, 2013.
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137–1155, 2003.
- Alexis Conneau, Holger Schwenk, LoÃ¯c Barrault, and Yann Lecun. Very deep convolutional networks for text classification. pages 1107–1116, 2016.
- Mohamed Elhoseiny, Babak Saleh, and Ahmed Elgammal. Write a classifier: Zero-shot learning using purely textual descriptions. In IEEE International Conference on Computer Vision, pages 2584–2591, 2014.
- Evgeniy Gabrilovich and Shaul Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJcAI, volume 7, pages 1606–1611, 2007.
- Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, 2017.
- Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the 27th International Conference on Computational Linguistics, pages 487–498, 2018.
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- Dinesh Jayaraman and Kristen Grauman. Zero-shot recognition with unreliable attributes. In International Conference on Neural Information Processing Systems, pages 3464–3472, 2014.
- D. M. Katz, Bommarito Mj Nd, and J Blackman. A general approach for predicting the behavior of the supreme court of the united states. Plos One, 12(4), 2014.
- R Keown. Mathematical models for legal prediction. Computer/l.j, 1980.
- Yoon Kim. Convolutional neural networks for sentence classification. Eprint Arxiv, 2014.
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Fred Kort. Predicting supreme court decisions mathematically: A quantitative analysis of the âright to counselâ cases. American Political Science Review, 51(1):1–12, 1957.
- C. H. Lampert, H Nickisch, and S Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis & Machine Intelligence, 36(3):453–465, 2014.
- Yuquan Le, Zhi-Jie Wang, Zhe Quan, Jiawei He, and Bin Yao. Acv-tree: A new method for sentence similarity modeling. In IJCAI, pages 4137–4143, 2018.
- Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis & Machine Intelligence, PP(99):2999–3007, 2017.
- Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence, 2018.
- Chao Lin Liu, Cheng Tsung Chang, and Jim How Ho. Case instance generation and refinement for case-based criminal summary judgments in chinese *. Journal of Informationence & Engineering, 20(4):783–800, 2008.
- Chao Lin Liu and Chwen Dar Hsieh. Exploring phrase-based classification of judicial documents for criminal charges in chinese. In International Conference on Foundations of Intelligent Systems, pages 681–690, 2006.
- Yi Hung Liu, Yen Liang Chen, and Wu Liang Ho. Predicting associated statutes for legal problems. Information Processing & Management, 51(1):194–211, 2015.
- Bingfeng Luo, Yansong Feng, Jianbo Xu, Xiang Zhang, and Dongyan Zhao. Learning to predict charges for criminal cases with legal basis. pages 2727–2736, 2017.
- Ejan Mackaay and Pierre Robillard. Predicting judicial decisions: The nearest neighbour rule and visual representation of case patterns. Computer/l.j, pages 302–331, 1974.
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 3111–3119, 2013.
- Stuart S Nagel. Applying correlation analysis to case prediction. Tex.l.rev, 42(7):1006–1017, 1964.
- Mihaela Vela Octavia Maria Sulea, Marcos Zampieri and Josef Van Genabith. Exploring the use of text classication in the legal domain. In Proceedings of ASAIL workshop, 2017.
- Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. 2017.
- Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. Dynamic routing between capsules. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 3859–3869, 2017.
- Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5):513–523, 1988.
- Johan AK Suykens and Joos Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 9(3):293–300, 1999.
- Tung-Jia Chang Chueh-An Yen Chao-Ju Chen Wan-Chen Lin, Tsung-Ting Kuo and Shou de Lin. Exploiting machine learning models for chinese legal documents labeling, case classification, and sentencing prediction. In Processdings of ROCLING, 2014.
- Dilin Wang and Qiang Liu. An optimization view on dynamic routing between capsules. In 6th International Conference on Learning Representations 2018, 2018.
- Shuang Wu, Sravanthi Bondugula, Florian Luisier, Xiaodan Zhuang, and Pradeep Natarajan. Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In Computer Vision and Pattern Recognition, pages 2665–2672, 2014.
- Rowan Zellers and Yejin Choi. Zero-shot activity recognition with verb attribute induction. In Conference on Empirical Methods in Natural Language Processing, pages 946–958, 2017.
- Xiang Zhang, Junbo Zhao, and Yann Lecun. Character-level convolutional networks for text classification. pages 649–657, 2015.
- Wei Zhao, Jianbo Ye, Min Yang, Zeyang Lei, Suofei Zhang, and Zhou Zhao. Investigating capsule networks with dynamic routing for text classification. 2018.