Understanding and Monitoring Human Trafficking via Social Sensors: A Sociological Approach
Abstract
Human trafficking is a serious social problem, and it is challenging mainly because of its difficulty in collecting and organizing related information. With the increasing popularity of social media platforms, it provides us a novel channel to tackle the problem of human trafficking through detecting and analyzing a large amount of human trafficking related information. Existing supervised learning methods cannot be directly applied to this problem due to the distinct characteristics of the social media data. First, the short, noisy, and unstructured textual information makes traditional learning algorithms less effective in detecting human trafficking related tweets. Second, complex social interactions lead to a highdimensional feature space and thus present great computational challenges. In the meanwhile, social sciences theories such as homophily have been well established and achieved success in various social media mining applications. Motivated by the sociological findings, in this paper, we propose to investigate whether the Network Structure Information (NSI) could be potentially helpful for the human trafficking problem. In particular, a novel mathematical optimization framework is proposed to integrate the network structure into content modeling. Experimental results on a realworld dataset demonstrate the effectiveness of our proposed framework in detecting human trafficking related information.
Introduction
Human trafficking [\citeauthoryearMcGill2003] is the trade in humans, and it is a crime because of the violation of the victims’ rights of movement through coercion. For example, it includes the forced labor, extraction of organs or tissues and forced marriage, etc. It is also the fastest growing crime all over the world and one of the largest sources of income for organized crime. For each year, 600,000800,000 adults and children are trafficked across international borders [\citeauthoryearAndrijasevic2007]. In addition, human trafficking not only happens in many developing countries but also in many developed countries, such as the USA and European countries.
Though many InternationalGovernmental Organizations (IGO) and NonGovernmental Organizations (NGO) spent a lot of time and efforts to tackle this problem, human trafficking is still very challenging because of the following reasons. First, many organizations lack sufficient and timely data. The human trafficking related data posted by these organizations is mainly manually collected from various sources, such as calls, emails and web applications. Also, while many organizations are collecting data by themselves, it is hard to share data between each other in a timely manner. Second, the data collected from multiple resources are unstructured and heterogeneous, and it presents great challenges to existing computational methods. Thus we propose to examine this problem from a novel perspective in our study.
The increasing popularity of social media platforms provides a great opportunity to develop new approaches to help address human trafficking problem from the data perspective. The social networks have the data from both the victims of the human trafficking and their family members. For example, 1) some people are looking for their missing daughters through posting information online. 2) many victims of the human trafficking grow up, and they post the tweet on the social network to find their parents. There are also many children beggars, and they are the victims of human trafficking. The social network users upload information of the children beggars on social networks. Hence, social networks bridge the gap from both sides. It’s meaningful to collect the tweets first and then match them to help people find their family members. However, useful information could be easily buried in the extremely large amount of social media posts. This motivates our study of identifying human trafficking related information on social networks, which could enable a number of related applications in the future.
Identification of human trafficking related information can be simply modeled as a classification problem based on the content information or network interactions. However, existing supervised learning methods cannot be directly applied because of the following reasons. First, texts in social media are very short. Twitter allows users to post tweets with no more than 140 characters. Because of the short length, the performance of existing learning methods depending on similarity measurement between texts will be significantly affected. Second, textual features are unstructured in social media. Many short texts are concise with many misspellings. Users describe the same thing in different and nonstandard ways. For instance, some social network users use terms like “gotta”, “luv” and “wot”. Third, it consists of complicated users interactions. In addition to content information, the interactions among users are essential but difficult to be used for our study. Thus it is challenging for existing learning methods to accurately identify the human trafficking information in social media data, which are short, unstructured and containing complex user interactions.
In the meanwhile, intensive efforts have been made to study the nontrivial properties of social networks [\citeauthoryearNewman2010, \citeauthoryearZhou, Huang, and Schölkopf2005]. It has been well established that, on social networks, users and features are correlated with each other and no longer independent. Users and tweets always form a complex assortative network with strong community structure. Based on the network theory, “homophily”, i.e. assortativity concept [\citeauthoryearZhou, Huang, and Schölkopf2005, \citeauthoryearChung2005], defines the extent for a network’s vertices to attach to others that are similar in some way. As shown in Figure 1, the vertices in a community may have similar labels, and vertices that connect with each other probably have the same labels, which are helpful for vertices classification. Motivated by the above studies, we propose to investigate how assortativity could help identify human trafficking related information.
In this paper, we study the problem of identifying and understanding human trafficking on social media. Essentially, through our study, we want to answer the following questions. 1) How to define the problem of human trafficking text identification? 2) How to extract and select the content/network structure information? 3) How to handle the curse of dimensionality of text and structure features? 4) How to integrate network structure and content information in a model? By answering the above questions, the main contributions of this paper can be summarized as follows:

Formally define the problem of human trafficking text identification with network structure and content features.

Employ sparse learning methods to automatically select features and learn a simple model.

Propose a unified model to effectively integrate network structure and content information, and a novel optimization framework is introduced.

Evaluate the proposed NSI model on social network data and demonstrate the effectiveness of NSI.
Problem Description
The notations are introduced in this section, and then we define the problem we study. denotes the Frobenius norm of the matrix , where is the number of tweets and is the number of features, and is defined as . is the norm of the weights vector , where is the number of categories, and the is the norm of the weight matrix W. The trace of a matrix B is , and the transpose of matrix denotes as .
Graph denotes the reply and retweet relationship among users, in which vertices and represent users in social media, while the edges in represent the reply or retweet relationships among the users. To classify the tweets into two classes, i.e. the tweets are about human trafficking or not, the task is to classify the edges in graphs into two classes (positive or negative) — an edge classification problem. However, the methods that classify edges into two classes are not common. Hence, we convert node adjacency matrix of to the edge adjacency matrix of , as shown in Figure 2. The main idea of the conversion algorithm is that if two edges have the same start or end node, there is a link between them.
Here we formally define the problem we study as follows. Given a set of microblogging tweets with network information , content information , and identity label information , our aim is to learn a classifier to automatically identify human trafficking related tweets from unknown tweets.
A Sociological Approach—NSI
In this section, we firstly model the social network structure information for tweet classification. Then we discuss the methods in modeling the content information. At last, we integrate the social network structure information and the content information into a unified model and propose an optimization algorithm to solve the human trafficking tweets recognition problem.
Modeling Network Information
Network information plays an important role in solving practical problems, as it contains much useful information which cannot be mined in pure content information. In addition to pure content information, many studies have employed network information in solving realworld problems: sentiment analysis, influential users [\citeauthoryearRabade, Mishra, and Sharma2014], recommendation [\citeauthoryearTang, Hu, and Liu2013] and topic detection [\citeauthoryearChen et al.2012]. The concepts “homophily” and “community structure” in social sciences indicate that the vertices in each community and the vertices connected with each other probably have similar labels. Motivated by these theories, we employ homophily and community structure to help identify human trafficking tweets.
Existing work [\citeauthoryearPlatt, Cristianini, and ShaweTaylor1999, \citeauthoryearHansen, Dubayah, and DeFries1996] has been done to classify the vertices in networks into two classes. These methods are all on directed networks. Hence, we see the edge in our undirected network as a bidirectional edge. Similarly, the vertice ’s indegree is defined as , and the vertice ’s outdegree is defined as . is defined as the transition probability matrix of random walk in a graph with . The stationary distribution of the random walk satisfies: The network information is used to smooth the unified model. The classification problem can be formulated as minimizing
(1) 
where is the predicted label of user , and is the predicted label of user . is the function space, and is the classification function, which assigns a label sign to each vertex . If two tweets and are close to each other and have different predicted labels, the above loss function will have a penalty. For solving the Equation (1), we introduce an operator .
(2) 
It has been shown [\citeauthoryearHansen, Dubayah, and DeFries1996] that the objective function can be interpreted as , where the .
(3) 
where is a diagonal matrix with entries . denotes the eigenvector of the transition probability , and is the transpose of . If the original network is an undirected network, the is reduced to . is symmetric and positivesemidefinite. is the degree matrix and is the adjacency matrix of the graph.
Modeling Content Information
The most important task is to distinguish the human trafficking tweets from other social media posts. The text information [\citeauthoryearHu and Liu2012] is necessary for the task.
One of the most widely used methods is the Least Squares [\citeauthoryearLawson and Hanson1974], which is an efficient and interpretable model. The classifiers can be learned by solving the following equation:
(4) 
where is the content feature matrix of the training data, and is the label matrix. This formulation is to minimize the learning error between the predicted value and the true value in the training data.
However, highdimensional feature space makes the computational task extremely difficult. As we know, sparse learning method has shown its effectiveness in many realworld applications such as [\citeauthoryearPu and Yang2006] and [\citeauthoryearTibshirani1996] to handle the highdimensional feature space. Hence, we propose to make use of the sparse learning for selecting the most effective features. Sparse learning methods [\citeauthoryearLiu, Ji, and Ye2009b, \citeauthoryearChen et al.2009] are widely used in many areas, such as the sparse latent semantic analysis and image retrieval. The sparse learning methods can generate a more efficient and interpretable model. A widely used regularized version of least squares is the lasso (least absolute shrinkage and selection operator) [\citeauthoryearTibshirani1996]. Hence, we can get a text classifier through solving the norm penalization on least squares:
(5) 
In many real world applications, the features are not independent. In our task, the features are related with each other according to the same POS tagging. These relations among features may have a positive effect on the classification. Hence, to analyze the complex features, we introduce the regularization norm to the model, which is considered as one of the most popular ones due to its effectiveness, robustness, and efficiency. It is defined as , where is a group of features. Another appealing feature of the norm regularization is that it encourages multiple predictors to share similar sparsity patterns [\citeauthoryearLiu and Ye2010]. And the norm regularization can automatically select variables, do continuous shrinkage and select groups of correlated variables [\citeauthoryearFriedman, Hastie, and Tibshirani2010, \citeauthoryearEksioglu2014, \citeauthoryearChen, Gu, and Hero2010, \citeauthoryearBengio et al.2009]. Hence, we get a more stable model by solving the equation:
(6) 
where and are positive regularization parameters to control the sparsity and robustness of the learned model.
The Objective Function
Traditional text classification methods [\citeauthoryearLiu et al.2011, \citeauthoryearGlickman, Dagan, and Koppel2005] intend to add new features or propose effective classifiers to successfully solve the problem. On the one hand, the dimension of the text feature is always high. Traditional methods are not able to handle high dimension features. These methods have to select features first, and then learn a model to classify the texts. Sparse learning method, which can automatically select features and learn a model, is a good choice to solve the problem. On the other hand, one assumption of many traditional methods is that the features of texts are independent. However, some features of the texts are correlated with others. It is of great importance if we can consider networked or grouped features in the learning task. Hence, we employ the sparse group lasso to analyze the relationships among different features.
Furthermore, network structure information plays an important role in identifying the human trafficking tweets. The assortativity and community structure are used to formulate the behaviors among users. The behavior information contains much useful information that text information doesn’t have. Hence, we further integrate the two kinds of features.
We propose to consider both network and content information in a unified model. By considering both network and content information, the human trafficking tweets recognition problem can be formulated as the optimization problem:
(7) 
The can be derived from solving the above equation. Then, we can use the following equation to predict the label of an unknown tweet:
(8) 
An Optimization Algorithm
The optimization problem in Equation (7) is convex and nonsmooth. We intend to reformulate the nonsmooth optimization problem, and get an equivalent smooth convex optimization problem, according to the main idea of [\citeauthoryearLiu and Ye2010, \citeauthoryearNesterov2004].
By converting the norm as the constrained condition, we can reformulate the Equation (7) as a constrained nonsmooth convex optimization problem:
where
(9) 
The defines a closed and convex set . The objective function is a convex but not differentiable, as the function is differentiable everywhere, except when or equivalently when is the all zero vector [\citeauthoryearLiu and Ye2010, \citeauthoryearEksioglu2014]. At the point , the subdifferential includes the all zero vector as a legitimate subgradient, that is, when . Motivated by the work [\citeauthoryearNie et al.2010, \citeauthoryearChen, Gu, and Hero2010], we propose an iterative algorithm. Through denoting as a diagonal matrix , Equation (9) can be formulated as a convex and smooth optimization:
(10) 
The derivative of is as follows.
(11) 
Let the derivative of be equal to zero.
(12) 
The solution of is correlated with the input of which is also related to . Therefore, the cannot be obtained directly. Instead we propose an algorithm to optimize the parameters in Algorithm 1, according to [\citeauthoryearNesterov2004, \citeauthoryearLiu, Ji, and Ye2009a].
The basic idea of the proposed algorithm is to reformulate the nonsmooth optimization problem as an equivalent smooth convex optimization problem. In the algorithm, we use the to continually update the from lines 1 to 6. The positive semidefinite (PSD) constraint of is not necessary for each iterative step. We perform the PSD projection for the final weights matrix for efficiency at lines 78. The is set in projection function.
Time complexity analysis
It takes to calculate the laplacian matrix before the iterative procedure, where is the number of samples in the Laplacian Matrix. The procedure of updating and the inverse of the matrix need . The item which is a diagonal matrix makes the inverse of more stable. We need to obtain W. Hence, each iterative step costs . Suppose the optimization algorithm takes iterations, the overall time complexity is .
Experiments
We intend to evaluate the effectiveness of the proposed method in this paper and analyze the effectiveness of the network structure and content information. The experiments in this section focus on solving the following questions:

How effective is the proposed method compared with the baseline methods?

What are the effects of the network structure and content information?
We begin by introducing data set and experimental setup, and then compare the performance of different traditional machine learning methods. Finally, we study the effects of network and content features on the proposed method.
Data set
The realworld weibo data set is firstly crawled from September 2014 to October 2014. Weibo is a Chinese microblogging website that is akin to the Twitter. We generally sample 14151 tweets which contain the keywords “Human trafficking” and “missing people”. Then, 5 students annotate 14151 tweets with 1,404 positive samples. All the tweets are marked as positive and negative according to whether the weibo is related to human trafficking. Each tweet is retweeted or replied by 16.8 times on average. The retweet/reply frequencies follow the power law distribution, which indicates that few of the tweets draw much attention, and most of the tweets are neglected.
We investigate these tweets and propose general and domain specific featuresï¼

Word based features: unigrams, unigrams+bigrams, and Pos colored unigrams+bigrams. Pos tagging is done by the Jieba package. When the corpus is large, the dimensions of the unigrams and unigram+bigrams features are too high for a PC to handle. Hence, we pick up the Pos colored unigrams+bigrams feature.

Tag based features: Most of the human trafficking tweets have tags. Having a tag in the tweet may promote more users to reach the information.

Morphological feature: These include each feature for the frequencies of

the number in the sentence

the question mark in the sentence

the exclamation mark in the sentence

the quantifiers in the sentence


NER features Most of the human trafficking related tweets contain a name, location, organization and time.

Tweet features: the length of tweets
Performance Evaluation
Precision, recall and F1measure are used as the performance metrics. The F1 measure is the harmonic mean of precision and recall. To answer the first question, we compare our proposed method with the baseline methods: SVM [\citeauthoryearSuykens and Vandewalle1999], Logistic Regression (LR) [\citeauthoryearHosmer Jr and Lemeshow2004], Gaussian Naive Bayes (GNB) [\citeauthoryearJohn and Langley1995], SGD [\citeauthoryearXiao2009], Decision Tree (DT) [\citeauthoryearFriedl and Brodley1997] and Random Forests (RF) [\citeauthoryearLiaw and Wiener2002]. All the methods utilize both social network structure and content information. The combined feature is the linear combination of the social network structure feature and content feature.
(1) According to the results in Table 1 and Figure 3, we can draw a conclusion that our model NSI outperforms other methods in precision and F1measure. Twosample one tail ttests are applied to compare NSI with other methods in Section IV, which indicates the effectiveness of our model.
(2) As shown in Figure 3, the NSI model, decision tree and random forests have relatively balanced precision and recall. Most of the baseline methods have a high recall with small precision. The performance of SGD is not as good as that of SVM. The reason is that the poor performance of L2 regularization is linked to rotational invariance.
Methods  F1  Precision  Recall 

SVM  0.783  0.729  0.845 
LR  0.815  0.758  0.880 
GNB  0.581  0.706  0.493 
SGD  0.719  0.681  0.761 
DT  0.797  0.807  0.788 
RF  0.785  0.786  0.784 
NSI  0.871  0.872  0.870 
The performance of all the methods cannot solve the classification problem perfectly. The reason is as follows:
1) Though we provide some instructions for every annotator, some tweets are so ambiguous that they cannot distinguish the class of the tweet.
2) Some tweets search for an unfamiliar charming boy/girl that the user met by accident in the real world. It’s a “searching” people tweet. And the feature of this kind of tweet is similar to the human trafficking related tweets.
Case study
Table 2 shows the examples of human trafficking tweets. The first column is the label of the tweet. The second column is the tweet content. The name of missing people and HTTP link are replaced with hashtag #Name# and #http#, respectively. The 1st tweet in Table 2 is a typical human trafficking tweet. It contains detailed information of the missing people: name, age, height, and clothing. The 3rd tweet is also a human trafficking tweet posted by the victim of the human trafficking. We intend to identify these tweets first and then match them. Almost all machine learning methods could correctly label the 1st, 3rd, 4th and 5th tweet, while traditional machine learning methods label the 2nd tweet as a negative sample. The reason is that this tweet doesn’t contain any detailed information. However, our model could correctly classify the tweet, as this tweet is retweeted by many users who ever retweeted many human trafficking tweets. These retweet/reply behaviors are incorporated into the Laplacian matrix in our model. By introducing the matrix, the model smooths the weight of . It leads to increase the precision of the model but decrease the recall to some extent. That’s the reason why our model gets a good performance with a relative balanced precision and recall.
Label  Tweet  

1 


1  #Name# from Hubei was last seen at 10am #http#  
1 


0 


0  Missing people that don’t miss me. 
Effects of Network and Content Information
In this subsection, we study the performance of network structure and content information, and compare our proposed method with the following two groups methods: 1) content feature: the traditional methods are employed for classifying human trafficking tweets based on content information only. 2) network feature: the traditional methods are applied on the network information.
(1) Considering the content feature only, most of the methods cannot balance the precision and recall, as shown in Table 3. The precision of SGD reaches 97%, while its recall is 34.8%. SVM, logistic regression, and Gaussian have the higher recall, while other methods can achieve high precision. Our NSI model which archives relatively balanced precision and recall has the highest F1value.
(2) Considering the network feature only, almost all methods fail to classify the tweets into the right category. The precision of GNB is 79.3%, which is the highest among other methods, while its recall is just 29.6%. Different from GNB, the recall of the SVM is 93.8%, however the precision of the SVM is only 56.9%.
(3) In conclusion, considering the integrated features, the recall of all the methods rise except SVM, Gaussian naive Bayes, and SGD, while the precision of all the methods rises except SVM, logistic regression and SGD. All the methods have relatively balanced precisions and recalls. The reason is that the network structure features successfully smooth the weight of the content feature. Hence, these features play an important role in predict the true label of a new tweet.
Methods  Content feature  Network feature  

F1  Precision  Recall  F1  Precision  Recall  
SVM  0.834  0.744  0.948  0.708  0.569  0.938 
LR  0.823  0.780  0.871  0.694  0.571  0.885 
GNB  0.755  0.705  0.812  0.431  0.793  0.296 
SGD  0.512  0.970  0.348  0.512  0.538  0.489 
DT  0.764  0.793  0.738  0.427  0.648  0.318 
RF  0.769  0.769  0.770  0.331  0.348  0.315 
NSI  0.855  0.856  0.854  0.409  0.695  0.290 
Parameter Analysis
There are three positive parameters involved in the experiments, including , , and , as shown in Algorithm 1. is to control the sparsity of the learned model, is the parameter to control the group sparsity, and is to control the contribution of network information. As a common practice, all the parameters can be tuned via crossvalidation with validation data. In the experiments, we empirically set ,, and for general experiment purposes. In this section, we will further explore the relation between the two parameters.
The performance of NSI model gets better when the and increase. The F1value reaches the peak at and . As the further increase of and , the F1value begins to drop. It indicates that the NSI model can achieve good performance when and .
Conclusion and Future Work
The emergence of social networks provides a great opportunity to recognize and understand the human trafficking tweets. In this paper, we exploit the social network structure information to perform effective human trafficking tweets recognition. In particular, the proposed NSI models the network and content information in a unified way. An efficient algorithm is proposed to solve the nonsmooth convex optimization problem. Experimental results on a real weibo data set indicate that our model NSI can effectively detect human trafficking tweets, and outperform alternative supervised learning methods.
There still exists open questions in tackling the problem of human trafficking. The extensions of this work are as follows. Other sparse learning methods can be introduced to analyze the complex structure of textual features. Many human trafficking tweets contain images. Hence, face/gender/age detection algorithms on the images are of great importance. It is meaningful to utilize the semantic analysis across social media sites to better understand the characteristics of the victims of the human trafficking with text features. How to utilize the distributed system to analyze the large data and solve the optimization problem is a promising direction.
References
 [\citeauthoryearAndrijasevic2007] Andrijasevic, R. 2007. Beautiful dead bodies: gender, migration and representation in antitrafficking campaigns. feminist review 86(1):24–44.
 [\citeauthoryearBengio et al.2009] Bengio, S.; Pereira, F.; Singer, Y.; and Strelow, D. 2009. Group sparse coding. In Advances in Neural Information Processing Systems, 82–89.
 [\citeauthoryearChen et al.2009] Chen, X.; Pan, W.; Kwok, J. T.; and Carbonell, J. G. 2009. Accelerated gradient method for multitask sparse learning problem. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, 746–751. IEEE.
 [\citeauthoryearChen et al.2012] Chen, Y.; Li, Z.; Nie, L.; Hu, X.; Wang, X.; Chua, T.S.; and Zhang, X. 2012. A semisupervised bayesian network model for microblog topic classification. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012).
 [\citeauthoryearChen, Gu, and Hero2010] Chen, Y.; Gu, Y.; and Hero, A. O. 2010. Regularized leastmeansquare algorithms. arXiv preprint arXiv:1012.5066.
 [\citeauthoryearChung2005] Chung, F. 2005. Laplacians and the cheeger inequality for directed graphs. Annals of Combinatorics 9(1):1–19.
 [\citeauthoryearEksioglu2014] Eksioglu, E. M. 2014. Group sparse rls algorithms. International Journal of Adaptive Control and Signal Processing 28(12):1398–1412.
 [\citeauthoryearFriedl and Brodley1997] Friedl, M. A., and Brodley, C. E. 1997. Decision tree classification of land cover from remotely sensed data. Remote sensing of environment 61(3):399–409.
 [\citeauthoryearFriedman, Hastie, and Tibshirani2010] Friedman, J.; Hastie, T.; and Tibshirani, R. 2010. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736.
 [\citeauthoryearGlickman, Dagan, and Koppel2005] Glickman, O.; Dagan, I.; and Koppel, M. 2005. A probabilistic classification approach for lexical textual entailment. In Twentieth National Conference on Artificial Intelligence (AAAI.
 [\citeauthoryearHansen, Dubayah, and DeFries1996] Hansen, M.; Dubayah, R.; and DeFries, R. 1996. Classification trees: an alternative to traditional land cover classifiers. International journal of remote sensing 17(5):1075–1081.
 [\citeauthoryearHosmer Jr and Lemeshow2004] Hosmer Jr, D. W., and Lemeshow, S. 2004. Applied logistic regression. John Wiley & Sons.
 [\citeauthoryearHu and Liu2012] Hu, X., and Liu, H. 2012. Text analytics in social media. In Mining text data. Springer. 385–414.
 [\citeauthoryearHu et al.2013] Hu, X.; Tang, J.; Zhang, Y.; and Liu, H. 2013. Social spammer detection in microblogging. In Proceedings of the TwentyThird international joint conference on Artificial Intelligence, 2633–2639. AAAI Press.
 [\citeauthoryearJohn and Langley1995] John, G. H., and Langley, P. 1995. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, 338–345. Morgan Kaufmann Publishers Inc.
 [\citeauthoryearLawson and Hanson1974] Lawson, C. L., and Hanson, R. J. 1974. Solving least squares problems, volume 161. SIAM.
 [\citeauthoryearLiaw and Wiener2002] Liaw, A., and Wiener, M. 2002. Classification and regression by randomforest. R news 2(3):18–22.
 [\citeauthoryearLiu and Ye2010] Liu, J., and Ye, J. 2010. Fast overlapping group lasso. arXiv preprint arXiv:1009.0306.
 [\citeauthoryearLiu et al.2011] Liu, T.; Du, X.; Xu, Y.; Li, M.; and Wang, X. 2011. Partially supervised text classification with multilevel examples. TwentyFifth AAAI Conference on Artificial Intelligence.
 [\citeauthoryearLiu, Ji, and Ye2009a] Liu, J.; Ji, S.; and Ye, J. 2009a. Multitask feature learning via efficient l 2, 1norm minimization. In Proceedings of the twentyfifth conference on uncertainty in artificial intelligence, 339–348. AUAI Press.
 [\citeauthoryearLiu, Ji, and Ye2009b] Liu, J.; Ji, S.; and Ye, J. 2009b. Slep: Sparse learning with efficient projections. Arizona State University 6:491.
 [\citeauthoryearMcGill2003] McGill, C. 2003. Human traffic sex, slaves and immigration. Vision Paperbacks.
 [\citeauthoryearNesterov2004] Nesterov, Y. 2004. Introductory lectures on convex optimization, volume 87. Springer Science & Business Media.
 [\citeauthoryearNewman2010] Newman, M. 2010. Networks: an introduction. Oxford University Press.
 [\citeauthoryearNie et al.2010] Nie, F.; Huang, H.; Cai, X.; and Ding, C. H. 2010. Efficient and robust feature selection via joint l2, 1norms minimization. In Advances in Neural Information Processing Systems, 1813–1821.
 [\citeauthoryearPlatt, Cristianini, and ShaweTaylor1999] Platt, J. C.; Cristianini, N.; and ShaweTaylor, J. 1999. Large margin dags for multiclass classification. In nips, volume 12, 547–553.
 [\citeauthoryearPu and Yang2006] Pu, Q., and Yang, G.W. 2006. Shorttext classification based on ica and lsa. In Advances in Neural NetworksISNN 2006. Springer. 265–270.
 [\citeauthoryearRabade, Mishra, and Sharma2014] Rabade, R.; Mishra, N.; and Sharma, S. 2014. Survey of influential user identification techniques in online social networks. In Recent Advances in Intelligent Informatics. Springer. 359–370.
 [\citeauthoryearSuykens and Vandewalle1999] Suykens, J. A., and Vandewalle, J. 1999. Least squares support vector machine classifiers. Neural processing letters 9(3):293–300.
 [\citeauthoryearTang, Hu, and Liu2013] Tang, J.; Hu, X.; and Liu, H. 2013. Social recommendation: a review. Social Network Analysis and Mining 3(4):1113–1133.
 [\citeauthoryearTibshirani1996] Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 267–288.
 [\citeauthoryearXiao2009] Xiao, L. 2009. Dual averaging method for regularized stochastic learning and online optimization. In Advances in Neural Information Processing Systems, 2116–2124.
 [\citeauthoryearZhou, Huang, and Schölkopf2005] Zhou, D.; Huang, J.; and Schölkopf, B. 2005. Learning from labeled and unlabeled data on a directed graph. In Proceedings of the 22nd international conference on Machine learning, 1036–1043. ACM.