Knowledge-Enhanced Attentive Learning for Answer Selection in Community Question Answering Systems
In the community question answering (CQA) system, the answer selection task aims to identify the best answer for a specific question, and thus is playing a key role in enhancing the service quality through recommending appropriate answers for new questions. Recent advances in CQA answer selection focus on enhancing the performance by incorporating the community information, particularly the expertise (previous answers) and authority (position in the social network) of an answerer. However, existing approaches for incorporating such information are limited in (a) only considering either the expertise or the authority, but not both; (b) ignoring the domain knowledge to differentiate topics of previous answers; and (c) simply using the authority information to adjust the similarity score, instead of fully utilizing it in the process of measuring the similarity between segments of the question and the answer. We propose the Knowledge-enhanced Attentive Answer Selection (KAAS) model, which enhances the performance through (a) considering both the expertise and the authority of the answerer; (b) utilizing the human-labeled tags, the taxonomy of the tags, and the votes as the domain knowledge to infer the expertise of the answer; (c) using matrix decomposition of the social network (formed by following-relationship) to infer the authority of the answerer and incorporating such information in the process of evaluating the similarity between segments. Besides, for vertical community, we incorporate an external knowledge graph to capture more professional information for vertical CQA systems. Then we adopt the attention mechanism to integrate the analysis of the text of questions and answers and the aforementioned community information. Experiments with both vertical and general CQA sites demonstrate the superior performance of the proposed KAAS model.
Community question answering (CQA) systems can utilize the expertise of the community to provide timely and personalized service to Web users, and thus has merged as a key information acquisition platform for both general (e.g. Quora
The answer selection  in CQA involves knowledge management and machine learning techniques with the primary focus on natural language processing (NLP)  and knowledge graphs (e.g. question classification  and measuring semantic similarity [7, 38]), because such online communities usually do not reveal the identity and detailed demographic information of users [8, 9, 10, 11, 12]. Typically, a long short-term memory (LSTM) framework is employed to learn the text representations and extract features [13, 14, 15, 16]. Attention mechanism has emerged as a common framework for such task due to its capability to capture the interrelations between different segments of the question and the answer [17, 18, 19]. To further enhance the performance, recent advances in CQA answer selection [2, 20, 21, 22, 23, 24] go beyond pure NLP by incorporating the community information, particularly the expertise (previous answers) and authority (position in the social network) of an answerer. Existing methods mainly adopt a two-phase approach, in which certain statistics are calculated and then imported into the downstream answer selection task. For instance, the count of an answerer’s followers indicates his or her authority; the count of votes/agrees/thanks received by an answer indicates the quality of his or her previous answers [2, 23]. More recent studies utilize the text and tags of an answerer’s previous answers to infer the domain expertise of the answerer [20, 21, 22, 24] . Despite being effective, existing methods for incorporating community information are limited:
(a) They only consider either the expertise or the authority, but not both. In practice, both types of information may contribute to predicting the quality and relevance of an answer.
(b) Domain knowledge is not fully utilized to differentiate topics of previous answers. To be specific, previous studies [21, 22] depict the expertise of an answerer through analyzing the textual information of all the answerer’s previous answers. This approach is appropriate for vertical CQA communities since the answerers (e.g. a clinical doctor) are not likely to answer questions that are irrelevant to their expertise. However, this approach is limited for general CQA communities, in which users often answer various questions. If the full text of an answerer’s previous answers is used, we may include irrelevant expertise information. For example, a user’s previous answer about Chinese history does not provide any information for a question about a machine learning algorithm. In addition, using the full text of all previous answers requires huge computational resources.
(c) The expertise and authority information (e.g. the count of followers and the expertise of the followers) is only used to adjust the final similarity score (e.g. [22, 24]), instead of being fully utilized in the process of measuring the similarity between segments of the question and the answer.
More specifically, existing approaches assume that an answerer’s authority can be represented by the linear combination of followers’ specialties. The similarity between the question and an answerer is be derived by a model first and then adjusted by the answerer’s authority. This is problematic because different specialties are not entirely independent to each other; there could be interrelations among two or more specialties. In addition, such expertise and authority information could have been fully utilized in the NLP process that evaluates the similarities among segments of questions and answerers.
To fill the aforementioned gaps, we have proposed the Knowledge-enhanced Attentive Answer Selection (KAAS) model (framework is shown by Figure 1). First, we introduce an expertise matrix and an authority matrix to capture the expertise information from historical answers and the authority information from the answerer’s followers in the social networks, respectively. The two matrices share the same (human-labeled) tag dimension, which represents the predefined topic/specialty structure. To address the sparsity problem caused by the large number of tags, we utilize the taxonomy of tags to group tags that are semantically similar to each other. Second, we extract the latent feature matrices for expertise and authority through decomposing the two corresponding matrices. Third, we adopt an attention mechanism to examine the similarity among segments of questions and answers. The answer’s attention representation is adjusted by the extracted expertise and authority features of the answerer. Eventually, the similarity between the question and a candidate answer is generated by taking the inner product of the attentive representation of the question and the adjusted attentive representation of the answer. Experiments with both vertical and general CQA sites demonstrate the superior performance of the proposed KAAS model. For vertical community (i.e., HeathTap, which is a medical area CQA site), we introduce an external knowledge graph (i.e., health knowledge graph) and embed this knowledge graph before attentive learning.
There are three main contributions of our paper. First, KAAS incorporates both the expertise and the authority of the answerer to enhance the performance. Second, we utilize the human-labeled tags and votes as the domain knowledge to infer the expertise of the answer. Third, we propose a matrix decomposition-based method to infer the authority of the answerer and incorporate such information in the process of evaluating the similarity between segments.
2 Related Work
2.1 The general neural network framework for answer selection
To capture the sequential contextual features in free text, LSTM , particularly bidirectional LSTM (biLSTM) , has been the basic modeling framework for answer selection . Figure 2a presents this general framework. First, an embedding method (e.g. word2vec ) is used to encode the text. Second, we use biLSTM to generate the question feature matrix Q and answer feature matrix A, respectively. Third, column-wise max pooling (or other pooling methods) is used to transform the feature matrices into the representing vectors of the question and answer. Last, the similarity score is obtained by calculating the cosine similarity between the representing vectors.
There are a number of variants that improve over the base biLSTM model, including replacing biLSTM with convolutional neural network (CNN) as the sentence model , using both CNN and biLSTM to jointly learn feature matrices , and using multiple biLSTM components to learn the similarity from the feature matrices/vectors .
2.2 The attentive framework for answer selection
Attentive pooling is a method to enable the pooling layer to be aware of the input pair . With an attention mechanism, the information of input items directly influence the calculation of each other’s representations, and thus enhance the capability in evaluating the similarity between two inputs . The attentive pooling method (shown in Figure 2b) has been recently adopted as a standard for answer selection task, in which the two inputs can naturally represent the question matrix and answer matrix (from the biLSTM component) [32, 33]. A recent study further extends the attention matrix to 3rd-order tensor to consider the relationships among segments within questions or answers .
2.3 Recent advances in incorporating community information
Recent advances [2, 20, 21, 22, 23, 24] in CQA answer selection go beyond pure text mining and incorporate the community information. Particularly, we can evaluate the answerer’s expertise through analyzing his or her previous answers, and estimate the authority of an answerer through examining his or her topological position in the community. Zhao et. al (2017) propose the Asymmetric Multi-Faceted Ranking Network Learning (AMRNL) model  that uses the count of an answerer’s followers to indicate the authority, and adjust the similarity score by introducing the question-authority matching score . Lei et. al (2018) borrow the idea of residual networks and propose the Multi-View Fusion Neural Network (MVFNN) model to take topics of the question into consideration . Wen et. al (2019) propose the Hybrid Attentive with Deep Users (UIA-LSTM) model that combines the text of the candidate answer and the text of the corresponding answerer’s previous answers for the following attentive pooling procedure [21, 22]. This approach is effective for vertical CQA communities, where users often do not answer questions out of their domain expertise. However, it might introduce noise for the answer selection task in general CQA communities because users may answer questions in quite different domains.
3.1 Text representation
First, we perform the word embedding of the original text. Word2vec  is used to train the word vector. Note that we may use other word embedding methods. Next, we follow  to use the biLSTM to represent the question and the answer. More specifically, we represent a given sentence as , in which is the -dimension embedded vector for the word. The hidden vector at time step in the LSTM component is updated as follows:
where is the sigmoid activation function, represents the input gate, represents the forget gate, represents the output gate, denotes the cell memory, and , , are network parameters. The formula (4) is the input transformation, and formula (5) updates of the cell state.
The standard LSTM only uses the information of the past. biLSTM, on the other hand, utilizes both the previous and future context by processing the sequence on two directions, and generates two independent sequences of LSTM output vectors. Because biLSTM models the context information for each word, biLSTM-based representation is usually more accurate than LSTM in the answer selection task . In our model, the biLSTM output at each time step is the concatenation of the two output vectors from both directions, i.e., .
3.2 Expertise representation based on previous answers
Previous answers provided by an answerer can be used to represent the expertise of the answerer. Particularly, the CQA site usually has the tag function to help users manually label the questions and answers into a predefined categorical system of the domain knowledge. Such rich information can help us model the answerers’ expertise with a high resolution (as compared to only using the text in the answers). We adopt a collaborative filtering [34, 35] approach to capture the relationship between a tag and an answerer based on the relationships among all tags and previous answers. Because of the large number of tags, there exists the sparsity problem that we do not have sufficient data to learn the representation of each tag . Therefore, we make use of the taxonomy of the tags to group semantically similar tags to a single higher-level tag. We illustrate the grouping procedure in Figure 3 using the two datasets used in this study.
After the tag grouping process, we define a weight to represent answerer ’s expertise in tag as expressed by answer . is measured by the product of the frequency of a certain tag in an answer and the vote measure for this answer as follows
where denotes the answerer, denotes the frequency of tag in the previous answer , and denotes the vote measure for answer . For brevity, we omit the answerer identifier in the rest of the paper since we focus on modeling the answers and follower of a single answerer (no interactions between competing answerers). For the CQA community (HealthTap) that exhibits the actual count of up-votes, the vote measure is the count. For the CQA community (QatarLiving
where represents the feature matrix for the answerer’s previous answers, represents the scaling matrix, and represents the feature matrix for the tags.
3.3 Authority representation based on social networks
The topological position of an answerer in the social network formed by following relationship represent the authority of the answerer. In addition, the interests/specialty of an answerer’s followers (expressed by tags) can be used to further enrich the representation of the answerer’s authority. Existing method of inferring the answerer’s authority is to have a linear combination of the followers’ tags [34, 35]. However, this approach has a strong assumption that the tags are independent. Similarily, we employ the SVD  to capture the relationship between the an answerer and his or her followers (and their tags) based on the relationships among all tags and followers. Semantically similar tags are also grouped to a single higher-level tag (as shown in Figure 3).
After the tag grouping process, we define a weight to represent the answerer’s authority in tag as inferred from the answerer’s follower . is measured by the frequency of a certain tag of a follower as follows
where denotes the frequency of tag of the answerer’s follower . Note that this tag is labeled by the follower . Before tag grouping, the frequency for each tag is either 1 or 0. After tag grouping, the frequency refers to the number of lower-level tags labeled by . For example, if the follower is labeled with ”depression” and ”anxiety,” both are then grouped to ”psychiatry,” the frequency of ”psychiatry” for is 2. For the follower , we construct a follower-tag quality matrix . is the number of the answerer’s followers and is the total number of tags that are shared by all. Then, we employ the SVD to decompose S to obtain the feature matrices for the answerer’s social network and the tags as follows
where represents the feature matrix for the answerer’s social network (authority), represents the scaling matrix, and represents the feature matrix for the tags which is set equal to .
As for the tag matrix, because the SVD is not unique as a small portion of top singular values can approximately represent the whole matrix, so we first approximate matrix , and then set the and equal. SVD is set based on numeric experiments.
3.4 Knowledge graph representation
In vertical community question answering system, there are much professional knowledge. To make full use of this kind of knowledge, we incorporate an external knowledge graph. For example, in this paper, for HealthTap site, we introduce a health knowledge graph (as shown in Figure 4) which is derived from Electronic Medical Records (EMR)  and it can include more professional medical knowledge. There are high quality knowledge bases linking diseases and symptoms, while diseases can be grouped to single higher-level tags similarly. Then we define a weight to represent the candidate answer’s total weights in tag in terms of the answer’s symptom concept . is measured by the frequency of a certain tag of symtoms as follows
where denotes the total weights of tag in terms of the answer’s symptom concept . For the symptom , we construct a symptom-tag quality matrix . is the number of the answer’s symptom concepts and is the total number of tags that are shared by all. Then, we employ the SVD to decompose KG to obtain the feature matrices for the answer’s symptom concepts and the tags as follows
where represents the feature matrix for the answer’s symptom concepts (relationship knowledge graph), represents the scaling matrix, and represents the feature matrix for the tags which is set equal to and .
3.5 Question-answer pair concatenation
To capture the relationship between each segment in the question and each segment in the answer, we follow  to construct a concatenation tensor G as follows
where and is the sigmoid activation function, is the transformation matrix, is the -th hidden state of biLSTM question representations, is the -th hidden state of biLSTM answer representations, and is the bias vector. is the length of the question and is the length of the answer.
Following the standard attentive pooling scheme , we use the row-wise pooling to obtain an interaction matrix , which captures the relationship between each segment in question with all segments in answer, and use the column-wise pooling to obtain an interaction matrix , which captures the relationship between each segment in answer and all segments in the question
3.6 Attention calculation and similarity
For each candidate answer, we model each answerer’s expertise and authority using and , respectively. Then we combine them with as: , where is the unnormalized attention of the -th segment in the answer, , and are transformation matrices, and is the bias vector. Then, the formal attention of each segment in a answer is calculated by
The final representation of the answer is calculated as the weighted summation of the interactions
The final representation of the question is directly derived from the attentive pooling as follows
where is the attention of each segment in the question.
Eventually, the similarity score of the question and the answer is defined as follows
where is a parameter matrix.
For each question, we identify two question-answer pairs, and , with denotes the best answer and denotes the worse answer. We formulate a WARP loss function to optimize the top of the answer ranking .
4.1 Data and model setup
HealthTap is a medical (vertical) CQA site with over 111,000 certified clinical doctors who have answered over 6.5 billion questions by 2018. We adopt the benchmark HealthTap data provided by . We sample all the 4,781 questions with at least two answers and the vote difference between the best and worse answers is at least two. 500 questions (around 10%) are randomly selected as the test set. For consistency, we follow a commonly adopted approach to make the count of answers for each question to be 20. If a question has more than 20 answers, we select the top 20. If a question has fewer than 20 answers, we randomly choose answers  from the answer pool to make it 20. We perform five-fold cross-validation to avoid the overfitting issue. QatarLiving is a general CQA site about various topics of living in Qatar . It is a commonly used benchmark dataset for answer selection. There are 5,450 questions and 41,908 answers. Following the benchmark rules , 10.48% are selected as the test set.
The KAAS model is implemented with the Tensorflow (¿=1.8) framework in Python 2.7. Word2vec is used for training the word vector with the vector dimension of 100. The maximum lengths for the question and the answer are 40 and 80, respectively. The optimization method is Stochastic Gradient Descent (SGD) with the batch size of 256 and the learning rate of 0.01. The value of margin is set as 0.05. The hidden size of the biLSTM is 128. We use the Top 1 precision (P@1), Top 2 precision (P@2), MAP, Accuracy and F1-score to evaluate the performance of KAAS. P@K stands for the proportion of the selected answers in the top K that are true. Usually, users will pay much attention to top 1 answer, sometimes to top 2, and few times to answers behind the second. As a result, we consider about P@1 and P@2. For HT dataset, the ground truth is the number of votes, which ranks the answers from the most-voted to the least-voted. So we use P@1 and P@2 to evaluate the performance of our model for HT dataset. For QL dataset, the ground truth is the human-labeled group (i.e., âGoodâ, âPotentially Usefulâ and âBadâ), which provides with answers’ quality classification and that’s why MAP, Accuracy and F1-score are used for QL dataset. We run the experiments on a Linux server with two E5-2630 v4 2.2GHz CPU and 64GB RAM. The source code is released
4.2 Baseline models
We compare the performance of KAAS with that of following state-of-the-art models. PLANE is a non-neural network method based on statistical NLP feature extractions. It has an offline learning component and an online search component . LSTM is the basic biLSTM model without attentive component . AP-LSTM has a similar biLSTM architecture with the attentive pooling component . AI-CNN takes the interaction of sentence pair into consideration, resulting in a 3D tensor to capture the relationship among the segments . AI-CNN-F computes the similarity through adding additional community information (received thanks and agrees) . MVFNN models answer selection task with a multi-view fusion neural network based on the idea of residual networks . AMRNL uses a linear combination of followers’ tags to represent the authority information, and adjusts the final matching score .
Table 1 and 2 present the performance on both HealthTap and QatarLiving datasets. In general, the proposed KAAS model consistently outperforms state-of-the-art baseline models. More specifically, for the HealthTap dataset, we use P@1 and P@2 to measure the accuracy. P@K is the frequency of successfully predicting the best answer. As the only non-neural network model, PLANE has the lowest accuracy, but it has the advantage in computational efficiency. From LSTM to AP-LSTM and to AI-CNN/AI-CNN-F, the performance improves with additional attentive pooling framework and community information. MVFNN’s performance is similar to AI-CNN-F because it also utilizes the simple community information. AMRNL, on the other hand, only leads to similar performance as the AP-LSTM, indicating the linear combination of tags is less effective than expected, probably due to the sparsity problem in the tag distributions.
Because the QatarLiving dataset provides a categorical evaluation (instead of vote count) of the answer, we adopted MAP, Accuracy, and F1-score to evaluate the performance. We have similar finding: the attentive pooling framework and the inclusion of community information can improve the performance. The proposed KAAS model performs the best consistently.
We further analyze the sensitivity of the KAAS model in terms of the size of the biLSTM hidden layer. As shown in Table 3 and Figure 5, we observe that the size of the hidden layer influences the performance of the model. A trade-off lies between the model complexity and the performance. In particular, when the hidden layer size is small, we can improve the performance by increasing the size of the hidden layer. However, when the hidden layer size is large than a change point, the performance declines, which could be due to the overfitting issue and the lack of sufficient data to fit the additional parameters. The size of 128 in the previous experiments is set based on this sensitivity analysis. Finally we conduct ablation studies as shown in Table 4. We can find that among three parts (i.e., authority, expertise and knowledge graph), expertise information is the most significant. In the future, we plan to optimize the matrix decomposition part to further enhance the full model’s performance.
|KAAS Knowledge Graph||39.73%|
|KAAS Expertise & Authority||40.32%|
|KAAS Authority & Knowledge Graph||40.07%|
|KAAS Knowledge Graph & Expertise||40.11%|
In this paper, we propose the KAAS model for the CQA answer selection task. KAAS is based on a biLSTM neural network with attentive pooling mechanism. It incorporates both the expertise and authority information learned from the answerer’s previous answers, the tags of the answers, the followers of the answerer, and the tags of the followers. For vertical community, an external knowledge graph is also utilized which can capture semantic information between questions and answers [41, 42, 43]. In the end, experiments with both general and vertical CQA datasets show that the KAAS model outperforms state-of-the-art answer selection models.
The novelty of our model comes from the incorporation of community information and domain knowledge. We combine the existing techniques with an efficient modeling framework. The model presents a generic framework that incorporates the community information as well as external knowledge, which does not necessarily have to be in the same format in different datasets. We can easily modify the SVD and attention components to incorporate different types of community information that can be extracted from other datasets or other types of side information. For CQA sites where no authors’ information is available, we can always incorporate external knowledge graphs into it or we may crawler information from website. In hence, our proposed model is both general and novel, and we can rather easily apply it to other recommendation problems in CQA .
In conclusion, this paper sheds light on the efficacy of the community information in inferring the expertise and authority of answerers, and could inform future research to better mine and utilize the domain knowledge hidden in the CQA community. One possible future direction is to optimize the decomposition process. Given that we will have to consider its influence on downstream learning task, the optimization is difficult but a valuable try. Another opportunity is to generate knowledge graph from the CQA community itself, which might be more suitable for revealing semantic information between the community questions and answers, and thus is a rewarding while very challenging future work.
 Yuan, S., Zhang, Y., Tang, J., Hall, W. & CabotÃ , J. B. (2019). Expert finding in community question answering: a review. Artificial Intelligence Review, in press.
 Nie, L., Wei, X., Zhang, D., Wang, X., Gao, Z. & Yang, Y. (2017). Data-driven answer selection in community QA systems. IEEE transactions on knowledge and data engineering 29(6), 1186-1198.
 Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B. & Chua, T. S. (2015). Disease inference from health-related questions via sparse deep learning. IEEE Transactions on knowledge and Data Engineering, 27(8), 2107-2119.
 Sun, R., Cui, H., Li, K., Kan, M. Y. & Chua, T. S. (2005). Dependency relation matching for answer selection. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 651-652). ACM.
 Zhang, D. & Lee, W. S. (2003). Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp. 26-32). ACM.
 Manning, C. D., Manning, C. D. & SchÃ¼tze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
 Moschitti, A., Quarteroni, S., Basili, R. & Manandhar, S. (2007). Exploiting syntactic and shallow semantic kernels for question answer classification. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 776-783). ACL.
 Heilman, M. & Smith, N. A. (2010). Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 1011-1019). ACL.
 Xue, X., Jeon, J. & Croft, W. B. (2008). Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 475-482). ACM.
 Rao, J., He, H. & Lin, J. (2017). Experiments with convolutional neural network models for answer selection. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1217-1220). ACM.
 Qiu, X. & Huang, X. (2015). Convolutional neural tensor network architecture for community-based question answering. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (pp. 1305-1311). IJCAI.
 Yang, X., Khabsa, M., Wang, M., Wang, W., Awadallah, A., Kifer, D. & Giles, C. L. (2018). Adversarial training for community question answer selection based on multi-scale matching. arXiv preprint arXiv:1804.08058.
 Wang, D. & Nyberg, E. (2015). A long short-term memory model for answer sentence selection in question answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 2, pp. 707-712). ACL.
 Tan, M., Santos, C. D., Xiang, B. & Zhou, B. (2015). Lstm-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.
 Hao, Y., Liu, X., Wu, J. & Lv, P. (2019). Exploiting Sentence Embedding for Medical Question Answering. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI.
 Wu, F., Duan, X., Xiao, J., Zhao, Z., Tang, S., Zhang, Y. & Zhuang, Y. (2017). Temporal interaction and causal influence in community-based question answering. IEEE Transactions on Knowledge and Data Engineering 29(10), 2304-2317.
 Sha, L., Zhang, X., Qian, F., Chang, B. & Sui, Z. (2018). A multi-view fusion neural network for answer selection. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (pp. 5422-5429). AAAI.
 Santos, C. D., Tan, M., Xiang, B. & Zhou, B. (2016). Attentive pooling networks. arXiv preprint arXiv:1602.03609.
 Huang, H., Wei, X., Nie, L., Mao, X. & Xu, X. S. (2018). From Question to Text: Question-Oriented Feature Attention for Answer Selection. ACM Transactions on Information Systems (TOIS) 37(1), 6.
 Zhao, Z., Lu, H., Zheng, V. W., Cai, D., He, X. & Zhuang, Y. (2017). Community-based question answering via asymmetric multi-faceted ranking network learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (pp. 3532-3538). AAAI.
 Wen, J., Ma, J., Feng, Y. & Zhong, M. (2018). Hybrid Attentive Answer Selection in CQA With Deep Users Modelling. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (pp. 2556-2563). AAAI.
 Wen, J., Tu, H., Cheng, X., Xie, R. & Yin, W. (2019). Joint modeling of users, questions and answers for answer selection in CQA. Expert Systems with Applications 118, 563-572.
 Zhang, X., Li, S., Sha, L. & Wang, H. (2017). Attentive interactive neural networks for answer selection in community question answering. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (pp. 3525-3531). AAAI.
 Zhao, Z., Zhang, L., He, X. & Ng, W. (2014). Expert finding for question answering via graph regularized matrix completion. IEEE Transactions on Knowledge and Data Engineering 27(4), 993-1004.
 Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural computation 9(8), 1735-1780.
 Graves, A. & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18(5-6), 602-610.
 Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.
 Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
 Hu, B., Lu, Z., Li, H. & Chen, Q. (2014). Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems (pp. 2042-2050). NIPS.
 Tan, M., Dos Santos, C., Xiang, B. & Zhou, B. (2016). Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 464-473). ACL.
 Yin, W., Yu, M., Xiang, B., Zhou, B. & SchÃ¼tze, H. (2016). Simple question answering by attentive convolutional neural network. arXiv preprint arXiv:1606.03391.
 Bian, W., Li, S., Yang, Z., Chen, G. & Lin, Z. (2017). A compare-aggregate model with dynamic-clip attention for answer selection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 1987-1990). ACM.
 Xiang, Y., Chen, Q., Wang, X. & Qin, Y. (2017). Answer selection in community question answering via attentive neural networks. IEEE Signal Processing Letters 24(4), 505-509.
 Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P. & Riedl, J. (1994). GroupLens: an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on Computer supported cooperative work (pp. 175-186). ACM.
 Marlin, B. M. (2004). Modeling user rating profiles for collaborative filtering. In Advances in neural information processing systems (pp. 627-634). NIPS.
 Nakatsuji, M., Fujiwara, Y., Uchiyama, T. & Toda, H. (2012). Collaborative filtering by analyzing dynamic user interests modeled by taxonomy. In International Semantic Web Conference (pp. 361-377). ISWC.
 Golub, G. H. & Reinsch, C. (1971). Singular value decomposition and least squares solutions. In Linear Algebra (pp. 134-151). Berlin: Springer.
 Weston, J., Bengio, S. & Usunier, N. (2011). Wsabie: Scaling up to large vocabulary image annotation. In The 22nd International Joint Conference on Artificial Intelligence (pp. 2764-2770). IJCAI.
 Deng, Y., Xie, Y., Li, Y., Yang, M., Du, N., Fan, W. & Shen, Y. (2018). Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering. arXiv preprint arXiv:1812.02354.
 Nakov, P., Hoogeveen, D., MÃ rquez, L., Moschitti, A., Mubarak, H., Baldwin, T. & Verspoor, K. (2017). SemEval-2017 task 3: Community question answering. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) (pp. 27-48). ACL.
 Wei, X., Huang, H., Nie, L., Zhang, H., Mao, X. L. & Chua, T. S. (2016). I know what you want to express: sentence element inference by incorporating external knowledge base. IEEE Transactions on Knowledge and Data Engineering 29(2), 344-358.
 Huang, X., Zhang, J., Li, D. & Li, P. (2019). Knowledge graph embedding based question answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (pp. 105-113). ACM.
 Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S. & Sontag, D. A. (2017). Learning a health knowledge graph from electronic medical records. Scientific Reports, 7(1).