Promotion of Answer Value Measurement with Domain Effects in CommunityQuestion Answering Systems

Promotion of Answer Value Measurement with Domain Effects in Community
Question Answering Systems

Binbin Jin, Enhong Chen, , Hongke Zhao, , Zhenya Huang, Qi Liu, , Hengshu Zhu, and Shui Yu This research was partially supported by grants from the National Key Research and Development Program of China (No. 2016YFB1000904), the National Natural Science Foundation of China (Grants No. U1605251, 61727809 and 61672483), and the Youth Innovation Promotion Association of CAS (No. 2014299). (Corresponding author: Enhong Chen.)B. Jin, E. Chen, Z. Huang, and Q. Liu are with the School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China. E-mail: bb0725@mail.ustc.edu.cn, cheneh@ustc.edu.cn, huangzhy@mail.ustc.edu.cn, qiliuql@ustc.edu.cnH. Zhao is with College of Management and Economics, Tianjin University, Tianjin, 300072, China. E-mail: hongke@tju.edu.cnH. Zhu is with the Baidu Talent Intelligence Center, Baidu Inc., Haidian District, Beijing, 100085, China. E-mail: zhuhengshu@baidu.comY. Shui is with the School of Computer Science, University of Technology Sydney, Australia. E-mail: Shui.yu@uts.edu.au
Abstract

In the area of community question answering (CQA), answer selection and answer ranking are two tasks which are applied to help users quickly access valuable answers. Existing solutions mainly exploit the syntactic or semantic correlation between a question and its related answers (Q&A), where the multi-facet domain effects in CQA are still underexplored. In this paper, we propose a unified model, Enhanced Attentive Recurrent Neural Network (EARNN), for both answer selection and answer ranking tasks by taking full advantages of both Q&A semantics and multi-facet domain effects (i.e., topic effects and timeliness). Specifically, we develop a serialized LSTM to learn the unified representations of Q&A, where two attention mechanisms at either sentence-level or word-level are designed for capturing the deep effects of topics. Meanwhile, the emphasis of Q&A can be automatically distinguished. Furthermore, we design a time-sensitive ranking function to model the timeliness in CQA. To effectively train EARNN, a question-dependent pairwise learning strategy is also developed. Finally, we conduct extensive experiments on a real-world dataset from Quora. Experimental results validate the effectiveness and interpretability of our proposed EARNN model.

Deep learning, community question answering, answer selection/ranking, topic effects, timeliness

I Introduction

With the prevalence of community question answering (CQA), e.g., Quora111https://www.quora.com/ and Yahoo Answers222https://answers.yahoo.com/, there are more and more users active on these communities. They are willing to post questions or share their experience so that massive questions with largely increasing answers are accumulated. In CQA websites, after posting questions, many answerers will continuously contribute to their interested ones. Particularly, some attractive questions can appeal to even hundreds of answers, and also, some of them may make users lose their interest due to their long paragraphs [1]. Thus, it is hard for the askers and attracted visitors to quickly find the valuable answers, especially for those answers whose views and upvotes are not stable yet. Therefore, answer selection and answer ranking, which can be applied to similar application scenarios, have become two effective solutions. Particularly, both of them focus on measuring the semantic matching between the question and its related answers (Q&A). The difference is that answer selection aims to select valuable answers from the candidate list, while answer ranking aims to put valuable answers ahead in the answer list [2]. However, as hot research points in the area of CQA, answer selection and ranking are still with challenges.

Fig. 1: An example of Q&A from Quora with a question and three comparable answers.

These challenges mainly come from two aspects. First, for the past few years, researchers have proposed some deep learning based models with attention mechanisms to exploit the semantics between Q&A [3, 4, 5]. These studies mainly use the text of Q&A to measure their similarity. Recent years, with the mechanism improvements of CQA sites, almost all platforms provide topics for each question (see the blue rectangle in Fig. 1) so that visitors can quickly find their interested ones with topic filtering. However, exploiting the benefits of these topics for answer selection or ranking is still open and being explored.

Second, some studies mainly focus on fixing the lexical gap between Q&A on large-scale dataset, such as Yahoo Answers or Quora [6, 7, 8]. However, they ignore the fact that different answers are posted at different times so that their value is highly related to the time [9]. Therefore, it is necessary to eliminate the bias induced by the time.

In order to solve the challenges, we make a deep observation and exploration in CQA and then find multi-facet domain effects. Fig. 1 333This question and corresponding answers are posted on the following url: https://www.quora.com/Which-are-some-of-the-best-places-to-visit explains the significance of two domain effects related to our motivations with a snapshot from Quora. The figure shows a popular question which indeed receives more than 100 answers (green rectangle) and we only show three compared answers for better illustration. From this instance, besides “Details of Q&A (red)”, we can also see “Topics (blue)”, “Answer Time (orange)”, “Views and Upvotes (purple)” in rectangles. Obviously, three sample answers have different value444The value means the attraction to readers, which is often reflected in the numbers of their stable views and upvotes. (14.2k views and 128 upvotes versus 1.7k views and 40 upvotes versus 162 views and 2 upvotes). According to our observations, there are at least two aspects which have great impacts on measuring answer value.

Topic effects. The value of an answer not only depends on its description but also is affected by the topics. Different from some platforms (e.g., Yahoo Answers) which only provide predefined topics, Quora allows askers to create any topic for their questions. As a consequence, askers will use more exact topics that are relevant to their questions. It is obvious that there will be several key words in the topics and Q&A, and these keys words, or the emphasis, can help visitors quickly understand the implicit intent of Q&A [10]. For example, we can see that the first and second answers in Fig. 1 refer to some places which are the intent of the asker, whereas only the first one receives more upvotes. To explain this difference, we find the first good answer mentions more words which are similar with “beautiful” appeared in topics, e.g., “beauty”, “surreal”, etc. Especially the last sentence (“Surreal, otherworldly, simply amazing.”) is likely to attract more people. On the contrary, even though the second one involves “Pangong Lake”, the description is bland and does not appeal to readers. Thus, with the help of topics, we can better understand the deep semantic and distinguish the emphasis of Q&A.

Timeliness of answers. As we mentioned, in CQA, the number of upvotes depends on not only the quality of answers but also the time period since answers are posted. We call the latter fact timeliness. Intuitively, a question will attract more readers during the early period after it is posted. Consequently, the corresponding early-coming answers are supposed to serve more readers and receive more upvotes. In Fig. 1, the first and the third answers are both good ones with high quality, but the timestamp of the first answer is much earlier than the third one’s so that the former is viewed and voted with more chances than the latter. Therefore, when measuring answer value, it is significant to take the timeliness into account.

Considering both topic effects and timeliness of answers, it is necessary to find an approach which can well model these special domain effects on understanding the deep semantics of Q&A and evaluating the answer value. Also, to conveniently access Q&A, especially for those long answers, it is significant to distinguish the emphasis of Q&A with the help of topics.

To that end, in this paper, we present a focused study on answer selection and ranking by taking full advantages of both Q&A and two specific domain effects (i.e., topic effects and timeliness). Specifically, we propose a unified model, Enhanced Attentive Recurrent Neural Network (EARNN), to exploit the impacts of topics and timeliness on evaluating answer value. Particularly, we first follow the question answering process and develop a serialized LSTM (i.e., Long Short-Term Memory) with two enhanced attention mechanisms to capture the deep effects of topics. Benefiting from our attention mechanisms, we can easily find the important regions which are related to the intent of the asker (i.e., topics). After that, the unified representations of Q&A are learned and the emphasis of Q&A can be automatically distinguished at sentence and word levels. Moreover, considering the timeliness that answers with earlier timestamps are supposed to be more preferred, we develop a time-sensitive ranking function to eliminate the bias induced by time (i.e., timeliness). Furthermore, since visitors of different questions may vary a lot, it is usually unreasonable to compare the value of answers to different questions. Thus, we adopt a question-dependent pairwise learning strategy to facilitate the training process of our model. Finally, we construct extensive experiments on a real-world dataset. The experimental results validate the effectiveness and interpretability of EARNN. The contributions of this paper can be summarized as follows.

  • We conduct a focused study on answer selection and ranking problems in CQA. We further propose a unified model (i.e.,EARNN) for both tasks to effectively measure answer value.

  • We make a deep observation and find the topic effects which have an impact on measuring semantic relations between Q&A. Therefore, we propose two enhanced attention mechanisms to capture the deep effects of topics. Benefiting from them, the emphasis of Q&A can be automatically distinguished at sentence and word levels.

  • To eliminate the bias induced by the time, we develop a time-sensitive rank function to model timeliness of answers.

  • We collect large-scale real-world data from Quora. With this data, we conduct extensive experiments whose results demonstrate the effectiveness of EARNN.

Ii Related Work

In CQA, the related work can be grouped into two categories. One is the traditional methods which mainly depend on lots of manual work. Another is neural network based methods which avoid feature engineering and their performances have been validated.

Ii-a Feature Based Approaches

In the early stage, researchers tried their best to design various non-textual features to predict answer quality including answer length, answerer’s activity level, question-answer overlap and so on [11, 12, 13, 14, 15, 9]. Then, with the development of natural language processing (NLP) [16, 17], many lexical and syntactic based approaches were applied to analyze the structure of sentences and the relations between Q&A. Yih et al. [18] paid attention to improving the lexical semantics based on word relations including synonymy/antonymy, hypernymy/hyponymy and general semantic word similarity. For syntactic approaches, dependency trees [19, 20, 21] or quasi-synchronous grammar [22, 23] was used to analyze the structure of sentences and extract effective syntactic features. Besides, Cai et al. [24] and Ji et al. [25] adopted the topic models to extract topic distributions as contextual features under the assumption that the question and answer should share a similar topic distribution. After the feature engineer, logistic regression (LR) [14], support vector machine (SVM) [21], conditional random fields (CRF) [26, 27] and other machine learning methods [28] were employed to measure answer quality. Since Jeon et al. [29] proposed a word-based translation model for question retrieval, researchers begun to make their efforts on fixing the lexical gap between Q&A. Xue et al. [30] proposed a word-based translation language model for question retrieval and Lee et al. [31] tried to improve the translation probabilities based on question-answer pairs by selecting the most important terms to build compact translation models. Furthermore, phrase-based models [32, 33], lexical word-based translation models [34] were proposed in succession for better measuring the similarities of Q&A. In summary, those feature engineering methods depend on much manual work which is time consuming. The translation-based methods suffer from the informal words or phrases in Q&A archives and perform less applicability in new domains.

Ii-B Neural Network Based Approaches

Recently, with the development of deep learning, scholars proposed some neural network based models to explore the semantic relations of Q&A texts [35, 36], which achieved great success on measuring answer quality [37, 3, 4]. Despite the deep belief networks (DBN) [38] and recursive neural networks [39] have shown some nonlinear fitting capability, the great success of convolutional neural networks (CNN) [40, 41, 42] and recurrent neural networks (RNN) [43, 44, 45] on various tasks completely changed the research direction. For example, Severyn et al. [3] employed a CNN to generate a representation for each sentence and then used similarity matrix to compute a relevant score. Wang et al. [36] proposed a method using a stacked bidirectional LSTM to read the sentence word-by-word, and then integrated all outputs to predict the quality score. Wan et al. [46] combined the LSTM and CNN to capture both local and context information for determining the importance of local keywords from the whole sentence view. In a word, CNN based models use position-shared weights with local perspective filters to learn spatial regularities in Q&A while RNN based models pay more attention on the regularities of the word sequence.

Moreover, in order to deeply exploit the semantic relevance between Q&A, some researchers attempted to integrate attention mechanisms into CNN/RNN based models [47, 5, 48, 49, 50]. These adjusted the machine’s attention on different regions of texts so that the relations of Q&A could be better understood. For instance, Yin et al. [47] described three architectures where attention mechanisms were combined with a CNN for general sentence pair modeling tasks. Liu et al. [5] modeled strong interactions of two texts through two inter- and intra-dependent LSTMs. However, to the best of our knowledge, almost all neural network models only focus on modeling the similarities of Q&A and few works directly model multi-facet domain effects such as topic effects or timeliness

Different from previous studies, our work aims at modeling two specific domain effects in CQA, i.e., topic effects and timeliness, which are potentially beneficial for measuring answer value. Specifically, we develop a serialized LSTM with two enhanced attention mechanisms (i.e., sentence-level and word-level) to mine the deep effects of topics and further recognize the emphasis of Q&A. Besides, we design a time-sensitive ranking function to integrate timestamps into our proposed model EARNN and perceive the timeliness in CQA.

Iii Methodology

In this section, we first formally inspect answer selection and answer ranking problems. Then we introduce the technical details of our Enhanced Attentive Recurrent Neural Network (i.e., EARNN) including the architecture of the neural network and the training method. For better illustration, Table I lists some mathematical notations.

Notations Type Description
scalar the number of words in a sentence
scalar the number of sentences in an answer
vector is the number of words in -th topic
matrix the representation of a sentence in Q&A
tensor the representation of topics
matrix the outputs of LSTM_Q
tensor the outputs of LSTM_A
vector the unified representation of the question
vector the unified representation of the answer
vector the unified representation of topics
scalar the matching score of the answer
scalar the time period since the answer is posted
scalar the ranking score of the answer
TABLE I: Several important mathematical notations.

Iii-a Problem Overview

Generally, answer selection mainly targets at choosing valuable answers from candidates while answer ranking mainly targets at sorting answers by their value to a specific question. Specifically, given a question , a list of answers is followed where is the -th answer. In addition, each question has several topics and each answer has a timestamp and ground truth . Then, two tasks have the following formulations.

Task 1 (Answer Selection.) Given a question with topics and a list of answers with a series of timestamps, our goal is to integrate all information to train a model (i.e., EARNN), which can be used to label all candidate answers with 0/1 (1 for valuable answer; 0 otherwise), s.t. the labels are consistent with the ground truth, such as manual annotation.

Task 2 (Answer Ranking.) Given a question with topics and a list of unsorted answers with a series of timestamps, our goal is to integrate all information to train a model (i.e., EARNN), which can be used to rank all candidate answers, s.t. the order of sorted answers is consistent with their ground truth, such as the stable upvotes.

Fig. 2: Graphical representation of EARNN.

Iii-B Technical Details of EARNN

In this subsection, we introduce EARNN in detail. As shown in Fig. 2, EARNN contains three parts namely Input Layer, Representation Layer and Evaluation Layer. Particularly, multi-facet domain effects (i.e., topic effects and timeliness) are modeled in representation layer and evaluation layer, respectively. To effectively train EARNN, a question-dependent pairwise learning strategy is also proposed at the end of this subsection.

Input Layer aims to represent words in Q&A in a continuous space. As mentioned in Section III-A, the textual inputs to EARNN include three parts (i.e., a question , an answer and topics ). For simplicity, we assume the question contains only one intent of the asker so that it is treated as one sentence. Differently, answers usually settle a question from various perspectives, thus an answer is formalized as a sequence of sentences. For each sentence in Q&A, it is formalized as a sequence of words. Besides, topics consist of phrases and each phrase contains words. Then, we replace each word with a pre-trained K-dimensional word embedding [51]. After that, as shown in Fig. 2, inputs are composed of one or more matrices each of which represents a sentence or a phrase. Note that are not fixed relaying on the instance and pre-trained word embeddings only capture the syntactic information rather than the semantic one.

Representation Layer develops a serialized LSTM with two enhanced attention mechanisms so that topic effects are modeled and unified representations of Q&A are learned. Meanwhile, the emphasis of Q&A is automatically distinguished.

In our study, since the attraction of answers partly depends on the question, answer representations should be adjusted according to the question. Therefore, we develop a serialized LSTM built by two LSTM models (i.e., LSTM_Q, LSTM_A) as shown in Fig. 2. Specifically, given the question and answer , LSTM_Q reads the word embeddings of the question one by one while LSTM_A (with different parameters from LSTM_Q) reads the word embeddings of the answer in a more sophisticated way. First, we initialize the memory cell of LSTM_A with the final memory cell of LSTM_Q to model the relations of Q&A. Then, since sentences in the answer may express different meanings, they should be independently modeled by LSTM_A. Therefore, each word in a sentence can learn a semantic word embedding by combing the question and its adjacent words.

With respect to the basic RNN cell, we utilize an implementation of LSTM proposed by Graves [52]. Given the -th word embedding and -th cell vector of a specific sentence, the hidden vector can be computed as:

(1)

Then, all word representations of Q&A, denoted by , , can be constructed by a set of hidden vector sequences generated from LSTM_Q and LSTM_A.

Fig. 3: Graphical representations of two attention mechanisms. (a) Sentence-level attention. (b) Word-level attention.

After learning the representations of Q&A, i.e., , in the following, we aims to learn the unified representations of Q&A combing with topic effects. In Fig. 1, the first good answer mentions “Baikal Lake” and uses “beauty”, “surreal” to describe a beautiful scenery, which are semantically relevant to words in the question (e.g., “place”) and topics (e.g., “beautiful”). But the second answer only refers to “Pangong Lake” with its location and best visiting time. Its description is bland and boring so that it loses much attraction from askers and visitors even though the posting time of the second one is earlier than the first one. Based on this evidence, we first design a sentence-level attention to capture the relations of Q&A. Then, on the basis of it, we further design a word-level attention together with topics to locate the significant regions of Q&A which are semantically relevant to topics. Formally, given the question , answer and topics , with the help of these two attentions, the unified representations of Q&A, denoted by , will be learned and each word or sentence is assigned an attention score denoting its importance.

Specifically, for the question , the sentence-level attention (Fig. 3) applies an average pooling to summarize all words into a fixed length vector which denotes the semantic summary of the question. Thus, the final question representation is implemented as follows:

(2)

Similarly, for the answer , the sentence-level attention first puts -th sentence into the average pooling and gets its semantic representation . Then, a distance function is used to compute the attention score for each sentence in the answer to the question . Finally, the answer representation is modeled by a weighted sum of all sentences as follows:

(3)

where denotes the distance function which is stated as cosine similarity in this paper.

Differently, the word-level attention (Fig. 3) can not only distinguish the important regions of the answer at sentence level, but also focus on the word level regions with the help of topics . Given topics where denotes the -th word in the -th phrase, the word-level attention applies an average pooling to summarize these topics into a fixed length vector :

(4)

After the computation of the topic embedding , word-level attention uses it to measure the attention score for each word in Q&A. Regarding the question , since it contains a set of semantic representations whereas the topic embedding is a syntactic representation, a translation matrix is adopted to measure the distance between each word and topics. Then, a softmax operation follows to compute the attention score for -th word in the question. Finally, the final question representation can be modeled by a weighted sum as follows:

(5)

where denotes the distance between and in different spaces and the translation matrix is optimized by the network. As for the answer , each sentence vector can be computed by the same way. Then we use (3) to get the final answer representation .

In summary, the sentence-level attention utilizes the question and answer representations (i.e., , ) to distinguish important sentences of the answer. The word-level attention utilizes the extra topic representations (i.e., ) to capture the deep semantic relevance between topics and Q&A. Besides, according to the attention scores (i.e., in (3) and in (5)), both important words and sentences can be recognized which will be demonstrated in Experiments. After the process of this layer, the unified representations of Q&A (i.e., ) are obtained in a deep semantic space.

Evaluation Layer outputs the answer value by combing the semantic matching of Q&A and the timeliness of answers. Actually, the timeliness means that the value of questions or answers will be reduced as time goes on. That is, a question will attract more readers in the early period after it is posted, so that the corresponding early-coming answers may serve more readers and receive more upvotes. For example, in Fig. 1, the first and third answers both mention several beautiful places. They should have received the same attention. However, the first one receives more views and upvotes than the third one because it is posted much earlier than the other. According to this observation, we design a time-sensitive ranking function to model the biased value of answers. Specifically, given the question embedding , answer embedding and timestamp , we first measure the deep semantic matching score of the answer and then take the timeliness into account to obtain the final ranking score .

For measuring the deep semantic matching score, we first concatenate the question and answer representations. Then, a fully connected network is used to learn the overall relevance representation . Finally, a logistic function is applied to predict the deep semantic matching score :

(6)

where and are and functions, respectively. is the concatenation operation. {, , , } are model parameters optimized by the network.

With respect to the timeliness of answers, we assume that an answer with an earlier timestamp is more valuable and attractive so that it is supposed to be ranked first than others. Therefore, we measure the revised ranking score of the answer by jointly exploiting the relations between the deep semantic matching and timeliness:

(7)

where is the timestamp of the first coming answer and is the hyper-parameter. In (7), the first multiplier is a decay factor and it becomes smaller as time goes on. Particularly, it equals to 1 for the first answer.

Model Training. Since askers can post their questions at any time and visitors of different questions may vary a lot, it is usually unreasonable to compare the value of answers to different questions. In other words, we assume that only the value of answers to the same question is comparable. Thus, we adopt a question-dependent pairwise learning strategy with a large margin objective to optimize all parameters. First, for each question , we construct several triples from the answer list, where answer is more valuable than answer . Then, we minimize the following objective function:

(8)

where is all parameters in EARNN; denotes the ranking score illustrated in (7); is the margin which is a hyper-parameter. Given a triple , we will compute . If , we will skip this triple. Otherwise, we use stochastic gradient decent (SGD) [53] to update the model parameters with the back propagation through time algorithm.

Statistics Value
Total number of questions 9,353
Total number of answers 218,965
Average number of answers per question 23.4
Average number of words per question 35.6
Average number of words per answer 108.3
Average number of sentences per answer 6.1
Average number of topics per question 4.7
Average number of triples per question 190.7
TABLE II: The statistics of the dataset.
Data split The ratio of questions in training sets
90% 80% 70% 60%
Training Question 8,417 7,482 6,547 5,611
Answer 197,068 175,857 155,183 130,493
Triples 1,605,454 1,433,493 1,270,750 1,009,476
Testing Question 936 1,871 2,806 3,742
Answer 21,897 43,108 63,782 88,472
TABLE III: The statistics of the training and testing dataset.

Iv Experiments

In this section, we first introduce the dataset and show some basic statistics. Then, we illustrate the experimental setup including the embedding size, initialization, etc. Afterwards, all benchmark methods and evaluation metrics are introduced. Finally, we report the experimental results from four aspects: evidence of topic effects and timeliness, performance comparisons, parameter sensitiveness and two case studies for attention visualizations.

Iv-a Dataset Preparation

We collected a dataset including 372,818 questions and 1,739,222 answers associated with topics, upvotes, timestamps, etc. from Quora using the approach described in [1]. For training a robust model, we only reserved the questions with more than 10 answers among which the most number of upvotes is over 20. Then, we also removed questions and answers which had less than 10 words or questions without topics. After that, we removed the questions whose views and upvotes were not stable (the posting time of Q&A was less than one month before the collect-data time). After the data cleaning, there are 9,353 questions and 218,965 answers left. Table II shows the detailed statistics of the dataset after preprocessing. Finally, we create two kinds of ground truth for answer selection and answer ranking, respectively. For answer selection, we sorted answers to each question by their stable upvotes. Then, answers with more than 10 upvotes were treated as good answers and the rest of them were treated as bad ones (i.e., ground truth of good answers equals to 1; 0 otherwise). For answer ranking, we sorted the answers of each question by their final upvotes and treated that ranking as the ground truth [1, 7].

Moreover, we also analyze some distributions of the number of sentences and words. Fig. 4 tells that about 40% answers contain more than 5 sentences and 50% questions (sentences in answers) contain more than 33 (15) words. That is to say, it is exhausting for the asker and visitors to read dozens of long answers to one question, so it is necessary to rank answers by their attraction to readers. In the following experiments, to observe how the models behave at different sparsity data, we randomly select 90%, 80%, 70%, 60% instances as training sets and the rest as testing sets, respectively. More details are shown in Table III.

Fig. 4: Distribution of sentences and words. (a) Sentence distribution. (b) Words distribution.

Iv-B Experimental Setup

Embedding Setting. Word embeddings in Input Layer are pre-trained on the Q&A corpus in the whole collected data. We use public word2vec lib (Gensim555http://radimrehurek.com/gensim/) to assign every word with a 50 dimensional vector (i.e., ) which is tuned in training. Particularly, those words which appear less than 5 times are assigned a same randomly initialized vector. Besides, we empirically set the size of vectors , in (1) and in (6) as 50.

Training Setting. We follow [54] and randomly initialize all parameters in EARNN with a uniform distribution in the range from to , where , are the sizes of layers before and after the weight matrix. During the training process, all parameters are tuned. Moreover, we use dropout with probability 0.2 to prevent overfitting. Without special illustration, in (7) and in (8) are empirically set as (with unit second) and , respectively.

Benchmark Methods. In the experiments, EARNN represents our complete solution with word-level attention and also the time-sensitive ranking function. In order to illustrate the effectiveness of sentence-level and word-level attention mechanisms and the effects of timeliness, we construct two variant models, denoted by EARNN_s and EARNN_w. Specifically, EARNN_s only utilizes the sentence-level attention whose inputs are the details of Q&A. EARNN_w only utilizes the word-level attention which treats extra topics as an additional input. However, both variants treat the deep semantic matching score as the ranking score of the answer without considering the timeliness of answers.

Besides, we compare our approaches against six popular models for two tasks.

  • BM25 [55] is a popular model in information retrieval. Correspondingly, the text of question can be treated as the query in BM25 and answers are the documents to be ranked.

  • TRLM [29] is a translation based model to calculate the similarity between two texts (i.e., the details of the question and answers).

  • rankSVM [56] is a support vector machine for ranking. Each question and answer is presented by a vector where each dimension denotes a word and its value equals to the frequency.

  • NBOW is a neural bag-of-words model where Q&A embeddings are the syntactic level representations and a multi-layer perceptron (MLP) [57] is used to measure the relevance.

  • CNTN [37] is a convolutional neural tensor network architecture which encodes Q&A and models their relevance with a tensor layer.

  • WEC_CNN [6] is the similarity matrix based architecture to model the deep interactions between Q&A.

Among them, BM25, TRLM and RankSVM are traditional methods using discrete word representations. The other deep learning based methods use distributed vectors to model sentences. Particularly, NBOW and CNTN use syntactic and semantic level representations, respectively. And WEC_CNN is one of the deep interaction methods to match Q&A.

In the experiments, all methods are implemented by ourselves following the related references and all hyperparameters are tuned carefully so that their performances reach the best in Quora dataset. All results are obtained on a Linux System (4 Intel Core i5-6500 CPUs, 8G RAM). We use Tensorflow to implement deep learning methods (NBOW, CNTN, WEC_CNN, EARNN_s, EARNN_w, EARNN). Except the non-parameter models (i.e., BM25, TRLM), we also test the training time for EARNN, WEC_CNN, CNTN, NBOW and rankSVM with the 90%-10% partitioning data. It respectively takes 7,986 seconds, 960 seconds, 1,242 seconds, 224 seconds and 321 seconds for them to converge.

Fig. 5: The evidences of our motivations. (a) Topic effects. (b) Timeliness.
P@5 P@10 MAP MRR
BM25 0.287 0.252 0.420 0.529
TRLM 0.277 0.223 0.428 0.582
rankSVM 0.338 0.267 0.505 0.662
NBOW 0.338 0.266 0.495 0.640
CNTN 0.350 0.270 0.513 0.650
WEC_CNN 0.355 0.273 0.514 0.656
EARNN_s 0.361 0.274 0.532 0.673
EARNN_w 0.365 0.275 0.539 0.688
EARNN 0.404 0.303 0.587 0.739
P@5 P@10 MAP MRR
BM25 0.283 0.245 0.417 0.527
TRLM 0.278 0.229 0.419 0.576
rankSVM 0.332 0.264 0.498 0.648
NBOW 0.336 0.270 0.505 0.638
CNTN 0.344 0.270 0.509 0.647
WEC_CNN 0.352 0.272 0.514 0.653
EARNN_s 0.354 0.273 0.525 0.669
EARNN_w 0.352 0.275 0.533 0.677
EARNN 0.390 0.295 0.573 0.724
P@5 P@10 MAP MRR
BM25 0.285 0.247 0.422 0.527
TRLM 0.270 0.224 0.417 0.570
rankSVM 0.330 0.262 0.495 0.645
NBOW 0.336 0.269 0.502 0.635
CNTN 0.348 0.272 0.510 0.643
WEC_CNN 0.350 0.274 0.508 0.646
EARNN_s 0.353 0.274 0.522 0.664
EARNN_w 0.358 0.276 0.531 0.675
EARNN 0.386 0.294 0.573 0.723
P@5 P@10 MAP MRR
BM25 0.281 0.246 0.420 0.526
TRLM 0.273 0.222 0.414 0.567
rankSVM 0.339 0.272 0.502 0.647
NBOW 0.347 0.276 0.498 0.631
CNTN 0.350 0.272 0.504 0.634
WEC_CNN 0.356 0.281 0.513 0.653
EARNN_s 0.356 0.280 0.515 0.655
EARNN_w 0.363 0.283 0.527 0.665
EARNN 0.393 0.303 0.569 0.720
TABLE IV: Performances of answer selection on four metrics. Results are divided into four parts according to the ratio of questions in training sets. Upper left: , Upper right: , bottom left:, bottom right:
NDCG@1 NDCG@5 NDCG@10 DOA
BM25 0.556 0.623 0.711 0.560
TRLM 0.611 0.629 0.700 0.523
rankSVM 0.675 0.684 0.763 0.619
NBOW 0.644 0.686 0.756 0.623
CNTN 0.650 0.695 0.763 0.625
WEC_CNN 0.652 0.693 0.762 0.628
EARNN_s 0.677 0.707 0.772 0.636
EARNN_w 0.680 0.711 0.775 0.640
EARNN 0.720 0.731 0.793 0.651
NDCG@1 NDCG@5 NDCG@10 DOA
BM25 0.545 0.621 0.712 0.560
TRLM 0.601 0.621 0.697 0.512
rankSVM 0.661 0.684 0.755 0.608
NBOW 0.639 0.685 0.756 0.618
CNTN 0.652 0.693 0.763 0.624
WEC_CNN 0.650 0.691 0.761 0.626
EARNN_s 0.670 0.700 0.768 0.629
EARNN_w 0.676 0.706 0.770 0.631
EARNN 0.705 0.729 0.792 0.651
NDCG@1 NDCG@5 NDCG@10 DOA
BM25 0.544 0.620 0.710 0.557
TRLM 0.603 0.623 0.698 0.514
rankSVM 0.657 0.682 0.753 0.601
NBOW 0.629 0.683 0.755 0.614
CNTN 0.648 0.695 0.766 0.625
WEC_CNN 0.631 0.688 0.761 0.623
EARNN_s 0.652 0.698 0.766 0.630
EARNN_w 0.666 0.704 0.772 0.632
EARNN 0.695 0.726 0.791 0.648
NDCG@1 NDCG@5 NDCG@10 DOA
BM25 0.543 0.617 0.709 0.558
TRLM 0.606 0.623 0.696 0.514
rankSVM 0.659 0.681 0.752 0.603
NBOW 0.628 0.680 0.752 0.607
CNTN 0.647 0.694 0.765 0.624
WEC_CNN 0.629 0.688 0.760 0.619
EARNN_s 0.660 0.700 0.769 0.629
EARNN_w 0.664 0.703 0.771 0.631
EARNN 0.687 0.721 0.788 0.645
TABLE V: Performances of answer ranking on four metrics. Results are divided into four parts according to the ratio of questions in training sets. Upper left: , Upper right: , bottom left:, bottom right:

Evaluation Metrics. For evaluation, we rank answers as their predicted scores and compare the rank list with the ground truth (0/1 for answer selection task; ranking as their real upvotes for answer ranking task) to compute metrics.

Answer Selection. Since each answer is only noted by “good” or “bad” (i.e., 1 or 0), we adopt three types of metrics widely used in information retrieval. They are precision at (P@k), mean average precision (MAP) and mean reciprocal rank (MRR). Specifically, given the rank list, P@k measures the precision on the top answers. MAP is the mean of the average precision scores and MRR is the position of the first good answer in the candidate list. Formally, for each question, P@k, MAP and MRR are defined as:

(9)

where is a binary function on -th answer in the rank list and it equals 1 when the -th answer is good, 0 otherwise; is the number of good answers, is the number of all answers, is the position of the first good answer in the rank list. These three metrics range from 0 to 1 and the larger the better. In our study, we choose P@5, P@10, MAP and MRR for evaluation.

Answer Ranking. For this task, we adopt two widely used ranking metrics. One names Normalized Discount Cumulative Gain (NDCG@k) [58], where represents the top- ranked answers. Formally, for each question, NDCG@k is defined as follows:

(10)

where iDCG is the ideal DCG and DCG is defined as

(11)

where equals to the rating of the -th answer.

Considering that NDCG@k measures the ranking quality of top- answers, we use another metric named Degree of Agreement (DOA) [59] which can measure the quality of an entire ranking list. Specifically, for a list of answers to a question, we assume is the observation and evaluation rank of the -th answer. Any pair of and , where , is said to be a correct order pair if . Then, for each question, DOA is defined as:

(12)

where is the number of correct order pairs and n is the number of candidate answers. Both NDCG@k and DOA range from 0 to 1 and the larger the better. In our study, we choose NDCG@1, NDCG@5, NDCG@10 and DOA for evaluation.

Fig. 6: Effects of hyper-parameter H on four metrics used in answer selection. (a) P@5 (b) P@10 (c) MAP (d) MRR
Fig. 7: Effects of hyper-parameter H on four metrics used in answer ranking. (a) NDCG@1 (b) NDCG@5 (c) NDCG@10 (d) DOA

Iv-C Experimental Results

Iv-C1 Analysis of topic effects and timeliness

Firstly, we report some evidences from Quora to strength our motivations. Generally, in CQA websites, when posting a question, the asker can choose a topic for it and this topic can help readers quickly find their favorite questions. Compared with those CQA sites whose topics are predefined such as Yahoo Answers, Quora provides an open way where the asker can create the topics for a specific question as shown in Fig. 1 (blue rectangle). Therefore, the asker usually extracts some key words from the question as topics which contain some intent of the asker. In Fig. 5, we plot the number of new topics per month by blue bars and also the number of all topics by the red line.666Based on the statistics from https://neilpatel.com/blog/quora/, Quora experienced an estimated 150% growth of the number of unique visitors between Dec. 2010 and Jan. 2011 (i.e., 16th month in Fig. 5). Accordingly, the number of topics also increased rapidly around that month. According to our statistics, since the establishment of Quora, the number of topics linearly increase at an average rate of 1,600 per week and it reaches 70 thousands over 41 weeks. With the rapid growth of the number of topics, there is a wealth of semantic information in them such as coarse ones (e.g., “Movies”, “Startups”) and fine ones (e.g., “Who Are the Best Professors at X”). Through the analysis of these topics, we can easily find the emphasis and hidden intention in the question. Then, the question can be better understood and it is beneficial for measuring the quality of answers.

Besides the topic effects, we also analyze the timeliness in CQA. In CQA website, after answering a question, the answer will be seen by all readers and visitors. Intuitively, with more and more users seeing it as time goes on, more upvotes are received even if the answer is not perfect. Therefore, when comparing two similar answers to one question, the early posted answer will receive more upvotes. In order to better illustration, we collect all upvotes of answers in a specific period and put them into different buckets according to the number of weeks they are posted (i.e., from one week to five weeks). Since most answers only obtain few upvotes and they have a bad effect on the analysis, we reserve answers with more than 10 upvotes. Then, we draw a box plot for each bucket as shown in Fig. 5. For each box plot, top bar is maximum observation, lower bar is minimum observation, top of box is the third quartile, bottom of box is the first quartile, middle bar is median value and red crosses are possible outliers. We can find the longer answers are posted, the larger the maximum upvotes. Similar phenomena occur on the third quartile and median value which indicate the analysis of timeliness is correct and worthy.

Iv-C2 Performance Comparisons

Secondly, we show the performance comparisons among all models on answer selection and answer ranking. For the former task, we list the results on P@5, P@10, MAP and MRR in Table IV, while for the latter task, we list the results on NDCG@1, NDCG@5, NDCG@10 and DOA in Table V. As indicated in two tables, since both tasks focus on measuring semantic matching between Q&A, the performances of them have shown the similar patterns. Our proposed models (i.e., EARNN, EARNN_w and EARNN_s) outperform the baselines at most cases, indicating the effectiveness of our models on exploring topic effects and timeliness. Specific to our three models, EARNN performs best and EARNN_w ranks the second, followed by EARNN_s. Particularly, EARNN_w always performs better than EARNN_s, indicating the topic effects on answer ranking and our word-level attention mechanism can succeed in capturing the deep semantic relations between topics and Q&A. Except EARNN, all models mainly consider the similarities of Q&A and EARNN_w beats the best baseline (i.e., WEC_CNN) with the promotion of 1.8%, 0.8%, 4.0%, 3.7% on P@5, P@10, MAP, MRR and 4.9%, 2.3%, 1.4%, 1.5% on NDCG@1, NDCG@5, NDCG@10, DOA. It suggests our models are better at measuring answer quality on our dataset. Besides, compared with EARNN_w, EARNN respectively increases 9.4%, 7.8%, 8.1%, 7.4% on four metrics in answer selection and 4.5%, 2.9%, 2.5%, 2.4% on four metrics in answer ranking so that we conclude that timeliness exists in CQA and our time-sensitive ranking function is effective on modeling this phenomenon.

Among the baselines, experimental results reveal the following points. First, in most cases, the performances of NBOW, CNTN and WEC_CNN are better than those of BM25 and TRLM indicating the powerful strength of deep learning models. However, as a conventional approach, RankSVM is still competitive with simple neural network methods, e.g., NBOW. Second, the performances of NBOW are not good enough compared with the performances of other neural network models, which demonstrates that CNN or RNN models can truly capture the semantic information in Q&A. Third, the observation that WEC_CNN performs quite well among the baselines shows that the interaction among words is effective to evaluate the relevance between sentences.

Fig. 8: Attention visualizations of the question and two answers. Each sentence begins with a number in red circle which denotes the current sentence. There is a stripe in blue beneath each word. The deeper the color, the more important the word is. The thicker the stripe, the more important the sentence is. (a) The visualization case of Q&A in Fig. 1. (b) Another visualization case answered by Andrew Ng.

Iv-C3 Parameter Sensitiveness

Here, we evaluate the sensitiveness of hyper-parameter which can adjust the weight of the decay factor in (7). As mentioned in Section III-B, the decay factor is used to model the timeliness. When time goes on, the decay factor becomes smaller so that the value of the corresponding answer becomes lower and lower. Fig. 6 shows the performances on answer selection, while Fig. 7 shows the performances on answer ranking. Since we are intended to exploit the impact of the decay factor, we compare the performances between EARNN (blue curve) and EARNN_w (red curve) where there is only one difference on the prediction. In this part, we test performances with the 90%-10% partitioning data. Since the decay factor is close to 0 when is too small, we change hyper-parameter beginning with to larger value. From the results, we notice that the performances of EARNN are better than those of EARNN_w in most cases. When varies from to , the results appear a trend of going up first and then declining. Among different metrics, the performances achieve the peak at different point. However, the best value always ranges between . After the value exceeds , the decay factor is close to 1 so that EARNN degenerates into EARNN_w.

Iv-C4 Case Study

Here, we will illustrate one outstanding ability of EARNN on distinguishing the emphasis of Q&A, i.e., using attention scores generated by our word-level attention mechanism. Fig. 8 shows the sentence score in (3) and the word score in (5) of Q&A with two cases. One is the motivating example in Fig. 1 and another is a suggestion from Andrew Ng for a fresher of machine learning777Andrew Ng’s answer is posted on the following website: https://www.quora.com/I-do-not-have-strong-mathematics-background-what-should-I-learn-in-mathematics-to-be-able-to-master-Machine-Learning-and-AI. Specifically, in order to clearly visualize, we classify all words (sentences) into several rates according to their word (sentence) scores (), i.e., the horizontal stripe beneath each word. The thickness of the stripe represents the importance of sentences and the depth of color represents the importance of words. Intuitively, the thicker the stripe, the deeper the color, the more important it is. In particular, the thickness of the stripe in questions makes no sense. From Fig. 8, we can easily find those words which can appeal to travellers are highlighted such as “beauty”, “amazing” and many scenic spots. On the contrary, most prepositions and adverbs are assigned lower attention scores such as “in”, “on”, “most”, etc. Compared with the second answer A2, the first answer A1 describes a surreal and amazing lake (i.e., the sixth sentence) and the third answer A3 involves multiple scenic spots (i.e., the fifth sentence). These two answers are much better in terms of the content.

We also illustrate an answer from Andrew Ng for a fresher who is asking for help mastering machine learning and AI. From the machine understanding, it is convincing that the asker focuses on words like “mathematics”, “machine learning” and “AI”. In Andrew Ng’s answer, machine finds the first and the ninth sentences are helpful to solve the question. Especially the first sentence involves several mathematics such as “linear algebra”, “probability and statistics”, “calculus” and “optimization”. Although Andrew Ng refers to some opinions about machine learning (i.e., the fifth and the sixth sentences) and his own experience (i.e., the seventh sentence), machine dose not treat them as important ones. These two visualizations illustrate that our two attention mechanisms can clearly capture the emphasis of Q&A which is beneficial for the development of CQA.

V Summary and Future work

In this paper, we comprehensively inspected the answer selection and ranking problems by taking full advantage of both Q&A and multi-facet domain effects. Particularly, we developed a serialized LSTM together with two enhanced attention mechanisms to model topic effects. Meanwhile, the emphasis of Q&A was automatically distinguished. We also designed a time-sensitive ranking function to establish the relations between Q&A and timeliness. We evaluated the performances of EARNN using the dataset from Quora and extensive experimental results clearly validated the effectiveness and interpretability of EARNN. In the future, we plan to generalize our model to those CQA systems whose topics are predefined such as Yahoo Answers. We would also like to exploit and model more domain effects based on our findings in CQA.

References

  • [1] G. Wang, K. Gill, M. Mohanlal, H. Zheng, and B. Y. Zhao, “Wisdom in the social crowd: an analysis of quora,” in Proceedings of the 22nd international conference on World Wide Web.   ACM, 2013, pp. 1341–1352.
  • [2] I. Srba and M. Bielikova, “A comprehensive survey and classification of approaches for community question answering,” ACM Transactions on the Web (TWEB), vol. 10, no. 3, p. 18, 2016.
  • [3] A. Severyn and A. Moschitti, “Learning to rank short text pairs with convolutional deep neural networks,” in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.   ACM, 2015, pp. 373–382.
  • [4] L. Yang, Q. Ai, J. Guo, and W. B. Croft, “Anmm: Ranking short answer texts with attention-based neural matching model,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.   ACM, 2016, pp. 287–296.
  • [5] P. Liu, X. Qiu, J. Chen, and X. Huang, “Deep fusion lstms for text semantic matching,” in Proceedings of Annual Meeting of the Association for Computational Linguistics, 2016.
  • [6] Y. Shen, W. Rong, N. Jiang, B. Peng, J. Tang, and Z. Xiong, “Word embedding based correlation model for question/answer matching.” in AAAI, 2017, pp. 3511–3517.
  • [7] Z. Zhao, H. Lu, V. W. Zheng, D. Cai, X. He, and Y. Zhuang, “Community-based question answering via asymmetric multi-faceted ranking network learning.” in AAAI, 2017, pp. 3532–3539.
  • [8] L. Nie, X. Wei, D. Zhang, X. Wang, Z. Gao, and Y. Yang, “Data-driven answer selection in community qa systems.” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 6, pp. 1186–1198, 2017.
  • [9] T. P. Sahu, N. K. Nagwani, and S. Verma, “Selecting best answer: An empirical analysis on community question answering sites,” IEEE Access, vol. 4, pp. 4797–4808, 2016.
  • [10] Z. Zhang, Q. Li, and D. Zeng, “Mining evolutionary topic patterns in community question answering systems,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 41, no. 5, pp. 828–833, 2011.
  • [11] J. Jeon, W. B. Croft, J. H. Lee, and S. Park, “A framework to predict the quality of answers with non-textual features,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval.   ACM, 2006, pp. 228–235.
  • [12] J. Bian, Y. Liu, E. Agichtein, and H. Zha, “Finding the right facts in the crowd: factoid question answering over social media,” in Proceedings of the 17th international conference on World Wide Web.   ACM, 2008, pp. 467–476.
  • [13] Z. Zhu, D. Bernhard, and I. Gurevych, “A multi-dimensional model for assessing the quality of answers in social q&a sites,” in International Conference on Information Quality, Iciq 2009, Hasso Plattner Institute, University of Potsdam, Germany, November, 2009, pp. 264–265.
  • [14] C. Shah and J. Pomerantz, “Evaluating and predicting answer quality in community qa,” in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval.   ACM, 2010, pp. 411–418.
  • [15] D. H. Dalip, M. A. Gonçalves, M. Cristo, and P. Calado, “Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow,” in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval.   ACM, 2013, pp. 543–552.
  • [16] P. Rosso, L.-F. Hurtado, E. Segarra, and E. Sanchis, “On the voice-activated question answering,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 1, pp. 75–85, 2012.
  • [17] M. T. Mills and N. G. Bourbakis, “Graph-based methods for natural language processing and understanding—a survey and analysis,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 1, pp. 59–71, 2014.
  • [18] W.-t. Yih, M.-W. Chang, C. Meek, and A. Pastusiak, “Question answering using enhanced lexical semantic models,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2013, pp. 1744–1753.
  • [19] H. Cui, R. Sun, K. Li, M.-Y. Kan, and T.-S. Chua, “Question answering passage retrieval using dependency relations,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval.   ACM, 2005, pp. 400–407.
  • [20] M. Heilman and N. A. Smith, “Tree edit models for recognizing textual entailments, paraphrases, and answers to questions,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics.   Association for Computational Linguistics, 2010, pp. 1011–1019.
  • [21] A. Severyn and A. Moschitti, “Automatic feature engineering for answer selection and extraction,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 458–467.
  • [22] D. A. Smith and J. Eisner, “Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies,” in Proceedings of the Workshop on Statistical Machine Translation.   Association for Computational Linguistics, 2006, pp. 23–30.
  • [23] M. Wang, N. A. Smith, and T. Mitamura, “What is the jeopardy model? a quasi-synchronous grammar for qa,” in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007.
  • [24] L. Cai, G. Zhou, K. Liu, and J. Zhao, “Learning the latent topics for question retrieval in community qa,” in Proceedings of 5th international joint conference on Natural Language Processing, 2011, pp. 273–281.
  • [25] Z. Ji, F. Xu, B. Wang, and B. He, “Question-answer topic model for question retrieval in community question answering,” in Proceedings of the 21st ACM international conference on Information and knowledge management.   ACM, 2012, pp. 2471–2474.
  • [26] S. Ding, G. Cong, C.-Y. Lin, and X. Zhu, “Using conditional random fields to extract contexts and answers of questions from online forums,” Proceedings of ACL-08: HLT, pp. 710–718, 2008.
  • [27] X. Yao, B. Van Durme, C. Callison-Burch, and P. Clark, “Answer extraction as sequence tagging with tree edit distance,” in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013, pp. 858–867.
  • [28] Y. Yao, H. Tong, F. Xu, and J. Lu, “Scalable algorithms for cqa post voting prediction,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1723–1736, 2017.
  • [29] J. Jeon, W. B. Croft, and J. H. Lee, “Finding similar questions in large question and answer archives,” in Proceedings of the 14th ACM international conference on Information and knowledge management.   ACM, 2005, pp. 84–90.
  • [30] X. Xue, J. Jeon, and W. B. Croft, “Retrieval models for question and answer archives,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.   ACM, 2008, pp. 475–482.
  • [31] J.-T. Lee, S.-B. Kim, Y.-I. Song, and H.-C. Rim, “Bridging lexical gaps between queries and questions on large online q&a collections with compact translation models,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing.   Association for Computational Linguistics, 2008, pp. 410–418.
  • [32] S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu, “Statistical machine translation for query expansion in answer retrieval,” in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 2007, pp. 464–471.
  • [33] G. Zhou, L. Cai, J. Zhao, and K. Liu, “Phrase-based translation model for question retrieval in community question answer archives,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1.   Association for Computational Linguistics, 2011, pp. 653–662.
  • [34] A. Singh, “Entity based q&a retrieval,” in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.   Association for Computational Linguistics, 2012, pp. 1266–1277.
  • [35] B. Hu, Z. Lu, H. Li, and Q. Chen, “Convolutional neural network architectures for matching natural language sentences,” in Advances in neural information processing systems, 2014, pp. 2042–2050.
  • [36] D. Wang and E. Nyberg, “A long short-term memory model for answer sentence selection in question answering,” in Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2015, pp. 707–712.
  • [37] X. Qiu and X. Huang, “Convolutional neural tensor network architecture for community-based question answering.” in IJCAI, 2015, pp. 1305–1311.
  • [38] B. Wang, X. Wang, C. Sun, B. Liu, and L. Sun, “Modeling semantic relevance for question-answer pairs in web social communities,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.   Association for Computational Linguistics, 2010, pp. 1230–1238.
  • [39] M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, and H. Daumé III, “A neural network for factoid question answering over paragraphs,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 633–644.
  • [40] A. Kamel, B. Sheng, P. Yang, P. Li, R. Shen, and D. D. Feng, “Deep convolutional neural networks for human action recognition using depth maps and postures,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, no. 99, 2018.
  • [41] B. Pourbabaee, M. J. Roshtkhari, and K. Khorasani, “Deep convolutional neural networks and learning ecg features for screening paroxysmal atrial fibrillation patients,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, no. 99, pp. 1–10, 2017.
  • [42] H. Zhao, B. Jin, Q. Liu, Y. Ge, E. Chen, X. Zhang, and T. Xu, “Voice of charity: Prospecting the donation recurrence & donor retention in crowdfunding,” IEEE Transactions on Knowledge and Data Engineering, 2019.
  • [43] C.-M. Lin and E.-A. Boldbaatar, “Autolanding control using recurrent wavelet elman neural network,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 9, pp. 1281–1291, 2015.
  • [44] P. Liu, Z. Zeng, and J. Wang, “Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 4, pp. 512–523, 2016.
  • [45] B. Jin, H. Zhao, E. Chen, Q. Liu, and Y. Ge, “Estimating the days to success of campaigns in crowdfunding: A deep survival perspective,” in Thirty-Third AAAI Conference on Artificial Intelligence, 2019.
  • [46] S. Wan, Y. Lan, J. Guo, J. Xu, L. Pang, and X. Cheng, “A deep architecture for semantic matching with multiple positional sentence representations.” in AAAI, 2016, pp. 2835–2841.
  • [47] W. Yin, H. Schütze, B. Xiang, and B. Zhou, “Abcnn: Attention-based convolutional neural network for modeling sentence pairs,” Computer Science, 2015.
  • [48] Z. Huang, Q. Liu, E. Chen, H. Zhao, M. Gao, S. Wei, Y. Su, and G. Hu, “Question difficulty prediction for reading problems in standard tests,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  • [49] Z. Li, H. Zhao, Q. Liu, Z. Huang, T. Mei, and E. Chen, “Learning from history and present: Next-item recommendation via discriminatively exploiting user behaviors,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.   ACM, 2018, pp. 1734–1743.
  • [50] H. Tao, S. Tong, H. Zhao, T. Xu, B. Jin, and Q. Liu, “A radical-aware attention-based model for chinese text classification,” in Thirty-Third AAAI Conference on Artificial Intelligence, 2019.
  • [51] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
  • [52] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on.   IEEE, 2013, pp. 6645–6649.
  • [53] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in Neural networks for perception.   Elsevier, 1992, pp. 65–93.
  • [54] G. B. Orr and K.-R. Müller, Neural networks: tricks of the trade.   Springer, 2003.
  • [55] S. E. Robertson and S. Walker, “On relevance weights with little relevance information,” in ACM SIGIR Forum, vol. 31, no. SI.   ACM, 1997, pp. 16–24.
  • [56] T. Joachims, “Training linear svms in linear time,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2006, pp. 217–226.
  • [57] Y. Bengio et al., “Learning deep architectures for ai,” Foundations and trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
  • [58] K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of ir techniques,” ACM Transactions on Information Systems (TOIS), vol. 20, no. 4, pp. 422–446, 2002.
  • [59] Q. Liu, E. Chen, H. Xiong, C. H. Ding, and J. Chen, “Enhancing collaborative filtering by user interest expansion via personalized ranking,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 1, pp. 218–233, 2012.

Binbin Jin received the B.E. degree in computer science from the University of Science and Technology of China (USTC), Hefei, China, 2016. He is currently working toward the Ph.D. degree in the Department of Computer Science and Technology, University of Science and Technology of China (USTC) under the advisory of professor Enhong Chen. His research interests include data mining, deep learning and applications in Community Question Answering (CQA) and Internet finance-based websites (such as crowdfunding). He has published several papers in referred journals and conference proceedings, such as IEEE Transactions on Knowledge and Data Engineering and AAAI.

Enhong Chen(SM’07) is a professor and vice dean of the School of Computer Science at University of Science and Technology of China (USTC). He received the Ph.D. degree from USTC. His general area of research includes data mining and machine learning, social network analysis and recommender systems. He has published more than 100 papers in refereed conferences and journals, including IEEE Trans. KDE, IEEE Trans. MC, KDD, ICDM, NIPS, and CIKM. He was on program committees of numerous conferences including KDD, ICDM, SDM. He received the Best Application Paper Award on KDD-2008, the Best Research Paper Award on ICDM-2011 and Best of SDM-2015. His research is supported by the National Science Foundation for Distinguished Young Scholars of China. He is a senior member of the IEEE.

Hongke Zhao received the Ph.D. degree from the University of Science and Technology of China. He is currently a faculty member of the College of Management and Economics, Tianjin University. His research interest includes data mining, knowledge management and Internet finance-based applications, such as Crowdfunding and P2P lending. He has published more than 20 papers in refereed journals and conference proceedings, such as ACM Transactions on Intelligent Systems and Technology, IEEE Transactions on Systems, Man, and Cybernetics: Systems, IEEE Transactions on Big Data, Information Sciences, ACM SIGKDD, IJCAI, AAAI and IEEE ICDM. He was the program committee member at ACM SIGKDD, AAAI, etc.

Zhenya Huang received the B.E. degree in software engineering from Shandong University (SDU), China, in 2014. He is currently working toward the Ph.D. degree in the School of Computer Science and Technology at University of Science and Technology of China (USTC). His main research interests include data mining and knowledge discovery, recommender systems and intelligent education systems. He has published several papers in referred conference proceedings, such as AAAI, CIKM, DASFAA and SIGKDD.

Qi Liu is an associate professor at University of Science and Technology of China (USTC). He received the Ph.D. degree in Computer Science from USTC. His general area of research is data mining and knowledge discovery. He has published prolifically in refereed journals and conference proceedings, e.g., TKDE, TOIS, TKDD, TIST, KDD, IJCAI, AAAI, ICDM, SDM and CIKM. He has served regularly in the program committees of a number of conferences, and is a reviewer for the leading academic journals in his fields. He is a member of ACM and IEEE. Dr. Liu is the recipient of the ICDM 2011 Best Research Paper Award, the Best of SDM 2015 Award.

Hengshu Zhu (M’14) received the Ph.D. degree in 2014 and B.E. degree in 2009, both in Computer Science from University of Science and Technology of China (USTC), China. He is currently a senior data scientist at Baidu Inc. His general area of research is data mining and machine learning, with a focus on developing advanced data analysis techniques for emerging applied business research. He has published prolifically in refereed journals and conference proceedings, including IEEE TKDE, IEEE TMC, ACM TKDD, KDD, IJCAI, and AAAI, etc. He was regularly on the program committees of numerous conferences, and has served as a reviewer for many top journals in relevant fields.

Shui Yu is a Professor of School of Computer Science, University of Technology Sydney, Australia. Dr Yu’s research interest includes Security and Privacy, Networking, Big Data, and Mathematical Modelling. He has published two monographs and edited two books, more than 280 technical papers, including top journals and top conferences. Dr Yu initiated the research field of networking for big data in 2013. His h-index is 37. He is currently serving the editorial boards of IEEE Communications Surveys and Tutorials (Area Editor), IEEE Communications Magazine, IEEE Internet of Things Journal, IEEE Communications Letters, IEEE Access, and IEEE Transactions on Computational Social Systems. He is a Senior Member of IEEE, a member of AAAS and ACM, and a Distinguished Lecturer of IEEE Communication Society.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
371425
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description