DeepMood: Modeling Mobile Phone Typing Dynamics for Mood Detection
Abstract.
The increasing use of electronic forms of communication presents new opportunities in the study of mental health, including the ability to investigate the manifestations of psychiatric diseases unobtrusively and in the setting of patients’ daily lives. A pilot study to explore the possible connections between bipolar affective disorder and mobile phone usage was conducted. In this study, participants were provided a mobile phone to use as their primary phone. This phone was loaded with a custom keyboard that collected metadata consisting of keypress entry time and accelerometer movement. Individual character data with the exceptions of the backspace key and space bar were not collected due to privacy concerns. We propose an endtoend deep architecture based on late fusion, named DeepMood, to model the multiview metadata for the prediction of mood scores. Experimental results show that 90.31% prediction accuracy on the depression score can be achieved based on sessionlevel mobile phone typing dynamics which is typically less than one minute. It demonstrates the feasibility of using mobile phone metadata to infer mood disturbance and severity.
1. Introduction
Mobile phones, in particular, “smartphones” have become near ubiquitous with 2 billion smartphone users worldwide. This presents new opportunities in the study and treatment of psychiatric illness including the ability to study the manifestations of psychiatric illness in the setting of patients’ daily lives in an unobtrusive manner and at a level of detail that was not previously possible. Continuous realtime monitoring in naturalistic settings and collection of automatically generated smartphone data that reflect illness activity could facilitate early intervention and have a potential use as objective outcome measures in efficacy trials (Ankers and Jones, 2009; Bopp et al., 2010; FaurholtJepsen et al., 2016).
While mobile phones are used for a variety of tasks the most widely and frequently used feature is text messaging. To the best of our knowledge, no previous studies (Association et al., 2013; Puiatti et al., 2011; Frost et al., 2013; Gruenerbl et al., 2014; Schleusing et al., 2011; Valenza et al., 2014) have investigated the relationship between mobile phone typing dynamics and mood states. In this work, we aim to determine the feasibility of inferring mood disturbance and severity from such data. In particular we seek to investigate the relationship between the digital footprints and mood in bipolar affective disorder which has been deemed the most expensive behavioral health care diagnosis (Peele et al., 2003), costing more than twice as much as depression per affected individual (Laxman et al., 2008). For every dollar allocated to outpatient care for people with bipolar disorder, $1.80 is spent on inpatient care, suggesting early intervention and improved prevention management could decrease the financial impact of this illness (Peele et al., 2003).
We study the mobile phone typing dynamics metadata on a sessionlevel. A session is defined as beginning with a keypress which occurs after 5 or more seconds have elapsed since the last keypress and continuing until 5 or more seconds elapse between keypresses

Unaligned views: An intuitive idea for fusing the multiview time series is to align them with each unique timestamp. However, features defined in one view would be missing for data points collected in another view. For example, a data point in special characters has no acceleration in accelerometer values or distance from last key in alphanumeric characters
^{9} . 
Dominant views: One may also attempt to do the fusion by concatenating the multiview time series per session. However, the views usually have different densities in a session, because the metadata are collected from different sources or sensors. For example, characterrelated metadata collected following a person’s typing behaviours are much sparser than accelerometer values collected in the background which have 16 times more data points in our dataset. Dense views could dominate a concatenated feature space and potentially override the effects of sparse but important views.

View interactions: The multiview time series from typing dynamics contains complementary information reflecting a person’s mental health. The relationship between the digital footprints and mood states can be highly nonlinear. An effective fusion strategy is needed to explore feature interactions across different views.
In this paper, we propose a deep architecture based on late fusion, named DeepMood, to model mobile phone typing dynamics, as illustrated in Figure 2. The contributions of this work are threefold:

Data analysis (Section 2): We obtain interesting insights related to the digital footprints on mobile phones by analyzing the correlation between patterns of typing dynamics metadata and mood in bipolar affective disorder.

A novel fusion strategy in a deep framework (Section 3): Motivated by the aforementioned challenges that early fusion strategies (i.e., aligning views with timestamps or concatenating views per session) would lead to the problems of unaligned or dominant views, we propose a twostage late fusion approach for modeling the multiview time series data. In the first stage, each view of the time series is separately modeled by a Recurrent Neural Network (RNN) (Mikolov et al., 2010; Sutskever et al., 2011). The multiview metadata are then fused in the second stage by exploring interactions across the output vectors from each view, where three alternative approaches are developed following the idea of Multiview Machines (Cao et al., 2016), Factorization Machines (Rendle, 2012), or in a fully connected fashion.

Empirical evaluations (Section 4): We conduct experiments showing that 90.31% prediction accuracy on the depression score can be achieved based on sessionlevel typing dynamics which reveals the potential of using mobile phone metadata to predict mood disturbance and severity. Our code is opensourced at https://www.cs.uic.edu/~bcao1/code/DeepMood.py.
2. Data
The data used in this work were collected from the BiAffect
In this work, we study the collected metadata for participants including bipolar subjects and normal controls who had provided at least one week of metadata. There are 7 participants with bipolar I disorder that involves periods of severe mood episodes from mania to depression, 5 participants with bipolar II disorder which is a milder form of mood elevation, involving milder episodes of hypomania that alternate with periods of severe depression, and 8 participants with no diagnosis per DSMIV TR criteria (Kessler et al., 2005).
Participants were administered the Hamilton Depression Rating Scale (HDRS) (Williams, 1988) and Young Mania Rating Scale (YMRS) (Young et al., 1978) once a week which are used as the golden standard to assess the level of depressive and manic symptoms in bipolar disorder. However, the use of these clinical rating scales requires a facetoface patientclinician encounter, and the level of affective symptoms is assessed during a clinical evaluation. Study findings may be unreliable when using rating scales as outcome measures due to methodological issues such as unblinding of raters and patients, differences in rater experiences and missing visits for outcome assessments (Demitrack et al., 1998; Psaty and Prentice, 2010; FaurholtJepsen et al., 2016). Thus, it motivates us to explore more objective methods with realtime data for assessing affective symptoms.
2.1. Alphanumeric Characters
Due to privacy reasons, we only collected metadata for keypresses on alphanumeric characters, including duration of a keypress, time since last keypress, and distance from last key along two axises. Firstly, we aim to assess the correlation between duration of a keypress and mood states. The complementary cumulative distribution functions (CCDFs) of duration of a keypress are displayed in Figure 3. Data points with different scores are colored differently, and the range of mood scores corresponds to the colorbar. In general, the higher the score, the darker the color and the more severe the depressive or manic symptoms. According to the KolmogorovSmirnov test on two samples, for all the pairs of distributions, we can reject the null hypothesis that two samples are drawn from the same distribution with significance level . As expected, we are dealing with a heavytailed distribution: (1) most keypresses are very fast with median 85ms, (2) but a nonnegligible number have longer duration with 5% using more than 155ms. Interestingly, samples with mild depression tend to have shorter duration than normal ones, while those with severe depression stand in the middle. Samples in manic symptoms seem to hold a key longer than normal ones.
Next we ask how the time since last keypress correlates with mood states. We show the CCDFs of time since last keypress in Figure 4. Based on the KolmogorovSmirnov test, for 98.06% in HDRS and 99.52% in YMRS of the distribution pairs, we can reject the null hypothesis that two samples are drawn from the same distribution with significance level . Not surprisingly, this distribution is heavily skewed, with most time intervals being very short with median 380ms. However, there is a significant fraction of keypresses with much longer intervals where 5% have more than 1.422s. We can observe that the values of time since last keypress from the normal group (with light blue/red) approximate a uniform distribution on the log scale in the range from 0.1s to 2.0s. On the contrary, this metric from samples with mood disturbance (with dark blue/red) shows a more skewed distribution with a few values on the two tails and majority centered between 0.4s and 0.8s. In other words, healthy people show a good range of reactivity that gets lost in mood disturbance where the range is more restricted.
Figure 5 shows the CCDFs of distance from last key along two axises which can be considered as a sort of very rough proxy of the semantic content of people’s typing. No distinction can be observed across different mood states, because there are no dramatic differences in the manner in which depressive or manic people type compared to controls.
2.2. Special Characters
In this view, we use onehotencoding for typing behaviors other than alphanumeric characters, including autocorrect, backspace, space, suggestion, switchingkeyboard and other. They are usually sparser than alphanumeric characters. Figure 6 shows the scatter plot between rates of these special characters as well as alphanumeric ones in a session where the color of a dot/line corresponds to the HDRS score. Although no obvious distinction can be found between mood states, we can observe some interesting patterns: the rate of alphanumeric keys is negatively correlated with the rate of backspace (from the subfigure at the 2nd row, 7th column), while the rate of switchingkeyboard is positively correlated with the rate of other keys (from the subfigure at the 5th row, 6th column). On the diagonal there are kernel density estimations. It shows that the rate of alphanumeric characters is generally high in a session, followed by autocorrect, space, backspace, etc. Similar patterns can be found from the plot of YMRS which is omitted here.
2.3. Accelerometer Values
Accelerometer values are recorded every 60ms in the background during an active session regardless of a person’s typing speed, thereby making them much denser than alphanumeric characters. The CCDFs of absolute accelerometer values along three axises are displayed in Figure 7. Data points with different mood scores are colored differently, and the higher the score, the more severe the depressive or manic symptoms. According to the KolmogorovSmirnov test on two samples, for all the pairs of distributions, we can reject the null hypothesis that two samples are drawn from the same distribution with significance level . Note that the vertical axis of the nonzoomed plots is on a log scale. We observe a heavytailed distribution for all three axises and for both HDRS and YMRS, with more than 99% of data points being less than 7.45, 9.97 and 10.56 along X, Y and Z axis, respectively. By zooming into data points at the “head” of the distribution on a regular scale, we can see different patterns on the absolute acceleration along different axises. There is a nearly uniform distribution of absolute acceleration along the Y axis in the range from 0 to 10, while the majority along the X axis lie between 0 and 2, and the majority along the Z axis lie between 6 and 10. An interesting observation is that compared with normal ones, samples with mood disturbance tend to have larger accelerations along the Z axis, and smaller accelerations along the Y axis. Hence, we suspect that people in a normal mood state prefer to hold their phone towards to themselves, while people in depressive or manic symptoms are more likely to lay their phone with an angle towards to the horizon, given that data were collected only when the phone was in a portrait position.
See Table 1 for more information about the statistics of the dataset. Note that the length of a sequence is measured in terms of the number of data points in a sample rather than the duration in time.
Statistics  Alph.  Spec.  Accel. 

# data points  836,027  538,520  14,237,503 
# sessions  34,993  33,385  37,647 
mean length  24  16  378 
median length  14  9  259 
maximum length  538  437  90,193 
3. DeepMood Architecture based on Late Fusion
In this paper, we propose an endtoend deep architecture, named DeepMood, to model mobile phone typing dynamics. Specifically, DeepMood provides a late fusion framework. It first models each view of the time series data separately using Gated Recurrent Unit (GRU) (Cho et al., 2014), a simplified version of Long ShortTerm Memory (LSTM) (Hochreiter and Schmidhuber, 1997). It then fuses the output of the GRU from each view. As the GRU extracts a latent feature representation out of each time series, where the notions of sequence length and sampling time points are removed from the latent space, this avoids the problem of dealing directly with the heterogeneity of the time series from each view. Following the idea of Multiview Machines (Cao et al., 2016), Factorization Machines (Rendle, 2012), or in a conventional fully connected fashion, three alternative fusion layers are designed to integrate the complementary information in the multiview time series to produce a prediction on the mood score. The architecture is illustrated in Figure 2.
3.1. Modeling One View
Each view in the metadata is essentially a time series whose length can vary a lot across sessions that largely depends on the duration of a session. In order to model the dynamic sequential correlations in each time series, we adopt the RNN architecture (Mikolov et al., 2010; Sutskever et al., 2011) which keeps hidden states over a sequence of elements and updates the hidden state by the current input as well as the previous hidden state where with a recurrent function:
(1) 
The simplest form of an is as follows:
(2) 
where are model parameters that need to be learned, and are the input dimension and the number of recurrent units, respectively. is a nonlinear transformation function such as tanh, sigmoid, and rectified linear unit (ReLU). Since RNNs in such a form would fail to learn long term dependencies due to the exploding and the vanishing gradient problem (Bengio et al., 1994; Hochreiter, 1998), they are not suitable to learn dependencies from a long input sequence in practice.
To make the learning procedure more effective over long sequences, the GRU (Cho et al., 2014) is proposed as a variation of the LSTM unit (Hochreiter and Schmidhuber, 1997). The GRU has been attracting great attentions since it overcomes the vanishing gradient problem in traditional RNNs and is more efficient than the LSTM in some tasks (Chung et al., 2014). The GRU is designed to learn from previous timestamps with long time lags of unknown size between important timestamps via memory units that enable the network to learn to both update and forget hidden states based on new inputs.
A typical GRU is formulated as:
(3)  
where is the elementwise multiplication operator, a reset gate allows the GRU to forget the previously computed state , and an update gate balances between the previous state and the candidate state . The hidden state can be considered as a compact representation of the input sequence from to .
3.2. Late Fusion on Multiple Views
Here we pursue a late fusion strategy to integrate the output vectors of the GRU units on these time series data from different views. This avoids the issues of alignment and diverse frequencies among the time series under different views when performing early fusion directly on the input data.
In the following we study alternative methods for performing late fusion. These include not only the straightforward approach based on adding a fully connected layer to concatenate the features from different views, but also novel approaches to capture interactions among the features across multiple views by exploring the concept of Factorization Machines (Rendle, 2012) to capture the secondorder interactions as well as the concept of Multiview Machines (Cao et al., 2016) to capture higher order interactions as shown in Figure 8.
We denote the output vectors at the end of a sequence from the th view as . We can consider as multiview data where is the number of views.
Fully connected layer. In order to generate a prediction on the mood score, a straightforward idea is to first concatenate features from multiple views together, i.e., , where is the total number of multiview features, and typically for onedirectional RNNs and for bidirectional RNNs. We then feed forward into one or several fully connected neural network layers with a nonlinear function in between.
(4)  
where , is the number of hidden units, is the number of classes, and the constant signal “1” is to model the global bias. Note that here we consider only one hidden layer between the input layer and the final output layer as shown in Figure 8(a).
Factorization Machine layer. Rather than capturing nonlinearity through the transformation function, we consider explicitly modeling feature interactions between input units as shown in Figure 8(b).
(5)  
where , is the number of factor units, and denotes the th class. By denoting , we can rewrite the decision function of in Eq. (5) as follows:
(6)  
One can easily see that this is similar to the twoway Factorization Machines (Rendle, 2012) except that the subscript ranges from to in the original form.
Multiview Machine layer. In contrast to modeling up to the secondorder feature interactions between all input units as in the Factorization Machine layer, we could further explore all feature interactions up to the thorder between inputs from views as shown in Figure 8(c).
(7)  
where is the factor matrix of the th view for the th class. By denoting , we can verify that Eq. (7) is equivalent to Multiview Machines (Cao et al., 2016).
(8)  
As shown in Figure 2, the fullorder feature interactions across multiple views are modeled in a tensor, and they are factorized in a collective manner.
Note that a dropout layer (Hinton et al., 2012) is applied before feeding the output from GRU to the fusion layer which is a regularization method designed to prevent coadaptation of feature detectors in deep neural networks. The dropout method randomly sets each unit as zero with a certain probability. The dropout units contribute to neither the feedforward process nor the backpropagation process.
Following the computational graph, it is straightforward to compute gradients for model parameters in both the Factorization Machine layer and the Multiview Machine layer, as we do for the conventional fully connected layer. Therefore, the error messages generated from the loss function on the final mood score can be backpropagated through these fusion layers all the way to the very beginning, i.e., , , , , , in GRU for each input view. In this manner, we can say that DeepMood is an endtoend learning framework for mood detection.
4. Experiments
We investigate a sessionlevel prediction problem. That is to say, we use features of alphanumeric characters, special characters and accelerometer values in a session to predict the mood score of the associated participant.
4.1. Experimental Setup
The implementation is completed using Keras (Chollet, 2015) with Tensorflow (et al., 2015) as the backend. The code has been made available at the author’s homepage
Experiments on the depression score HDRS are conducted as a binary classification task where . We consider sessions with the HDRS score between 0 and 7 (inclusive) as negative samples (normal) and those with HDRS greater than or equal to 8 as positive samples (from mild depression to severe depression). On the other hand, the mania score YMRS is more complicated without a widely adopted threshold. Therefore, YMRS is directly used as the label for a regression task where . Accuracy and Fscore are used to evaluate the classification task, and rootmeansquare error (RMSE) is used for the regression task.
Parameter  Value 

# recurrent units ()  4, 8, 16 
# factor units ()  4, 8, 16 
# epochs  500 
batch size  256 
learning rate  0.001 
dropout fraction  0.1 
maximum sequence length  100 
minimum sequence length  10 
4.2. Compared Methods
The compared methods are summarized as follows:

DMVM: The proposed DeepMood architecture with a Multiview Machine layer for data fusion.

DFM: The proposed DeepMood architecture with a Factorization Machine layer for data fusion.

DNN: The proposed DeepMood architecture with a conventional fully connected layer for data fusion.

XGB: The implementation of a tree boosting system from XGBoost
^{13} (Chen and Guestrin, 2016) is used. We concatenate the sequence data with the maximum length 100 (padding 0 for short ones) of each feature as the input. 
SVM and LR: These are two linear models. With the same input setting as XGB, the implementations of Linear Support Vector Classification/Regression and Logistic/Ridge Regression from scikitlearn
^{14} are used for Classification/Regression tasks.
In general, DMVM, DFM and DNN can be categorized as late fusion approaches, while XGB, SVM and LR are early fusion strategies for the sequence prediction problem on multiview time series. Note that the number of model parameters for fusing multiview data in DMVM and DFM is and , respectively, thereby leading to approximately the same model complexity due to . For DNN, the number of model parameters for fusion is . For a fair comparison, we need to control the model complexity of the compared methods at the same level. Therefore, in all experiments, we always set .
4.3. Prediction Performance
Experimental results are shown in Table 3. We can see that the late fusion based DeepMood methods are the best on the prediction for the dichotomized HDRS scores, especially DMVM and DFM with 90.31% and 90.21%, respectively. It demonstrates the feasibility of using passive typing dynamics from mobile phone metadata to predict the disturbance and severity of mood states. In addition, it is found that SVM and LR are not a good fit to this task, or sequence prediction in general. XGB performs reasonably well as an ensemble method, but DMVM still outperforms it by a significant margin 5.56%, 5.93% and 10.02% in terms of accuracy, Fscore and RMSE, respectively. Among the DeepMood variations, the improvement of DMVM and DFM over DNN reveals the potential of replacing a conventional fully connected layer with a Multiview Machine layer or Factorization Machine layer for data fusion in a deep framework. This is because DMVM and DFM can explicitly capture higher order interactions among features, while DNN does not capture any feature interaction.
Task  Classification  Regression  

Metric  Accuracy  Fscore  RMSE 
DMVM  0.9031  0.9070  3.5664 
DFM  0.9021  0.9029  3.6767 
DNN  0.8868  0.8929  3.7874 
XGB  0.8555  0.8562  3.9634 
SVM  0.7323  0.7237  4.1257 
LR  0.7293  0.7172  4.1822 
In practice, it is important to understand how the model works for each individual when monitoring her mood states. Therefore, we investigate the prediction performance of DMVM on each of the 20 participants in our dataset. Results are shown in Figure 9 where each dot represents a participant with the number of her contributed sessions in the training set and the corresponding prediction accuracy. We can see that the proposed model can steadily produce accurate predictions (87%) of a participant’s mood states when she provides more than 400 valid typing sessions in the training phase. Note that the prediction we make in this work is per session which is typically less than one minute. We can expect more accurate results on the daily level by ensembling sessions occurring during a day.
4.4. Convergence Efficiency
In this section, we show more details about the learning procedure of the proposed DeepMood architecture with different fusion layers and that of XGB. Figure 10 illustrates how the accuracy on the validation set changes over epochs. We observe that different fusion layers have different convergence performance in the first 300 epochs, and afterwards they steadily outperform XGB. Among the DeepMood methods, it is found that DMVM and DFM converge more efficiently than DNN in the first 300 epochs, and they reach a better local minima of the loss function at the end. This again shows the importance of the fusion layer in a deep framework. It is also interesting to see the convergence process of XGB considering its popularity and success on many tasks in practice. We found that the generalizability of XGB on the sequence prediction task is limited, although its training error could perfectly converge to 0 at an early stage.
4.5. Importance of Different Views
To better understand the role that different views play in the buildup of mood detection by DeepMood, we examine separate models trained with or without each view. Since DMVM is designed for heterogeneous data fusion, i.e., data with at least two views, we train DMVM on every pairwise views. Moreover, we train DFM on every single view. Experimental results are shown in Table 4. First, we observe that Spec. are poor predictors of mood states. Alph. and Accel. have significantly better predictive performance. Alph. are the best individual predictors of mood states. It validates a high correlation between the mood disturbance and typing patterns including duration of a keypress, time interval since the last keypress, as well as accelerometer values.
Task  Classification  Regression  

Metric  Accuracy  Fscore  RMSE 
DMVM w/o Alph.  0.8125  0.8164  3.9833 
DMVM w/o Spec.  0.9008  0.9034  3.8166 
DMVM w/o Accel.  0.8318  0.8253  3.9499 
DMVM w/ all  0.9031  0.9070  3.5664 
DFM w/ Alph.  0.8322  0.8224  3.9515 
DFM w/ Spec.  0.6260  0.5676  4.1040 
DFM w/ Accel.  0.8015  0.8089  3.9722 
DFM w/ all  0.9021  0.9011  3.6767 
5. Related Work
This work is studied in the context of supervised sequence prediction. Xing et al. provide a brief survey on the sequence prediction problem where sequence data are categorized into five subtypes: simple symbolic sequences, complex symbolic sequences, simple time series, multivariate time series, and complex event sequences (Xing et al., 2010). Sequence classification methods are grouped into three subtypes: feature based methods, sequence distance based methods, and model based methods. Feature based methods first transform a sequence into a feature vector and then apply conventional classification models (Lesh et al., 1999; Aggarwal, 2002; Leslie and Kuang, 2004; Ji et al., 2007; Ye and Keogh, 2009). Distance based methods include K nearest neighbor classifier (Keogh and Pazzani, 2000; Keogh and Kasetty, 2003; Ratanamahatana and Keogh, 2004; Wei and Keogh, 2006; Xi et al., 2006; Ding et al., 2008) and SVM with local alignment kernel (Lodhi et al., 2002; She et al., 2003; Sonnenburg et al., 2005) by measuring the similarity between a pair of sequences. Model based methods assume that sequences in a class are generated by an underlying probability distribution, including Naive Bayes (Cheng et al., 2005), Markov Model (Yakhnenko et al., 2005) and Hidden Markov Model (Srivastava et al., 2007).
However, most of the works focus on simple symbolic sequences and simple time series, with a few on complex symbolic sequences and multivariate time series. The problem of classifying complex event sequence data (a combination of multiple numerical measurements and categorical fields) still needs further investigation which motivates this work. Furthermore, most of the methods are devoted to shallow models with feature engineering. Inspired by the great success of deep RNNs in the applications of other sequence tasks, including speech recognition (Graves et al., 2013) and natural language processing (Mikolov et al., 2010; Bahdanau et al., 2014), in this work, we propose a deep architecture to model complex event sequences of mobile phone typing dynamics.
On multiview learning, Cao et al. propose to fuse multiview data through the operation of tensor product and assume that the effects of feature interactions across views have a low rank (Cao et al., 2014, 2016). Lu et al. extend it to multitask learning (Lu et al., 2017). Zhang et al. use Factorization Machines to initialize the bias terms and embedding vectors for multifield categorical data at the bottom layer of a deep architecture (Zhang et al., 2016b). There are also some work incorporating multiple views into the process of subgraph mining (Cao et al., 2015) and deep learning (Zhang et al., 2016a) to help identify meaningful patterns from data.
6. Conclusion
It appears that mobile phone metadata could be used to predict the presence of mood disorders. The proposed DeepMood architecture is able to achieve 90.31% prediction accuracy, where late fusion is indeed more effective than early fusion and more sophisticated fusion layer also helps. The ability to passively collect data that can be used to infer the presence and severity of mood disturbances may enable providers to provide interventions to more patients earlier in their mood episodes. Models such as the one presented here may also lead to deeper understanding of the effects of mood disturbances in the daily activities of people with mood disorders.
7. Acknowledgements
This work is supported in part by NSF through grants IIS1526499, and CNS1626432, and NSFC 61672313.
Footnotes
 copyright: rightsretained
 journalyear: 2017
 copyright: acmcopyright
 conference: KDD’17; ; August 13–17, 2017, Halifax, NS, Canada
 price: 15.00
 doi: 10.1145/3097983.3098086
 isbn: 9781450348874/17/08
 5second is an arbitrary threshold we set which can be changed and tuned easily.
 This is for privacy concerns, because malicious person may be able to unscramble and recover the texts using such information.
 http://www.biaffect.com
 http://www.moodchallenge.com
 https://www.cs.uic.edu/~bcao1/code/DeepMood.py
 https://github.com/dmlc/xgboost
 http://scikitlearn.org
References
 Charu C Aggarwal. 2002. On effective classification of strings with wavelets. In KDD. ACM, 163–172.
 David Ankers and Steven H Jones. 2009. Objective assessment of circadian activity and sleep patterns in individuals at behavioural risk of hypomania. Journal of clinical psychology 65, 10 (2009), 1071–1086.
 American Psychiatric Association and others. 2013. Diagnostic and statistical manual of mental disorders (DSM5®). American Psychiatric Pub.
 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
 Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning longterm dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157–166.
 Jedediah M Bopp, David J Miklowitz, Guy M Goodwin, Will Stevens, Jennifer M Rendell, and John R Geddes. 2010. The longitudinal course of bipolar disorder as revealed through weekly text messaging: a feasibility study. Bipolar disorders 12, 3 (2010), 327–334.
 Bokai Cao, Lifang He, Xiangnan Kong, Philip S Yu, Zhifeng Hao, and Ann B Ragin. 2014. Tensorbased Multiview Feature Selection with Applications to Brain Diseases. In ICDM.
 Bokai Cao, Xiangnan Kong, Jingyuan Zhang, Philip S Yu, and Ann B Ragin. 2015. Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification. In ICDM.
 Bokai Cao, Hucheng Zhou, Guoqiang Li, and Philip S Yu. 2016. Multiview Machines. In WSDM. ACM, 427–436.
 Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In KDD. ACM.
 Betty Yee Man Cheng, Jaime G Carbonell, and Judith KleinSeetharaman. 2005. Protein classification based on text document classification techniques. Proteins: Structure, Function, and Bioinformatics 58, 4 (2005), 955–970.
 Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
 François Chollet. 2015. Keras. https://github.com/fchollet/keras. (2015).
 Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
 Mark A Demitrack, Doug Faries, John M Herrera, David J DeBrota, and William Z Potter. 1998. The problem of measurement error in multisite clinical trials. Psychopharmacology bulletin 34, 1 (1998), 19.
 Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. VLDB 1, 2 (2008), 1542–1552.
 Martín Abadi et al. 2015. TensorFlow: LargeScale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org.
 Maria FaurholtJepsen, Maj Vinberg, Mads Frost, Sune Debel, Ellen Margrethe Christensen, Jakob E Bardram, and Lars Vedel Kessing. 2016. Behavioral activities collected through smartphones and the association with illness activity in bipolar disorder. International journal of methods in psychiatric research 25, 4 (2016), 309–323.
 Mads Frost, Afsaneh Doryab, Maria FaurholtJepsen, Lars Vedel Kessing, and Jakob E Bardram. 2013. Supporting disease insight through data analysis: refinements of the monarca selfassessment system. In UBICOMP. ACM, 133–142.
 Alex Graves, Abdelrahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In ICASSP. IEEE, 6645–6649.
 Agnes Gruenerbl, Venet Osmani, Gernot Bahle, Jose C Carrasco, Stefan Oehler, Oscar Mayora, Christian Haring, and Paul Lukowicz. 2014. Using smart phone mobility traces for the diagnosis of depressive and manic episodes in bipolar patients. In AH. ACM, 38.
 Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).
 Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems 6, 02 (1998), 107–116.
 Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long shortterm memory. Neural computation 9, 8 (1997), 1735–1780.
 Xiaonan Ji, James Bailey, and Guozhu Dong. 2007. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems 11, 3 (2007), 259–286.
 Eamonn Keogh and Shruti Kasetty. 2003. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery 7, 4 (2003), 349–371.
 Eamonn J Keogh and Michael J Pazzani. 2000. Scaling up dynamic time warping for datamining applications. In KDD. ACM, 285–289.
 Ronald C Kessler, Patricia Berglund, Olga Demler, Robert Jin, Kathleen R Merikangas, and Ellen E Walters. 2005. Lifetime prevalence and ageofonset distributions of DSMIV disorders in the National Comorbidity Survey Replication. Archives of general psychiatry 62, 6 (2005), 593–602.
 Kiran E Laxman, Kate S Lovibond, and Mariam K Hassan. 2008. Impact of bipolar disorder in employed populations. The American journal of managed care 14, 11 (2008), 757–764.
 Neal Lesh, Mohammed J Zaki, and Mitsunori Ogihara. 1999. Mining features for sequence classification. In KDD. ACM, 342–346.
 Christina Leslie and Rui Kuang. 2004. Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, Nov (2004), 1435–1455.
 Huma Lodhi, Craig Saunders, John ShaweTaylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research 2, Feb (2002), 419–444.
 ChunTa Lu, Lifang He, Weixiang Shao, Bokai Cao, and Philip S Yu. 2017. Multilinear Factorization Machines for MultiTask MultiView Learning. In WSDM.
 Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech, Vol. 2. 3.
 Pamela B Peele, Ying Xu, and David J Kupfer. 2003. Insurance expenditures on bipolar disorder: clinical and parity implications. American Journal of Psychiatry 160, 7 (2003), 1286–1290.
 Bruce M Psaty and Ross L Prentice. 2010. Minimizing bias in randomized trials: the importance of blinding. Jama 304, 7 (2010), 793–794.
 Alessandro Puiatti, Steven Mudda, Silvia Giordano, and Oscar Mayora. 2011. Smartphonecentred wearable sensors network for monitoring patients with bipolar disorder. In EMBC. IEEE, 3644–3647.
 Chotirat Ann Ratanamahatana and Eamonn Keogh. 2004. Making Timeseries Classification More Accurate Using Learned Constraints. In SDM. SIAM, 11.
 Steffen Rendle. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology 3, 3 (2012), 57.
 O Schleusing, Ph Renevey, M Bertschi, JM Koller, R Paradiso, and others. 2011. Monitoring physiological and behavioral signals to detect mood changes of bipolar patients. In ISMICT. IEEE, 130–134.
 Rong She, Fei Chen, Ke Wang, Martin Ester, Jennifer L Gardy, and Fiona SL Brinkman. 2003. Frequentsubsequencebased prediction of outer membrane proteins. In KDD. ACM, 436–445.
 Sören Sonnenburg, Gunnar Rätsch, and Bernhard Schölkopf. 2005. Large scale genomic sequence SVM classifiers. In ICML. ACM, 848–855.
 Prashant K Srivastava, Dhwani K Desai, Soumyadeep Nandi, and Andrew M Lynn. 2007. HMMModE–Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences. BMC bioinformatics 8, 1 (2007), 1.
 Ilya Sutskever, James Martens, and Geoffrey E Hinton. 2011. Generating text with recurrent neural networks. In ICML. 1017–1024.
 Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2 (2012).
 Gaetano Valenza, Mimma Nardelli, Antonio Lanata, Claudio Gentili, Gilles Bertschy, Rita Paradiso, and Enzo Pasquale Scilingo. 2014. Wearable monitoring for mood recognition in bipolar disorder based on historydependent longterm heart rate variability analysis. IEEE Journal of Biomedical and Health Informatics 18, 5 (2014), 1625–1635.
 Li Wei and Eamonn Keogh. 2006. Semisupervised time series classification. In KDD. ACM, 748–753.
 Janet BW Williams. 1988. A structured interview guide for the Hamilton Depression Rating Scale. Archives of general psychiatry 45, 8 (1988), 742–747.
 Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei, and Chotirat Ann Ratanamahatana. 2006. Fast time series classification using numerosity reduction. In ICML. ACM, 1033–1040.
 Zhengzheng Xing, Jian Pei, and Eamonn Keogh. 2010. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter 12, 1 (2010), 40–48.
 Oksana Yakhnenko, Adrian Silvescu, and Vasant Honavar. 2005. Discriminatively trained markov model for sequence classification. In ICDM. IEEE, 8–pp.
 Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: a new primitive for data mining. In KDD. ACM, 947–956.
 RC Young, JT Biggs, VE Ziegler, and DA Meyer. 1978. A rating scale for mania: reliability, validity and sensitivity. The British Journal of Psychiatry 133, 5 (1978), 429–435.
 Jingyuan Zhang, Bokai Cao, Sihong Xie, ChunTa Lu, Philip S Yu, and Ann B Ragin. 2016a. Identifying Connectivity Patterns for Brain Diseases via Multisideview Guided Deep Architectures. In SDM.
 Weinan Zhang, Tianming Du, and Jun Wang. 2016b. Deep Learning over Multifield Categorical Data. In ECIR. Springer, 45–57.