TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records

Taste: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records

Ardavan Afshar, Ioakeim Perros, Haesun Park
Christopher deFilippi, Xiaowei Yan, Walter Stewart, Joyce Ho, Jimeng Sun
Georgia Institute of Technology, ,Health at Scale INOVA,
Sutter Health, Emory University

Phenotyping electronic health records (EHR) focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose Temporal And Static TEnsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 80 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.

1 Introduction

Phenotyping is about identifying patient groups sharing similar clinically-meaningful characteristics and is essential for treatment development and management [richesson2016clinical]. However, the complexity and heterogeneity of the underlying patient information render a manual phenotyping impractical for large populations or complex conditions. Unsupervised EHR-based phenotyping based on tensor factorization, e.g.,  [ho2014marble, Perros2018kdd, YQCFP19aaai], provides effective alternatives. However, existing unsupervised phenotyping methods are unable to handle both static and dynamically-evolving information, which is the focus of this work.

Figure 1: TASTE applied on dynamically-evolving structured EHR data and static patient information. Each represents the medical features recorded for different clinical visits for patient k. Matrix includes the static information (e.g., race, gender) of patients. TASTE decomposes into three parts: , , and . Static matrix is decomposed into two parts: and . Note that (personalized phenotype scores) is shared between static and dynamically-evolving features.

Traditional tensor factorization models [carroll1970analysis, hitchcock1927expression, harshman1970foundations] assume the same dimensionality along with each tensor mode. However, in practice one mode such as time can be irregular, e.g., different patients may vary by the number of clinical visits over time. To handle such longitudinal datasets, [Perros2017-dh] and [afshar2018copa] propose algorithms fitting the PARAFAC2 model [Hars1972b] which are faster and more scalable for handling irregular and sparse data. However, these PARAFAC2 approaches only focus on modeling the dynamically-evolving features for every patient (e.g., the structured codes recorded for every visit). Static features (such as race and gender) which do not evolve are completely neglected; yet, they are among important information in phenotyping analyses (e.g., some diseases have the higher prevalence in a certain race).

To address this problem, we propose a scalable method called TASTE which is able to jointly model both temporal and static features by combining the PARAFAC2 model with non-negative matrix factorization as shown in Figure 1. We reformulate the original non-convex problem into simpler sub-problems (i.e., orthogonal Procrustes, least square and non-negativity constrained least square) and solve each of them efficiently by avoiding unnecessary computations (e.g., expensive Khatri-Rao products).

We summarize our contributions below:

  • Temporal and Static Tensor Factorization: We formulate a new technique which jointly models static and dynamic features from EHR data as nonnegative factor matrices.

  • Fast and Accurate Algorithm: Our proposed fitting algorithm is up to faster than the state of the art baseline. At the same time, TASTE preserves model constraints which promote model uniqueness better than baselines while maintaining interpretability.

  • Case Study on Heart Failure Phenotyping: We demonstrate the practical impact of TASTE through a case study on heart failure (HF) phenotyping. We identified clinically-meaningful phenotypes confirmed by a cardiologist. Using phenotypes extracted by TASTE, simple logistic regression can achieve comparable predictive accuracy with deep learning techniques such as RNNs.

2 Background & Related Work

Table 1 summarizes the notations used in this paper.



* Element-wise Multiplication
Khatri Rao Product
matrix, vector
the -th row of
the -th column of
element (i,r) of
Feature matrix of patient
Extract the diagonal of matrix
Vectorizing matrix
Singular value decomposition on
Frobenius Norm
max operator replaces negative values in with 0
All elements in are non-negative
Table 1: Notations

2.1 PARAFAC2 Model

The PARAFAC2 model [kiers1999parafac2], has the following objective function:

subject to

where is the input matrix, factor matrix , diagonal matrix , and factor matrix are output matrices. Factor matrix is an orthogonal matrix, and where . SPARTan [Perros2017-dh] is a method to fit this model for sparse datasets, as well as COPA [afshar2018copa] incorporates different constraints such as temporal smoothness and sparsity to the model factors to produce more meaningful results.

Uniqueness property ensures that a decomposition is pursuing the true latent factors, rather than an arbitrary rotation of them. The unconstrained version of PARAFAC2 in (1) (without constraints = and = ) is not unique. Assume as an invertible matrix and as diagonal matrices. Then, we can transform as:

which is another valid solution achieving the same approximation error [kiers1999parafac2] - this is problematic in terms of the interpretability of the result. To promote uniqueness, Harshman [Hars1972b] introduced the cross-product invariance constraint, which dictates that should be constant . To achieve that, the following constraint is added: where , so that: .

2.2 Unsupervised Computational Phenotyping

A wide range of approaches applies tensor factorization techniques to extract phenotypes. [ho2014limestone, ho2014marble, henderson2017granite, Perros2018kdd, zhao2019detecting, zhao2019using, jiang2019combining] incorporate various constraints (e.g., sparsity, non-negativity, integer) into regular tensor factorization to produce more clinically-meaningful phenotypes. [Perros2017-dh, afshar2018copa] identify phenotypes and their temporal trends by using irregular tensor factorization based on PARAFAC2 [Hars1972b]; yet, those approaches cannot model both dynamic and static features for meaningful phenotype extraction. As part of our experimental evaluation, we demonstrate that naively adjusting such PARAFAC2-based approaches to incorporate static information results in biased and less interpretable phenotypes. The authors of [YQCFP19aaai] proposed a collective non-negative tensor factorization for phenotyping purposes. However, the method is not able to jointly incorporate static information such as demographics with temporal features. Also they do not employ the orthogonality constraint on the temporal dimension, a strategy known to produce non-unique solutions [Hars1972b, kiers1999parafac2].

3 The Taste framework

3.1 Intuition

We will explain the intuition of TASTE in the context of phenotyping application.

Input data include both temporal and static features for all patients:

  • Temporal features (); For patient , we record the medical features for different clinical visits in matrix where is the number of clinical visits and is the total number of medical features. Note that can be different for different patients.

  • Static features (): The static features like gender, race, body mass index (BMI), smoking status 111Although BMI and smoking status can change over time, in our data set these values for each patient are constant over time. are recorded in where is the total number of patients and is the number of static features. In particular, is the static features for patient.

Phenotyping process maps input data into a set of phenotypes, which involves the definition of phenotypes and temporal evolution. Figure 1 illustrates the following model interpretation. First, phenotype definitions are shared by factor matrices and for temporal and static features, respectively. In particular, or are the column of factor matrix or which indicates the participation of temporal or static features in the phenotype. Second, personalized phenotype scores for patient are provided in the diagonal matrix where its diagonal element indicates the overall importance of the phenotype for patient . Finally, temporal phenotype evolution for patient is specified in factor matrix where its column indicates the temporal evolution of phenotype over all clinical visits of patient .

3.2 Objective function and challenges

We introduce the following objective:

subject to

Objective function has three main parts as follows:

  1. [leftmargin=*]

  2. The first part is related to fitting a PARAFAC2 model that factorizes a set of temporal feature matrices into , diagonal matrix , and .

  3. The second part is for optimizing the static feature matrix where , and . also is the weight parameter. Common factor matrices are shared between static and temporal features by setting =diag().

  4. The third part enforces both non-negativity of the factor and also minimizes its difference to . Due to the constraint , minimizing is encouraging that is constant over K subjects, which is a desirable property from PARAFAC2 that promotes uniqueness, and thus enhancing interpretability [kiers1999parafac2].

  5. and are weighting parameters which all set by user. For simplicity, we set .

The challenge in solving the above optimization problem lies in: 1) addressing all the non-negative constraints especially on , 2) trying to make constant over subjects by making non-negative as close as possible to while can contain negative values, and 3) estimating all factor matrices in order to best approximate both temporal and static input matrices.

3.3 Algorithm

To optimize the objective function (2), we need to update and iteratively. Although the original problem in Equation 2 is non-convex, our algorithm utilizes the Block Coordinate Descent framework [kim2014algorithms] to mathematically reformulate the objective function (2) into simpler sub-problems. In each iteration, we update based on Orthogonal Procrustes problem [schonemann1966generalized] which ensures an orthogonal solution for each . Factor matrix can be solved efficiently by least square solvers. For factor matrices we reformulate objective function (2) so that the factor matrices are instances of the non-negativity constrained least squares (NNLS) problem. Each NNLS sub-problem is a convex problem and the optimal solution can be found easily. We use block principal pivoting method [kim2011fast] to solve each NNLS sub-problem. The block principal pivoting method achieved state-of-the-art performance on NNLS problems compared to other optimization techniques [kim2011fast]. We provide the details about NNLS problems in the supplementary material section. We also exploit structure in the underlying computations (e.g., involving Khatri-Rao products) so that each one of the sub-problems is solved efficiently. Next, we summarize the solution for each factor matrix.

3.3.1 Solution for factor matrix

We can reformulate objective function 2 with respect to as follows :

subject to

More mathematical details about converting Equation 2 to 3 are provided in supplementary material section. The optimal value of can be computed via Orthogonal Procrustes problem [schonemann1966generalized] which has the closed form solution where and are the right and left singular vectors of . Note that each can contain negative values.

3.3.2 Solution for factor matrix

The objective function with respect to can be rewritten as:


without any constraint. The details about updating factor matrix are provided in supplementary material section. The value of that minimizes Equation 4 also minimizes Equation 2. The update rule for factor matrix has the following form:

3.3.3 Solution for phenotype evolution matrix

After updating the factor matrices , , we focus on solving for . In classic PARAFAC2 [Hars1972b, kiers1999parafac2], this factor is retrieved through the simple multiplication . However, due to interpretability concern, we prefer temporal factor matrix to be non-negative because the temporal phenotype evolution for patient k () should never be negative. As shown in the next section, a naive enforcement of non-negativity () violates the important uniqueness property of PARAFAC2 (the model constraints). Therefore, we consider as an additional factor matrix, constrain it to be non-negative and minimize its difference to .

The objective function with respect to can be combined into the following NNLS form:

subject to

As we mentioned earlier, we update factor matrix based on block principal pivoting method.

3.3.4 Solution for temporal phenotype definition

Factor matrix defines the participation of temporal features in different phenotypes. By considering Equation 2, factor matrix participates in the PARAFAC2 part with non-negativity constraint. Therefore, the objective function for factor matrix have the following form:

subject to

In order to update based on block principal pivoting, we need to calculate and for all samples which can be calculated in parallel.

3.3.5 Solution for factor matrix or }

The objective function with respect to yields the following format:

subject to

As we mentioned earlier, factor matrices is shared between PARAFAC2 input and matrix where =diag(). By knowing , Equation 7 can be rewritten as:

subject to

where denotes Khatri-Rao product. We can solve the rows of factor matrix ( or diag()) separately and in parallel. To update each factor matrix we need to compute two time-consuming operations: 1) and 2). The first operation can be replaced with where * denotes element-wise(hadamard) product [van2000ubiquitous]. The second operation also can be replaced with [van2000ubiquitous]. Therefore, the time-consuming Khatri-Rao Product doesn’t need to be explicitly formed. Each row of W can be efficiently updated via block principal pivoting.

3.3.6 Solution for static phenotype definition

Finally, factor matrix represents the participation of static features for the phenotypes. The objective function for factor matrix have the following form:

subject to

which can be easily updated via block principal pivoting.

3.4 Phenotype inference on new data

Given phenotype definition () and factor matrix for some training set, TASTE can project data of new unseen patients into the existing low-rank space. This is useful because healthcare provider may want to fix the phenotype definition while score new patients with those existing definitions. Moreover, such a methodology enables using the low-rank representation of patients such as () as feature vectors for a predictive modeling task.

Suppose, represents the temporal information of unseen patients and indicated their static information. TASTE is able to project the new patient’s information into the existing low-rank space (, , and ) by optimizing {}, and for the following objective function:

subject to

Updating factor matrices is based on Equation 3. can be minimized based on Equation 5. Finally, can be updated based on Equation 8 where . Factor matrix represents the personalized phenotype scores of all patients.

4 Experimental Results

We focus on answering the following:

  1. Does TASTE preserve accuracy and the uniqueness-promoting constraint, while being fast to compute?

  2. How does TASTE scale for increasing number of patients ()?

  3. Does the static information added in TASTE improve predictive performance for detecting heart failure?

  4. Are the heart failure phenotypes produced by TASTE meaningful to an expert cardiologist?

4.1 Data Set Description

Table 2 summarizes the statistics of data sets.

Dataset # Patients # Temporal Features Mean() # Static Features
Sutter 59,480 1164 29 22
CMS 151,349 284 50 30
Table 2: Summary statistics of two real data sets.

Sutter: The data set contains the EHRs for patients with new onset of heart failure and matched controls (matched by encounter time, and age). It includes 5912 cases and 59300 controls. For all patients, encounter features (e.g., medication orders, diagnosis) were extracted from the electronic health records. We use standard medical concept groupers to convert the available ICD-9 or ICD-10 codes of diagnosis to Clinical Classification Software (CCS level 3)  [slee1978international]. We also group the normalized drug names based on unique therapeutic sub-classes using the Anatomical Therapeutic Chemical (ATC) Classification System. Static information of patients includes their gender, age, race, smoking status, alcohol status and BMI.

Centers for Medicare and Medicaid (CMS):222https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF.html The next data set is CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF). The goal of CMS data set is to provide a set of realistic data by protecting the privacy of Medicare beneficiaries by using 5% of real data to synthetically construct the whole dataset. We extract the ICD-9 diagnosis codes and convert them to CCS diagnostic categories as in the case of Sutter dataset.

4.2 Evaluation metrics:

  • RMSE: Accuracy is evaluated as the Root Mean Square Error (RMSE) which is a standard measure used in coupled matrix-tensor factorization literature [choi2017fast, beutel2014flexifact]. Given input collection of matrices and static input matrix , we define


    denotes the element of input matrix and its approximation through a model’s factors (the element of the product in the case of TASTE). Similarly, is the element of input matrix and is its approximation (in TASTE, this is the element of ).

  • Cross-Product Invariance (CPI): We use CPI to assess the solution’s uniqueness, since this is the core constraint promoting it [kiers1999parafac2]. In particular we check whether is close to constant () ). The cross-product invariance measure is defined as:

    The range of cross-product invariance is between , with values close to 1 indicating unique solutions ( is close to constant).

  • Area Under the ROC Curve (AUC): Examines classification model’s performance when the data is imbalance by comparing the actual and estimated labels. We use AUC on the test set to evaluate predictive model performance.

4.3 Q1. Taste is fast, accurate and preserves uniqueness-promoting constraints

4.3.1 Baseline Approaches:

In this section, we compare TASTE with methods that incorporate non-negativity constraint on all factor matrices. Note that SPARTan [Perros2017-dh] and COPA [afshar2018copa] are not able to incorporate non-negativity constraint on factor matrices .

  • Cohen+ [cohen2018nonnegative]: Cohen et al. proposed a PARAFAC2 framework which imposes non-negativity constraints on all factor matrices based on non-negative least squares algorithm [kim2014algorithms]. We modified this method to handle the situation where static matrix is coupled with PARAFAC2 input based on Figure 1. To do so, we add to their objective function and updated both factor matrices and in an Alternating Least Squares manner, similar to how the rest of the factors are updated in [cohen2018nonnegative].

  • COPA+: One simple and fast way to enforce non-negativity constraint on factor matrix is to compute as: , where is taken element-wise to ensure non-negative results. Therefore, we modify the implementation in [afshar2018copa] to handle both the PARAFAC2 input and the static matrix and then apply the simple heuristic to make non-negative. We will show in the experimental results section that how this heuristic method no longer guarantees unique solutions (violates model constraints).

We provide the details about tuning hyperparameters in the supplementary material section.

4.3.2 Results:

Apart from purely evaluating the RMSE and the computational time achieved, we assess to what extent the cross-product invariance constraint is satisfied  [kiers1999parafac2]. Therefore, in Figure 2 we present the average and standard deviation of RMSE, CPI, and the computational time for the approaches under comparison for Sutter and CMS data sets for four different target ranks (). In Figures  1(a), 1(b), we compare the RMSE for all three methods. We remark that all methods achieve comparable RMSE values for two different data sets. On the other hand, Figures 1(c), 1(d) show the cross-product invariance (CPI) for Sutter and CMS respectively. COPA+ achieves poor values of CPI for both data sets. This indicates that the output factors violate model constraints and would not satisfy uniqueness properties [kiers1999parafac2]. Also TASTE outperforms Cohen significantly on CPI in Figures 1(c) and 1(d). Finally, Figures 1(e), 1(f) show the running time comparison for all three methods on where TASTE is up to and faster than Cohen on Sutter and CMS data sets. Therefore, our approach is the only one that achieves a fast and accurate solution (in terms of RMSE) and preserves model uniqueness (in terms of CPI).

(a) RMSE for Sutter
(b) RMSE for CMS
(c) CPI for Sutter
(d) CPI for CMS
(e) Running Time for Sutter
(f) Running Time for CMS
Figure 2: The average and standard deviation of RMSE (lower is better), CPI (higher is better), and total running time (in seconds) (lower is better) for different approaches and for different target ranks () related to 5 different random initialization for Sutter and CMS data sets.

4.4 Q2. Taste is scalable

Apart from assessing the time needed for increasing values of target rank (i.e., number of phenotypes), we evaluate the approaches by comparison in terms of time needed for increasing load of input patients. Each method runs for times and the convergence threshold is set to for all of them. Figure 3 compares the average and standard deviation of total running time for 125K, 250K, 500K, and 1 Million patients for . TASTE is up to faster than Cohen’s baseline for . While COPA+ is a fast approach, this baseline suffers from not satisfying model constraints which promote uniqueness.

Figure 3: The average and standard deviation of running time (in seconds) for and for 5 random initialization by varying number of patients from 125K to 1 million for CMS data set. TASTE is upto faster than Cohen.

4.5 Q3. Static features in Taste improve predictive power

We measure the performance of TASTE indirectly on the performance of classification. So we predict whether a patient will be diagnosed with heart failure (HF) or not. Our objective is to assess whether static features in the way handled by TASTE boost predictive performance by using personalized phenotype scores for all patients () as features.

4.5.1 Cohort Construction:

After applying the preprocessing steps (i.e. removing sparse features and eliminating patients with less than 5 clinical visits), we create a data set with 35113 patients where 3244 of them are cases and 31869 related to controls (9.2 % are cases). For case patients, we know the date that they are diagnosed with heart failure (HF dx). Control patients also have the same index dates as their corresponding cases. We extract 145 medications and 178 diagnosis codes from a 2-year observation window and set prediction window length to 6 months. Figure 5 in the supplementary material section depicts the observation and prediction windows in more detail.

4.5.2 Baselines:

We assess the performance of TASTE with 6 different baselines.

  • RNN-regularized CNTF: CNTF [YQCFP19aaai] feeds the temporal phenotype evolution matrices () into an LSTM model for HF prediction. This baseline only uses temporal medical features.

  • RNN Baseline: We use the GRU model for HF prediction implemented in [choi2016using]. The one-hot vector format is used to represent all dynamic and static features for different clinical visits.

  • Logistic regression with raw dynamic: We create a binary matrix where the rows are the number of patients and columns are the total number of medical features (323). Row of this matrix is created by aggregating over all clinical visits of matrix .

  • Logistic regression with raw static+dynamic: Same as the previous approach, we create a binary matrix where the rows are number of patients and columns are the total number of temporal and static features (345) by appending matrix to raw dynamic baseline matrix.

  • COPA Personalized Score Matrix: We use the implementation of pure PARAFAC2 from [afshar2018copa] which learn the low-rank representation of phenotypes () from training set and then projecting all the new patients onto the learned phenotypic low-rank space.

  • COPA (+static) Personalized Score Matrix: This is same as previous baseline, however, we incorporate the static features into PARAFAC2 matrix by repeating the value of static features of a particular patient for all encounter visits.

Figure 4: The average of AUC score for varying number of phenotypes (R) for TASTE and 3 other tensor baselines on the test set. The AUC score for a baseline with raw dynamic features (323) is 0.7426, for the raw dynamic+static baseline (345) is 0.7498 and for RNN baseline is 0.7547.

Figure 4 shows the average of AUC for all baselines and TASTE. For COPA, COPA(+static), CNTF and TASTE we report the AUC score for different values of R ({5,10,20,40,60,80,100}). TASTE improves the AUC score over a simple non-negative PARAFAC2 model (COPA and COPA(+static)) and CNTF which suggests: 1) incorporating static features with dynamic ones will increase the predictive power (comparison of TASTE with COPA and CNTF). 2) incorporating static features in the way we do in TASTE improves predictive power (comparison of TASTE and COPA(+static)). TASTE with R=80 (AUC=) performs slightly better than the RNN model. Training details are provided in the supplementary material section.

4.6 Q4. Heart Failure Phenotype Discovery

Heart failure (HF) is a complex, heterogeneous disease and is the leading cause of hospitalization in people older than 65 333https://www.webmd.com/heart-disease/guide/diseases-cardiovascular#1-4. However, there are no well-defined phenotypes other than the simple categorization of ejection fraction of the heart (i.e., preserved or reduced ejection fraction). With the comprehensive collection of available longitudinal EHR data, now we have the opportunity to computationally tackle the challenge of phenotyping HF patients.

4.6.1 Cohort Construction:

We select the patients diagnosed with HF from the EHRs in Sutter dataset. We extract 145 medications and 178 diagnosis codes from a 2-year observation window which ends 6 months before the heart failure diagnosis date (HFdx)444Figure 5 in the supplementary material section presents the observation window in more detail.. The total number of patients (K) is 3,244 (the HF case patients of Sutter dataset).

4.6.2 Findings of HF Phenotypes

TASTE extracted 5 phenotypes which are all confirmed and annotated by an expert cardiologist 555More detail about the strategy for finding the optimal number of phenotypes is provided in the supplementary material section.. The clinical description of all phenotype are provided by the cardiologist:

  1. Atrial Fibrillation (AF): This phenotype represents patients with irregular heartbeat and AF predisposes to HF. Medications are related to managing AF and preventing strokes. This phenotype is usually more prevalent in male and old patients (i.e. 80 years or older).

  2. Hypertensive Heart Failure: This is a classic and dominant heart failure phenotype, representing a subgroup of patients with long history of hypertension, and cardiac performance declines over time. Anti-hypertensive medications are spelled out as to indicate the treatment to hypertension.

  3. Obese Induced Heart Failure: This phenotype is featured by severe obesity (BMI>35) and obesity induced orthopedic condition.

  4. Cardiometablic Driving Heart Failure: This phenotype is featured by diabetes and cardiometabolic conditions (i.e. hyperlipidemia, hypertension). Diabetes is a well known risk factor for cardiovascular complications (i.e. stroke, myocardial infaction, etc.), and increases the risk for heart failure.

  5. Coronary Heart Disease Phenotype: This phenotype is associated with a greater deterioration of left ventricle function and a worse prognosis. This phenotype is also more prevalent in male and white population.

More detail about all phenotypes is provided in the supplementary material section.

5 Conclusions

TASTE jointly models temporal and static information from electronic health records to extract clinically meaningful phenotypes. We demonstrate the computational efficiency of our model on extensive experiments that showcase its ability to preserve important properties underpinning the model’s uniqueness, while maintaining interpretability. TASTE not only identifies clinically meaningful heart failure phenotypes validated by a cardiologist but the phenotypes also retain predictive power for predicting heart failure.

6 Acknowledgements

This work was in part supported by the National Science Foundation award IIS-1418511, CCF-1533768 and IIS-1838042, the National Institute of Health award 1R01MD011682-01 and R56HL138415.


7 Supplementary Material

7.1 Non-Negativity constrained Least Squares (NNLS)

NNLS problem has the following form:

subject to

Here, , and where . NNLS is a convex problem and the optimal solution of 12 can be easily found. In this paper, we use the block principal pivoting method [kim2011fast] to solve NNLS problems. Authors showed block principal pivoting method has the state-of-the-art performance.

7.2 More details on Taste framework

7.2.1 Updating factor matrix

We can rewrite objective function 2 with respect to based on trace properties [petersen2008matrix] as:

subject to

After removing the constant terms and applying , we have:

subject to

7.2.2 Updating factor matrix

is a rectangular orthogonal matrix (). We also introduce where and . Now is a square orthogonal matrix as follows:


Now we multiply with equation LABEL:H_equation as follow:


where is a constant and independent of the parameter under minimization. Therefore, the value of that minimizes also minimizes

7.3 Implementation details

TASTE is implemented in MATLAB. To enable reproducibility of our work, we attached a zip file containing TASTE code. All the approaches (including the baselines) are evaluated on MatlabR2017b. We utilize the capabilities of Parallel Computing Toolbox of Matlab by activating parallel pool for all methods. For Sutter and CMS dataset, we used 12 workers. For the prediction task, we use the implementation of regularized logistic regression from Scikit-learn machine learning library of python 3.6.

P1. Atrial Fibrillation (AF) Weight
Dx_Cardiac dysrhythmias [106.] 1
Rx_Coumarin Anticoagulants 0.79597
Dx_Heart valve disorders [96.] 0.57526
Dx_Coronary atherosclerosis and other heart disease [101.] 0.465406
Rx_Beta Blockers Cardio-Selective 0.348005
Dx_Conduction disorders [105.] 0.246467
Static_white 0.270443
Static_age_greater_80 0.245424
Static_Non_Hispanic 0.245373
Static_male 0.200464
Static_Alchohol_yes 0.197431
Static_Smk_Quit 0.167838
P2. Hypertensive Heart Failure: Weight
Dx_Essential hypertension [98.] 0.705933
Rx_Calcium Channel Blockers 0.684211
Rx_ACE Inhibitors 0.578426
Rx_Beta Blockers Cardio-Selective 0.564413
Rx_Angiotensin II Receptor Antagonists 0.250883
Dx_Chronic kidney disease [158.] 0.14528
Rx_Thiazides and Thiazide-Like Diuretics 0.134521
Static_Non_Hispanic 0.536399
Static_female 0.390343
Static_Smk_NO 0.363068
Static_Alchohol_No 0.360158
Static_white 0.345942
Static_Alchohol_yes 0.264664
Static_age_between_70_79 0.26237
Static_Severely_obese 0.252981
P3. Obese Induced Heart Failure: Weight
Dx_Other back problems 0.438879
Rx_Opioid Combinations 0.414518
Dx_Other connective tissue disease [211.] 0.304155
Dx_Other non-traumatic joint disorders [204.] 0.269159
Dx_Osteoarthritis [203.] 0.190075
Rx_Opioid Agonists 0.185262
Dx_Intervertebral disc disorders 0.156421
Rx_Benzodiazepines 0.150865
Static_female 0.548758
Static_Non_Hispanic 0.52183
Static_white 0.474545
Static_Alchohol_No 0.404191
Static_Smk_Quit 0.273576
Static_Severely_obese 0.226485
Static_age_between_70_79 0.180683
Static_age_between_60_69 0.180006
P4. Cardiometablic Driving Heart Failure: Weight
Dx_Diabetes mellitus without complication [49.] 0.712628
Dx_Disorders of lipid metabolism [53.] 0.507946
Rx_Biguanides 0.48821
Dx_Immunizations and screening for infectious disease [10.] 0.358349
Rx_Diagnostic Tests 0.353441
Static_Alchohol_No 0.247165
Static_Non_Hispanic 0.190547
Static_Smk_NO 0.184782
Static_male 0.164795
Static_white 0.15693
Static_age_between_60_69 0.131118
P5. Coronary Heart Disease Phenotype: Weight
Dx_Disorders of lipid metabolism [53.] 0.984987
Rx_HMG CoA Reductase Inhibitors 0.442637
Dx_Coronary atherosclerosis and other heart disease [101.] 0.265414
Dx_Diabetes with renal manifestations 0.229549
Dx_Chronic kidney disease [158.] 0.203386
Dx_Other thyroid disorders 0.101615
Static_Non_Hispanic 1
Static_white 0.998604
Static_male 0.957132
Static_Smk_Quit 0.773904
Static_Alchohol_yes 0.604641
Table 3: TASTE extracted 5 phenotypes from the HF dataset. Red indicates the static features; indicates diagnoses; indicates medication; The phenotype names are provided by the cardiologist.

7.4 More details on Q1. Taste is fast, accurate and preserves uniqueness-promoting constraints

7.4.1 Setting hyper-parameters:

We perform a grid search for and for TASTE and Cohen+ for different target ranks () and we run each method with specific parameter for 5 random initialization and pick the best values of and based on the lowest average RMSE value. For COPA+, we just search for the best value of since it does not require parameter.

7.5 More details on Q3. Static features in Taste improve predictive power

7.5.1 Training Details:

We train a Lasso Logistic Regression for all the baselines under comparison. We divide the patient records into training (), validation (), and test sets () and use them to evaluate all baselines. Lasso Logistic Regression has regularization parameter (). For all 6 baselines, we just need to tune parameter . However, for TASTE we need to perform a 3-D grid search over and and . We first train TASTE on training set and learn the phenotype low-rank representation (, ) and then train Lasso Logistic Regression classifier based on patient low-rank representation () and corresponding labels. Then we project the patients in validation set on phenotype low-rank representation and learn their personalized phenotype scores . Next, we feed into a Lasso Logistic Regression and calculate the AUC score. Finally, we pick the best parameters based on the highest AUC score on validation set and report the AUC on test data. The training phase for COPA, COPA(+static) and CNTF is same as TASTE.

7.6 More details on Q4. Heart Failure Phenotype Discovery

Figure 5: The arrow represents the encounter visits of a patient. We extract diagnosis and medications from a 2-year observation window by setting prediction window length to 6 months.

7.6.1 Hyper-parameter search:

The optimal number of phenotypes (R), , and need to be tuned. So, We perform a 3-D grid search for , and based on the stability-driven metric provided in [wu2016stability]. Stability-driven metric calculates the dissimilarity between a pair of factor matrices from two different initialization (,’) using the cross-correlation matrix (C) as follows: where diss(,’) denotes the dissimilarity between factor matrices ,’. Also cross-correlation matrix () computes the cosine similarity between all columns of ,’. We run TASTE model for 10 different random initialization and compute the average dissimilarity for each value of , , and . Figure 6 shows the best value of stability-driven metric for different values for R. The best solution corresponding to were chosen as the one achieving the lowest average dissimilarity. Then, for fixed we present the phenotypes of the model achieving the lowest RMSE among the 10 different runs.

Figure 6: Stability-driven metric for different values of phenotypes(). The optimal value of phenotypes corresponds to the minimum value of stability-driven metric ().

7.6.2 Phenotype definition:

We provide the details of all 5 phenotypes discovered by TASTE in Table 3.

7.6.3 Pure PARAFAC2 cannot handle static feature integration.

In this Section, we illustrate the fact that a naive way of incorporating static feature information into a simpler PARAFAC2-based framework [afshar2018copa] would result in poor, less interpretable phenotypes. We incorporate the static features into PARAFAC2 input by repeating the value of static features on all clinical visits of the patients same as what we did for COPA(+static). For instance, if the male feature of patient k has value 1, we repeat the value 1 for all the clinical visits of that patient. Then we compare the phenotype definitions discovered by TASTE (matrices , ) and by COPA (matrix ). Table 4 contains two sample phenotypes discovered by this baseline, using the same truncation threshold that we use throughout this work (we only consider features with values greater than ). We observe that the static features introduce a significant amount of bias into the resulting phenotypes: the phenotype definitions are essentially dominated by static features, while the values of weights corresponding to dynamic features are closer to . This suggests that pure PARAFAC2-based models such as the work in [afshar2018copa] are unable to produce meaningful phenotypes that handle both static and dynamic features. Such a conclusion extends to other PARAFAC2-based work which does not explicitly model side information [kiers1999parafac2, Perros2017-dh, cohen2018nonnegative].

Phenotype 1 weight
Static_Alcohol_yes 0.3860
Static_White 0.2160
Static_Non_Hispanic 0.2064
Static_Smk_Quit 0.1743
Static_male 0.1508
Static_moderately_obese 0.1025
Phenotype 2 weight
Static_age_between_70_79 1
Static_Non_Hispanic 0.8233
Static_White 0.7502
Static_Alcohol_No 0.6905
Static_moderately_obese 0.2098
Static_male 0.2026
Static_Smk_No 0.1614
Table 4: Two sample phenotypes discovered by COPA(+static) baseline by naively integrating static features into a simpler PARAFAC2-based model [afshar2018copa] .

7.7 Recovery of true factor matrices

In this section, we assess to what extent the original factor matrices can be recovered through synthetic data experiments 666The reason that we are working with synthetic data here is that we do not know the original factor matrices in real data sets.. We demonstrate that: a) TASTE recovers the true latent factors more accurately than baselines for noisy data; and b) the baseline (COPA+) which does not preserve a high CPI measure (which is known to be theoretically linked to uniqueness [kiers1999parafac2] 777A discussion on uniqueness can be found in Background and & Related Work Section.) fails to match TASTE in terms of latent factor recovery, despite achieving similar RMSE.

7.7.1 Evaluation Metric:

Similarity between two factor matrices: We define the cosine similarity between two vectors , as =. Then the similarity between two factor matrices , can be computed as (similar to [kiers1999parafac2]):

The range of is between [0,1] and the values near 1 indicate higher similarity.

7.7.2 Synthetic Data Construction:

We construct the ground-truth factor matrices , , , by drawing a number uniformly at random between (0,1) to each element of each matrix. For each factor matrix , we create a binary non-negative matrix such that and then compute =. After constructing all factor matrices, we compute the input based on and . We set , and . We then add Gaussian normal noise to varying percentages of randomly-drawn elements () of and input matrices.

7.7.3 Results:

For all approaches under comparison, all three methods achieve same value for RMSE, therefore, we skip ploting RMSE vs different noise levels. We assess the similarity measure between each ground truth latent factor and its corresponding estimated one (e.g., ), and consider the average measure across all output factors as shown in Figure 6(a). We also measure CPI and provide the results in 6(b) for different levels of noise . We observe that despite achieving comparable RMSE, COPA+ scores the lowest on the similarity between the true and the estimated factors. On the other hand, TASTE achieves the highest amount of recovery, in accordance to the fact that it achieves the highest CPI among all approaches. Overall, we demonstrate how promoting uniqueness (by enforcing the CPI measure to be preserved [kiers1999parafac2]) leads to more accurate parameter recovery, as suggested by prior work [kiers1999parafac2, williams2018unsupervised].

(a) Average Similarity vs Noise Level
(b) CPI vs Noise Level
Figure 7: Figure 6(a) provides total average similarity between the estimated and the true factor matrices for different noise levels () on synthetic data. Figure 6(b) provides the CPI of three methods for different levels of noise for a synthetic data with K=100, J=30, P=20, , R=4. All points in the figures is computed as an average of 5 random initialization. All three algorithms achieve similar values for RMSE.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description