Taste: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records
Abstract
Phenotyping electronic health records (EHR) focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose Temporal And Static TEnsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with nonnegative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the subproblems, our proposed mathematical reformulations lead to efficient subproblem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 80 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.
1 Introduction
Phenotyping is about identifying patient groups sharing similar clinicallymeaningful characteristics and is essential for treatment development and management [richesson2016clinical]. However, the complexity and heterogeneity of the underlying patient information render a manual phenotyping impractical for large populations or complex conditions. Unsupervised EHRbased phenotyping based on tensor factorization, e.g., [ho2014marble, Perros2018kdd, YQCFP19aaai], provides effective alternatives. However, existing unsupervised phenotyping methods are unable to handle both static and dynamicallyevolving information, which is the focus of this work.
Traditional tensor factorization models [carroll1970analysis, hitchcock1927expression, harshman1970foundations] assume the same dimensionality along with each tensor mode. However, in practice one mode such as time can be irregular, e.g., different patients may vary by the number of clinical visits over time. To handle such longitudinal datasets, [Perros2017dh] and [afshar2018copa] propose algorithms fitting the PARAFAC2 model [Hars1972b] which are faster and more scalable for handling irregular and sparse data. However, these PARAFAC2 approaches only focus on modeling the dynamicallyevolving features for every patient (e.g., the structured codes recorded for every visit). Static features (such as race and gender) which do not evolve are completely neglected; yet, they are among important information in phenotyping analyses (e.g., some diseases have the higher prevalence in a certain race).
To address this problem, we propose a scalable method called TASTE which is able to jointly model both temporal and static features by combining the PARAFAC2 model with nonnegative matrix factorization as shown in Figure 1. We reformulate the original nonconvex problem into simpler subproblems (i.e., orthogonal Procrustes, least square and nonnegativity constrained least square) and solve each of them efficiently by avoiding unnecessary computations (e.g., expensive KhatriRao products).
We summarize our contributions below:

Temporal and Static Tensor Factorization: We formulate a new technique which jointly models static and dynamic features from EHR data as nonnegative factor matrices.

Fast and Accurate Algorithm: Our proposed fitting algorithm is up to faster than the state of the art baseline. At the same time, TASTE preserves model constraints which promote model uniqueness better than baselines while maintaining interpretability.

Case Study on Heart Failure Phenotyping: We demonstrate the practical impact of TASTE through a case study on heart failure (HF) phenotyping. We identified clinicallymeaningful phenotypes confirmed by a cardiologist. Using phenotypes extracted by TASTE, simple logistic regression can achieve comparable predictive accuracy with deep learning techniques such as RNNs.
2 Background & Related Work
Table 1 summarizes the notations used in this paper.
Symbol 
Definition 

*  Elementwise Multiplication 
Khatri Rao Product  
matrix, vector  
the th row of  
the th column of  
element (i,r) of  
Feature matrix of patient  
Extract the diagonal of matrix  
Vectorizing matrix  
Singular value decomposition on  
Frobenius Norm  
max operator replaces negative values in with 0  
All elements in are nonnegative  
2.1 PARAFAC2 Model
The PARAFAC2 model [kiers1999parafac2], has the following objective function:
(1)  
subject to 
where is the input matrix, factor matrix , diagonal matrix , and factor matrix are output matrices. Factor matrix is an orthogonal matrix, and where . SPARTan [Perros2017dh] is a method to fit this model for sparse datasets, as well as COPA [afshar2018copa] incorporates different constraints such as temporal smoothness and sparsity to the model factors to produce more meaningful results.
Uniqueness property ensures that a decomposition is pursuing the true latent factors, rather than an arbitrary rotation of them. The unconstrained version of PARAFAC2 in (1) (without constraints = and = ) is not unique. Assume as an invertible matrix and as diagonal matrices. Then, we can transform as:
which is another valid solution achieving the same approximation error [kiers1999parafac2]  this is problematic in terms of the interpretability of the result. To promote uniqueness, Harshman [Hars1972b] introduced the crossproduct invariance constraint, which dictates that should be constant . To achieve that, the following constraint is added: where , so that: .
2.2 Unsupervised Computational Phenotyping
A wide range of approaches applies tensor factorization techniques to extract phenotypes. [ho2014limestone, ho2014marble, henderson2017granite, Perros2018kdd, zhao2019detecting, zhao2019using, jiang2019combining] incorporate various constraints (e.g., sparsity, nonnegativity, integer) into regular tensor factorization to produce more clinicallymeaningful phenotypes. [Perros2017dh, afshar2018copa] identify phenotypes and their temporal trends by using irregular tensor factorization based on PARAFAC2 [Hars1972b]; yet, those approaches cannot model both dynamic and static features for meaningful phenotype extraction. As part of our experimental evaluation, we demonstrate that naively adjusting such PARAFAC2based approaches to incorporate static information results in biased and less interpretable phenotypes. The authors of [YQCFP19aaai] proposed a collective nonnegative tensor factorization for phenotyping purposes. However, the method is not able to jointly incorporate static information such as demographics with temporal features. Also they do not employ the orthogonality constraint on the temporal dimension, a strategy known to produce nonunique solutions [Hars1972b, kiers1999parafac2].
3 The Taste framework
3.1 Intuition
We will explain the intuition of TASTE in the context of phenotyping application.
Input data include both temporal and static features for all patients:

Temporal features (); For patient , we record the medical features for different clinical visits in matrix where is the number of clinical visits and is the total number of medical features. Note that can be different for different patients.

Static features (): The static features like gender, race, body mass index (BMI), smoking status ^{1}^{1}1Although BMI and smoking status can change over time, in our data set these values for each patient are constant over time. are recorded in where is the total number of patients and is the number of static features. In particular, is the static features for patient.
Phenotyping process maps input data into a set of phenotypes, which involves the definition of phenotypes and temporal evolution. Figure 1 illustrates the following model interpretation. First, phenotype definitions are shared by factor matrices and for temporal and static features, respectively. In particular, or are the column of factor matrix or which indicates the participation of temporal or static features in the phenotype. Second, personalized phenotype scores for patient are provided in the diagonal matrix where its diagonal element indicates the overall importance of the phenotype for patient . Finally, temporal phenotype evolution for patient is specified in factor matrix where its column indicates the temporal evolution of phenotype over all clinical visits of patient .
3.2 Objective function and challenges
We introduce the following objective:
(2)  
subject to  
Objective function has three main parts as follows:

[leftmargin=*]

The first part is related to fitting a PARAFAC2 model that factorizes a set of temporal feature matrices into , diagonal matrix , and .

The second part is for optimizing the static feature matrix where , and . also is the weight parameter. Common factor matrices are shared between static and temporal features by setting =diag().

The third part enforces both nonnegativity of the factor and also minimizes its difference to . Due to the constraint , minimizing is encouraging that is constant over K subjects, which is a desirable property from PARAFAC2 that promotes uniqueness, and thus enhancing interpretability [kiers1999parafac2].

and are weighting parameters which all set by user. For simplicity, we set .
The challenge in solving the above optimization problem lies in: 1) addressing all the nonnegative constraints especially on , 2) trying to make constant over subjects by making nonnegative as close as possible to while can contain negative values, and 3) estimating all factor matrices in order to best approximate both temporal and static input matrices.
3.3 Algorithm
To optimize the objective function (2), we need to update and iteratively. Although the original problem in Equation 2 is nonconvex, our algorithm utilizes the Block Coordinate Descent framework [kim2014algorithms] to mathematically reformulate the objective function (2) into simpler subproblems. In each iteration, we update based on Orthogonal Procrustes problem [schonemann1966generalized] which ensures an orthogonal solution for each . Factor matrix can be solved efficiently by least square solvers. For factor matrices we reformulate objective function (2) so that the factor matrices are instances of the nonnegativity constrained least squares (NNLS) problem. Each NNLS subproblem is a convex problem and the optimal solution can be found easily. We use block principal pivoting method [kim2011fast] to solve each NNLS subproblem. The block principal pivoting method achieved stateoftheart performance on NNLS problems compared to other optimization techniques [kim2011fast]. We provide the details about NNLS problems in the supplementary material section. We also exploit structure in the underlying computations (e.g., involving KhatriRao products) so that each one of the subproblems is solved efficiently. Next, we summarize the solution for each factor matrix.
3.3.1 Solution for factor matrix
We can reformulate objective function 2 with respect to as follows :
(3)  
subject to 
More mathematical details about converting Equation 2 to 3 are provided in supplementary material section. The optimal value of can be computed via Orthogonal Procrustes problem [schonemann1966generalized] which has the closed form solution where and are the right and left singular vectors of . Note that each can contain negative values.
3.3.2 Solution for factor matrix
The objective function with respect to can be rewritten as:
3.3.3 Solution for phenotype evolution matrix
After updating the factor matrices , , we focus on solving for . In classic PARAFAC2 [Hars1972b, kiers1999parafac2], this factor is retrieved through the simple multiplication . However, due to interpretability concern, we prefer temporal factor matrix to be nonnegative because the temporal phenotype evolution for patient k () should never be negative. As shown in the next section, a naive enforcement of nonnegativity () violates the important uniqueness property of PARAFAC2 (the model constraints). Therefore, we consider as an additional factor matrix, constrain it to be nonnegative and minimize its difference to .
The objective function with respect to can be combined into the following NNLS form:
(5)  
subject to 
As we mentioned earlier, we update factor matrix based on block principal pivoting method.
3.3.4 Solution for temporal phenotype definition
Factor matrix defines the participation of temporal features in different phenotypes. By considering Equation 2, factor matrix participates in the PARAFAC2 part with nonnegativity constraint. Therefore, the objective function for factor matrix have the following form:
(6)  
subject to 
In order to update based on block principal pivoting, we need to calculate and for all samples which can be calculated in parallel.
3.3.5 Solution for factor matrix or }
The objective function with respect to yields the following format:
(7)  
subject to  
As we mentioned earlier, factor matrices is shared between PARAFAC2 input and matrix where =diag(). By knowing , Equation 7 can be rewritten as:
(8)  
subject to 
where denotes KhatriRao product. We can solve the rows of factor matrix ( or diag()) separately and in parallel. To update each factor matrix we need to compute two timeconsuming operations: 1) and 2). The first operation can be replaced with where * denotes elementwise(hadamard) product [van2000ubiquitous]. The second operation also can be replaced with [van2000ubiquitous]. Therefore, the timeconsuming KhatriRao Product doesn’t need to be explicitly formed. Each row of W can be efficiently updated via block principal pivoting.
3.3.6 Solution for static phenotype definition
Finally, factor matrix represents the participation of static features for the phenotypes. The objective function for factor matrix have the following form:
(9)  
subject to 
which can be easily updated via block principal pivoting.
3.4 Phenotype inference on new data
Given phenotype definition () and factor matrix for some training set, TASTE can project data of new unseen patients into the existing lowrank space. This is useful because healthcare provider may want to fix the phenotype definition while score new patients with those existing definitions. Moreover, such a methodology enables using the lowrank representation of patients such as () as feature vectors for a predictive modeling task.
Suppose, represents the temporal information of unseen patients and indicated their static information. TASTE is able to project the new patient’s information into the existing lowrank space (, , and ) by optimizing {}, and for the following objective function:
(10)  
subject to  
4 Experimental Results
We focus on answering the following:

Does TASTE preserve accuracy and the uniquenesspromoting constraint, while being fast to compute?

How does TASTE scale for increasing number of patients ()?

Does the static information added in TASTE improve predictive performance for detecting heart failure?

Are the heart failure phenotypes produced by TASTE meaningful to an expert cardiologist?
4.1 Data Set Description
Table 2 summarizes the statistics of data sets.
Dataset  # Patients  # Temporal Features  Mean()  # Static Features 

Sutter  59,480  1164  29  22 
CMS  151,349  284  50  30 
Sutter: The data set contains the EHRs for patients with new onset of heart failure and matched controls (matched by encounter time, and age). It includes 5912 cases and 59300 controls. For all patients, encounter features (e.g., medication orders, diagnosis) were extracted from the electronic health records. We use standard medical concept groupers to convert the available ICD9 or ICD10 codes of diagnosis to Clinical Classification Software (CCS level 3) [slee1978international]. We also group the normalized drug names based on unique therapeutic subclasses using the Anatomical Therapeutic Chemical (ATC) Classification System. Static information of patients includes their gender, age, race, smoking status, alcohol status and BMI.
Centers for Medicare and Medicaid (CMS):^{2}^{2}2https://www.cms.gov/ResearchStatisticsDataandSystems/DownloadablePublicUseFiles/SynPUFs/DE_Syn_PUF.html The next data set is CMS 20082010 Data Entrepreneurs’ Synthetic Public Use File (DESynPUF). The goal of CMS data set is to provide a set of realistic data by protecting the privacy of Medicare beneficiaries by using 5% of real data to synthetically construct the whole dataset. We extract the ICD9 diagnosis codes and convert them to CCS diagnostic categories as in the case of Sutter dataset.
4.2 Evaluation metrics:

RMSE: Accuracy is evaluated as the Root Mean Square Error (RMSE) which is a standard measure used in coupled matrixtensor factorization literature [choi2017fast, beutel2014flexifact]. Given input collection of matrices and static input matrix , we define
(11) denotes the element of input matrix and its approximation through a model’s factors (the element of the product in the case of TASTE). Similarly, is the element of input matrix and is its approximation (in TASTE, this is the element of ).

CrossProduct Invariance (CPI): We use CPI to assess the solution’s uniqueness, since this is the core constraint promoting it [kiers1999parafac2]. In particular we check whether is close to constant () ). The crossproduct invariance measure is defined as:
The range of crossproduct invariance is between , with values close to 1 indicating unique solutions ( is close to constant).

Area Under the ROC Curve (AUC): Examines classification model’s performance when the data is imbalance by comparing the actual and estimated labels. We use AUC on the test set to evaluate predictive model performance.
4.3 Q1. Taste is fast, accurate and preserves uniquenesspromoting constraints
4.3.1 Baseline Approaches:
In this section, we compare TASTE with methods that incorporate nonnegativity constraint on all factor matrices. Note that SPARTan [Perros2017dh] and COPA [afshar2018copa] are not able to incorporate nonnegativity constraint on factor matrices .

Cohen+ [cohen2018nonnegative]: Cohen et al. proposed a PARAFAC2 framework which imposes nonnegativity constraints on all factor matrices based on nonnegative least squares algorithm [kim2014algorithms]. We modified this method to handle the situation where static matrix is coupled with PARAFAC2 input based on Figure 1. To do so, we add to their objective function and updated both factor matrices and in an Alternating Least Squares manner, similar to how the rest of the factors are updated in [cohen2018nonnegative].

COPA+: One simple and fast way to enforce nonnegativity constraint on factor matrix is to compute as: , where is taken elementwise to ensure nonnegative results. Therefore, we modify the implementation in [afshar2018copa] to handle both the PARAFAC2 input and the static matrix and then apply the simple heuristic to make nonnegative. We will show in the experimental results section that how this heuristic method no longer guarantees unique solutions (violates model constraints).
We provide the details about tuning hyperparameters in the supplementary material section.
4.3.2 Results:
Apart from purely evaluating the RMSE and the computational time achieved, we assess to what extent the crossproduct invariance constraint is satisfied [kiers1999parafac2]. Therefore, in Figure 2 we present the average and standard deviation of RMSE, CPI, and the computational time for the approaches under comparison for Sutter and CMS data sets for four different target ranks (). In Figures 1(a), 1(b), we compare the RMSE for all three methods. We remark that all methods achieve comparable RMSE values for two different data sets. On the other hand, Figures 1(c), 1(d) show the crossproduct invariance (CPI) for Sutter and CMS respectively. COPA+ achieves poor values of CPI for both data sets. This indicates that the output factors violate model constraints and would not satisfy uniqueness properties [kiers1999parafac2]. Also TASTE outperforms Cohen significantly on CPI in Figures 1(c) and 1(d). Finally, Figures 1(e), 1(f) show the running time comparison for all three methods on where TASTE is up to and faster than Cohen on Sutter and CMS data sets. Therefore, our approach is the only one that achieves a fast and accurate solution (in terms of RMSE) and preserves model uniqueness (in terms of CPI).
4.4 Q2. Taste is scalable
Apart from assessing the time needed for increasing values of target rank (i.e., number of phenotypes), we evaluate the approaches by comparison in terms of time needed for increasing load of input patients. Each method runs for times and the convergence threshold is set to for all of them. Figure 3 compares the average and standard deviation of total running time for 125K, 250K, 500K, and 1 Million patients for . TASTE is up to faster than Cohen’s baseline for . While COPA+ is a fast approach, this baseline suffers from not satisfying model constraints which promote uniqueness.
4.5 Q3. Static features in Taste improve predictive power
We measure the performance of TASTE indirectly on the performance of classification. So we predict whether a patient will be diagnosed with heart failure (HF) or not. Our objective is to assess whether static features in the way handled by TASTE boost predictive performance by using personalized phenotype scores for all patients () as features.
4.5.1 Cohort Construction:
After applying the preprocessing steps (i.e. removing sparse features and eliminating patients with less than 5 clinical visits), we create a data set with 35113 patients where 3244 of them are cases and 31869 related to controls (9.2 % are cases). For case patients, we know the date that they are diagnosed with heart failure (HF dx). Control patients also have the same index dates as their corresponding cases. We extract 145 medications and 178 diagnosis codes from a 2year observation window and set prediction window length to 6 months. Figure 5 in the supplementary material section depicts the observation and prediction windows in more detail.
4.5.2 Baselines:
We assess the performance of TASTE with 6 different baselines.

RNNregularized CNTF: CNTF [YQCFP19aaai] feeds the temporal phenotype evolution matrices () into an LSTM model for HF prediction. This baseline only uses temporal medical features.

RNN Baseline: We use the GRU model for HF prediction implemented in [choi2016using]. The onehot vector format is used to represent all dynamic and static features for different clinical visits.

Logistic regression with raw dynamic: We create a binary matrix where the rows are the number of patients and columns are the total number of medical features (323). Row of this matrix is created by aggregating over all clinical visits of matrix .

Logistic regression with raw static+dynamic: Same as the previous approach, we create a binary matrix where the rows are number of patients and columns are the total number of temporal and static features (345) by appending matrix to raw dynamic baseline matrix.

COPA Personalized Score Matrix: We use the implementation of pure PARAFAC2 from [afshar2018copa] which learn the lowrank representation of phenotypes () from training set and then projecting all the new patients onto the learned phenotypic lowrank space.

COPA (+static) Personalized Score Matrix: This is same as previous baseline, however, we incorporate the static features into PARAFAC2 matrix by repeating the value of static features of a particular patient for all encounter visits.
Results:
Figure 4 shows the average of AUC for all baselines and TASTE. For COPA, COPA(+static), CNTF and TASTE we report the AUC score for different values of R ({5,10,20,40,60,80,100}). TASTE improves the AUC score over a simple nonnegative PARAFAC2 model (COPA and COPA(+static)) and CNTF which suggests: 1) incorporating static features with dynamic ones will increase the predictive power (comparison of TASTE with COPA and CNTF). 2) incorporating static features in the way we do in TASTE improves predictive power (comparison of TASTE and COPA(+static)). TASTE with R=80 (AUC=) performs slightly better than the RNN model. Training details are provided in the supplementary material section.
4.6 Q4. Heart Failure Phenotype Discovery
Heart failure (HF) is a complex, heterogeneous disease and is the leading cause of hospitalization in people older than 65 ^{3}^{3}3https://www.webmd.com/heartdisease/guide/diseasescardiovascular#14. However, there are no welldefined phenotypes other than the simple categorization of ejection fraction of the heart (i.e., preserved or reduced ejection fraction). With the comprehensive collection of available longitudinal EHR data, now we have the opportunity to computationally tackle the challenge of phenotyping HF patients.
4.6.1 Cohort Construction:
We select the patients diagnosed with HF from the EHRs in Sutter dataset. We extract 145 medications and 178 diagnosis codes from a 2year observation window which ends 6 months before the heart failure diagnosis date (HFdx)^{4}^{4}4Figure 5 in the supplementary material section presents the observation window in more detail.. The total number of patients (K) is 3,244 (the HF case patients of Sutter dataset).
4.6.2 Findings of HF Phenotypes
TASTE extracted 5 phenotypes which are all confirmed and annotated by an expert cardiologist ^{5}^{5}5More detail about the strategy for finding the optimal number of phenotypes is provided in the supplementary material section.. The clinical description of all phenotype are provided by the cardiologist:

Atrial Fibrillation (AF): This phenotype represents patients with irregular heartbeat and AF predisposes to HF. Medications are related to managing AF and preventing strokes. This phenotype is usually more prevalent in male and old patients (i.e. 80 years or older).

Hypertensive Heart Failure: This is a classic and dominant heart failure phenotype, representing a subgroup of patients with long history of hypertension, and cardiac performance declines over time. Antihypertensive medications are spelled out as to indicate the treatment to hypertension.

Obese Induced Heart Failure: This phenotype is featured by severe obesity (BMI>35) and obesity induced orthopedic condition.

Cardiometablic Driving Heart Failure: This phenotype is featured by diabetes and cardiometabolic conditions (i.e. hyperlipidemia, hypertension). Diabetes is a well known risk factor for cardiovascular complications (i.e. stroke, myocardial infaction, etc.), and increases the risk for heart failure.

Coronary Heart Disease Phenotype: This phenotype is associated with a greater deterioration of left ventricle function and a worse prognosis. This phenotype is also more prevalent in male and white population.
More detail about all phenotypes is provided in the supplementary material section.
5 Conclusions
TASTE jointly models temporal and static information from electronic health records to extract clinically meaningful phenotypes. We demonstrate the computational efficiency of our model on extensive experiments that showcase its ability to preserve important properties underpinning the model’s uniqueness, while maintaining interpretability. TASTE not only identifies clinically meaningful heart failure phenotypes validated by a cardiologist but the phenotypes also retain predictive power for predicting heart failure.
6 Acknowledgements
This work was in part supported by the National Science Foundation award IIS1418511, CCF1533768 and IIS1838042, the National Institute of Health award 1R01MD01168201 and R56HL138415.
References
7 Supplementary Material
7.1 NonNegativity constrained Least Squares (NNLS)
NNLS problem has the following form:
(12)  
subject to 
Here, , and where . NNLS is a convex problem and the optimal solution of 12 can be easily found. In this paper, we use the block principal pivoting method [kim2011fast] to solve NNLS problems. Authors showed block principal pivoting method has the stateoftheart performance.
7.2 More details on Taste framework
7.2.1 Updating factor matrix
We can rewrite objective function 2 with respect to based on trace properties [petersen2008matrix] as:
(13)  
subject to 
After removing the constant terms and applying , we have:
(14)  
subject to 
7.2.2 Updating factor matrix
is a rectangular orthogonal matrix (). We also introduce where and . Now is a square orthogonal matrix as follows:
(15)  
Now we multiply with equation LABEL:H_equation as follow:
(16)  
where is a constant and independent of the parameter under minimization. Therefore, the value of that minimizes also minimizes
7.3 Implementation details
TASTE is implemented in MATLAB. To enable reproducibility of our work, we attached a zip file containing TASTE code. All the approaches (including the baselines) are evaluated on MatlabR2017b. We utilize the capabilities of Parallel Computing Toolbox of Matlab by activating parallel pool for all methods. For Sutter and CMS dataset, we used 12 workers. For the prediction task, we use the implementation of regularized logistic regression from Scikitlearn machine learning library of python 3.6.
P1. Atrial Fibrillation (AF)  Weight 

Dx_Cardiac dysrhythmias [106.]  1 
Rx_Coumarin Anticoagulants  0.79597 
Dx_Heart valve disorders [96.]  0.57526 
Dx_Coronary atherosclerosis and other heart disease [101.]  0.465406 
Rx_Beta Blockers CardioSelective  0.348005 
Dx_Conduction disorders [105.]  0.246467 
Static_white  0.270443 
Static_age_greater_80  0.245424 
Static_Non_Hispanic  0.245373 
Static_male  0.200464 
Static_Alchohol_yes  0.197431 
Static_Smk_Quit  0.167838 
P2. Hypertensive Heart Failure:  Weight 
Dx_Essential hypertension [98.]  0.705933 
Rx_Calcium Channel Blockers  0.684211 
Rx_ACE Inhibitors  0.578426 
Rx_Beta Blockers CardioSelective  0.564413 
Rx_Angiotensin II Receptor Antagonists  0.250883 
Dx_Chronic kidney disease [158.]  0.14528 
Rx_Thiazides and ThiazideLike Diuretics  0.134521 
Static_Non_Hispanic  0.536399 
Static_female  0.390343 
Static_Smk_NO  0.363068 
Static_Alchohol_No  0.360158 
Static_white  0.345942 
Static_Alchohol_yes  0.264664 
Static_age_between_70_79  0.26237 
Static_Severely_obese  0.252981 
P3. Obese Induced Heart Failure:  Weight 
Dx_Other back problems  0.438879 
Rx_Opioid Combinations  0.414518 
Dx_Other connective tissue disease [211.]  0.304155 
Dx_Other nontraumatic joint disorders [204.]  0.269159 
Dx_Osteoarthritis [203.]  0.190075 
Rx_Opioid Agonists  0.185262 
Dx_Intervertebral disc disorders  0.156421 
Rx_Benzodiazepines  0.150865 
Static_female  0.548758 
Static_Non_Hispanic  0.52183 
Static_white  0.474545 
Static_Alchohol_No  0.404191 
Static_Smk_Quit  0.273576 
Static_Severely_obese  0.226485 
Static_age_between_70_79  0.180683 
Static_age_between_60_69  0.180006 
P4. Cardiometablic Driving Heart Failure:  Weight 
Dx_Diabetes mellitus without complication [49.]  0.712628 
Dx_Disorders of lipid metabolism [53.]  0.507946 
Rx_Biguanides  0.48821 
Dx_Immunizations and screening for infectious disease [10.]  0.358349 
Rx_Diagnostic Tests  0.353441 
Static_Alchohol_No  0.247165 
Static_Non_Hispanic  0.190547 
Static_Smk_NO  0.184782 
Static_male  0.164795 
Static_white  0.15693 
Static_age_between_60_69  0.131118 
P5. Coronary Heart Disease Phenotype:  Weight 
Dx_Disorders of lipid metabolism [53.]  0.984987 
Rx_HMG CoA Reductase Inhibitors  0.442637 
Dx_Coronary atherosclerosis and other heart disease [101.]  0.265414 
Dx_Diabetes with renal manifestations  0.229549 
Dx_Chronic kidney disease [158.]  0.203386 
Dx_Other thyroid disorders  0.101615 
Static_Non_Hispanic  1 
Static_white  0.998604 
Static_male  0.957132 
Static_Smk_Quit  0.773904 
Static_Alchohol_yes  0.604641 
7.4 More details on Q1. Taste is fast, accurate and preserves uniquenesspromoting constraints
7.4.1 Setting hyperparameters:
We perform a grid search for and for TASTE and Cohen+ for different target ranks () and we run each method with specific parameter for 5 random initialization and pick the best values of and based on the lowest average RMSE value. For COPA+, we just search for the best value of since it does not require parameter.
7.5 More details on Q3. Static features in Taste improve predictive power
7.5.1 Training Details:
We train a Lasso Logistic Regression for all the baselines under comparison. We divide the patient records into training (), validation (), and test sets () and use them to evaluate all baselines. Lasso Logistic Regression has regularization parameter (). For all 6 baselines, we just need to tune parameter . However, for TASTE we need to perform a 3D grid search over and and . We first train TASTE on training set and learn the phenotype lowrank representation (, ) and then train Lasso Logistic Regression classifier based on patient lowrank representation () and corresponding labels. Then we project the patients in validation set on phenotype lowrank representation and learn their personalized phenotype scores . Next, we feed into a Lasso Logistic Regression and calculate the AUC score. Finally, we pick the best parameters based on the highest AUC score on validation set and report the AUC on test data. The training phase for COPA, COPA(+static) and CNTF is same as TASTE.
7.6 More details on Q4. Heart Failure Phenotype Discovery
7.6.1 Hyperparameter search:
The optimal number of phenotypes (R), , and need to be tuned. So, We perform a 3D grid search for , and based on the stabilitydriven metric provided in [wu2016stability]. Stabilitydriven metric calculates the dissimilarity between a pair of factor matrices from two different initialization (,’) using the crosscorrelation matrix (C) as follows: where diss(,’) denotes the dissimilarity between factor matrices ,’. Also crosscorrelation matrix () computes the cosine similarity between all columns of ,’. We run TASTE model for 10 different random initialization and compute the average dissimilarity for each value of , , and . Figure 6 shows the best value of stabilitydriven metric for different values for R. The best solution corresponding to were chosen as the one achieving the lowest average dissimilarity. Then, for fixed we present the phenotypes of the model achieving the lowest RMSE among the 10 different runs.
7.6.2 Phenotype definition:
We provide the details of all 5 phenotypes discovered by TASTE in Table 3.
7.6.3 Pure PARAFAC2 cannot handle static feature integration.
In this Section, we illustrate the fact that a naive way of incorporating static feature information into a simpler PARAFAC2based framework [afshar2018copa] would result in poor, less interpretable phenotypes. We incorporate the static features into PARAFAC2 input by repeating the value of static features on all clinical visits of the patients same as what we did for COPA(+static). For instance, if the male feature of patient k has value 1, we repeat the value 1 for all the clinical visits of that patient. Then we compare the phenotype definitions discovered by TASTE (matrices , ) and by COPA (matrix ). Table 4 contains two sample phenotypes discovered by this baseline, using the same truncation threshold that we use throughout this work (we only consider features with values greater than ). We observe that the static features introduce a significant amount of bias into the resulting phenotypes: the phenotype definitions are essentially dominated by static features, while the values of weights corresponding to dynamic features are closer to . This suggests that pure PARAFAC2based models such as the work in [afshar2018copa] are unable to produce meaningful phenotypes that handle both static and dynamic features. Such a conclusion extends to other PARAFAC2based work which does not explicitly model side information [kiers1999parafac2, Perros2017dh, cohen2018nonnegative].
Phenotype 1  weight 

Static_Alcohol_yes  0.3860 
Static_White  0.2160 
Static_Non_Hispanic  0.2064 
Static_Smk_Quit  0.1743 
Static_male  0.1508 
Static_moderately_obese  0.1025 
Phenotype 2  weight 
Static_age_between_70_79  1 
Static_Non_Hispanic  0.8233 
Static_White  0.7502 
Static_Alcohol_No  0.6905 
Static_moderately_obese  0.2098 
Static_male  0.2026 
Static_Smk_No  0.1614 
7.7 Recovery of true factor matrices
In this section, we assess to what extent the original factor matrices can be recovered through synthetic data experiments ^{6}^{6}6The reason that we are working with synthetic data here is that we do not know the original factor matrices in real data sets.. We demonstrate that: a) TASTE recovers the true latent factors more accurately than baselines for noisy data; and b) the baseline (COPA+) which does not preserve a high CPI measure (which is known to be theoretically linked to uniqueness [kiers1999parafac2] ^{7}^{7}7A discussion on uniqueness can be found in Background and & Related Work Section.) fails to match TASTE in terms of latent factor recovery, despite achieving similar RMSE.
7.7.1 Evaluation Metric:
Similarity between two factor matrices: We define the cosine similarity between two vectors , as =. Then the similarity between two factor matrices , can be computed as (similar to [kiers1999parafac2]):
The range of is between [0,1] and the values near 1 indicate higher similarity.
7.7.2 Synthetic Data Construction:
We construct the groundtruth factor matrices , , , by drawing a number uniformly at random between (0,1) to each element of each matrix. For each factor matrix , we create a binary nonnegative matrix such that and then compute =. After constructing all factor matrices, we compute the input based on and . We set , and . We then add Gaussian normal noise to varying percentages of randomlydrawn elements () of and input matrices.
7.7.3 Results:
For all approaches under comparison, all three methods achieve same value for RMSE, therefore, we skip ploting RMSE vs different noise levels. We assess the similarity measure between each ground truth latent factor and its corresponding estimated one (e.g., ), and consider the average measure across all output factors as shown in Figure 6(a). We also measure CPI and provide the results in 6(b) for different levels of noise . We observe that despite achieving comparable RMSE, COPA+ scores the lowest on the similarity between the true and the estimated factors. On the other hand, TASTE achieves the highest amount of recovery, in accordance to the fact that it achieves the highest CPI among all approaches. Overall, we demonstrate how promoting uniqueness (by enforcing the CPI measure to be preserved [kiers1999parafac2]) leads to more accurate parameter recovery, as suggested by prior work [kiers1999parafac2, williams2018unsupervised].