Financial Forecasting and Analysis for Low-Wage Workers

Financial Forecasting and Analysis for Low-Wage Workers

Wenyu Zhang Cornell UniversityIthacaNew York Raya Horesh IBM Research AIYorktown HeightsNew York Karthikeyan Natesan Ramamurthy IBM Research AIYorktown HeightsNew York Lingfei Wu IBM Research AIYorktown HeightsNew York Jinfeng Yi IBM Research AIYorktown HeightsNew York Kryn Anderson Neighborhood Trust Financial PartnersNew YorkNew York  and  Kush R. Varshney IBM Research AIYorktown HeightsNew York43017-6221

Despite the plethora of financial services and products on the market nowadays, there is a lack of such services and products designed especially for the low-wage population. Approximately 30% of the U.S. working population engage in low-wage work, and many of them lead a paycheck-to-paycheck lifestyle. Financial planning advice needs to explicitly address their financial instability.

We propose a system of data mining techniques on small-scale transactions data to improve automatic and personalized financial planning advice to low-wage workers. We propose robust methods for accurate prediction of bank account balances and automatic extraction of recurring transactions and unexpected large expenses. We formulate a hybrid method consisting of historical data averaging and a regularized regression framework for prediction. To uncover recurring transactions, we use a heuristic approach that capitalizes on transaction descriptions. Our methods achieve higher performance compared to conventional approaches and state-of-the-art predictive methods in real financial transactions data.

In collaboration with Neighborhood Trust Financial Partners, the proposed methods will upgrade the functionalities in WageGoal, Neighborhood Trust Financial Partners’ web-based application that provides budgeting and cash flow management services to a user base comprising mostly low-income individuals. The proposed methods will therefore have a direct impact on the individuals who are or will be connected to the product.

copyright: rightsretained

1. Background and Motivation

Low-wage workers make up a large portion of the working population, and need the most help in financial planning. The U.S. Bureau of Labor Statistics defines low-wage work in three ways (Fusaro and Shaefer, 2016):

  • Wage ($9.25/hr) lifting a family of two above poverty line,

  • Wage ($10.75/hr) lifting a family of three above poverty line,

  • Wage ($13.50/hr) lifting a family of three to 125% of the poverty line,

where corresponding wages in the U.S. in 2013 are in parenthesis. Approximately 15,000,000 U.S. workers earned up to $9.25/hr and 36,000,000 earned up to $13.50/hr in 2013. These are equivalent to 12.63% and 29.54% of the U.S. working population respectively. To put the wages into perspective, $13.50/hr amounts to a monthly income of just over $2000, while the median rent for one-bedroom apartments across 50 major U.S. cities is already $1200 (Insider, 2017b).

Many low-income households live paycheck-to-paycheck, at risk of financial instability in the event of medical, job or other unforseen emergencies when they are forced to take up loans and in the process incur additional charges. Reference (Pew Charitable Trusts, 2016) reports that the largest U.S. banks charged $11.6 billion in overdraft and insufficient fund fees in 2015, of which a significant portion is attributed to the poor. To make matters worse, the average debit charge triggering such fees is $24, while the typical overdraft fee is $35.

The poor are affected by obstacles including high cost of personalized financial services, lack of suitable banking services (Barr, 2004), and hassle of seeking financial services (Mullainathan and Shafir, 2010). We hope to alleviate these obstacles through data-driven solutions. In this paper, we propose a system of data mining techniques that can be used to improve automatic and personalized financial planning advice to low-wage workers. We work with anonymized checking, savings, and credit card account transactions data. User identification information such as age, gender, and size of household is not used. We also work in the small data scenario where low-income individuals may not have a long banking history due to the aforementioned obstacles.

This work is motivated by use cases for Neighborhood Trust Financial Partners’ WageGoal app which caters primarily to low-income individuals (Neighborhood Trust Financial Partners, 2017). Figure 1 contains screenshots illustrating some current functionalities such as giving a cash flow snapshot of the user’s income, bills and expenses, and helping with cash flow management. We aim to improve the accuracy of the forecast and information extraction functionalities. As illustrated in Figure 2, our main contributions are to propose methods for:

  • Short- and long-term prediction of bank account balances with improved accuracy;

  • Automatic extraction of recurring transactions and unexpected large expenses.

Figure 1. Some current functionalities of WageGoal.
Figure 2. Proposed functionalities for WageGoal.

The functionalities inform users about their possible future spending behavior so that they have enough time to adjust and to save up for emergencies. The main difficulties of these tasks originate from the multifaceted nature of transactions data. A user’s spending depends on individual needs and historical spending, but can also exhibit patterns similar to other users. Moreover, salary, bills and other recurring transactions provide characteristic features of a user’s spending behavior, but these cyclic patterns can be noisy and inconsistent at times. Our methods address these difficulties by both effectively mining a user’s recurring transactions and borrowing strength from the spending patterns of other users. We tested on two real financial transactions datasets, one is a smaller dataset from actual WageGoal users, and the other is a larger publicly available dataset from PKDD’99 Discovery Challenge. Our methods achieve higher performance compared to conventional approaches and state-of-the-art prediction methods on both datasets.

We differentiate the proposed functionalities from those already offered by personal finance apps on the market (Insider, 2017a). The apps mainly track cash flow and provide simple budgeting tools, such as calculating an average daily “spendable” amount based on a user’s income and saving goal. For low-wage workers whose bank balance can hover dangerously around zero, more accurate estimates and finer-grained financial analysis are necessary.

2. Transactions Data: Overview

The WageGoal app collects users’ transactions through authorized accounts using a third-party service Plaid (Plaid, 2018). Each transaction is associated with an account ID, date, description or merchant name, amount and category. Table 1 shows a sample snippet of the transactions captured. There are a total of 11 categories, namely Bank Fees, Cash Advance, Community, Food and Drink, Healthcare, Interest, Payment, Recreation, Service, Shops, and Travel. Uncategorized transactions are labeled NA. In the WageGoal Dataset, 28% of the transactions are uncategorized. The category labels are provided through Plaid and we do not attempt to address the problem of missing labels in this paper. Daily account balances are retrospectively calculated from the current balance.

Date Description $ Category
1 6/22/2016 IKEA 20 Shops
2 6/22/2016 Target 10 NA
1 6/24/2016 Starbucks 15 Food & Drink
3 6/24/2016 Interest -0.01 Interest
3 6/24/2016 Direct Deposit -1000 Transfer
Table 1. WageGoal Dataset: Examples of transactions data.

2.1. Initial Cluster Analysis

The WageGoal dataset, further described in Section 5.1.1, consists 19 users with approximately one year of transactions. Using Dynamic Time Warping (DTW) distance, we cluster the overall balance sequences for the last available month by hierarchical agglomerative clustering. Balance sequences are set to zero-mean since the mean does not affect the spending pattern. We use window = 2 for DTW to allow for slight misalignments in the time series.

Setting the number of clusters to five, two clusters consist one user each. For the three other clusters, we plot their average category-wise spending in Figure 3, which shows a clear distinction between the less and more well-off users. The least well-off (blue) group spent less and also incurred more bank fees compared to the moderate (green) and the most well-off (yellow) groups. In the four categories not shown, the three groups spent similarly.

(a) Transfer (negative spending = income)
(b) Selected categories
Figure 3. WageGoal Dataset: Average category-wise spending for each cluster of users: blue, green and yellow.

Temporal patterns in balance sequences can help to distinguish the users’ financial status. Hence, we devise our prediction method to borrow strength from the data of similar users.

3. Related Work

Most works on transactions data make use of RFM (Recency, Frequency and Monetary value) to define and analyze customer value (Birant, 2011; Khajvand et al., 2011; Chang and Tsai, 2011). However, these summary statistics mask too many details for our purpose. Instead, we devise data mining and time series techniques to extract more information from the raw data and also make use of similarities amongst users’ spending behaviors to effectively improve prediction.

3.1. Prediction

Two-part models are popular in modeling household finances, medical expenditure and other data with nonnegative values (Brown et al., 2015; Min and Agresti, 2002; Mullahy, 1998; Neelon et al., 2011). These models can be formulated to cluster the subjects such that each cluster receives different parameter values. In our application, we need to consider both positive and negative spending (i.e., income). Moreover, two-part models often use covariates as part of the binary and continuous component representations. Covariates include time-related variables and also class-membership variables, such as gender and employee vs. dependency status in (Neelon et al., 2011). Such covariates need extra manual effort to construct or are simply not available when we work with anonymized data.

A traditional time series model is the ARMA (autoregressive moving average) which regresses the current variable value on past values and error terms (Shumway and Stoffer, 2006). Seasonal ARMA models have been used on periodic data such as traffic flow (Williams and A. Hoel, 2003). Since transactions data contain multiple seasonal components and these components may not have fixed periodicities as we explain in Section 3.2, a plain seasonal ARMA method is not suitable. Zhang (Zhang, 2003) uses a neural network to model residuals from the ARMA model, while the authors in (Gonzaga Baca Ruiz et al., 2016) directly use neural networks for energy consumption prediction. However, these approaches require large quantities of data to provide good predictions.

In other data-driven time series analysis literature, Taylor and Letham (Taylor and Letham, 2017) implements a regression-based method that is fast and includes holiday effects, but it models only weekly seasonality and requires handcrafted variables to indicate holiday effects. Approaches reported in (Alvarez et al., 2011; L. Scott and R. Varian, 2014; Zhang et al., 2013) extract similar sequences in historical data to incorporate more diverse patterns. Authors of (Alvarez et al., 2011) and (Zhang et al., 2013) use KNN regression and provide predictions by taking unweighted or weighted averages of the samples immediately after the matched sequences, but these may not be robust enough against anomalous data. Such “anomalies” or spikes in expenditure are not uncommon in financial transactions. To induce sparsity on regression coefficients, (L. Scott and R. Varian, 2014) uses a spike-and-slab prior. Markov Chain Monte Carlo methods are used to estimate the parameters in the full Bayesian model, but they take significant computation time to iteratively forecast multiple future days, one day at a time.

While the aforementioned methods are suitable for general “well-behaved” time series, we incorporate design features suitable for transactions data, while working under the framework that no further data annotations should be required and that data may be limited at the early stages of user enrollment. We recognize that different bank account types and prediction horizons require different treatment, and hence propose a hybrid approach to address the different scenarios. One aspect of our approach works on the level of individual transactions and relies on extracted recurring transactions, while the other works from the holistic point-of-view of the overall balance series, where we extract similar sequences and use a computationally efficient regularized regression scheme to penalize anomalous sequences. Another key innovation is that we align the extracted sequences using landmark transaction events before regressing, which is observed to increase prediction accuracy.

3.2. Finding recurring transactions

For time series applications, it is common practice to identify periodicities by transforming the series into the frequency domain. A Lomb-Scargle periodogram approach to treat missing values and unevenly-spaced time points in finding periodicity in gene expression patterns is proposed in (Glynn et al., 2006). There are also methods such as (Elfeky et al., 2005) which address the problem directly in the time domain. Although some of these methods are capable of detecting multiple periodicities, they require the period of each cyclic component to be consistent across time. In transactions data, there can be significant jitter in the periodicity for recurring transactions such as pays due to differences in the number of days each month and the presence of holidays. Moreover, just using numerical values is insufficient to identify recurring transactions since there is substantial noise, and this is especially so for recurring transactions with small dollar amounts. Manually constructing and maintaining a complete biller’s list for each user’s recurring transactions is ideal but tedious. Hence in this paper, we propose a procedure that automatically identifies possible recurring transactions. The procedure takes into account the inconsistent nature of transactions and better distinguish between transactions through their textual descriptions.

4. Methods and Technical Solutions

We propose methods that use transactions data as described in Section 2. The main challenges of working with transactions data versus conventional numerical time series are the presence of:

  • Text description for each transaction;

  • Multiple noisy and inconsistent periodic patterns;

  • Spikes in spending.

4.1. Prediction of account balances

We predict account balances up to 31 days ahead. This forecast horizon encompasses two semimonthly paydays to give users sufficient time and information to plan their finances. We propose a historical data averaging method HistAvg which is more suited for short-term prediction and accounts with minimal transactions, and a regularized least squares method SubseqLS which is more suited for long-term prediction of accounts with distinct cyclic patterns. To effectively address the nuances in modeling different types of accounts, we further propose a hybrid method HistAvg-SubseqLS where the first days are predicted by HistAvg and the next days are predicted by SubseqLS.

4.1.1. HistAvg

HistAvg predicts daily spending and is adapted from the current implementation in WageGoal. The original version uses a biller’s list to extract bills, while we use recurring transactions found by our proposed procedure in Section 4.2. Predicting spending using past three months’ transactions is performed by:

  1. Removing recurring transactions;

  2. Removing top 10% of transactions;

  3. Calculating daily basic spending as the average amount spent daily according to the remaining transactions;

  4. Estimating future spending as the sum of daily basic spending and any recurring transaction to fall on that day.

Top 10% of transactions are excluded from the calculation of the daily basic spending since these are typically one-time or rare purchases. Finally, the account balance is estimated by summing the previous day’s balance and the estimated spending on that day.

4.1.2. SubseqLS

We assume that a similar balance history implies a similar future save for some “anomalies”. This motivates us to make use of all available historical data across users for prediction.

SubseqLS predicts one-day ahead by first setting the target account’s balance sequence from the immediate past as the length query vector . It then selects balance sequences that are similar to in its first values from historical balances of all users. Finally, it determines the best weights for the sequences to match . We combine the M sequences linearly for simplicity of the model, but more flexible combinations are also possible in principle. Essentially, for account of user , we consider

for some transformation and estimate the coefficients for a good prediction performance. Weights determined in traditional KNN regression methods tend to overfit to the query and are not robust to “anomalies”, so we regularize the estimation to avoid this.

Let be the current date, and be the number of days to predict. We set so that is sufficiently long to capture most recurring transactions. To recap notations, is the query for some account of some user , and is the -day ahead balance to be predicted. Each selected sequence is in length. We use DTW (Dynamic Time Warping) distance with window = 2 to measure sequence similarity to allow slight misalignments, and iteratively find each using the fast search in (Rakthanmanon et al., 2012). We then locally expand or contract the sequences to adjust for temporal variations through function , which aligns each to a template of payday events, since paydays mark the start of cyclic spending patterns. Payday estimation is described in Section 4.2. Denoting the aligned sequence as , the first subsequence with length is used to match , and the second with length is used for prediction. Note that we let the subscripts of match those of for ease of notation, which means that is not necessarily an observation at time but it is just matched to .

We outline SubseqLS algorithm for one-day ahead prediction below. All sequences mentioned are standardized.

  1. Set query vector consisting user U, account A’s daily balance from to .

  2. Find sequences of length most similar to in DTW distance in the first values.

  3. Create a template of indicators marking user U’s paycheck deposits into account A between and , with magnitude being the paycheck values. Set to be the function that aligns any sequence to this template by DTW.

  4. Align each such that .

  5. Estimate and nonnegative coefficients which minimize the following objective

    where , and obtain and .

  6. Estimate .

(a) Original sequences
(b) Aligned sequences
Figure 4. Effect of aligning to template of payday landmarks: black solid line is query , colored dotted lines are first values of matched sequences.

Aligning in Step 4 is an essential adjustment for temporal variations of matched sequences. Figure 4 shows how this preserves the exact cyclic pattern of .

We set in Step 2 so that similarity matching is done on a sufficiently long sequence that captures at least one semimonthly pay period. We let in the objective in Step 5 to eliminate sequences in which do not consistently match . Matrix also penalizes based on anomalous predictions of each . For the weights , we use 1 if , 5 if and 10 if . This choice emphasizes more accurate estimation of the tail of since that is the closest to the start of prediction, and is found to work well empirically. SubsegLS forecasts the entire future period by iteratively predicting for one day at a time.

4.2. Extraction of recurring transactions and unexpected large expenses

We propose a procedure to automatically extract recurring transactions in each account. The identified recurring transactions are used to improve prediction accuracy as in Section 4.1, and are also directly displayed to remind users of upcoming bills they need to pay to avoid penalty charges.

Recurring transactions include bills and other periodic behavior such as semimonthly salary, monthly recurring transfer between accounts, and weekly grocery shopping. Recurring transactions are split into monthly, semimonthly, biweekly and weekly frequency. For a spending category C, for each transaction and frequency in question, we look for transactions with similar descriptions in the past few dates satisfying the frequency constraint. We illustrate the procedure for extracting monthly charges in Figure 5.

Figure 5. Procedure to extract monthly charges.

In this example, we start with a window of 7 days, obtain all transactions within, and call this . Then, we backtrack by 31 days and retrieve all the transactions 31 days before (in a window of 7 days) that have descriptions similar to those in . We repeat this procedure multiple times to ensure that the remaining transactions indeed have the desired frequency. In our application, we repeat this procedure till we obtain 4 windows of transactions. The transactions remaining in the last window are identified as recurring charges.

For monthly and semimonthly charges, we use a 7-day window to accommodate differing lengths of months, and use a 2-day window for weekly and biweekly charges to accommodate small shifts in spending due to holidays, etc. To compare transaction descriptions, we use the difflib module (dif, 2017) in Python which iteratively finds the longest contiguous matching subsequences excluding junk elements, with a similarity threshold ratio of to accommodate insignificant differences such as dates and reference numbers. Additional bills may be found through a biller’s list. In practice, we use both our method and the biller’s list to complement each other.

To predict the next occurrence of a monthly transaction, we estimate the date as the last observed date of the transaction plus one month, and the amount as the average of historical transactions. We do similarly for semimonthly, biweekly and weekly transactions.

We further use the recurring transactions identified to determine unexpected or anomalous large expenses. Such expenses can be displayed to each user to prepare them for otherwise unforeseen spending. We do the following steps on each user’s transactions:

  1. Remove all recurring transactions;

  2. Retain top 10% of remaining transactions;

  3. Remove transactions with similar descriptions.

The extracted expenses are pooled across all users, and the list of expenses displayed to a user can be personalized depending on their characteristics (e.g., car owner, a person with children, etc.).

5. Empirical Evaluation

5.1. Data Description

We use data collected through WageGoal for evaluation. Since WageGoal is a relatively new app, the dataset is limited in terms of the number of users and the length of usage. We include an additional financial dataset from the PKDD’99 Discovery Challenge for the prediction task to show the performance of our proposed methods on both small and large datasets. We note that the PKDD’99 Dataset is not specific to low-wage workers.

5.1.1. WageGoal Dataset

This dataset is collected from 19 individuals, with approximately one year’s worth of financial transactions from June 21, 2016, to June 16, 2017, in checking, savings, and credit card accounts. There are a total of 52 accounts of which 16 are checking, 19 are savings, and 17 are other accounts including credit cards. Each line item in the data includes date, description, amount, category and final account balance as described in Section 2. The one year’s worth of data is split into a training period of nine months and a testing period of three months. In this dataset, all users have semimonthly income.

5.1.2. PKDD’99 Dataset

This is a publicly available dataset containing real anonymized bank transactions dating from January 1, 1993 to December 31, 1998 (Berka and Sochorova, 2018). We want to test our methods in the scenario where we have access to long historical data, and hence retains the 2263 accounts in this dataset which have at least four years of data. Accounts have an average of 52.367 transactions per year, with the minimum and maximum at 52.310 and 52.479 respectively. Due to the sparsity of transactions, we consider weekly instead of daily balances. The six years’ worth of data is split into a training period of four-and-a-half years and a testing period of one-and-a-half years. Each line item in the data includes date and amount. No information on description and category of transaction are provided, and hence this dataset is not used to evaluate any other task besides prediction.

5.2. Prediction of account balances

From the test set, 25 length-31 sequences are randomly chosen for prediction. We compute two different error measures for evaluation:

  • MAE (Mean Absolute Error), the average absolute difference between the actual and the predicted account balance;

  • Average difference in dollar amounts between the true and predicted balances when the true balance becomes negative.

The point in time at which balances become negative is of special interest because accounts will be charged penalty fees from that point onwards. We use only the first error measure on the PKDD’99 dataset. The second error measure is not applicable because true account balances in the PKDD’99 dataset are unknown and we arbitrarily set all account balances to start at 0. To calculate the error measures, we scale all accounts to have variance equal to 100 so that they contribute approximately equally.

We compared the performance of the individual methods HistAvg and SubseqLS, the hybrid HistAvg-SubseqLS, as well as Prophet, ARMA, NearestNeighbor and KNN averaging. Prophet is a state-of-the-art forecasting method from Facebook (Taylor and Letham, 2017) that uses a regression model to fit a linear trend, and incorporates weekly seasonality and holiday effects by marking them through indicator variables. We used paydays in lieu of holiday effects. ARMA is a well-established model for stationary stochastic processes (Shumway and Stoffer, 2006), and we implemented ARMA with parameters found through the statsmodels module using default arguments (sta, 2017). K-Nearest Neighbor is a popular nonparametric approach that is flexible and applied widely in diverse domains such as traffic flow and energy (Alvarez et al., 2011; Zhang et al., 2013). In NearestNeighbor, only the top matched sequence to the query is used by directly taking the value immediately after the match as the one-day-ahead prediction, and in KNN averaging, the top 10 matched sequences are used and the one-day-ahead prediction is the average of the values immediately after all 10 matches.

5.2.1. Prediction with WageGoal Dataset

The WageGoal Dataset is split into two types of accounts, those with paycheck income transactions (20 accounts), and those without (32 accounts). The former demonstrates more pronounced cyclic pattern as in Figure 7 and 7. Hence, the two types of accounts need different treatments.

(a) Example balance sequences
(b) Balance standard deviations
(a) Example balance sequences
(b) Balance standard deviations
Figure 6. WageGoal Dataset: Paycheck accounts
Figure 7. WageGoal Dataset: Non-paycheck accounts
Figure 6. WageGoal Dataset: Paycheck accounts

The training set is used to optimize the number of matches and penalty parameter in SubseqLS by cross-validation and also to determine the switching parameter for HistAvg-SubseqLS. Search range for is in multiples of 5 between 5 and 25, values are between 0 and 10, and values are integers between 0 and . For paycheck accounts, and . For non-paycheck accounts, and , meaning HistAvg-SubseqLS reduces to HistAvg. Parameter is determined individually for each account and hence not reported here.

Table 2 shows the test results. HistAvg-SubseqLS almost always performs the best, and is otherwise a close second. Figure 8 plots the average absolute difference between the actual and predicted account balance across time.

Balances in paycheck accounts tend to have pronounced semimonthly patterns starting with a sharp increase at payday followed by a decrease to approximately pre-payday levels. These cyclic patterns sometimes perpetuate through historical data and are shared across users, which make sequence-matching approaches suitable. SubseqLS benefited from regularization in this small dataset because not all top matches were close matches. KNN, in contrast, does not make this distinction and had average prediction error at least 50 times higher than all other methods. As seen from Figure 7(a), SubseqLS maintained almost consistent error across the entire 31-day prediction period, while other methods had higher errors predicting further ahead. Due to the regression formulation, SubseqLS focuses on modeling the overall trend of the query vector instead of next-day prediction. Weights shift some of that focus to short-term predictions, but it is difficult to tune exactly. As in Figure 7(a), HistAvg had the best short-term predictions since its next-day prediction is designed to be close to the current day observation unless it has knowledge of an impending recurring charge. The switching parameter let HistAvg-SubseqLS use HistAvg for the first 3 days before switching to SubseqLS, thereby attaining the lowest errors in both short- and long-term.

Non-paycheck accounts are mostly used for savings and occasional purchases. Common transactions include spare change saved through banks’ “keep the change” programs. The lack of prominent structures in the balance sequences resulted in poor matches found for sequence-matching approaches such as NearestNeighbor, KNN and SubseqLS. As in Figure 7(b), HistAvg performed the best by making conservative predictions using a basic daily spending and recurring transactions. The switching parameter correctly picked to use HistAvg throughout the prediction period, such that HistAvg-SubseqLS shared the same good performance as HistAvg.

Methods MAE Error in amount
HistAvg 11.417 7.460
SubseqLS 7.005 5.774
HistAvg-SubseqLS 6.790 5.099
Prophet 9.534 7.508
ARMA 7.941 6.983
NearestNeighbor 19.126 11.851
KNN 1105.719 88.193
(a) Paycheck accounts
Methods MAE Error in amount
HistAvg 6.876 6.863
SubseqLS 18.474 7.226
HistAvg-SubseqLS 6.876 6.863
Prophet 8.794 11.944
ARMA 6.565 6.981
NearestNeighbor 15.439 6.832
KNN 7212.774 370.220
(b) Non-paycheck accounts
Table 2. WageGoal Dataset: Error measures for account balance prediction averaged across 25 one-month test samples.
(a) Paycheck accounts
(b) Non-paycheck accounts
Figure 8. WageGoal Dataset: Average account balance prediction error over the prediction period across 25 test samples. KNN is excluded due to large magniture of error.

5.2.2. Prediction with PKDD’99 Dataset

Balances exhibit cyclic patterns as in Figure 9, just like those in the WageGoal paycheck accounts. We apply the same SubseqLS parameter , and determine by cross-validation. Since transaction descriptions and categories are not available, we cannot extract any recurring transaction for HistAvg, SubseqLS, and Prophet. We also do not implement the hybrid HistAvg-SubseqLS, since HistAvg, being reliant on good estimates of recurring transactions, is heavily handicapped in this setup and is not expected to benefit the hybrid approach.

(a) Example balance sequences
(b) Balance standard deviations
Figure 9. PKDD’99 Dataset: Properties of accounts

In each iteration of the experiment, we randomly select 52 out of the total 2263 accounts to perform prediction. We present test results across eight iterations in Table 3. Figure 10 plots the average absolute difference between the actual and predicted account balance across time. Solid lines are the mean across all iterations, and the shaded regions are the to percentile.

Results mirror those in Section 5.2.1 to conclude that SubseqLS performs the best in long-term predictions. The quality of SubseqLS predictions is consistent across the different scenarios and dataset sizes as presented by the two datasets. Furthermore, in this larger dataset, SubseqLS achieved low errors in short-term predictions as well. We note that larger dataset sizes benefit sequence-matching methods in general since more close matches can be found across time and users. For instance, KNN’s performance for the PKDD’99 Dataset is also much higher than that for the WageGoal Dataset.

Methods Mean of MAE Standard deviation of MAE
HistAvg 12.245 0.578
SubseqLS 7.017 0.277
Prophet 7.794 0.311
ARMA 7.967 0.455
NearestNeighbor 9.243 0.674
KNN 8.012 0.751
Table 3. PKDD’99 Dataset: Error measures for account balance prediction averaged across 25 length-31 test samples.
Figure 10. PKDD’99 Dataset: Average account balance prediction error over the prediction period across 25 test samples.

5.3. Extraction of recurring transactions and unexpected large expenses

From the test set of the WageGoal Dataset, 25 dates are randomly picked. For each date, we extract recurring transactions prior to the date, and predict the dates for their next occurrence. A transaction is correctly extracted if the prediction is within 5 days of the true date. We evaluate the quality of the extraction procedure by:

  • Average number of true recurring transactions extracted per user;

  • Precision, i.e., proportion of recurring transactions extracted that is true;

  • Average error in days for the predicted dates at which recurring transactions next occur.

We compare the performance of our proposed procedure with an extraction method utilizing transaction descriptions and category labels. The competing method flags a transaction as recurring if the description contains the word ‘recurring’ or if the category label contains the following keywords: ‘bill pay’, ‘payroll’, ‘service - insurance’ and ‘service - subscription’. These rules are manually formulated based on close inspection of the dataset.

As seen in Table 4, the proposed method identified a larger number of true recurring transactions and with higher precision. Despite extracting more than double the number of true recurring transactions, the proposed method was only half a day worse on average in predicting the next transaction date. We combine the recurring transactions found by both methods above and use them to obtain a list of unexpected large expenses. Some examples of their approximate costs are shown in Table 5. We provided only a single value for each cost, but given observations of the expense from more users, a range or empirical distribution would be appropriate.

Method # extracted Precision
Error in
Proposed 4.633 0.647 1.465
Using labels 2.161 0.311 1.043
Table 4. WageGoal Dataset: Performance in extraction of recurring transactions across 25 test samples.
Description Cost ($)
House cleaning service 350
Car repair 500
Student loan 5000
Roofing 8000
Table 5. WageGoal Dataset: Examples of unexpected large expenses.

6. Significance and Impact

Our system will upgrade and replace existing models in WageGoal, and therefore have a direct and near-immediate impact on the mostly low-income individuals currently connected to the product. The enhancements will help users manage their volatile cash flow, capitalize on opportunities for debt reduction and savings, obtain greater overall financial health and stay out of poverty. Strategic partnerships will help Neighborhood Trust further penetrate relevant markets in the coming years, eventually reaching many tens of thousands of clients nation-wide through its technology platforms.

Robust tracking systems are in place to measure the expected outcomes. Results will be shared as appropriate via Neighborhood Trust’s network of financial empowerment providers and other interested stakeholders, thereby increasing the potential visibility and scale of this promising approach.

7. Conclusion

We proposed a system of data mining techniques to predict and analyze spending behaviors in a small data scenario. Future work includes improving the predictive models by incorporating additional individual- and group-level information, providing early and enhanced visibility for users into their financial health, and automatically generating personalized recommendations for improving financial stability. Depending on the feedback from the deployment, we will also consider other improvements as necessary.


  • (1)
  • sta (2017) accessed Jan 28, 2017. StatsModels: Statistics in Python. (accessed Jan 28, 2017).
  • dif (2017) accessed Oct 9, 2017. difflib: Helpers for computing deltas. (accessed Oct 9, 2017).
  • Alvarez et al. (2011) F. Martinez Alvarez, A. Troncoso, J. C. Riquelme, and J. S. Aguilar Ruiz. 2011. Energy Time Series Forecasting Based on Pattern Sequence Similarity. IEEE TKDE 23, 8 (Aug 2011), 1230–1243.
  • Barr (2004) Michael S. Barr. 2004. Banking the Poor: Policies to Bring Low-Income Americans Into the Financial Mainstream. In Research Brief. The Brookings Institution.
  • Berka and Sochorova (2018) Petr Berka and Marta Sochorova. 1999 (accessed Jan 28, 2018). PKDD’99 Discovery Challenge - Guide to the Financial Data Set.
  • Birant (2011) Derya Birant. 2011. Data Mining Using RFM Analysis. In Knowledge-Oriented Applications in Data Mining, Kimito Funatsu (Ed.). InTech, Chapter 6, 91–208.
  • Brown et al. (2015) Sarah Brown, Pulak Ghosh, Li Su, and Karl Taylor. 2015. Modelling household finances: A Bayesian approach to a multivariate two-part model. Journal of Empirical Finance 33, C (2015), 190–207.
  • Chang and Tsai (2011) Hui-Chu Chang and Hsiao-Ping Tsai. 2011. Group RFM analysis as a novel framework to discover better customer consumption behavior. Expert Systems with Applications 38, 12 (2011), 14499 – 14513.
  • Elfeky et al. (2005) Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. 2005. Periodicity Detection in Time Series Databases. IEEE TKDE 17, 7 (July 2005), 875–887.
  • Fusaro and Shaefer (2016) Vincent Fusaro and H. Shaefer. 2016. How should we define “low-wage” work? An analysis using the Current Population Survey. Monthly Labor Review, U.S. Bureau of Labor Statistics (10 2016).
  • Glynn et al. (2006) Earl F. Glynn, Jie Chen, and Arcady R. Mushegian. 2006. Detecting Periodic Patterns in Unevenly Spaced Gene Expression Time Series Using Lomb–Scargle Periodograms. Bioinformatics 22, 3 (Feb. 2006), 310–316.
  • Gonzaga Baca Ruiz et al. (2016) Luis Gonzaga Baca Ruiz, Manuel Cuéllar, Miguel Calvo-Flores, and Maria del Carmen Pegalajar Jiménez. 2016. An Application of Non-Linear Autoregressive Neural Networks to Predict Energy Consumption in Public Buildings. Energies 9 (08 2016), 684.
  • Insider (2017b) Business Insider. 2016 (accessed Oct 4, 2017)b. Here’s what the typical one-bedroom apartment costs in 50 US cities.
  • Insider (2017a) Business Insider. 2017 (accessed Oct 4, 2017)a. The 5 best apps to help you manage your money.
  • Khajvand et al. (2011) Mahboubeh Khajvand, Kiyana Zolfaghar, Sarah Ashoori, and Somayeh Alizadeh. 2011. Estimating customer lifetime value based on RFM analysis of customer purchase behavior: Case study. Procedia Computer Science 3 (2011), 57 – 63. World Conference on Information Technology.
  • L. Scott and R. Varian (2014) Steven L. Scott and Hal R. Varian. 2014. Predicting the Present with Bayesian Structural Time Series. International Journal of Mathematical Modelling and Numerical Optimisation 5 (01 2014), 4 – 23.
  • Min and Agresti (2002) Yongyi Min and Alan Agresti. 2002. Modeling Nonnegative Data with Clumping at Zero: A Survey. Journal of the Iranian Statistical Society 1 (2002).
  • Mullahy (1998) John Mullahy. 1998. Much ado about two: reconsidering retransformation and the two-part model in health econometrics. Journal of Health Economics 17, 3 (1998).
  • Mullainathan and Shafir (2010) Sendhil Mullainathan and Eldar Shafir. 2010. Savings Policy and Decisionmaking in Low-Income Households. In National Poverty Center Policy Briefs. University of Michigan, Chapter 24.
  • Neelon et al. (2011) Brian Neelon, A. James O’Malley, and Sharon-Lise T. Normand. 2011. A Bayesian Two-Part Latent Class Model for Longitudinal Medical Expenditure Data: Assessing the Impact of Mental Health and Substance Abuse Parity. Biometrics 67, 1 (2011), 280–289.
  • Neighborhood Trust Financial Partners (2017) Neighborhood Trust Financial Partners. 2016 (accessed Sep 16, 2017). Neighborhood Trust Financial Partners And FlexWage Solutions Announce Partnership to Develop WageGoal.
  • Pew Charitable Trusts (2016) Pew Charitable Trusts. 2016. Consumers Need Protection From Excessive Overdraft Costs. A brief from The Pew Charitable Trusts (12 2016).
  • Plaid (2018) Plaid. 2018 (accessed Feb 7, 2018).
  • Rakthanmanon et al. (2012) Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. In 18th ACM SIGKDD. 262–270.
  • Shumway and Stoffer (2006) Robert Shumway and David Stoffer. 2006. Time Series Analysis and Its Applications - With R Examples (2 ed.). Springer, New York.
  • Taylor and Letham (2017) Sean J Taylor and Benjamin Letham. 2017. Forecasting at Scale. PeerJ Preprints (2017).
  • Williams and A. Hoel (2003) Billy Williams and Lester A. Hoel. 2003. Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results. Journal of Transportation Engineering 129 (11 2003), 664–672.
  • Zhang et al. (2013) Lun Zhang, Qiuchen Liu, Wenchen Yang, Nai Wei, and Decun Dong. 2013. An Improved K-nearest Neighbor Model for Short-term Traffic Flow Prediction. Procedia - Social and Behavioral Sciences 96, Supplement C (2013), 653 – 662.
  • Zhang (2003) Peter Zhang. 2003. Time Series Forecasting Using a Hybrid ARIMA and Neural Network Model. Neurocomputing. Neurocomputing 50 (01 2003), 159–175.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description