Proximity-Based Active Learning on Streaming Data: A Personalized Eating Moment Recognition

Proximity-Based Active Learning on Streaming Data: A Personalized Eating Moment Recognition


Detecting when eating occurs is an essential step toward automatic dietary monitoring, medication adherence assessment, and diet-related health interventions. Wearable technologies play a central role in designing unubtrusive diet monitoring solutions by leveraging machine learning algorithms that work on time-series sensor data to detect eating moments. While much research has been done on developing activity recognition and eating moment detection algorithms, the performance of the detection algorithms drops substantially when the model trained with one user is utilized by a new user. To facilitate development of personalized models, we propose PALS1, Proximity-based Active Learning on Streaming data, a novel proximity-based model for recognizing eating gestures with the goal of significantly decreasing the need for labeled data with new users. Particularly, we propose an optimization problem to perform active learning under limited query budget by leveraging unlabeled data. Our extensive analysis on data collected in both controlled and uncontrolled settings indicates that the F-score of PLAS ranges from 22% to 39% for a budget that varies from 10 to 60 query. Furthermore, compared to the state-of-the-art approaches, off-line PALS, on average, achieves to 40% higher recall and 12% higher f-score in detecting eating gestures.

Machine learning, mobile health, eating detection, active learning, optimization, wearable computing.

I Introduction

Eating habits are highly correlated with human health and wellbeing [9]. It is not only what people eat that contributes to their health but is also when and how often the eating events occur [15]. An automatic health monitoring system can help with monitoring eating habits. These systems can also accommodate users with special health conditions such as diabetes [10], those at need to take their medication at certain times during the day such as after or in between a meal, or assist users who need to follow a special dietary plan [3]. Detecting when eating happens is a key challenge in automatic health monitoring.

Most current approaches for eating moment recognition require multiple on-body sensors or specialized devices [1, 17, 2], which make these solutions impractical for everyday living scenarios. The aim of this research is to design a machine learning model that uses easy-to-wear and prevalent devices such as smartwatches for eating moment detection.

However, we recognize that different people perform the same activity differently as a result of which relying on a model trained by collected data of one or few subjects will not provide desired accuracy when used with new subjects. A major challenge with customization of the machine learning algorithms is that retraining the model needs large amounts of labeled training data. Collecting enough labeled data is a time consuming, labor-intensive, and expensive process. Considering this fact that user’s pattern in performing activities are different in real-life scenarios compared to in-lab settings, the problem becomes even more challenging. A potential approach to collect ground truth labels in real-life scenarios is to continuously record user’s activities using a body-worn cameras However, deploying cameras in uncontrolled settings impose serious privacy concerns. Therefore, it is critical to develop strategies that allow for collecting ground truth labels outside laboratory settings.

Active learning is potentially a feasible approach to query sensor data for ground truth labels in end-user settings. Such an approach will allow us to query a small subset of sensor data based on an informativeness measurement [18] an yet achieve an acceptable accuracy level. However, in mobile health and streaming data situations, the sensors are sampled in real-time and a decision needs to be made instantaneously about querying or skipping a data segment needs to be made. This is an area of research that has remained unexplored by the community. To address the problem of activity learning with streaming sensor data, we propose PALS as a proximity-based active learning approach for eating moment recognition. To the best of our knowledge, PALS is the first attempt to develop a practical approach for eating moment detection using an active learning framework for human-in-the-loop learning on streaming sensor data.

Ii Related Work

Our work in this article spans two areas of research including (1) diet monitoring; and (2) active learning. In this section, we discuss the state-of-the-art research in each area.

Ii-a Diet Monitoring

The pervasive nature of new technologies such as smartwatches and light-weight wearable devices with embedded inertial sensors (e.g., accelerometer and gyroscope) has resulted in development of eating moment detection algorithms. One example of such researches is the work by Thomaz et al. in which authors introduced an eating episodes detection approach using accelerometer data of an off-the-shelf smartwatch [20]. In another study, Tauhidur et al. presented BodyBeat, a custom-built microphone designed to detect non-speech body sounds such as food intake by capturing skin vibration [17]. In another study, bedri et al. explored the use of inertial, optical, and acoustic sensing modalities for eating moment detection [1]. Furthermore, Dong et al. utilized watch-like configuration of sensors to track hand motion [8]. Proposing the idea that eating episodes tend to be preceded and succeeded by the events of vigorous hand movements, authors used signal energy for classification of the activities [8]. In another study, Yatani et al. presented BodyScope, a wearable acoustic-sensor-based system that uses neck-worn sensor data for diet monitoring [22]. Cheng et al. also explored the use of a neckband to recognize different eating activities [6].

Ii-B Active Learning

A challenging task in diet monitoring is to collect sufficient amounts of labeled data in uncontrolled environments for algorithm training. One approach to collect labeled data is to use active learning technologies that query the user to label sensor data in real-time. In general, active learning has shown promising results in achieving a higher accuracy level using less labeled instances [18]. Active learning has been studied in two major scenarios including pool-based and stream-based cases [18]. In the pool-based scenario [12], a big pool of unlabeled examples are given and an oracle can provide truth label for instances in this pool. A major challenge in stream-based active learning is that the learner does not have access to the future instances. Therefore, the learner needs to decide about the informativeness of the instances in real-time and in absence of forthcoming data. Therefore, in the stream-based scenario [7], upon receiving a new instance, the learner decides whether to query for truth label and update the classifier or ignore the current instance.

While a fixed uncertainty sampling method has been used in the past to label instances within a batch of data from the data stream [25], Žliobaitė et al. designed a dynamic allocation strategy of labeling with a randomized search space without considering batches [27]. In addition to utilizing an evolving model [19], ensemble classifiers could be used to decide about the informativeness of instances [25, 21, 26] by training a number of classifiers on different portions of data stream. While many of these approaches have been proposed to address a concept drift in highly dynamic environments such as Twitter, our approach considers the personalization of the model for its current user in real-time running on a resource limited device such as a smartphone or smartwatch.

Nonetheless, the utility of active learning in diet monitoring with wearable sensors in general, and in scenarios with streaming data in particular, has not been investigated to date. We introduce a proximity-based active learning approach to improve the performance of the model with less labeled data while leveraging unlabeled data for model training. Inspired by graph-based semi-supervised learning research [24, 23], our approach utilizes unlabeled data to improve the quality of the model.

Iii Problem Statement

Let denote a large set of collected sensor data. An observation made by a wearable sensor at time can be represented as a -dimensional feature vector, = {, , , }. Each feature is computed from a given time window and a marginal probability distribution over all possible feature values. The activity recognition task is composed of a label space ={, , , } consisting of the set of labels for activities of interest, and a conditional probability distribution which is the probability of assigning a label given an observed instance . Subsequently, the final predicted label for observation is defined as


Although, given the growing ubiquity of Internet-of-Things (IoT) sensors, collecting a large pool of unlabeled sensor data is attainable, labeling such a huge amount of data using human supervision is time-consuming, burdensome, and expensive. Therefore, it is important to devise an efficient approach for selecting informative instances taking into account the constraint of limited budget to query an expert for ground truth labels. Furthermore, because the sensors are sampled continuously as the user performs various daily activities, the active learning algorithm needs to select sensor data for query in real-time. The reason for such a constraint is that expecting the user/expert to provide true labels for activities that occurred in the past is subject to human memory and bias errors. Therefore, it is desirable to decide if a query needs to be issued for the currently occurring activity. In this section, we formally define active learning as an optimization problem.

Iii-a Limited Budget Training

To approach the problem of active learning given both budget and real-time decision making constraints, we first relax the second constraint by assuming that a human expert can label a pool of sensor data collected in the past by either remembering the activities or watching a video recording of the activities. This allows us to develop a basic pool-based active learning algorithm that selects most informative instances from a large pool of the collected sensor data. In the next step, we show how the pool-based algorithm can be modified for realizing real-time active learning scenarios where a decision about querying the expert is made instantaneously. In the following, we formulate each of the problems and present our solution to solve those problems. Problem 1 formally defines the limited budget active learning problem.

Problem 1 (Limited Budget Training (LBT))

Assume an active learning algorithm splits the instances in into two disjoint subsets and where the instances in are used to query the oracle to obtain their true labels and those in remain unlabeled. The Limited Budget Training (LBT) problem is to efficiently construct the small subset and train a classifier such that the error of classifying instances in is minimized and the size of is bounded by a given query budget of .

The LBT problem described in Problem 1 can be formulated as follows.


The objective function in (2) aims to minimize the amount of misclassification error given the budget constraint in (3). The constraints in (4) and (5) are based on the definition where and are considered a perfect partitioning of set .

As described in Problem 1, due to limited budget constraint, designing an efficient method to cherry pick instances to feed the training process is essential. Here, Definition 1 formally defines the instance selector function.

Definition 1 (Instance Selector)

An instance selector is a function such that


where refers to the instances selected for query. Considering that the active learning algorithm uses the instance selector , the Problem 1 could be re-formulate as an Integer Linear Programming problem as follows.


The objective function in (7) aims to minimize the amount of misclassification error on unknown instances while (8) states the budget constraint.

A major limitation of the LBT problem described above is that it assumes a perfect memory retention for the oracle. That is, the oracle is able to remember the past events reliably. In reality, however, mobile health technologies monitor end users continuously and the user may not remember past events. Therefore, is it more realistic to design an active learning approach for streaming sensor data. In the following, we reformulate Problem 1 taking into account that the oracle provides labels for current activity. Problem 2 formally defines the problem of training with limited budget on a stream of data.

Problem 2 (Limited Budget Training on Data Stream (LBTS))

Let =[, , , , , ] be a sequence of sensor instances that are being produced during time frame = {, , }. An active learning algorithm on stream splits the instances in into two disjoint subsequences and where the instances in are used in order to query the oracle to obtain their true label and update the model as they become available in real-time while remain unlabeled. The Limited Budget Training on Stream(LBTS) is to efficiently decide whether to query the true label for the instance at time and update the classifier as it becomes available in real-time such that the error of classifying instances in is minimized.

Using Linear Programming framework in (7), the problem of limited budget training on data stream could be formulated as follows.


where is the classification function at time . The objective function in (9) aims to minimize the amount of misclassification error given the budget constraint in (10).

Iv PALS Framework Design

PALS framework focuses on two characteristics of everyday living situations: (1) the ubiquity of data and the ability of obtaining huge amounts of unlabeled data with mobile devices and wearable sensors; and (2) realistic assumption that the user/expert has a limited capability or interest in providing ground truth labels for the massive amounts of data that are being collected in continuous health monitoring applications. Therefore, the general goal of the PALS framework is to leverage the unlabeled data to construct an efficient model while choosing a small subset of instances of the unlabeled data to query the user/expert for label/annotation. In the following, we described our approach for leveraging unlabeled data through a proximity graph model and selecting informative data instances in preparation to query the expert.

Iv-a Proximity-Based Modeling

Inspired by graph-based semi-supervised learning research, we propose to construct a proximity-based model to quantify similarity among data instances. The intuition behind a proximity-based modeling and label inference is smoothness assumption. The smoothness assumption suggests that the instances that are close in the feature space should have similar labels [24]. The process of constructing a proximity-based model includes two phases. The first phase aims to build a proximity graph using both labeled and unlabeled data. Leveraging unlabeled data could potentially improve the model. As suggested by prior research [24], in absence of sufficient labeled data, using both labeled and unlabeled data can lead to a more accurate decision boundary for the learned model. The second phase is label inference, which focuses on generating labels for unlabeled instances through an iterative label propagation method.

Definition 2 (Proximity Graph)

A proximity graph is a weighted graph where each node in represents an instance in . Each node in the graph maintains a vector of its own feature values and the probability distribution of its labels. An edge represents the amount of similarity between instances and .

We denote the similarity between and by and compute its value by their euclidean distance:


To avoid the confusion of far away instances, we build similarity graph using -NN schema which is one of the most popular approaches in similarity graph construction [13]. Therefore, we measure edge weights in the similarity graph using the following equation:


where is the set of -nearest-neighbors of instance based on the defined similarity function.

In practice, we will show that using the -NN schema improves the performance of the trained model in detecting eating moments.

Iv-B Instance Selector

To maximize the labeling accuracy while taking into account the constraint in (3), we need an effective instance selector function to select the most informative instances from to add to the training data used to learn a final model. To quantify informativeness of the instances, in this article, we use an entropy-based method, which generate a score for a given instance based on Information Gain () from that instance. Recall that entropy indicates certainty of the model in classifying an instance. An entropy of zero means pure certainty with one of the classes receiving a probability of one. Therefore, low values of entropy suggests that the model is confident about how to classify the input instance. The instance selector sorts the instances by their information gain and selects the instance with highest information gain to add to the labeled pool .

Iv-C Off-line PALS

As described previously in Section III, in the off-line version of PALS, we assume that a pool of unlabeled sensor instances are available to the oracle. The oracle is then able to label any of instances and to assign the correct activity label upon request. In this off-line approach, we assume that the provided label is correct. This assumption is based on the fact that either the oracle’s memory is perfect that they can remember the past events or there is a video recording of the activities that the oracle can navigate to find the correct label for a queried activity.

Fig. 1: Overall architecture of PALS for off-line active learning.

Fig 1 shows the overall architecture of our off-line proximity-based active learning approach. Initially, among all of the recorded activities there is no or a small set of labeled instances along with a large pool of unlabeled instances . Our algorithm constructs a proximity-based graph on the entire dataset using both and . Following the graph construction phase, the model aims to infer the actual label of the instances in in multiple iterations of the label propagation procedure. In the next step, the instance selector searches through the unlabeled instances to find the most informative instance in , to date, to request for a label. The process concludes by adding the labeled instance to the model.

1:Input: labeled data , unlabeled pool , number of iterations , budget
2:Output: Proximity-based model
3:Initialize: ,
4:procedure Offline PALS
5:      construct proximity-based model on
6:     while  do
7:          inferred labels on using model
9:          instances with highest
10:          labels provided by oracle for
12:          update model with new instances in
Algorithm 1 Algorithm for Off-line PALS

As illustrated in Fig 1, the process continues iteratively by obtaining new labeled instances and adding them to the labeled set . The model is then updated and the process of label inference and instance selection are repeated. The algorithm finishes when all the allowed queries are exhausted (i.e. ). Algorithm 1 shows the off-line active learning approach in PALS.

Iv-D Real-time PALS

To realize real-time active learning on streaming data, we develop real-time PALS. Development of real-time PALS is motivated by the fact that both non-stop video recording of user’s activities in naturalistic settings and assuming perfect memory for the user to accurately remember all activities performed in a given time-frame in the past are unrealistic for activity recognition in free living situations. Therefore, to develop a personalized model in a real-life scenarios, we cannot sovely rely on pool-based active learning. Yet, we develop our real-time PALS algorithms based on the foundations established in our off-line PALS.

The main challenge in real-time active learning is to be able to make a decision about whether or not to query each sensor instance as it becomes available in real-time. In particular, because the model does not have access to future instances, it needs to determine whether the current instance is informative enough for which to request a label. Our general approach to make such a determination in real-time is to define a threshold on informativeness of a given instance. Such a threshold, if defined appropriately, will allow us to make real-time active learning decisions.

Definition 3 (Informativeness Threshold)

Let be the entire stream sorted in informativeness score given by . An informativeness threshold is a value such that where is the query budget.

Fig. 2: Overall architecture of PALS for real-time active learning on streaming sensor data.

As shown in Fig 2, real-time PALS assumes that the user can provide labels only for the current or very recent activities. In this approach, each instance is evaluated only once. As a result of this evaluation, the instance is either discarded from further analysis or used to query the oracle. If the system receives a label from the oracle, the next step is to update the model with the new instance in an effort to obtain a more personalized model. This is accomplished by adding the newly labeled instance to the labeled pool.

1:Input: current model , new instance , threshold , budget
3:procedure Real-time PALS
4:      make a prediction on using model
5:      calculate entropy of
6:     if  and  then
8:          query oracle to provide true label for
9:          update model with      
Algorithm 2 Algorithm for real-time PALS.

We need an algorithm to adjust the value of the informativeness threshold to balance labeling over the instance space. In order to obtain an effective performance, the algorithm needs to avoid both high and low values of . High values of will translate into a highly conservative approach where the a very small number of questions are asked. Therefore, the algorithm can fail in personalizing the model for the current user due to lack of sufficient input from the user. On the other hand, low values of will result in the algorithm exhausting the budget very quickly rather than generating queries that are distributed in time. Therefore, we need an adaptive algorithm to adjust the value of to create a balance between prompting time and query budget.

Iv-E Adaptive Threshold Setting

An adaptive algorithm for adjusting needs to address concerns of when and how to update to achieve an effective performance. Our strategy is to update after receiving a new instance to a value that ensures a uniform distribution of queries over a given time interval. Suppose denotes the number of instances over a given time interval. Also assume that we have seen instances so far. To uniformly distribute queries over the time interval, we need to adjust taking into account the fact that percentage of the budget has been already exhausted. Here we describe how can be adjusted for a stream of data to ensure a uniform distribution of queries over a given time interval of .

Let denote a given time interval over which the active learning process is expected to execute. Also, let represent the data stream generated up to time and be sorted in non-decreasing order by informativeness score given by . Furthermore, let be . An informativeness threshold at time is denoted by is a value such that . The threshold value aims to ensure a uniform distribution of queries over the time interval . This process for obtaining an adaptive is shown in Algorithm 3.

1:Input: current model , new instance , time , time interval , Entropy of instances up to current time , budget
3:procedure Adaptive
4:      use model to make predictions about
5:      calculate entropy of
Algorithm 3 Algorithm for adaptive adjustment of informativeness threshold,

V Validation Approach

Our goal is to evaluate the performance of PALS using data collected from real subjects performing different activities in both semi-controlled lab settings and free-living environments. In this section, we discuss the datasets, data pre-processing, and performance metrics used for validation of our active learning algorithms.

V-a Data Collection

We designed an experiment to collect wearable motion sensor data during eating sessions. The data collection took place between February to April 2017.Institutional Review Board (IRB) approval was obtained prior to data collection. Overall, the dataset contains sessions of eating data collected with four participants. Each data collection session took about 20 minutes and participants were continuously video recorded. In each session, the participant was asked to eat a meal while performing other related activities such as drinking, talking, working with laptop, and texting. In data collection sessions, participants were asked to wear a Samsung smartwatch on their dominant hand. The smartwach used in our experiment was equipped with a 3D accelerometer and a 3D gyroscope sensors. We developed an android application to sample inertial sensors at Hz. About % of the obtained dataset includes eating-related activities. The recorded data for each participant along with labels are available2 for public use. In the rest of this paper, we refer to this dataset as SW6S referring to smartwatch dataset with 6 axes of inertial sensor data collected in semi-controlled settings.

V-B Publicly Available Dataset

To assess the generalization of our approach, we also evaluate the performance of PALS on two publicly available datasets [20]. Both datasets contain 3D accelerometer data collected from a wirst-band worn on the dominent hand. The first dataset was collected in a semi-controlled lab setting with participants performing different activities including eating, watching a movie trailer, chatting, taking a walk, placing a phone call, brushing teeth, and combing hair. In the rest of this paper, we refer to this dataset as SW3S indicating the smartwatch dataset with 3-axis sensor data collected in semi-controlled settings. The second dataset was collected in free-living settings with seven participants. The participants in this study wore the wrist-band for an average of hours and minutes while performing various daily activities such as taking, commuting, reading, walking, working with a computer, and eating. This dataset includes approximately 6.7% of eating activity [20]. In the rest of this paper, we refer to this dataset as SW3U indicating smartwatch data with 3-axis sensor data in uncontrolled settings.

V-C Data Processing Pipeline

In this section we explain the data processing pipeline and challenges associated with utilizing the sensor data for algorithm development in the context of our active learning research.

Our data processing pipeline consists of four phases including pre-processing, segmentation, feature extraction, and feature selection. In the pre-processing phase, we pass the raw signal through a low-pass filter to reduce the instrumental noise that generates high-frequency components in the signal.

The next phase is segmentation which is intended to identify ‘start’ and ‘end’ points of the activity being examined for classification. During the segmentation phase, we use a sliding window with % overlap to split the continuous signal into segments. The window size is an important parameter because it needs to be long enough to capture an entire food intake gesture. According to previous research, a window length of 6 seconds with % overlap is a proper segmentation strategy for eating recognition task [20].

The next step in the pipeline is feature extraction where we extract features from each signal segment for each axis of sensor data(e.g., 45 features for SW3S and SW3U and 90 features for SW6S). Potentially, there are many different features that can be extracted from human activity signals. However, as shown in Table I, our extracted features can capture both morphology and statistical attributes of the the signals. For example, while features such as median and mean capture intensity of the signal, variance and zero crossing intend to capture morphology of the signal.

To maximize generalizability of the model and reduce the risk of overfitting, we need to control the complexity of the hypothesis. To this end, we perform feature selection to identify the best set of features. Particularly, we use feature selection method. Similar to statistics domain where test is used to test independence of two events, we use this test in our feature selection process to determine whether a specific feature and occurrence of a particular activity are independent. This feature selection approach eliminates irrelevant features from the file feature set.

Feature Name Description
median Median value
mean Mean value
max Maximum value
min Minimum value
p2p Peak-to-peak amplitude
skew Skewness of signal segment
kurtosis Kurtosis of signal segment
variance Variance of signal segment
peaks count Number of peaks
mean peaks amplitude Mean of peaks amplitude
max peaks amplitude Maximum peak amplitude
mean peaks distance Mean of peaks distance
min peaks distance Minimum of peaks distance
std peaks distance Standard deviation of peaks distance
zero crossings Number of zero crossings
TABLE I: Features extracted from each signal segment.

V-D Learning from Skewed Data

In real daily life settings, the duration of activities are not equal among all daily activities. This leads to an unbalanced dataset where the number of instances varies across different classes. Particularly, in the publicly available dataset under this study, a small portion of data points corresponds to the eating event while the majority of activities are non-eating. If trained on this skewed distribution, the classifier may learn to predict all the activities as non-eating and achieve a high accuracy level because a majority of the instances are non-eating. To handle the skewed nature of the dataset during training, we use an up-sampling technique. Specifically, in each iteration of active learning, after choosing the most informative instances, we up-sample the minority class among those selected instances by synthesizing new samples and adding a balanced set of instances to the labeled pool. For generating the synthesized instances, our offline PALS uses Synthetic Minority Over-sampling Technique (SMOTE) [4]. Particularly, for each example of minority class, SMOTE introduces synthetic examples along the line segments of the minority class nearest neighbors.

V-E Performance Metrics

As discussed previously, because of the skewed nature of the dataset, a naive classifier tends to classify all instances as the majority class (i.e., non-eating), which is usually a less important class and achieves a high accuracy. On the other hand, we cannot use the up-sampling technique for the test dataset because the test data should be a representation of the real data and needs to remain unmodified. Therefore, to avoid the disadvantage of reporting accuracy of a naive classifier, we need to consider different performance metrics than the accuracy to effectively evaluate the performance of the classifier [11]. For this specific problem, we aim at detecting ‘eating moment’ as the event of interest. Since the event of interest (i.e., class=‘1’) is the activity with minority of instances, traditional classifiers tend to have a poor recall by ignoring these important instances and predict almost everything as non-eating (i.e., class=‘0’). Therefore, the binary Recall, defined below, is an important metric in evaluation of the trained models.


However, we note that relying on Recall alone is not enough for comparing the performance of the learned models. In particular, one can train a poor classifier by only optimizing the Recall value by predicting all instances as ‘eating moment’. Therefore, while optimizing the Recall, we should ensure that Precision of the model also remains acceptable. Precision of the model is defined as follows.


In this paper, we compute the f-score value, which is a metric to measure the quality of the model based on the balance between Precision and Recall. The f-score value is traditionally defined as


Vi Results

This section presents experimental results for offline PALS on both SW6S dataset (described in Section V-A) and SW3S dataset (described in Section V-B) as well as for real-time PALS on SW3U (described in Section V-B).

Vi-a Algorithm Design Choices

Before evaluating the performance of our algorithms, we discuss algorithm design choices and trade-offs. Particularly, because we use a proximity graph to construct the classification model, here we discuss our approach for similarity assessment (i.e., similarity kernel). We also discuss the effectiveness of using entropy of classification as a metric for selecting instances to query during active learning.

Similarity Kernel

As described in Section IV-A, we use -NN schema for constructing our similarity graph. Since the choice of similarity kernel affects the performance of the model, in this section, we examine our choice of kernel. Particularly, we compare our -NN schema with the Radial Basis Function (RBF) (i.e., Gaussian) kernel [14]. RBF kernels are popular in the field of semi-supervised learning and the similarity function () is defined by:


where is a free parameter that determines the width of the Gaussian kernel.

We conducted an experiment comparing the performance of the PALS while using any of these two widely used similarity kernels.

As Fig 3 and Fig 4 illustrate, the -NN kernel outperforms the RBF kernel in our application and achieves a higher f-score in detecting eating moments in all learning iterations. In particular, on SW3S dataset, models with both kernels start with a binary f-score of around . However, the model with -NN kernels reaches an f-score of more than only after iterations while the model with RBF kernel does not improve with more iterations. On the SW6S dataset, the model trained by -NN kernel significantly outperforms the model trained using RBF kernel, as it starts with a f-score and reaches to f-score after only 10 iterations. However, the best f-score achieved by the model with RBF kernel is only on the SW6S dataset.

Fig. 3: Performance of RBF kernel vs. -NN kernel on SW3S dataset.
Fig. 4: Performance of RBF kernel vs. -NN kernel on SW6S dataset.

Evaluation of Instance Selection

One of the important settings in PALS implementation is that we use the entropy of classification on unlabeled instances to quantify the informativeness of an instance. Here, we show the effectiveness of our instance selection approach. We conducted an experiment comparing our instance selection approach with an approach that chooses the instances uniformly. As Fig 5 shows, using entropy as the criteria for choosing instances results in the classifier achieving a higher f-score. On the other hand, by sampling instances from a uniform distribution, the f-score does not improve beyond few iterations. One explanation for this observation is the skewed nature of the dataset, which using uniform distribution may result in sampling more non-eating instances to be used for our active learning approach.

Fig. 5: Entropy-based vs. uniform sampling on SW3S dataset (top) and SW6S dataset (bottom).

Vi-B Performance of Offline PALS

In this section, we present comparison of our algorithm with prior research in the area of eating moment detection as well as state-of-the-art machine learning methods. Prior to presenting the results, we describe our comparative evaluation approach.

Research in the area of eating gesture detection using wrist-worn inertial sensor is new. To the best of our knowledge, the most recent successful approach presented by E. Thomaz and et. al. [20] which tackles the similar problem of food intake gesture recognition as this paper. The classifer built in [20] uses Random Forest algorithms with the following settings. They used Scikit-learn Python package[16] implementation of Random Forest with number of trees in the forest set to 185. For the rest of this paper we call their approach RFA.

Additionally, we compare our algorithm to a classifer built using XGBoost learning algorithm[5]. XGBoost has recently been dominating the field of applied machine learning and used to win the Kaggle 3 competitions in recent years. Furthermore, XGBoost was used in all top-10 winning teams in KDDCup 2015[5]. XGBoost is an optimized and distributed implementation of Gradient Boosting. It provides a parallel tree boosting method to effectively solve machine learning problems in the industrial scale. For this experiment, we used the open source implementation of XGBoost.

To conduct the offline comparison, we suppose that each algorithm have access to the % of in-lab data as its training set and we use the remaining % of the data as test set to validate the performance of the algorithm. As shown in Table II, offline PALS outperforms both other approaches in correctly classifying eating moments. Specifically, offline PALs can achieve to 41% and 48% f-score when running on SW3S and SW6S datasets, respectively, which is a good improvement over XGBoost and RFA. Also, low recall for eating class refers to the classifier having a high bias in classifying all instances as not-eating. This again emphasizes the importance of selecting appropriate metrics while working with skewed datasets. As presented in Table II, offline PALS achieves a % and % recall when running on SW3S and SW6S datasets, respectively. These numbers demonstrate significant improvements over RFA and XGBoost classifier.

recall binary f-score
Offline PALS 0.62 0.41
SW3S dataset XGBOOST 0.25 0.35
RFA 0.22 0.34
Offline PALS 0.64 0.48
SW6S dataset XGBOOST 0.32 0.40
RFA 0.10 0.18
TABLE II: Performance of offline PALS vs. other approaches.

Vi-C Performance of Real-Time PALS

To the best of our knowledge, there is no prior algorithm for real-time training of eating-moment recognition in real-life settings. Therefore, for the purpose of evaluation, we conducted two experiments highlighting the effect of query budget and decision threshold estimation on the performance of the real-time PALS algorithm.

Query Budget

In this experiment, we assess the effect of query budget, , on the performance of real-time PALS approach in classifying eating moments. To this end, we examined twelve different values of query budget per hour for different subjects in in-the-wild setting on SW3U dataset. Fig 6 shows the f-score value averaged over all the subjects at the end of training cycles.

As Fig 6 illustrates, increasing the value of the budget helps in obtaining a more personalized classifier for each participant and leads to a higher performance measure. In particular, the average f-score starts at around with queries per hour and reaches a value of % when the budget has increased to 60 queries per hour.

With very limited query budget to query the user in real-time, the model cannot adapt itself from the lab-setting to real-world setting and the f-score of eating moment is less than %. In other words, only relying on the model trained on in-lab collected data, the model tends to detect all activities as non-eating. One reason behind this is that the distribution of eating vs. non-eating activities is very different from lab setting to real-world setting. Also, it means that in real-world setting and without any constraints, people tend to perform eating activity very different than how they are doing in the lab settings. This result again highlights the importance of designing the adaptive models for real-world settings.

Furthermore, only considering a small query budget, the model gets a significant gain. Particularly, increasing the query budget to queries, the average f-score of detecting eating-moments increases by around %. We see the constant improvement of model performance by increasing the query budget. Particularly, on average over all subjects, the model reaches the f-score of % when the maximum query budget set to the queries. While still improving, the rate of performance improvement decreases for query budgets more than and the model achieves the f-score of % by having the query budget of .

There is always a trade-off between the query budget and user convenience. While by increasing the query budget, we increase the performance of the model, we may also increase the risk of user inconvenience.

Fig. 6: Performance of the learned model in terms of f-score as a function of query budget on SW3U dataset.

Comparison of Thresholding Methods

As described in Section IV-E, decision threshold is used by the classifier to determine if the current instance of activity is valuable enough to query the user and obtain its label. To verify the effectiveness of our approach in updating the decision threshold, , we designed two different methods for governing the value of decision threshold. In the first method, the value of the is learned from the in-lab training data which is derived based on the ratio of budget to the size of the dataset. Since, this value extracted from the in-lab data and remains unchanged during the real-time training, we refer to the decision threshold obtained in this approach as static . The second method uses the knowledge of best possible value for the threshold in a time interval to select the most informative instances based on the entropy of the classification decision. This experiment provides an experimental upper-bound for the adaptive lambda because it has unlimited access to the future data and can extract the most accurate value of that the adaptive lambda algorithm attempts to estimate. We refer to the decision threshold obtained by this approach as best .

In this experiment, we compared the performance of the eating moment detection models trained on real-time data of SW3U dataset using best , adaptive , and static . The x-axis refers to different subjects and the y-axis shows the binary f-score value for classifying eating class instances. The query budget is set to queries per hour for this experiment. As Fig 7 shows, adaptive lambda algorithm achieves performance values close to the best while using a static value for performs poorly across different subjects. Specifically, adaptive on average can achieve to 7% less f-score compared to best and 12% better f-score compared to static . Also, to evaluate the extreme cases, adaptive achieves to 13% less f-score compared to best for subject number 5 while it works better for other subjects. Furthermore, adaptive works, in worst case, slightly better than static with 1.6% better f-score for subject number 6 while it outperforms static for other subjects specifically subject 7 with 28% higher f-score. To summarize the results of this experiment, best , adaptive , and static on average can provide 47%, 39%, and 28% average f-score for all subjects of SW3U dataset.

Fig. 7: Comparison of best , adaptive , and static approaches for decision threshold in term of f-score on SW3U dataset.

Vii Discussion and Future Work

We developed a novel proximity-based approach for recognizing eating gestures with the goal of significantly decreasing the need for obtaining labeled data from the users. One challenge of the proposed approach is that it assumes that the pattern of the minority class (i.e., eating) remains unchanged over time. Therefore, if the user’s activity pattern changes due to changes in life style or user being interested in a new type of food, the method may not easily detect the new eating patterns. On the other hand, continuous learning with the same rate might not be feasible since the user needs a stable model after the training phase. Facilitating a trade-off between exploitation of the learned model and exploration of new activity patterns is an interesting future research direction. A potential approach to address this problem is to examine how one can leverage reinforcement learning paradigm to handle the exploration/exploitation trade-off.

In this study, we used an off-the-shelf smartwatch to detect ‘eating moment’ activities. Our future research also involves studying the utility of other wearable and non-wearable sensory devices for eating moment detection through active learning.

While the focus of this study was on eating moment detection, we expect that the methodologies developed in this project can be applied to a broader class of activity recognition applications. In the future, we plan to study the effectiveness of PALS in devising personalized activity recognition algorithms. This way, one can integrate diet monitoring capabilities with the ability to recognized daily activities and develop a smart-health coach.


This work was supported in part by the United States National Science Foundation, under grant CNS-1932346. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations.

Viii Conclusion

Most current approaches to detect eating moment require multiple on-body sensors or specialized devices such as neck-collars for swallow detection that are impractical for everyday usage. The goal of this research was to design a practical solution for eating moment detection. We used an off-the-shelf smartwatch that records inertial sensor data to design a non-intrusive detection system with the machine learning algorithm personalized for the end-user.

Because people perform the same activity in different manners, relying on a model that is trained on in-lab data collected of different subjects leads to a significant performance drop. In this paper, we proposed PALS (Proximity-Based Active Learning on Streaming Data), a novel proximity-based model for recognizing eating gestures. We showed that PALS significantly decreases the need for labeled data with new users leveraging active learning under limited query budget while utilizing unlabeled data. Our extensive analysis on data collected from real-subjects showed that compared to the state-of-the-art approaches, PALS,on average, achieves to 40% higher recall and 12% higher f-score in detecting eating events. Furthermore, we showed the effectiveness of our adaptive thresholding method and how online PALS algorithm could be adapted in the real-world settings with only limited query budget.


  1. Software code for PALS is available online at
  3. ’’


  1. A. Bedri, R. Li, M. Haynes, R. P. Kosaraju, I. Grover, T. Prioleau, M. Y. Beh, M. Goel, T. Starner and G. Abowd (2017) EarBit: using wearable sensors to detect eating episodes in unconstrained environments. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1 (3), pp. 37. Cited by: §I, §II-A.
  2. A. Bedri, A. Verlekar, E. Thomaz, V. Avva and T. Starner (2015) Detecting mastication: a wearable approach. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 247–250. Cited by: §I.
  3. S. Chatterjee and A. Price (2009) Healthy living with persuasive technologies: framework, issues, and challenges. Journal of the American Medical Informatics Association 16 (2), pp. 171–178. Cited by: §I.
  4. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer (2002) SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16, pp. 321–357. Cited by: §V-D.
  5. T. Chen and C. Guestrin (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794. Cited by: §VI-B.
  6. J. Cheng, B. Zhou, K. Kunze, C. C. Rheinländer, S. Wille, N. Wehn, J. Weppner and P. Lukowicz (2013) Activity recognition and nutrition monitoring in every day situations with a textile capacitive neckband. In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pp. 155–158. Cited by: §II-A.
  7. D. Cohn, L. Atlas and R. Ladner (1994) Improving generalization with active learning. Machine learning 15 (2), pp. 201–221. Cited by: §II-B.
  8. Y. Dong, J. Scisco, M. Wilson, E. Muth and A. Hoover (2014) Detecting periods of eating during free-living by tracking wrist motion. IEEE journal of biomedical and health informatics 18 (4), pp. 1253–1260. Cited by: §II-A.
  9. K. M. Flegal, M. D. Carroll, C. L. Ogden and C. L. Johnson (2002) Prevalence and trends in obesity among us adults, 1999-2000. Jama 288 (14), pp. 1723–1727. Cited by: §I.
  10. A. Helal, D. J. Cook and M. Schmalz (2009) Smart home-based health platform for behavioral monitoring and alteration of diabetes patients. Journal of diabetes science and technology 3 (1), pp. 141–148. Cited by: §I.
  11. L. A. Jeni, J. F. Cohn and F. De La Torre (2013) Facing imbalanced data–recommendations for the use of performance metrics. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251. Cited by: §V-E.
  12. D. D. Lewis and W. A. Gale (1994) A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 3–12. Cited by: §II-B.
  13. M. Maier, M. Hein and U. Von Luxburg (2007) Cluster identification in nearest-neighbor graphs. In Algorithmic Learning Theory, pp. 196–210. Cited by: §IV-A.
  14. N. M. Nasrabadi (2007) Pattern recognition and machine learning. Journal of electronic imaging 16 (4), pp. 049901. Cited by: §VI-A1.
  15. T. A. Nicklas, T. Baranowski, K. W. Cullen and G. Berenson (2001) Eating patterns, dietary quality and obesity. Journal of the American College of Nutrition 20 (6), pp. 599–608. Cited by: §I.
  16. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss and V. Dubourg (2011) Scikit-learn: machine learning in python. Journal of machine learning research 12 (Oct), pp. 2825–2830. Cited by: §VI-B.
  17. T. Rahman, A. T. Adams, M. Zhang, E. Cherry, B. Zhou, H. Peng and T. Choudhury (2014) BodyBeat: a mobile system for sensing non-speech body sounds.. In MobiSys, Vol. 14, pp. 2–13. Cited by: §I, §II-A.
  18. B. Settles (2012) Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6 (1), pp. 1–114. Cited by: §I, §II-B.
  19. J. Smailović, M. Grčar, N. Lavrač and M. Žnidaršič (2014) Stream-based active learning for sentiment analysis in the financial domain. Information sciences 285, pp. 181–203. Cited by: §II-B.
  20. E. Thomaz, I. Essa and G. D. Abowd (2015) A practical approach for recognizing eating moments with wrist-mounted inertial sensing. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 1029–1040. Cited by: §II-A, §V-B, §V-C, §VI-B.
  21. P. Wang, P. Zhang and L. Guo (2012) Mining multi-label data streams using ensemble-based active learning. In Proceedings of the 2012 SIAM international conference on data mining, pp. 1131–1140. Cited by: §II-B.
  22. K. Yatani and K. N. Truong (2012) BodyScope: a wearable acoustic sensor for activity recognition. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 341–350. Cited by: §II-A.
  23. X. Zhu, Z. Ghahramani and J. D. Lafferty (2003) Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pp. 912–919. Cited by: §II-B.
  24. X. Zhu (2006) Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison 2 (3), pp. 4. Cited by: §II-B, §IV-A.
  25. X. Zhu, P. Zhang, X. Lin and Y. Shi (2007) Active learning from data streams. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pp. 757–762. Cited by: §II-B.
  26. X. Zhu, P. Zhang, X. Lin and Y. Shi (2010) Active learning from stream data using optimal weight classifier ensemble. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40 (6), pp. 1607–1621. Cited by: §II-B.
  27. I. Žliobaitė, A. Bifet, B. Pfahringer and G. Holmes (2011) Active learning with evolving streaming data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 597–612. Cited by: §II-B.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description