An Accelerometer Based Calculator for Visually Impaired People Using Mobile Devices

An Accelerometer Based Calculator for Visually Impaired People Using Mobile Devices

Abstract

Recent trend of touch-screen devices produces an accessibility barrier for visually impaired people. On the other hand, these devices come with sensors such as accelerometer. This calls for new approaches to human computer interface (HCI). In this study, our aim is to find an alternative approach to classify 20 different hand gestures captured by iPhone 3GS’s built-in accelerometer and make high accuracy on user-independent classifications using Dynamic Time Warping (DTW) with dynamic warping window sizes. 20 gestures with 1,100 gesture data are collected from 15 normal-visioned people. This data set is used for training. Experiment-1 based on this data set produced an accuracy rate of 96.7 %. In order for visually impaired people to use the system, a gesture recognition based “talking” calculator is implemented. In Experiment-2, 4 visually impaired end-users used the calculator and obtained 95.5 % accuracy rate among 17 gestures with 720 gesture data totally. Contributions of the techniques to the end result is also investigated. Dynamic warping window size is found to be the most effective one. The data and the code is available.

Accessibility, Visually impaired, Gesture recognition, Accelerometer, Dynamic Time Warping (DTW), Mobile phones, Ubiquitous computing

I Introduction

Within the popularity of new devices such as accelerometer based game controllers or touch-screen smartphones, the need of new human computer interfaces emerged. This is especially true in the area of accessibility although some mobile devices with just touch-screens come with features such as text-to-speech, speech-to-text, magnifier for handicapped people. Several research works on accelerometer based gesture recognition systems and on the usage of accelerometer based devices in medical area pioneered new interfaces for accessibility. For example, the Nintendo Wii controller is used for patients recovering from strokes, broken bones, surgery and even combat injuries with some specific games [8]; [13]; [24].

There are limited research works on these mentioned interfaces for visually impaired people on mobile devices. Text editing on a touch-screen device is one of the major issue. Clearly, text-to-speech, speech-to-text systems would be ultimate solutions. Handwriting on the screen is an other one. On the other hand, accelerometer based systems are also a potential candidate at least for some domains. This work aims to present a solution to this problem for the limited domain of arithmetic calculations.

Several methods have been suggested with different approaches for an accelerometer based gesture recognition system, which are mostly used Hidden Markov Models (HMM) [18]; [17]; [10]. Some of them are applied on mobile devices, for example; Pylvanainen proposed a gesture recognition system based on continuous HMM [21]. Prekopcsak uses HMMs and Support Vector Machines (SVM) to classify gestures captured by built-in accelerometer of a mobile phone, namely Sony-Ericsson W910i [20]. In addition, Klingmann uses HMMs with iPhone built-in accelerometer [12]. As an alternative solution to HMM, Wu et al. proposes an acceleration-based gesture recognition approach using SVM with a Nintendo Wii controller [26]. Besides using HMM or any probabilistic approaches, some researches represents Dynamic Time Warping (DTW) with template adaptation. For example; uWave includes quantization of accelerometer readings, DTW and template adaptation using a mobile device [16]; [15]. Leong et al. uses DTW with a Nintendo Wii controller [14]. Akl and Valaee use DTW as well as affinity propagation with a Nintendo Wii controller [5]; [4].

In this work, a reliable, fast and simple gesture recognition model and its implementation as a new interface is developed. The model is based on the technique originally proposed in Ratanamahatana and Keogh’s work [22]; [9]. As a proof of concept, a simple “talking” calculator is implemented. Among the main contributions of this work are a new interface to write text by capturing accelerometer data of hand gestures for touch-screen smartphones with built-in accelerometer and a detailed analysis of the contributions of techniques that are used. The proposed system is capable of classifying 20 different gestures with high reliability. The system has been tested by visually impaired end-users with the implemented application on iPhone 3GS.

Ii Method

An accelerometer based gesture recognition system is proposed which consists of three parts, namely, data collection, training and classification. Hand gesture data from participants are collected in data collection part using iPhone. Then all captured data are processed and a classifier is trained and validated in training part using a desktop computer. Finally, the trained classifier is tested by visually impaired participants in classification part via iPhone.

ii.1 Data Collection

Gesture Set

Figure 1: Gesture Set with 20 gestures. The gesture ID is given in the lower left white box of the gesture. Gestures 1-4 are in 1D, 5-17 in 2D, and 18-20 in in different plane than 5-17. The interpretation of the gestures in the calculator are given in the lower right grey box. Gestures 1 and 4 correspond to digit 1. All the other symbols have exactly one corresponding gesture. Gestures 10, 2, 8, 7, 11 and 3 are mapped to and “delete-the-last-entry”, respectively. Gestures 18, 19 are 20 are not used in the calculator.

20 gestures, given in Fig. 1, are designed. There are two design criteria. (i) The gestures should be intuitive so that they can be remembered easily. (ii) While doing a gesture, no visual clues should be necessary so that visually impaired could do it. Since a calculator is in mind, gestures corresponding to digits and arithmetic operations are necessary. Gestures very similar to shape of digits are used. Gestures for , are also similar to their shapes in mathematics. Gesture for “delete-the-last-entry” reminds erasing. On the other hand, gestures for , , and are not that intuitive.

Note that the gestures are in different dimensions. Gestures 1-4 are in 1D, only. Gestures 5-17 are in 2D. The remaining 3 gestures, 18-20, are in a plain different then that of 5-17.

Device and Data Representation

iPhone 3GS is used as the device for data collection. It’s built-in accelerometer measures the proper acceleration which is the sum of accelerations due to gravitation and the gesture motion. The unit of measurements is in terms of where is the gravitational acceleration due to the Earth. It has a range of and a sensitivity of approximately If the phone is laying on its back on a horizontal surface, acceleration values (in 3D) will be approximately the following values: , all in .

The accelerometer, which is configured to capture data at 60 Hz, produces four time series: three for each axes, namely, , and one for the time [25]. A sequence of acceleration vectors sampled at discrete times is represented as

where is an 3D column vector at time step . Note that is a 3D signal. 1D signals , , , are called channels.

Mobile Applications

One iPhone application with multi views is developed. The data acquisition view is used to collect acceleration data while user does gestures. User is asked to make the gesture while phone is facing her. She presses a finger on the screen to start data collection. Data is kept collected as long as the finger is on the screen. It stops when the finger is removed from the screen.

The talking calculator view is a simple “talking” integer calculator with 4-operation which is used for testing our approach by visually impaired users who needs audio feedback. An 4-operation calculator requires 16 different symbols (10 for digits, 4 for operations, one for “=” and one for “delete-the-last-entry”). Based on familiarity to the symbols, 17 gestures from Fig. 1 are selected for the calculator. Note that digit 1 has tow corresponding gestures, namely 1 and 4. Gestures 18, 19 are 20 are not used in the calculator. Text-to-speech library “Flite” [1] and its wrapper by Sam Foster is used to “speak” of the gesture that is entered [7]. The code is available at [3]

Users and Data Acquisition

The gesture data set is collected from 15 users. Users are undergraduate students, mainly freshmen and sophomores, of our department. It is necessary to point out that since they are Computer Engineering students, they may be more familiar to this then an average person.

We want user to be in their every day environment. There were no particular time and place for data collection. We asked students to participate during the break between courses. There were no problem about usage of the application or the gesture set that is reported.

We show a user how we place our finger on the screen and do the gesture. This done once and no further training is given. Then we give the phone to the user and she makes the gestures in Fig. 1.

Each user is asked to do 20 gestures multiple times, so on the average 55 gesture data are collected for each gesture. This makes in total 1,100 gesture data, out of which 10 recordings are found to be faulty. These 10 recordings were too short to process. Hence 1,090 gesture data is used in this study. The data is available at [3]

ii.2 Training

After data set generation at data collection, training processes are taken place. Training is computational intensive task, which is done on computer. Once system learns to recognize the gestures, then the trained system is moved to mobile device.

Overall system is given in Fig. 2 as block diagrams, which will be considered later in Sec. III. Flow-1 produces the gesture templates. In Flow-1, the training raw data which contains 1,090 gesture data, is processed by validation, low-pass filtering, mean and variance adjustment, down sampling, template generation operations.

In Flow-2 in Fig. 2, the training raw data is processed by means of validation, low-pass filter, down sampling, warping window size generation, threshold values generation. Warping windows and threshold values of corresponding gesture representatives are obtained as a result of Flow-2.

Figure 2: The Flow-1 produces templates , lower and upper bounds and , respectively. Then, the Flow-2 generates the warping window sizes , threshold values and . The Flow-3 represents the classification flow which uses the values generated in the first two flows.

ii.3 Classification

The gesture done by user needs to be classified on a mobile device, in our case iPhone. In Flow-3 in Fig. 2 is the classification which passes through the following processes: validation, low-pass filter, mean and variance adjustment, down sampling, template matching, threshold control. Then the system gives the output as classification result. The gesture is classified using template matching algorithm and the closest valid gesture representative is given as classification result.

Iii Processing Blocks

The 3D raw gesture data collected from user is passed through a number of processing blocks schematically given in Fig. 2.

iii.1 Validation

Clearly, every user has her own paste of doing a gesture. Some does the gesture fast, some does it slow. Similarly, some user does the same gesture in a small scale, some in a large scale. That effects the duration of gesture data. We discard gesture data that is too short () or too long () in duration. We use and .

Second validation is related to the amplitude. It is expected that the amplitude of the signal changes as user draws the gestures. We restrict the average amplitude in a acceptable range. Since our gesture data is in 3D, the average amplitude of a gesture is defined as where is the magnitude of . Data sets with average amplitude too small () or too big () are also discarded. We use and .

Out of 1,090 data sets, 24 due to duration and 4 due to amplitude, in total 28 are discarded and we end up with 1,062 gesture data for 20 gesture classes.

iii.2 Filtering

The high frequency components in each channel are removed by means of a low-pass filter given as where and are the input and the output signals of the filter, respectively, and the smoothing factor taking to be .

iii.3 Adjustment of Mean and Variance

Every gesture is different, hence it has different characteristics. After filtering, we adjust the mean and variance of data for each gesture individually so that each gesture has its own average and variance.

Since our gesture data is in 3D, we apply mean and variance adjustments to every channel individually. We obtain zero-average channel signal by subtracting the average of the channel. Then we get the mean adjusted channel by adding the average of the channel over all the signals of gesture .

For variance adjustment, we obtain the variance of the channel. Then we get the average of all the variances of the channel over all the signals of gesture . Finally each gesture data for are adjusted so that each channels share the same mean and the variance of the gesture.

iii.4 Down Sampling

So far each gesture data has different duration. We down sample each gesture data in such a way that they have the same durations of . We use , which is the acceptable minimum duration.

If the mean and variance adjusted gesture data has data points, we need to use downsampling factor of . That is, we represent every consecutive data points with one data point. We obtain the downsampled data using the following averaging

iii.5 Templates

For each gesture , we want to generate a template so that a given gesture data is classified to class if is closest to with respect to a distance metric. We simple consider the average of all the gesture data of the gesture as its template.

DTW is used as the distance metric. In order to speed up, the LBK technique is employed which requires lower and upper bounds for each gesture class . Template generation also produces and in two steps: (i) The lower bound and upper bound of gesture data in the gesture set of is calculated for each channel individually as given in [11] using LBK parameter . (ii) Then, the bounds for gesture is obtained by averaging lower and upper bounds of each gesture data obtained in step (i).

iii.6 Warping Window Size

For each gesture class , a specific sequence of warping window sizes is generated where is the window size at time . Warping window size generation is based on Ratanamahatana and Keogh’s work [11]. The warping window size minimizes the quality metric of [22]. That is,

at each step .

iii.7 Threshold Values

Distance of gesture data for gesture to the template is given as . We want to control the distance to the template by the minimum and maximum of these distances are given as

and

respectively, where is the set of all gesture data for , and is a safety constant taken to be .

iii.8 DTW Template Matching

Gesture is classified to gesture class if , , is the smallest for all . That is,

This calls for repeated evaluation of , for each . The evaluation is speeded up by means of LBK technique using and generated in the template generation.

iii.9 Threshold Control

Threshold values generated previously for given gesture class is used for classification result validation. If , then is the valid classification result. Otherwise, is discarded and classification result is invalid.

Iv Experiments and Results

There are two experiments in this study.

Figure 3: Recognition accuracy rate for each gesture class obtained in Experiment-1.

iv.1 Experiments-1

Experiment-1 is the system validation test. The data collected from normal users for template generation is used in Experiment-1. In Experiment-1, system is trained with training data set. Then it is validated with collected data using Flow-2 given in Fig. 2. There are 1,090 gesture data in validation set. The average classification accuracy is 96.7 %. In addition, the recognition accuracy for each gesture class is given in Fig. 3.

iv.2 Experiments-2

Table 1: All 40 calculations used in Experiment-2.

Once templates are generated using training data from normal users, performance of the method for the actual target users is investigated. Experiment-2 is the end-user test. Visually impaired users use the calculator to perform some calculations.

For Experiment-2, a test set of 40 calculations, given in Table 1, is designed. In order to do all the calculations user has to enter 180 characters in total. Note that each row of the table contains 10 digits, 4 operators and 4 “=” symbols. Therefore each symbol, except “=”, has to be entered 10 times during a test.

Experiment-2 is performed by 4 visually impaired participants, who did not attend in gesture data collection. A demo video of the application usage by a visually impaired participant is available on the web [2]

We want to investigate not only one time usage but also adaptation of users to the system. The test lasts for an adaptation period of 7 days. In the first day, each participant is trained 10 minutes about the system, the gesture movements and their meanings, the phone holding position and voice feedback. Then, each participant did the 40 calculations once a day for 7 days. The average daily recognition accuracy is given in Fig. 4. The average recognition accuracy increases day by day, to reach 95.5 % in the seventh day.

Figure 4: Daily average recognition accuracy rate obtained from Experiment-2.

V Discussion

v.1 Comparisons

There are several research works related to proposed method. In means of user-independent result; uWave gives 75.4 % for 8 gestures [15], Leong et al. founds 72 % for 10 gestures [14], Akl and Valaee give 90 % for 18 gestures (among not-included users) [5]. Note that, users in Experiment-2 did not attend in data collection part and they are visually impaired. Based on the user-independent classification accuracy rates, the number of gestures, and end-user experiment; our proposed method seems to be one of the best among previous works. There are a number of possible reasons for high accuracy: (i) We ask users to keep the phone facing them as much as possible while they are doing gesture. This may reduce noise. (ii) User stars and stops the data collection by pressing a finger to screen. So we get nothing but the gesture data. (iii) the subjects are Computer Engineering students that are more suitable to such tests than general audience.

Figure 5: Effects of some data processing blocks on classification accuracy. If the processing block is “off” the corresponding box is white, if it is “on”, the box is gray.

v.2 Contribution of the Blocks

We investigate the contribution of the processing blocks of Sec. III in various combinations. One expects that each technique used has different contributions to the classification accuracy. The techniques are grouped into 5 data processing blocks. The blocks are; B1 filtering, B2 mean adjustment, B3 variance adjustment, B4 threshold control, and B5 using warping window size which includes template generation. Note that, template matching and warping window size generation are considered as one block. On the other hand, mean and variance adjustment are considered as two blocks separately.

Fig. 5 provides the accuracy obtained by all 32 possible combinations of these 5 blocks using the data of Experiment-1. The combinations are ordered in the accuracy that they get.

One expect that adding a new block increases the accuracy but that is not the case. The pattern is quite complex. In some combinations, adding a new block degrades the performance. The very same block improves the performance if it is added to some other combination. Block B3 “variance adjustment” is one of them. Out of 16 possible configurations of other four blocks, adding B3 increases the performance in only 5 of them. In the remaining 11, it decreases it. See configuration pairs 2-6, 18-22 for performance increase, and pairs 4-0, 13-9 for degrading.

It is noted that the effect of using block B5 “warping window” is dramatic. It is the primary reason of the step jump from the first 16 combinations on the left including 11, to the remaining 16 combinations on the right starting with 20 in Fig. 5. Interestingly, just by itself, it produces close to 95 % accuracy observed at combination 16.

Block B1 “filtering” also has a big impact, too. In the first 16 combinations without B5, the highest 6 (from 1 through 11) includes B1. In the second 16 combinations that have B5, and the highest 8 combinations (from 17 through 31) uses B1. Without any other blocks, only B1 “filtering” and B5 “warping window” together, i.e., combination 17, manage to obtain almost 95 % accuracy .

v.3 Parameters

A number of parameters are used in the proposed system. They are generally decided empirically. Firstly, we decide to use iPhone built-in accelerometer at . Then we assume that a user makes a gesture movement between 0.5 to 3.5 second, which is equal 30 to 210 sample points at the given . We check the minimum and maximum lengths of our dataset and determine the value for and . If we hold iPhone in a stable position, it’s built-in accelerometer measures as amplitude. A gesture is an accelerating movement that the start and stop values are known as . In addition, built-in accelerometer has a range of at each axis, which is same as to in terms of amplitude. After considering these conditions and an additional 5 % for threshold value; we assumed that a gesture data has average amplitude between to . We use minimum gesture length as down sampling sample size , which is related to minimum number of samples . Finally, in threshold value generation we use as a safety constant as .

v.4 Future work

Lastly, one needs to consider points of improvement. (i) The main limitation is that the system is not working as an instant continuous recognition system. User triggers the start and the stop of the gesture. Making our proposed system to an continuous-gesture recognition system, which segments the data instantly and recognize the gesture, may be a good goal for future works. (ii) Our gesture set is selected among previous works and added some new ones. This calls for research on design of gesture set which would improve the accuracy rate as well as usability of the system by the target end-users. (iii) In order to improve accuracy some additional sensor, such as gyroscope, can be added to the system. (iv) Our proposed system generates gesture templates directly from training set. The quality of training set directly effects the quality of the system. How to measure and improve the quality of training set is an open issue yet to be investigated. In our case the training set is collected from people with normal vision but the end-user tests are done by visually impaired people. It would be interesting to see the performance of the system when the training set is also collected from the end-users which we could not do due to lack of access to enough number of visually impaired people.

Vi Conclusion

Recent trent of touch-screen devices produces a barrier for visually impaired people. This calls for new human-computer interfaces. An optimized accelerometer based gesture recognition system is introduced which hopefully contributes to the integration of the visually impaired to the society. The system is designed on a touch-screen smartphones with built-in accelerometer, namely, iPhone 3GS. The proposed method gives 96.7 % accuracy on training set using 20 gestures. As a proof of concept of the system, a gesture based simple calculator is implemented. End-user test done by 4 visually impaired people, who did not attend in data collection part, using the calculator with 17 gestures obtains 95.5 % accuracy. In summary, our proposed gesture recognition system provides an outstanding performance in terms of user-independent recognition accuracy, experimental results of end-users and variety of gesture set when compared to other systems in literature.

A number of processing blocks are used. Their contributions are investigated. Interestingly, one block outperforms all. That is, Warping Window Size has the largest impact to the end result. No other individual block or a combination of blocks approach to the effect of it.

Acknowledgements.
This work is partially supported by the Turkish State Planning Organization (DPT) the TAM Project, 2007K120610 and by Bogazici University Research Fund Project, BAP-2011-6017.

Vii Appendix

A more formal description is given as appendix.

Viii A1. Background

Any meaningful motion of hand is called a hand gesture. Data captured during a gesture is called gesture data. In this study, gesture data is captured by means of an accelerometer while user doing her hand gesture. Therefore, the gesture data is a sequence of an acceleration vectors in 3D.

viii.1 Notation

Index is exclusively used for the first, the second and the third dimensions of 3D. The component of 3D column vector is denoted by .

Acceleration vector is an 3D column vector where is used for definitions, and is the transpose operator. An acceleration vector sampled at discrete time is represented as . A sequence of acceleration vectors sampled at discrete times is represented as

and called a gesture data. Note that is a 3D sequence.

Let be the set of gestures. Index is exclusively used to represent a gesture. represents the set of all gesture data for gesture . Then, is the set of all gesture data.

Let be a gesture data of gesture . The average of is defined as

Note that is the average of in dimension . Similarly, the average of all the gesture data in is defined as

where is the number elements in . Note that both and are 3D vectors.

Let be a gesture data to be classified. The true class of is denoted by where as is the class where the classifier assigns to.

viii.2 Template Matching Classification

Consider -classes of 1-D sequences. Let the sequence , called template, be the representative of class . We use a template matching classifier which classifies sequence to the class which minimizes distance of to its template [6].

In this work dynamic time warping cost is used as the distance which calls for warping window  [22]. The quality of the classifier is improved by changing the warping window for each class rather than one for all. A final remark is needed before the dynamic time warping technique used in this study is elaborated. Dynamic time warping is defined for 1-D signals. We extend this to 3D by simple summation of distances in each dimension as given in Eq. 3.

viii.3 Dynamic Time Warping (DTW)

Dynamic Time Warping (DTW) is an algorithm for finding the optimal match between two given time series which may vary in time or speed with some certain criteria [23]; [19]. DTW is also used for measuring the similarity distance between two sequences after finding optimal match. Essentially DTW is in 1-D.

Matching Cost

Let and be two sequences of real numbers with length and , respectively. Note that and are time series in 1-D and of different lengths. Then the DTW distance of and is

where is the and entry of an matching cost matrix . The entries of are recursively defined as

(1)

for and where

with boundary conditions

Lower-Bound Keogh (LBK)

The time and space complexity of is where and are the lengths of and , respectively. In order to avoid unnecessary DTW calculation, a lower bounding technique, called Lower-Bound Keogh (LBK), proposed by Keogh [11]. In template matching classifier, we need to find the closest template to . The minimum distance is found by iteratively calculating the distances to the templates and getting the minimum. Suppose is the shortest distance so far. Then instead of calculating the distance to the template, calculate much faster lower bound . Since , there is no need to calculate as long as . Only for cases , is calculated. If , then is updated with the new lower value, that is, .

Warping Window

Given and , DTW produces a distance but this may not be an accepted match. It is possible that expanding or shrinking goes too far that corresponds to the cases where the matching path could be too far away from the diagonal [23]. This is a well studied problem. In order avoid matching paths going too away from the diagonal some restrictions in the form of warping window are introduced [23]; [22]. Let be an adjustment window. Then the points with away from diagonal should not be used, that is, Eq. 1 is revised as

Then the distance of to using warping window is denoted by .

Ratanamahatana-Keogh Band (RK-Band)

The shape of window needs to be decided according to application. One way to decide on is Ratanamaha–tana-Keogh Band (RK-Band) proposed in [22]. In RK-Band, is iteratively changed to optimize some criterion.

In our case the criterion is a metric of classification defined as follows. Let be a set of gestures. Elements are classified. The total distances from to the templates for correct and incorrect classification are defined as

and

respectively. The number of correct and incorrect classifications are

respectively. Then, use the quality metric of [22] defined as

(2)

Note that the value of increases with a wrong classification and decreases with a correct classification. More than that it is weighted with the distance.

Ix A2. Data Processing Blocks

Raw gesture data which is collected from users is passed in some operations. The operations are represented as processing blocks. Since the gesture data is in 3D, the 1-D techniques given in Section VIII need to be modified for 3D. For example, the DTW distance in 3D is defined to be the summations of the individual DTW distances in each dimension , that is,

(3)

ix.1 Validation

Clearly, every user has her own paste of doing a gesture. Some does the gesture fast, some does it slow. Similarly, some user does the same gesture in a small scale, some in a large scale. We discard gesture data that is too short or too long in duration, i.e., or . The average amplitude of a gesture is defined as where is the magnitude of . Data sets that are too small or too big in average amplitude, that is, or , are also discarded. Out of 1,090 data sets, 24 due to duration and 4 due to amplitude, in total 28 are discarded and we end up with 1,062 gesture data for 20 gesture classes.

ix.2 Low-pass Filter

The high frequency components are removed by means of a low-pass filter given as where and are the input and the output signals of the filter, respectively and is the smoothing factor taking to be . From now on, means the low-pass filtered version of raw gesture data .

ix.3 Adjustment of Mean and Variance

We adjust the mean and variance of all low-pass filtered gestures in the set go gesture data for each class individually. The gesture data with adjusted mean is obtained as

with . Let be the variance vector of where

Then the average variance of would be

Finally, transform all gesture data in to both mean and variance modified ones represented by where

ix.4 Down Sampling

So far each gesture data has different duration. We down sample each gesture data in such a way that they have the same durations of , which is the acceptable minimum duration as . Let be the mean and average adjusted gesture data with duration, . Then the down sampled gesture data is obtained by

for where is the downsampling factor.

ix.5 Templates

For each gesture , we want to generate a template so that a given gesture data is classified to class if is closest to with respect to a distance metric. The set of templates is denoted by

The template of class is obtained by averaging all the gesture data of the gesture as

Besides templates , template generation also produces lower and upper bounds for each gesture . During classification, DTW is used as the distance metric. In order to speed up, the LBK technique is employed which requires and of each gesture class . and are calculated in two steps: (i) The lower bound and upper bound of in the gesture set is calculated for each dimension individually as given in [11] using LBK parameter . (ii) Then, the lower bounds of the gesture set is obtained by averaging, that is:

where for . For upper bounds, are defined similarly. Note that and are all 3D vectors.

ix.6 Warping Window Size

For each gesture class , a specific sequence of warping window sizes is generated where is the window size at time . Warping window size generation is based on Ratanamahatana and Keogh’s work [11]. The warping window size minimizes the quality metric given in Eq. 2, that is,

at each step .

ix.7 Threshold Values

Consider the distances of to template . The minimum and maximum of these distances are given as

and

respectively, where is a safety constant taken to be .

ix.8 DTW Template Matching

Gesture is classified to gesture class if , , is the smallest for all . That is

This calls for repeated evaluation of , for each . The evaluation is speeded up by means of LBK technique using and generated in the template generation.

ix.9 Threshold Control

Threshold values generated previously for given gesture class is used for classification result validation. If , then is the valid classification result. Otherwise, is discarded and classification result is invalid.

References

  1. https://bitbucket.org/sfoster/iphone-tts/.
  2. https://vimeo.com/26196932.
  3. https://github.com/ereneld/accelerometerbasedcalculatorios, 2016.
  4. A. Akl, C. Feng, and S. Valaee. A novel accelerometer-based gesture recognition system. IEEE Transactions on Signal Processing, 59(12):6197–6205, 2011.
  5. A. Akl and S. Valaee. Accelerometer-Based Gesture Recognition Via Dynamic-Time Warping, Affinity Propagation, & Compressive Sensing. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pages 2270–2273. IEEE, 2010.
  6. E. Alpaydin. Introduction to machine learning. The MIT Press, 2 edition, 2010.
  7. A. Black. Flite: a small fast run-time synthesis engine. Workshop (ITRW) on Speech Synthesis, pages 4–9, 2001.
  8. J. Deutsch, M. Borbely, J. Filler, K. Huhn, and P. Guarrera-Bowlby. Use of a Low-Cost, Commercially Available Gaming Console (Wii) for Rehabilitation of an Adolescent With Cerebral Palsy. Physical Therapy, 88(10):1196–1207, 2008.
  9. D. Erenel. Accelerometer Based Calculator For Visually-Impaired People Using Mobile Devices. Master’s thesis, Bogazici University, 2011.
  10. J. Kela, P. Korpipää, J. Mäntyjärvi, S. Kallio, G. Savino, L. Jozzo, and D. Marca. Accelerometer-based gesture control for a design environment. Personal and Ubiquitous Computing, 10(5):285–299, Aug. 2006.
  11. E. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3):358–386, May 2004.
  12. M. Klingmann. Accelerometer-Based Gesture Recognition with the iPhone. Ms thesis, Goldsmiths University of London, 2009.
  13. L. Kratz and M. Smith. Wiizards: 3d gesture recognition for game play input. Future Play 2007, pages 209–212, 2007.
  14. T. Leong, J. Lai, J. Panza, P. Pong, and J. Hong. Wii Want to Write: An Accelerometer Based Gesture Recognition System. In International Conference on Recent and Emerging Advanced Technologies in Engineering, pages 4–7, 2009.
  15. J. Liu, Z. Wang, L. Zhong, J. Wickramasuriya, and V. Vasudevan. uWave: Accelerometer-based personalized gesture recognition and its applications. In 2009 IEEE International Conference on Pervasive Computing and Communications, pages 1–9. IEEE, Mar. 2009.
  16. J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan. uWave: Accelerometer-based personalized gesture recognition and its applications. Pervasive and Mobile Computing, 5:657–675, Mar. 2009.
  17. J. Mäntyjärvi, J. Kela, P. Korpipää, and S. Kallio. Enabling fast and effortless customisation in accelerometer based gesture interaction. In Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia - MUM ’04, pages 25–31, New York, New York, USA, Oct. 2004. ACM Press.
  18. V.-M. Mantyla, J. Mantyjarvi, T. Seppanen, and E. Tuulari. Hand gesture recognition of a mobile device user. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on, volume 1, pages 281–284. Ieee, 2000.
  19. C. Myers and L. Rabiner. A comparative study of several dynamic time-warping algorithms for connected-word recognition. The Bell System Technical Journal, 60(7):1389–1409, 1981.
  20. Z. Prekopcsák. Accelerometer based real-time gesture recognition. In Proceedings of the 12th International Student Conference on Electrical Engineering, Prague, Czech Republic. Citeseer, 2008.
  21. T. Pylvänäinen. Accelerometer based gesture recognition using continuous HMMs. Pattern Recognition and Image Analysis, 3522:413–430, 2005.
  22. C. Ratanamahatana and E. Keogh. Making time-series classification more accurate using learned constraints. In Proceedings of SIAM International Conference on Data Mining, pages 11–22. Lake Buena Vista, Florida, 2004.
  23. H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1):43–49, Feb. 1978.
  24. T. Schlömer, B. Poppinga, N. Henze, and S. Boll. Gesture recognition with a Wii controller. Proceedings of the 2nd international conference on Tangible and embedded interaction - TEI ’08, page 11, 2008.
  25. st. Lis302dl mems motion sensor, 2008.
  26. J. Wu, G. Pan, D. Zhang, G. Qi, and S. Li. Gesture Recognition with a 3-D Accelerometer. Ubiquitous Intelligence and Computing, pages 25–38, 2009.
100046
This is a comment super asjknd jkasnjk adsnkj
""
The feedback cannot be empty
Submit
Cancel
Comments 0
""
The feedback cannot be empty
   
Add comment
Cancel

You’re adding your first comment!
How to quickly get a good reply:
  • Offer a constructive comment on the author work.
  • Add helpful links to code implementation or project page.