|PIN Skimmer ||Camera, Mic||PINs||in-app|
|PIN Skimming ||Light||PINs||in-app|
|Keylogging by Mic ||Mic||Keyboard, PINs||in-app|
|ACCessory ||Acc||Keyboard, Area taps||in-app|
|Tapprints ||Acc, Gyr||Keyboard, Icon taps||in-app|
|Acc side channel ||Acc||PINs, Patterns||in-app|
|Motion side channel ||Acc, Gyr||Keyboard, PINs||in-app|
|TapLogger ||Acc, Ori||PINs||in-app|
|TouchSignatures||Motion, Ori||Touch actions, PINs||in-browser|
1.1 Access to mobile sensors within app
Sensor-rich mobile devices are becoming ubiquitous. Smart phones and tablets are increasingly equipped with a multitude of sensors such as GPS, gyroscope, compass, and accelerometer. Data provided by such sensors, combined with the growing computation capabilities of modern mobile devices enable richer, more personalised, and more usable apps on such devices. On the other hand, access to the sensor streams provides an app running in the background a side channel. Listening to mobile sensor data via a background process either for improving user security [13, 11, 6, 19, 12, 20, 7, 23] or attacking it [8, 24, 15, 18, 15, 14] has been always interesting for researchers.
Listening to the sensor data through a malicious background process may enable the app to compromise the user security. Here, we present Table 1 and briefly describe the existing in-app sensor-based password/PIN identifiers. Some of the existing works in Table 1 try to identify PINs and Passwords by using sensors such as light, camera and microphone [22, 21, 17]. In this paper, we are interested in the use of accelerometer and gyroscope sensors as a side channel to learn about users PINs and Passwords [15, 18, 5, 8, 24].
1.2 Access to mobile sensors within browser
W3C specifications discuss security and privacy issues for some mobile sensors such as GPS and light. For example, the working draft on ambient light events explicitly discuss security and privacy considerations as follows : “The event defined in this specification is only fired in the top-level browsing context to avoid the privacy risk of sharing the information defined in this specification with contexts unfamiliar to the user. For example, a mobile device will only fire the event on the active tab, and not on the background tabs or within iframes”. The geolocation API on the other hand, requires explicit user permission to grant access to the web app due to security and privacy considerations.
1.3 Access to mobile sensors within app vs. browser
The in-browser sensor data access that the W3C specification allows is heavily restricted in multiple ways. First, the access is restricted to only two types of streams: the device orientation which supplies the physical orientation of the device, and the device motion which represents the acceleration of the device. Motion data includes sequences from accelerometer, accelerometer-including-gravity, and rotation rate . The orientation sensor, on the other hand, derives its data by processing the raw sensor data from the accelerometer and the geomagnetic field sensor111http://developer.android.com/guide/topics/sensors/sensors_position.html#sensors-pos-orient.
More importantly, access is also restricted to low-rate streams which provide data with slower frequencies as compared to those provided in-app. Here, we present two tables (Tables 2 and 3) on sampling frequencies on different platforms and popular browsers.
|Freq. (Hz)||Freq. (Hz)|
|Nexus 5/Android 5.0.1||200||200|
|iPhone 5/iOS 8.2||100||100|
As it can bee seen in Table 2, iOS and Android limit the mentioned sensors’ maximum sampling rates to 100 Hz and 200 Hz, respectively. However, the hardware is capable to sample the sensor signals at much higher frequencies (up to thousands of Hz) . This reduction is to save power consumption. Moreover according to the results of our tests in Table 3, we found out that all currently available versions on different mobile browsers reduce the sampling rate even further - 3 to 5 times lower, regardless of the engine (Webkit, Blink, Gecko, etc.) that they use. Our observations on the sampling rates of different mobile browsers are mostly consistent with the results reported in .
|Freq. (Hz)||Freq. (Hz)|
In this work, we initiate the first study on the possibility of attacks compromising user security via web content, and demonstrate weaknesses in W3C standards, and also mobile OS and browser policies which leave the door open for such exploits. In particular, the main contributions of this work are as follows:
2 Examining mobile browsers
Table 4 shows the results of our tests as to whether each browser provides access to device motion and orientation sensor data in different conditions. The culumn(s) list the device, mobile OS (mOS), and browser combination under which the test has been carried out. In case of multiple versions of the same browser, as for Opera and Opera Mini, we list all of them as a family in one bundle since we found that they behave similarly in terms of granting access to the sensor data with which we are concerned. The “yes” indications under “active/same” show that all browsers provide access to the mentioned sensor data if the browser is active and the user is working on the same tab as the tab in which the code listening to the sensor data resides. This represents the situation in which there is perhaps a common understanding that the code should have access to the sensor data. In all other cases, as we discuss bellow, access to the sensor data provides a possible security leakage vector through which attacks can be mounted against user security. In the following we give more details on these results.
Browser-active iframe access
Through experiments, we found that all the browsers under test provided access to the sensor data streams in this case. The findings are listed in the column under “active/iframe” in Table 4 indicating such an access.
Browser-active different-tab access
We emphasise that none of the tested browsers (on Android or iOS) asked for any user permissions to access the sensor data when we installed them or while performing the experiments.
The above findings suggest possible attack vectors through which malicious web content may gather information about user activities and hence breach user security. In particular, browser-active iframe access enables active web content embedded in HTML frames, e.g. posing as an advertisement banner, to discretely record the sensor data and determine how the user is interacting with other segments of the host page. Browser-active different-tab access enables active web content that was opened previously and remains in an inactive tab, to eavesdrop the sensor data on how the user is interacting with the web content on other tabs. Browser-in-background and screen-locked access enable active web content that remains open in a minimised browser to eavesdrop the sensor data on how the user is interacting with other apps and on user’s actions while carrying the device.
To show the feasibility of our security attack, in the following sections, we will demonstrate that, with advanced machine learning techniques, we are able to distinguish the user’s touch actions and PINs with high accuracy when the user is working with a mobile phone.
Each user touch action, such as clicking, scrolling, and holding, and even tapping characters on the mobile soft keyboard, induces device orientation and motion traces that are potentially distinguishable from those of other touch actions. Identification of such touch actions may reveal a range of activities about user’s interaction with other webpages or apps, and ultimately their PINs. A user’s touch actions may reveal what type of web service the user is using as the patterns of user interaction are different for different web services: e.g., users tend to mostly scroll on a news web site, while they tend to mostly type on an email client. On known web pages, a user’s touch actions might reveal which part of the page the user is more interested in. Combined with identifying the position of the click on a page, which is possible through different signatures produced by clicking different parts of the screen, the user’s input characters could become identifiable. This in turn reveals what the user is typing on a page by leveraging the redundancy in human languages, or it may dramatically decrease the size of the search space to identify user passwords.
3.2 In-browser sensor data detail
The sensor data streams available as per the W3C specifications , i.e., device motion and orientation, as follows:
device orientation which provides the physical orientation of the device, expressed as three rotation angles: alpha, beta, and gamma, in the device’s local coordinate frame,
device acceleration which provides the physical acceleration of the device, expressed in Cartesian coordinates: x, y, and z, in the device’s local coordinate frame,
device acceleration-including-gravity which is similar to acceleration except that it includes gravity as well,
device rotation rate which provides the rotation rate of the device about the local coordinate frame, expressed as three rotation angles: alpha, beta, and gamma, and
interval which provides the constant rate with which motion-related sensor readings are provided, expressed in milliseconds.
The device’s local coordinate frame is defined with reference to the screen in its portrait orientation: x is horizontal in the plane of the screen from left of the screen towards right; y is vertical in the plane of the screen from the bottom of the screen towards up; and z is perpendicular to the plane of the screen from inside the screen towards outside. Alpha indicates device rotation around the z axis, beta around the x axis, and gamma around the y axis, all in degrees.
3.3 Application Implementation
3.4 Feature extraction
In this section, we discuss the features we extract to construct the feature vector which subsequently will be used as the input to the classifier. We consider both time domain and frequency domain features. The captured data include 12 sequences: acceleration, acceleration-including-gravity, orientation, and rotation rate, with three sequences for each sensor measurement. Before extracting features, to cancel out the effect of the initial position and orientation of the device, we subtract the initial value in each sequence from subsequent values in the sequence.
Time domain features
In the time domain, we consider both the raw captured sequences and the (first order) derivative of each sequence. The rationale is that each sequence and its derivative include complementary information on the touch action. To calculate the derivative, since we have low frequency sequences, we employ the basic method of subtracting each value from the value appearing immediately afterwards in the sequence. That is, if the sequence values are represented by , the derivative sequence is defined as .
For the device acceleration sequences, we furthermore consider the Euclidean distance between consecutive readings as a representation of the change in device acceleration. This is simply calculated as the following sequence:
This gives us a sequence which we call the device acceleration change sequence, or DAC sequence for short.
First we consider basic statistical features for all sequences, their derivative, and the DAC sequence. These features include maximum, minimum, and mean (average) of each sequence and its derivative, plus those of the DAC sequence. We also consider the total energy of each sequence and its derivative, plus that of the DAC sequence, calculated as the sum of the squared sequence values, i.e., . Here, in total we get 102 features for each sensor reading in the time domain. Later we will add a few more features to the input of the first phase (touch actions) in Section 4.3.
Frequency domain features
To distinguish between sequences with different frequency contents, we applied the Fast Fourier transform (FFT) of the sequences. We calculated the maximum, minimum, mean, and energy of the FFT of each sequence and consider them as our frequency domain features, i.e., a total of 48 frequency domain features.
3.5 Classification method
To decide which classification method to apply to our data, we implemented various classification algorithms to assess their efficiency. Our test classifiers included discriminant analysis, naive Bayes, classification tree, kNN, and ANN. Different classifiers work better in the different phases of TouchSignatures (touch actions and PINs). The chosen classifiers in each phase are presented in Sections 4.3 and 5.3. In both phases, we consider a generic approach and train our algorithms with the data collected from all users. Hence, our results are user-independent.
4 Phase 1: Identifying user touch actions
In this section we present the first phase of TouchSignatures that is able to distinguish user touch actions given access to the device orientation and motions sensor data provided by a mobile browser.
4.1 Touch actions set
We consider a set of 8 most common touch actions through which users interact with mobile devices. These actions include: click, scroll (up, down, right, left), zoom (in, out), and hold. They are presented in Table 5 along with their corresponding descriptions. Our experiments show that by applying machine learning techniques these actions are recognisable from their associated sensor measurements.
|Click||Touching an item momentarily with one finger|
|Scroll||Touching continuously and simultaneously sliding|
|– up, down, right, left||in the corresponding direction|
|Zoom||Placing 2 fingers on the screen and sliding them|
|– in, out||apart or toward each other, respectively|
|Hold||Touching continuously for a while with one finger|
We collected touch action samples from 11 users (university staff and students) using Google Chrome on an iPhone 5. We presented each user with a brief description of the project as well as the instruction to perform each of the 8 touch actions. The users were provided with the opportunity of trials before the experiment to get comfortable using the web browser on the mobile phone. They also could ask any question before and during the experiments. We asked the user to remain sitting on a chair in an office environment while performing the tasks. The provided GUI instructed the user to perform a single touch action in each step, collecting 5 samples for each touch action in successive steps with a three-second wait between steps. During the experiment, the user was notified of her progress in completing the expected tasks by the count of touch actions in an overall progress bar, as shown in Figure 3 (left).
Data were collected from each user in two settings: one-hand mode and two-hand mode. In the one-hand mode, we asked the users to hold the phone in one hand, and use the same hand’s thumb for touching the screen. In the two-hand mode, we asked them to use both hands to perform the touch actions. With these two settings, we made sure that our collected data set is a combination of different modes of phone usage. Note that zoom in/out actions can only be performed in the two-hand mode. Still, we distinguish two postures: 1) when a user holds the phone using one hand and performs zoom in/out actions by using the thumb of that hand and any finger of the other hand, and 2) when a user holds the phone using both hands and performs zoom in/out by using the thumbs of both hands. We collected data for both postures.
We had 10 samples of each of the following actions: click, hold, scroll down, scroll up, scroll right and scroll down. Five samples were collected in the one-hand mode and 5 in the two-hand mode. In addition, we collected 10 samples for each of the following two actions: zoom in and zoom out. All 10 samples were collected in the two-hand mode, with half for each of the two postures. Each user’s output was a set of 80 samples. With 11 users, we ended up with 880 samples for our set of touch actions. The experiment took each user on average about 45 minutes to complete. Each user received a £10 Amazon voucher for their contribution to the work.
4.3 Classification algorithm
Before discussing the algorithms used in this phase, we add another 14 features to the TouchSignatures’ time domain features. To differentiate between touch actions with a longer “footprint” and those with a shorter footprint, we consider a feature which represents the length (i.e., number of readings) of each dimension of the acceleration and acceleration-including-gravity sequences that contain 70% of the total energy of the sequence. To calculate this length, we first find the “centre of energy” of the sequence as follows: , where is the total energy as calculated before. We then consider intervals centred at and find the shortest interval containing 70% of the total energy of the sequence. Therefore, considering both time domain and frequency domain features from Section 3.4 in addition to the new ones, TouchSignatures’ final vector for phase one has 164 features in total.
Our evaluations show that the -nearest neighbour (-NN) algorithm  gives the best overall identification rate for our data. -NN is a type of lazy learning in which each object is assigned to the class to which the majority of its nearest neighbours are assigned, i.e., each feature vector is assigned to the label of the majority of the nearest training feature vectors. A distance function is used to decide the nearest neighbours. The most common distance function is the Euclidean distance, but there are other distance functions such as the city block distance (a.k.a. Manhattan or taxicab distance). For two given feature vectors and , the Euclidean distance is defined as and the city block distance as .
Based on the results of our evaluations, we decide to use two classifiers in two stages. In the first stage, the data is fed to the first classifier which is a 1-NN classifier using Euclidean distance. This classifier is responsible for classification of the input data into 5 categories: click, hold, zoom in, zoom out, and scroll. In the second stage, if the output of the first stage is scroll, then the data is fed into the second classifier which is a 1-NN classifier using city block distance. This classifier is responsible for classification of a scroll into one of the 4 categories: scroll up, scroll down, scroll right, and scroll left. We used a 10-fold cross validation approach for all the experiments.
In this section we show the results obtained from the cross validation of the collected user data by presenting the identification rates and confusion matrices for both classifiers.
Considering all scrolls (up, down, right, and left) in one category, the overall identification rate is 87.39%.
Table 6 shows the confusion matrix for our first classifier. In each cell, the matrix lists the probability that the classifier correctly labels or mislabels a sample in a category. The actual and classified categories are listed in the columns and rows of the table, respectively. As shown in Table 6, the worst results are for the pairs of Click and Hold (10.9% and 5.45%), and also pairs of Zoom in and Zoom out (25.45% and 20.9%). This is expected since click and hold are very similar actions: and hold is basically equivalent to a long click. Zoom in and zoom out also require the user to perform similar gestures. Another significant value is the classifier’s confusion between click and scroll (7.27%, 2.73%), which again is not surprising since scroll involves a gesture similar to a click. Apart from the mentioned cases, the rest of the confusion probabilities are nearly negligible.
Table 7 shows the identification rates and confusion matrix for our second classifier, respectively. Overall, our second classifier is able to correctly identify the scroll type with a success rate of 61.59%. The classifier mostly mislabels the pairs (down, up), and (right, left), which is somehow expected since they involve similar gestures.
The obtained results show that attacks on user privacy and security by eavesdropping sensor data through web content are feasible and are able to achieve accurate results. Further security risks could be imposed to the users if the attack tries to identify what character has been pressed on the touch screen. In phase 2 of TouchSignatures, we show that it is indeed possible to succeed such an attack by identifying the digits entered for the user’s PINs.
5 Phase 2: Identifying user PINs
|Attribute||iPhone 5||Nexus 5|
|screen.width||320 pixs||360 pixs|
|screen.height||568 pixs||640 pixs|
5.1 Digits set
In this work, we consider a numerical keypad and leave the attack on the full keyboard as future work. A numerical keyboard includes a set of 10 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, and a few more characters such as -, ., and #, depending on the mobile OS. For example Figure 3 (centre) shows a numerical keypad on an Android device. The idea is to identify the pressed digits in a PIN. Hence from a top view, once the first phase of TouchSignatures distinguishes that the user is “clicking” digits on a soft keyboard, the second phase is started in order to obtain the entered digits.
Similar to the first experiment, we asked a group of users (university student and staff) including 12 users to participate in our experiment in two parts. The first part was on an iPhone 5 and the second part was on a Nexus 5, both using Chrome. After giving a brief description about the study to the users, they were presented with a simple GUI (Figure 3, centre) asking them to enter 25 random 4-digit PINs on both devices. The 4-digit PINs were designed in a way that each number was exactly repeated 10 times in total. After entering each 4-digit PIN, the user could press a next button to go to the next PIN. They also could keep track of their progress as the number of PINs they have entered so far was shown on the page.
In this experiment, we asked the users to remain sitting on a chair and hold the phone in the way that they felt comfortable. The collected data contained a mixture of one-hand mode and two-hand mode records. In the one-hand mode, the user pressed the digits with one of the fingers of the same hand with which they were holding the phone. In the two-hand mode, they pressed the digits with either the free hand, or both hands. We had 10 samples of each digit for each user. Since we had 10 digits, each user’s output was a set of 100 samples for each device. With 12 users, the input of our classifiers was 1200 records for iPhone 5 and 1200 records for Nexus 5. It took each user 2 minutes on average to complete each part of the experiment with preparation and explanations. It took each user less than 10 minutes to finish the whole experiment.
5.3 Classification algorithm
Among different classification methods, we observed that ANN (Artificial Neural Network) works significantly better than other classifiers on our dataset. A neural network system for recognition is defined by a set of input neurons (nodes) which can be activated by the information of the intended object to be classified. The input can be either raw data, or pre-processed data from the samples. In our case, we have preprocessed our samples by building a feature vector as described in Section 3.4. Therefore, as input, TouchSignatures’ ANN system receives a set of 150 features for each sample.
A neural network can have multiple layers and a number of nodes in each layer. Once the first layer of the nodes receives the input, ANN weights and transfers the data to the next layer until it reaches the output layer which is the set of the labels in a classification problem. For better performance and to stop training before over-fitting, a common practice is to divide the samples into three sets: training, validation, and test sets.
We trained a neural network with 70% of our data, validated it with 15% of the records and tested it with the remaining 15% of our data set. We trained our data by using pattern recognition/classifying network with one hidden layer and 10,000 nodes. Pattern recognition/classifying networks normally use a scaled conjugate gradient (SCG) back-propagation algorithm for updating weight and bias values in training. SCG  is a fast supervised learning algorithm based on conjugate directions. The results of the second phase of TouchSignatures are obtained according to these settings.
|Nexus 5 (Ave. iden. rate: 70%)||iPhone 5 (Ave. iden. rate: 56%)|
Here, we present the output of the suggested ANN for Nexus 5 and iPhone 5, separately. Table 9 shows the accuracy of the ANN in classifying the digits presented in two parts for the two devices. The average identification rates for Nexus 5 and iPhone 5 are 70% and 56%, respectively. In general, the resolution of the data sequences on Android was higher than iOS. We recorded about 37 motion and 20 orientation measurements for a typical digit on Android, while there were only 15 for each sequence on iOS. This can explain the better performance of TouchSignatures on Android than on iOS. It is worth mentioning that attacks on iPhone 5 actually are the ones with the lowest sampling rates that we observed in Table 3 (20Hz for both motion and orientation). Interestingly, even with readings on the lowest available sampling rate, the attack is still possible.
In Tables 10 and 11, we show the identification results of each digit (bold in each cell), as well as confusion matrices on both devices. The general forms of the tables are according to Android and iOS numpads. As demonstrated, each digit is presented with all possible misclassifiable digits. As it can be observed, most misclassified cases are either in the same row or column, or in the neighbourhood of each expected digit.
Note that the probability of success in finding the actual digit will significantly improve with more tries at guessing the digit. In fact, while the chance of the attack succeeding is relatively good on the first guess, it increases on further guesses as shown in Tables 12 and 13. Figure 4 shows the average identification rates based on the number of guesses in Nexus 5 and iPhone 5 compared to random guessing. As shown on the figure, TouchSignatures can predict the correct touched digits on average in almost 90% of the cases on Nexus 5 and 80% of the cases on iPhone 5 in the third guess.
5.5 Comparison with related works
|TapLogger ||Acc, Orientation||36.4%||in-app|
In this section we compare the second phase of TouchSignatures, the identification of PIN digits, with previous in-app sensor-based PIN identifiers. Among the works described in Table 1, we choose to compare TouchSignatures with TouchLogger , and TapLogger , since they use similar sensors for identifying digits on soft numerical keyboards.
Taplogger performs its experiments on Android devices and identifies 36.4% of the digit positions in the first attempt by using accelerometer and orientation sensors. On the other hand, TouchLogger is able to identify the digits with 71.5% accuracy on an Android device by using device orientation.
TouchLogger collects around 30 samples per digit from one user, while Taplogger has the input of one user for 20 random 16-digit sequences in 60 rounds. However, we noticed that in these works the data has been collected from only one user. In general, data obtained form a single user are more consistent than those collected from a diversified group of users. To verify this, we performed another experiment by simulating the same test condition as described above with the Android device (Nexus 5) and asked only one user to repeat the experiment 3 times. We collected 30 samples for each digit. The results are presented in Table 14. As expected, the identification rate of TouchSignatures increased to 77% in this situation, which is better than the results reported in TapLogger and TouchLogger.
6 Possible solutions
To be able to suggest appropriate countermeasures, we need to first identify the exact entity responsible for the access control policy in each situation. Mobile OS access control policy decides whether the browser gets access to the device motion and orientation sensor data in the first place, no matter if the browser is active or not. If access is provided, then mobile browser access control policy decides whether a web app gets access to the sensor data, no matter if the web app is open in the same tab and in the same segment, in the same tab but in a different segment, or in a different tab. Hence any effective countermeasure must address changes in both mobile OS and browser policies with respect to access to such sensor data.
One approach to protect user security would be to require the mobile OS to deny access to the browser altogether when the browser is not active, and require the browser to deny access to web content altogether when it is running in an inactive tab or in a segment of the page with the different web origin. However, this approach may be considered too restrictive as it will disallow many potential web applications such as activity monitoring for health and gaming.
A more flexible approach would be to notify the user when a web page is requesting access to such sensor data, and provide control mechanisms through which the user is able to set their preferences with respect to such requests. This is the approach currently taken by both the mobile operating systems and browsers with respect to providing access to the device location (i.e., GPS sensor data ) when a web page requests such access. We believe similar measures for device motion and orientation would be necessary in order to achieve a suitable balance between usability and security. Possible (mock-up) interfaces for this countermeasure, based on existing solutions for GPS sensor data, are presented in Figure 5. In particular, we think the user should have three options: either allow access to the browser (in the mobile OS setting) or web pages (in the browser setting) indefinitely, or allow access only when the user is working on the browser (in the mobile OS settings) or interacting with the web page (in the browser settings), or deny access indefinitely. These three options provided to the user seem to be neither too few to render the access control ineffective, nor too many to exhaust the user attention span.
Furthermore, we believe raising this issue in the W3C specification would help the browsers to consider it in a more systematic and consistent way. Our suggestion for the new version of the specification is to include a section for security and privacy considerations and discuss these issues in that section properly.
7 Industry feedback
We reported the results of this research to the W3C community and mobile browser vendors including Mozilla, Opera, Chromium and Apple. We discussed the identified issues with them and received positive feedback as summarized below.
W3C. After we disclosed the identified problems to the W3C community, the community acknowledged the attack vectors introduced in this paper and stated that: “This would be an issue to address for any future iterations on this document [i.e. W3C specification on mobile orientation and motion]”. A security issue has been recorded to be taken into account by W3C in this regard121212github.com/w3c/deviceorientation/issues/13. The community discussed this issue in their latest meeting and suggested to add a security section to the specification in response to the findings of our work131313w3.org/2015/10/26-geolocation-minutes.html#item03.
Our results highlight major shortcomings in W3C standards, mobile operating systems, and browsers access control policy with respect to user security. As a countermeasure which strikes a balance between security and usability, we suggest that device orientation and motion data be treated similarly to GPS sensor data. Effective user notification and control mechanisms for access to such sensor data should be implemented both in mobile operating systems and in mobile browsers. The positive industry feedback confirms that serious damage could be caused exploiting the introduced attack vectors. As a matter of fact, some of the browser vendors such as Mozilla and Apple have already started working on the mitigations suggested in this paper.
We would like to thank the volunteers who contributed to the user studies of this project. We also thank several anonymous reviewers of this journal paper and its preliminary version at ASIACCS’15. We thank the W3C Geolocation Working Group and the mobile browser vendors including Mozilla, Apple, Google, and Opera for their quick responses and constructive communications. The last three authors are supported by ERC Starting Grant No. 306994.
-  W3C Geolocation API Specification. http://dev.w3.org/geo/api/spec-source.html.
-  W3C Working Draft Document on Ambient Light Events. http://www.w3.org/TR/ambient-light/.
-  W3C Working Draft Document on Device Orientation Event. http://www.w3.org/TR/orientation-event/.
-  W3C Working Draft Document on Media Capture and Streams. http://w3c.github.io/mediacapture-main/getusermedia.html.
-  A. J. Aviv, B. Sapp, M. Blaze, and J. M. Smith. Practicality of accelerometer side channels on smartphones. In Proceedings of the 28th Annual Computer Security Applications Conference, pages 41–50. ACM, 2012.
-  C. Bo, L. Zhang, X.-Y. Li, Q. Huang, and Y. Wang. Silentsense: Silent user identification via touch and movement behavioral biometrics. In Proceedings of the 19th Annual International Conference on Mobile Computing and Networking, MobiCom ’13, pages 187–190, New York, NY, USA, 2013. ACM.
-  H. Bojinov, Y. Michalevsky, G. Nakibly, and D. Boneh. Mobile device identification via sensor fingerprinting. CoRR, abs/1408.1416, 2014.
-  L. Cai and H. Chen. Touchlogger: Inferring keystrokes on touch screen from smartphone motion. In HotSec, 2011.
-  L. Cai and H. Chen. On the practicality of motion based keystroke inference attack. In S. Katzenbeisser, E. Weippl, L. Camp, M. Volkamer, M. Reiter, and X. Zhang, editors, Trust and Trustworthy Computing, volume 7344 of Lecture Notes in Computer Science, pages 273–290. Springer Berlin Heidelberg, 2012.
-  T. Cover and P. Hart. Nearest neighbor pattern classification. Information Theory, IEEE Transactions on, 13(1):21–27, 1967.
-  A. De Luca, A. Hang, F. Brudy, C. Lindner, and H. Hussmann. Touch me once and i know it’s you!: Implicit authentication based on touch screen patterns. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pages 987–996, New York, NY, USA, 2012. ACM.
-  T. Halevi, D. Ma, N. Saxena, and T. Xiang. Secure proximity detection for NFC devices based on ambient sensor data. In Computer Security–ESORICS 2012, pages 379–396. Springer, 2012.
-  H. Li, D. Ma, N. Saxena, B. Shrestha, and Y. Zhu. Tap-wave-rub: Lightweight malware prevention for smartphones using intuitive human gestures. In Proceedings of the Sixth ACM Conference on Security and Privacy in Wireless and Mobile Networks, WiSec ’13, pages 25–30, New York, NY, USA, 2013. ACM.
-  Y. Michalevsky, D. Boneh, and G. Nakibly. Gyrophone: Recognizing speech from gyroscope signals. In Proc. 23rd USENIX Security Symposium, 2014.
-  E. Miluzzo, A. Varshavsky, S. Balakrishnan, and R. R. Choudhury. Tapprints: your finger taps have fingerprints. In Proceedings of the 10th international conference on Mobile systems, applications, and services, pages 323–336. ACM, 2012.
-  M. F. Moller. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6(4):525 – 533, 1993.
-  S. Narain, A. Sanatinia, and G. Noubir. Single-stroke language-agnostic keylogging using stereo-microphones and domain specific machine learning. In Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless; Mobile Networks, WiSec ’14, pages 201–212, New York, NY, USA, 2014. ACM.
-  E. Owusu, J. Han, S. Das, A. Perrig, and J. Zhang. Accessory: password inference using accelerometers on smartphones. In Proceedings of the Twelfth Workshop on Mobile Computing Systems & Applications, page 9. ACM, 2012.
-  O. Riva, C. Qin, K. Strauss, and D. Lymberopoulos. Progressive authentication: deciding when to authenticate on mobile phones. In In Proceedings of 21st USENIX Security Symposium, 2012.
-  B. Shrestha, N. Saxena, H. T. T. Truong, and N. Asokan. Drone to the rescue: Relay-resilient authentication using ambient multi-sensing. In Proc. Eighteenth International Conference on Financial Cryptography and Data Security, 2014.
-  L. Simon and R. Anderson. Pin skimmer: Inferring pins through the camera and microphone. In Proceedings of the Third ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, SPSM ’13, pages 67–78, New York, NY, USA, 2013. ACM.
-  R. Spreitzer. Pin skimming: Exploiting the ambient-light sensor in mobile devices. In Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, SPSM ’14, pages 51–62, New York, NY, USA, 2014. ACM.
-  M. Velten, P. Schneider, S. Wessel, and C. Eckert. User identity verification based on touchscreen interaction analysis in web contexts. 9065:268–282, 2015.
-  Z. Xu, K. Bai, and S. Zhu. Taplogger: Inferring user inputs on smartphone touchscreens using on-board motion sensors. In Proceedings of the fifth ACM conference on Security and Privacy in Wireless and Mobile Networks, pages 113–124. ACM, 2012.
Appendix A Popular Browsers
|Opera Mini Fast Browser||7.6.40234||100,000,000+|
|Opera browser for Android||20.0.1656.87080||50,000,000+|
|UC Browser for Android||10.1.0.527||50,000,000+|
|UC Browser Mini for Android||18.104.22.1680||10,000,000+|
|UC Browser HD||22.214.171.1242||10,000,000+|
|Baidu Browser (fast and secure)||126.96.36.199||10,000,000+|
|CM Browser Fast & Secure||5.1.44||10,000,000+|
|Mobile Classic (Opera-based)||N/A||10,000,000+|
|Photon Flash Player & Browser||4.8||10,000,000+|
|Maxthon Browser Fast||188.8.131.520||5,000,000+|
|Boat Browser for Android||8.2.1||5,000,000+|
|Next Browser for Android||1.17||5,000,000+|
We tested several browsers including three major browsers on Android: Chrome, Firefox, and Opera, and three major browsers on iOS: Safari, Chrome, and Opera. Other Android browsers were also included in the study due to their high download counts on the Google Play Store. The full list of tested Android browsers and their download counts can be seen in Table 15. There are a number of browsers with high numbers of downloads but limited capabilities, e.g., specialised search engine browsers or email-based browsers. Since these browsers do not support features such as multi-tab browsing, they are excluded from our study. The iOS App Store does not report the number of downloads, hence we used a combination of user ratings, iTunes Charts, and checking the availability of the listed Android browsers on iOS to discover and select a list of popular browsers on iOS. On both platforms, we only considered browsers that are available free of charge from the official app stores.