Interaction Methods for Smart Glasses
Since the launch of Google Glass in 2014, smart glasses have mainly been designed to support micro-interactions. The ultimate goal for them to become an augmented reality interface has not yet been attained due to an encumbrance of controls. Augmented reality involves superimposing interactive computer graphics images onto physical objects in the real world. This survey reviews current research issues in the area of human-computer interaction for smart glasses. The survey first studies the smart glasses available in the market and afterwards investigates the interaction methods proposed in the wide body of literature. The interaction methods can be classified into hand-held, touch, and touchless input. This paper mainly focuses on the touch and touchless input. Touch input can be further divided into on-device and on-body, while touchless input can be classified into hands-free and freehand. Next, we summarize the existing research efforts and trends, in which touch and touchless input are evaluated by a total of eight interaction goals. Finally, we discuss several key design challenges and the possibility of multi-modal input for smart glasses.
In recent years, smart glasses have been released into the market. Smart glasses are equipped with a see-through optical display, which is positioned in the eye-line of human users. The human user can view both the real-world environment and the virtual contents shown in the display, which is regarded as the concept of augmented reality (Poupyrev et al., 2002). Currently, augmented reality on mobile devices is dominated by smartphones. For example, one of the biggest smartphone manufacturers, Apple Inc. has launched its augmented reality toolkit, namely ARKit ([11, 2017a). The shift in mobile devices from smartphones to smart glasses will happen over the next decade. It is projected that smart glasses will become the next leading mobile device after the smartphone, according to market research conducted by Digi-captial ([11, 2017c). Thus, smart glasses have great potential in becoming the major platform for augmented reality.
According to the figures forecast by Digi-Capital ([11, 2017c), the market value of augmented reality will hit 90 billion US dollar by 2020, in which no less than 45 % of the market share will be generated by the hardware for augmented reality. In the report by CCS Insight ([11, 2017b), it is estimated that around 14 million of the virtual and augmented reality headsets will be sold by 2020 with a market value of 14.5 billion US dollar. One of the challenges that device manufacturers encounter, before their smart glasses become widespread in the market, is the usability issue. The interaction between human user and smart glasses is still encumbered and problematic. That is, the virtual content on the optical display are not touchable and thus direct manipulation becomes a fatiguing and error-prone task. Additionally, compared with smartphones, smart glasses have more challenging issues such as reduced display size, small input interface, limited computational power, and short battery life (Ok et al., 2015).
Google Glass (, 2017) is the first of its kind in the market. Due to its small form size, only swipe gestures are accessible for the user input and thus the operating system is designed as a series of pixel cards, namely Timeline. Users can swipe over the pixel cards and select the target pixel card. However, this design has potential pitfalls such as limitations in micro-interaction, long search time when pixel card number is large, and so on. Similar to the desktop computer and smartphone, other successors of smart glasses have applied the traditional custom of the WIMP (Windows, Icons, Menus, Pointers) paradigm in their interfaces. However, the default interaction methods available on smart glasses such as touch pad and button inputs are far from satisfactory. The users may find it difficult to accomplish their tasks in the interface under the WIMP paradigm by using these default interaction methods, for instance, the long task completion time, high error rate in item selection, and so on. However, there exists no other standard and mature methods for the interaction between smart glasses and human users.
To tackle this problem, we explore various gestural interaction approaches supported by either the peripheral sensors on additional devices, or embedded sensors in the smart glasses. Gestural input refers to the capturing of the body movements of human users that instructs the smart glasses to execute specific commands. The sensors on the additional devices (e.g. wrist band) or embedded sensors in smart glasses can capture the user’s gestures such as drawing a stroke, circle, square, or triangle (Groenewald et al., 2016). The captured gestures are then converted into input commands according to the gesture library. For example, the possible input commands can be to select a character on the keyboard in the virtual interface shown on the optical display of smart glasses, choosing an app icon on the main menu of the starting page, as well as moving a 3D object from one location to another in the augmented reality environment.
In accordance with the above problem, this survey mainly focuses on research issues related to interaction methods between smart glasses and human users. Equivalently, we focus on the needs of human users in operating the smart glasses. We compare the gestural interaction methods using touch or touchless techniques. We also present the opportunities for using multi-modal methods for the hybrid user interface in augmented reality. In summary, the framework of this survey covers the following areas.
Introduction to Smart Glasses and issues with human-smart glasses interactions (Section 2). Google Glass is the first example of smart glasses in the market, which provides new opportunities for user interaction and the challenges in interaction design to researchers. We evaluate a number of popular smart glasses on the market, and their sensors and the corresponding interaction methods of those smart glasses.
Touch-based interaction approaches (Section 3.1). The user-friendliness of smart glasses is crucial, which becomes an important issue to design easy-to-use and robust interaction techniques. We present various approaches of touchless input to operate smart glasses with external devices or additional sensors.
Touchless interaction methods (Section 3.2). Apart from the touch-based techniques with external devices, a number of touchless techniques exist in the literature. We review two primary techniques (Hands-free and Freehand interactions) that enable smart glass users to perform input on smart glasses.
Existing research efforts and trends (Section 4). We summarize touch and touchless inputs into four categories and compare them with a total of eight interaction goals. Their potencies and research trends are accordingly discussed.
Challenges of interactions on smart glasses (Section 5). All the interaction methods using additional devices or embedded sensors share a common goal. That is, users can perform fast and natural interaction with augmented reality on smart glasses. We present the key challenges of interaction methods to the hybrid user interface in augmented reality.
2. Preliminary- the introduction of smart glasses and their sensors
Smart glasses are head-worn mobile computing devices, which contain multiple sensors, processing capabilities and optical head-mounted displays (OHMDs). With the processing capabilities and the OHMD, the users of smart glasses can view augmented information that is overlaid on the physical world. These capabilities provide great potential to achieve real-time and enriched interaction between the smart glasses user and the physical world with augmented information. Equivalently, the smart glasses wearer can interact with the augmented reality environments. In order to achieve the two-way interactions between the user and the smart glasses, two important requirements should be fulfilled.
First, smart glasses can provide a clear and stable output on the OHMD to the smart glasses wearer. The smart glasses wearer finds it very difficult to see the content in augmented reality if the output on the OHMD is too small or unclear in some illuminated conditions such as outdoor environments. However, this is highly related to the technical specifications of the smart glasses and thus not the focal point of the survey paper. Second, smart glasses should offer an easy and effortless manner with which to operate them under appropriate ergonomic considerations. The smart glasses user can perform inputs through various actions (e.g. head movement, hand gesture, voice input, etc.) and the sensors embedded into the smart glasses identify the actions of the user. The input of the wearer can be processed into instructions for user interaction with virtual content superimposed onto the physical world.
This section first includes several significant examples of smart glasses, ranging from the very first prototype (head-worn computer) proposed in the lab to the recent commercial product (smart glasses) available in the market. We can see the advancement of smart glasses has developed from being a bulky and cumbersome backpack to the current lightweight wearables. Next, we depict the sensors available for today’s nowadays commercial smart glasses. The usage of sensors for the corresponding input methods will be briefly explained in this section, while the details of the input methods in the wide body of literature is discussed in the next section.
2.1. Examples of smart glasses
The Touring Machine The first and historically significant example of smart glasses can be traced back to the Touring Machine (et al., 1993), which was proposed in 1997 by Feiner et al. The Touring Machine is a prototype machine designed for urban exploration. In the demonstration, it is used for the navigation of the campus area. The machine consists of a wearable see-through display with built-in orientation detector, a stylus and a trackpad on a handheld computer, a GPS receiver, peripherals for the internet connection, and a desktop computer stowed in the backpack of the user. Figure 1 shows the appearance of the Touring Machine.
In the campus navigation demonstration, through the see-through display, the user can see information about the surrounding buildings in the campus (for instance, names of buildings and corresponding departments) as well as a number of choices on a virtual menu such as finding the user location, showing the department information, removal of digital overlay, and so on. The user can access the digital overlaid menu with the trackpad. Also, the orientation detector guides the users to the orientation of the destination building. The compass system presents a compass pointer on the see-through display. The color of the points will change from green to red if the user deviates from the target building more than 90 degrees.
Even though the system is cumbersome and heavyweight, in comparison to today’s smart glasses, it is a well-defined example of the early development of augmented reality on mobile devices, where the features of Touring Machine are driven by a GPS. This is basically the same as today’s GPS-driven mobile applications. Also, it presents a rudimentary approach to the interaction with a digital overlaid menu in augmented reality by using trackpad and stylus.
Weavy Weavy (Kurata et al., 2002) is a lightweight head-worn mobile wearable, which is comprised of a single-eyed head-mounted display with the capabilities of wireless connection (Figure 1). All the frames captured from the camera on the device are transmitted to the back-end server that handles the offloading of computer-vision tasks. Compared with the Touring Machine, Weavy demonstrates a working prototype, which is closest to today’s smart glasses. However, due to the limitations of computing power in 2002, the image frames are processed in the back-end server.
WUV and BrainyHand WUV (Mistry et al., 2009) and BrainyHand (et al., 2010) can be regarded as a variation of Weavy. These wearables, as shown in Figure 2, have a similar purpose to smart glasses but the key difference is that a laser projector substitutes the optical output on the head-mounted display. The wearables can show the augmented information onto the user’s palm or nearby surface such as an interior wall and newspaper. As these devices are miniature in size, trackpads or buttons are not available for user input. The user input of these wearables are mainly supported by hand gestures. In WUV, the wearers have to stick colorful markers on their hand for the recognition of the user’s hand gesture input, while BrainyHand is able to detect simple hand gestures, such as zooming in and out, by calculating the distance between the skin surface of the user’s hand and the head-mounted camera.
Transcend HUD Before the commencement of Google Glass, Transcend (, 2017) is the first example of commercial smart glasses launched in 2010. They are ski-goggles that are equipped with a Heads-Up Display (Figure 3). The data is displayed on a small screen on the outer edge of a skier’s peripheral vision. With the assistance of location-aware features driven by a built-in GPS, the smart glasses can notify a skier about the real-time performance such as speed, elevation, airtime and navigation.
Google Glass and Sony SmartEyeGlass Google Glass (, 2017), which was released in 2014, is a light-weighted and self-contained head-mounted computer with a set of sensors such as accelerometer, gyroscope and magnetometer (Figure 4). Compared with the previous example, the virtual content is visible in a see-through optical display, which is made of liquid crystal on silicon (LCoS) with LED illumination. Thus, the smart glasses are capable of superimposing virtual content such as text and images onto the user’s field of view (FOV). It allows the wearer to perform micro-interaction with the smart glasses such as map navigation, photo or video capturing, and receiving notification/message. Regarding the input method, voice command (speech recognition) is the major method of operating Google Glass. Similar to Weavy, the task of natural language processing is offloaded to Google’s cloud server that analyze the user’s input, due to the limited computational capabilities of Google Glass. Epson (, 2017) released its first smart glasses in 2015, as shown in (Figure 4). It is a similar product to Google Glass with a considerably larger optical display. Also, its input method relies on a touch-sensitive external controller that enables the user to operate a mouse cursor in the WIMP interface.
Microsoft Hololens Microsoft Hololens (, 2017) are recently launched smart glasses that are equipped with powerful computer chipsets and a state-of-the-art display with wider FOV than the aforementioned examples (Figure 4). The chipsets create a more immersive environment, which allows the user to pin holograms onto the surrounding physical environment. The holograms can be represented in the taxonomy of a 2D interface and a 3D object. 2D objects can be virtual windows/menus, writing notes, gallery, video, while 3D objects can be sphere, cube, animal, planet, etc. Also, it supports multi-modal input including a head gesture for cursor movement, two simple hand gestures (tap and blooming), and voice command. Although they are the most powerful self-contained smart glasses on the market, they are considered as obtrusive with a bulky design and lack of mobility in outdoor environments. It is suggested that a wearable interface must be ready for mobility or in-situ use (Ens et al., 2016).
Summary Today’s smart glasses are regarded as the beginning of augmented reality on mobile devices. However, they are considered as a rudimentary product because major constraints, such as weak processors, short battery life, small screen size, have not yet been solved. Considering the focal point of this paper, the input methods for smart glasses are not well-defined. Even though the projection of smart glasses is promising, we are not clear if smart glasses will be adopted by users for daily usage in the same way as today’s smartphones, as the issues of battery life and input methods are problematic. However, it seems that smart glasses will first serve as some specialized task-oriented devices, for instance, industrial glasses, smart-helmets, sport-activity coaching devices, and the like. (Rauschnabel et al., 2015).
2.2. Sensors on the smart glasses and the input methods
Figure 5 shows the sensors on the smart glasses available in the market that support various input methods in the practice and the literature. The only exception is the optical display in the final column, which is the standard component for optical output. The sensors are briefly explained as the following.
Camera It is an optical instrument for recording or capturing images, which may be individual still photographs or sequences of images constituting videos or movies (, 2017). Camera are one of the standard components on smart glasses. Among the available smart glasses, the majority of them (9 out of 12) are equipped with RGB camera that is only designed for monocular vision. This is mainly restricted by the requirements of the product size, as depth cameras and infrared cameras are bulky and heavyweight. Therefore, we find that the remaining three smart glasses support depth measurement and infrared data, in which Microsoft Hololens and ODG have depth cameras, and META supports infrared vision. The cameras on the glasses can support various computer-vision tasks and their capabilities are subject to the types of camera. When the camera comes to the domain of user input, it is usually for capturing a wearer gesture, in particular of hand gestures.
Microphone It is a transducer that converts sound into an electrical signal ([10, 2017). The electrical signal can be further processed by speech recognition. The recognized speech is used for input to the smart glasses. All smart glasses have microphones embedded into their circuit board. This implies that today’s smart glasses support voice input from users. One of the reasons is that the recent advancements in speech recognition makes voice input accurate and responsive.
Global Positioning System(GPS) It is a global navigation satellite system that provides geo-location and time information to a GPS receiver anywhere on the Earth ([11, 2017d). The GPS enables the smart glasses to support various geo-location based applications. For instance, the smart glasses can tell the users about the current position of the wearer, or driving directions. From the results, 11 out of 12 smart glasses have GPS, which makes them ready for GPS based augmented reality applications.
Accelerometer The sensor is designed for measuring proper acceleration, which is defined as the rate of change of velocity of a body in its own instantaneous rest frame ([12, 2017). All smart glasses have accelerometer. Smart glasses can measure the acceleration force along the x, y, and z axis, as well as gravity force. This allows the smart glasses to record the motion input from the wearer, for instance, understanding the status and activities of the wearer like being stationary, walking, running, and so on. In addition, knowing the status and activities of the user can help in designing user input in a more precise and subtle manner (Tung and Shin, 2016). For example, the wearer performs head gestural input to the smart glasses but the accuracy of the gesture recognition can be influenced by the other simultaneous motions, for instance, the walking status of the wearer. Thus, the unwanted motion from walking can be alleviated by the measures taken by the accelerometer.
Gyroscope The sensor is an infrastructure which measures the orientation of the wearer on the basis of the principle of angular momentum ([14, 2017). The rate of rotation around the x, y, and z axis are measured by the infrastructure. Identical to the accelerometer, the gyroscope exists in all smart glasses as gyroscopes and accelerometers are commonly integrated into today’s manufacturing standard. Regarding the input approach of smart glasses, a gyroscope can measure the angular velocity of the wearer’s head. Therefore, smart glasses can measure the head movement of the wearer and hence support head gestural input.
Magnetometer The sensor is an instrument that measures the strength and direction of magnetic fields ([15, 2017). Many smartphones have magnetometers and they serve as compasses in various mobile application especially for navigation and maps. Similarly, all smart glasses have magnetometers as they have inherited the requirements for mobile applications on smartphones. It is projected that smart glasses have the same potential to measure the wearer’s mobility and perform various mobile applications as appear in today’s smartphone when both the accelerometer and gyroscope are considered.
Light sensor The sensor is a detector of light or other electromagnetic energy ([16, 2017). As for the smartphone, the touchscreen display adjusts its brightness subject to the ambient light. Likewise, the optical display of smart glasses adjusts the brightness if the ambient light affects the readability of content. Thus, the light sensor provides smart glasses the capability of automatically adjusting the brightness of the display in various light conditions. From the results, only two of the surveyed smart glasses are not equipped with light sensors. The META is currently designed for Augmented reality in indoor environments. However, Mad Gaze X5 is designed for both indoor and outdoor environments. The lack of a light sensor impacts on the readability of content in outdoor environments.
Tangible interface This category refers to the use of an external controller, trackpad and button, which allows the wearers to interact with the digital interface of the optical display of smart glasses (Figure 6). The external controller provides a more efficient and easier control than the trackpad and button located on the body of the smart glasses. However, the external controller is cumbersome if the wearer’s hands are occupied and thus it is not convenient for the wearer to perform other tasks simultaneously in augmented reality. The trackpad and button have no issue with the above problem but the operations on the small touch surface of the button and trackpad of the smart glasses causes two major problems (Chew et al., 2010). First, muscle fatigue is the main issue as wearers need to raise their hands to touch the button and trackpad. Prolonged use is not favorable for the wearer (Hincapié-Ramos et al., 2014). Second, the small surface of the button and trackpad requires subtle finger movements and therefore deteriorates the task performance. Nevertheless, the above input methods are commonly used for smart glasses. Out of 12 smart glasses 6 provide buttons and trackpads for manipulating the items and objects on the smart glasses interface, while 3 out of 12 smart glasses have external controllers that enable users to control a cursor on the smart glasses interface. The remaining three smart glasses rely on the gestural input supported by various types of cameras. META utilizes an infrared camera to detect hand gestural inputs from the wearers, and Microsoft Hololens and ODG R9 utilize depth camera to capture hand gestural input. The results show that the hand gestural input is an alternative to the tangible interface, because of its advantages such as intuitiveness and naturalism (Norman and Nielsen, 2010).
Eye tracker It is a device for measuring eye positions and eye movement. Nowadays, it is mainly applied in the virtual reality such as FOVE ([18, 2017). Unfortunately, none of the smart glasses supports the eye tracking function. Here is an example showing the potential of using a multi-modal input approach of eye-tracking and physical interface. When only the tangible interface is accessible, it is difficult for users to select one small object in cluttered environments. The eye-tracking technology can be used to quickly spot and locate the object that the user intends to select, which is driven by eye movement (Toyama et al., 2012). Afterwards, the user can manipulate the object by the tangible interface such as button or external controller. Alternatively, the combination of eye tracking and hand gesture can achieve object location and selection (Slambekova et al., 2012).
Summary To conclude, today’s smart glasses have evolved from a bulky and heavy machine located in the user’s backpack to lightweight wearables. The ways of showing virtual content are unified to see-through optical displays from head-mounted displays and projections onto nearby surfaces. The twelve surveyed smart glasses are equipped with cameras, microphones, accelerometers, gyroscopes, magnetometers, which are widely available in many smartphones. GPSs and light sensors are important in aiding smart glasses to adapt with the mobile applications in outdoor environments. Eye tracker is gaining popularity in the field of virtual reality, but none of the smart glasses manufacturers have taken eye tracker into their commercial products and the technology of eye tracker for augmented reality smart glasses is in its infancy, even though a few lower cost add-on components for eye tracking on head-worn computers have been proposed (Stengel et al., 2015; Shimizu and Chernyshov, 2016). Not surprisingly, the kind of hand gestural interaction has first been applied to the commercial products from research. Many approaches have been widely proposed in the literature, ranging from head gestures, gaze interaction, to touch interface on different parts of the human body. In the next section, we investigate various interaction approaches for smart glasses in the literature, which are supported by the embedded sensors introduced in this section and other additional sensors.
3. Interaction approaches for smart glasses
Nowadays, touchscreen input is the primary interaction modality for today’s smart devices, and these touchscreens are sized from smart wristbands to smartphones. As for the smart wearables, such as smart glasses, speech recognition is the major input of choice because these wearable devices do not have a touch-screen display that serves as the input device. Despite the fact that touch screens are popular in smartphones and smart watches, the screen touch interfaces have not moved into small-sized smart devices with following reasons (Colaço et al., 2013). A touch screen interface does not fully take advantage of human dexterity. It requires the user to touch a small screen on the device repetitively and constantly, and hence touching the screen for input occludes the user’s sight of the display. This makes the simple tasks like menu navigation becoming repetitive and tedious actions. Therefore, studies in the literature have proposed numerous approaches to interact with smart wearables of small size including smart glasses. Offering smart glasses with better input approaches makes the interaction experience more intuitive and efficient, which enables the users to handle more complicated and visually demanding tasks. In other words, the enhanced interaction experience brings smart glasses from their limited usage of micro-interactions to daily usage as seen in today’s smartphones. In this section, we focus solely on the interaction approaches for smart glasses.
There are multiple dimensions for classifying interaction approaches, for instance, Vision-based and Non-vision based, Gesture-based and Non-gesture based (M., 2014). An alternative dimension is to divide the interaction approaches into 3 classes, which are handheld, touch, and touchless (Tung et al., 2015). First, handheld refers to the input type that makes use of handheld controllers, such as smartphones, and the wired trackpads linked with Sony’s SmartEyeglass and Epson’s Moverio glasses. Second, touch refers to non-handheld touch input, such as gestures and tapping on body surfaces, touch-sensing wearable devices (e.g. smart rings, smart wrist band, watches, and spectacle frame of smart glasses), as well as touch interface on the user’s body. This class is characterized by the presence of tactile feedback. Third, touchless refers to non-handheld, non-touch input, such as mid-air hand gestures, head and body movements, gaze interaction, and voice recognition. In contrast with the second class, this class does not involve tactile feedback from touch but tactile feedback can be augmented by devices (e.g. haptic feedback from a haptic glove (Hsieh et al., 2014) or a head-worn computer (Kangas et al., 2014)). The first class have been briefly explained with the tangible interface in Section 2. The remainder of the classes (Touch and touchless) are discussed in this section. Figure 7 depicts the classification of interaction approaches proposed in this survey.
3.1. Touch inputs
On-device interaction means the users can perform gestural input on a sensible surface of various devices such as the body of smart glasses and peripheral sensors on external devices, which serves as an augmented touch surface for user inputs.
Touch interface on smart glasses Google glasses have a touchable spectacle frame, where a swipe gesture can be acted on the frame. Researchers propose swipe-based gesture for text entry (Yu et al., 2016; Grossman et al., 2015). In Yu et al’s work (Yu et al., 2016), an unistroke gesture system is proposed (Figure 8). Each character is represented by a set of two dimensional uni-strokes. These stroke sets are designed for easy memorization. For example, the character ‘a’ is comprised of three swipes of ’down-up-down’ that mimics the stroke of handwriting. In SwipeZone (Grossman et al., 2015), the touchable spectacle frame on Google Glass are divided into three zones (back, middle and front). A character can be quickly chosen by two swipes on these zones (Figure 8). The first swipe selects the character block consisting of 3 characters. The second swipe chooses the target character inside the block. On the other hand, other works focus on the optimal use of the external controller wired with smart glasses to achieve faster text entry. The external controller allows users to operate the pointing device, that is, the cursor, and select keys on a virtual on-screen keyboard. Various arrangements of text input interface are considered in the literature such as Dasher (Ward et al., 2000), as well as AZERTY and QWERTY keyboards (McCall et al., 2015).
Physical forms of external devices As smart glasses have a reduced form size and weight, the need for complementary interaction methods are evolving. External devices can be made in various physical forms such as rings (Kienzle and Hinckley, 2014; Yang et al., 2012; Ogata et al., 2012; Ashbrook et al., 2011), wristbands (Rekimoto, 2001; Ham et al., 2014), sleeves (Schneegass and Voit, 2016), and belts (Dobbelstein et al., 2015). Instrumental glove is excluded from this category because of its purpose for mid-air interaction (Dipietro et al., 2008). The on-device interactions are precise and responsive. That is, the spatial mapping between the sensible interface on external devices and the smart glasses’ virtual interface allows accurate input and fast repetition. However, the major drawback is the existence of the device itself and the time required for putting on the device (Ens et al., 2016).
Finger-worn device Finger-worn devices (Figure 9) have gained a lot of attention in recent years, as these devices encourages small, discreet, and single-handed movements (Shilkrot et al., 2015). LightRing (Kienzle and Hinckley, 2014) consists of a gyroscope and an infrared emitter positioned on the second phalanx of the index finger, while MagicFinger (Yang et al., 2012) has an optical sensor positioned on the fingertips. These types of hardware enable stroke-based gestures on any surface. In LightRing, the infrared emitter and gyroscope detect changes in distance and orientation that constitute trajectories on touch surfaces. The miniature optical sensor on Magic Finger detects the direct touch of fingertips on any solid surface. In contrast, iRing (Ogata et al., 2012) and Nenya (Ashbrook et al., 2011) provides a touch surface on the ring. Users can touch these ring surfaces for pointing and flipping gestures. In addition, iRing can detect both the touch on the ring surface and the bending of the finger muscle, in which the gesture combination is enriched. The photoreflector in the ring can detect the changes in pressures from touch and finger bending. Nenya has a magnetometer in the baselet sensing the absolute orientation of finger touch. Ens et al. (Ens et al., 2016) and Nirjon et al. (Nirjon et al., 2015) attempt to further extend the capability of ring-form devices. As ring-form devices own a relatively small sensitive surface, its usage is commonly proposed for tap and swipe gestures. Nirjon et al. (Nirjon et al., 2015) propose a finger-worn text entry system for a virtual QWERTY keyboard. The keys on a QWERTY keyboard are divided into multiple zones in which every zone contains a sequence of 3 consecutive keys. Two steps are compulsory for choosing a key, as follows. In the first step, users select the target zone by moving the hand horizontally and vertically on a surface. Next the user locates the target key by finger movement, as the ring mounted on the middle finger can detect the user’s finger movements (middle, index, and ring fingers). Another ring proposed by Ens et al. (Ens et al., 2016) contains an inertia measurement unit and touch surface. This hardware configuration supports tap and swipe gestures during hand gestural input. A depth camera mounted on the smart glasses detects the hand gestures for fast and coarse selection of a window. The user can use a fingertip to point on a virtual object and afterwards interact with the chosen object through the tap and swipe gestures powered by the ring.
Arm-worn device These devices have a relatively larger surface than finger-worn devices. Instead, the touch surface is located on the wristband (Figure 10). Muscle tension (Rekimoto, 2001) and arm movement (Ham et al., 2014) (e.g. wrist rotation) are detected by capacitive sensors and an inertial measurement unit (IMU), respectively. Gesture Sleeve (Schneegass and Voit, 2016) is a variation of wristband covering the entire area of the forearm with a touch-enabled textile that supports tap and stroke based gestures.
Touch-belt device Dobblelstein et al. (Dobbelstein et al., 2015) have proposed a touch-sensitive belt for smart glasses inputs. The belt-shape prototype intends to provide users a larger input surface than the spectacle frame on Google Glass. The touch-sensitive area on the belt (Red circuit boards as shown in Figure 10) supports swipe gestures to manipulate the pixel cards on the optical display. The approach is claimed to be unobtrusive as the user do not need to lift the arm and only subtle interaction with the belt is involved. However, this work only considers swipe gestures for Google Glass, while the pointing technique in WIMP paradigm (Jacob et al., 2008) is neglected.
Many researches have utilized human skin as the interaction surface. The prominent feature of on-body interaction is to leverage human proprioception as an additional feedback mechanism. That is, a human user can sense the tactile cue when interaction is exerted on the skin’s surface. Due to the existence of the tactile cue, on-body interaction has higher performance than touchless input especially mid-air input. Users no longer rely on visual clues to accomplish their tasks when the tactile clue can help them to locate their touch (Gustafson et al., 2013). In other words, the tactile cue can release the visual attention and achieve eyes-free input that is useful in actions with lower cognitive/physical efforts or lack-of-attention scenarios (Yi et al., 2012). For instance, when users are walking (i.e., in mobile scenarios), eyes-free input through on-body interaction allows them to pay attention to the surroundings without high attention on the input interface, which could reduce distraction and danger (Fuentes and Bastian, 2010). Besides, users can immerse themselves in augmented reality without switching their attention between the input interface and the virtual contents on the optical display of smart glasses.
A recent work by Wagner et al. (Wagner et al., 2013) investigates the body-centric design space to understand the multi-surface and on-body interactions. Three guidelines for designing on-body interactions are proposed accordingly. Task difficulty, body balance, and interaction effects should be considered together for the on-body interaction. Particularly, the on-body interaction should be selected on stable body parts, such as upper limbs, especially when tasks require precise or highly coordinated movements. In another study conducted by Wagner et al. (Weigel et al., 2014), the on-skin input on various positions of the upper limbs are studied thoroughly. The user preference shows that the forearm is the highest perceived ease and comfort location (50%), followed by the back of the hand (18.9), the palm (17.8%), the finger (7.3%) and others (6%). However, the above studies have not considered touch on the facial area. Facial touch has high potential because smart glasses are positioned on the user’s head, and at the same time facial touch is proximate to the smart glasses, which serves as an extension of the touch interface on smart glasses, in addition to the benefits such as intuitive and natural interactions (Mahmoud and Robinson, 2011).
The prior work of on-body interaction have proposed various parts of the human body, such as the palm (Gustafson et al., 2013; Harrison et al., 2011a, b; Weigel et al., 2015; Wang et al., 2015a; Wang et al., 2015b), the forearm (combined with the back of the hand) (Azai et al., 2017, 2017; Ogata et al., 2013; Lin et al., 2011), the finger (Huang et al., 2016; Weigel et al., 2015; Yoon et al., 2015), the face (Serrano et al., 2014), the ear(Lissermann et al., 2013) for touch input, as the following.
Palm as surface The projection-based techniques are first adaptable to smart glasses. OmniTouch (Harrison et al., 2011a) is a shoulder-worn wearable proof-of-concept system equipped with depth-sensor and projector. Users can perform multi-touch interaction on their own bodies including the palm. In addition, the projection of virtual contents can be applied to any flat surface. The user can receive tactile feedback from the finger when active touch (Gustafson et al., 2013) is acted on these surfaces. Skinput (Harrison et al., 2011b) is an arm-worn wearable hardware with projector and vibration sensors. Instead of using a depth sensor to detect a touching event on an user’s skin, an array of tuned mechanical vibration sensors are used to capture wave propagation along the arm’s skeletal structure when a finger presses on the skin.
PalmType (Wang et al., 2015a) is a palm-based keyboard for text entry. Instead of using the projection proposed in OmniTouch (Harrison et al., 2011a), a virtual QWERTY keyboard appears on the optical display of smart glasses (Figure 11). A number of infrared sensors located on the wrist of the user’s forearm detect the touch acting on the palm keyboard. Three types of text entry methods are compared in the evaluation - Touchpad on the external controller wired with Epson Moverio glasses, Squared QWERTY keyboard on the palm, and optimized QWERTY keyboard that matches with shape of the user’s palm. The results show that PalmType with optimized layout achieved 10 words per minutes, which was 41% faster than touch pad, and 29% faster than PalmType with a squared layout. The above results give a cue that the mapping of virtual interfaces on the body surface can influence the task performance. The palm should be treated not only as a writing board surface but also a dynamic interface on the body surface.
In the above examples, visual clues exist in the form of image projection or virtual images on the palm. The surprising fact is that visual feedback is optional to palm-based interaction. Gustafson et al. (Gustafson et al., 2013) investigate the possibility of palm-based imaginary interfaces. That is, no visual cue appears on the user’s palm. Alternatively, an audio system announces instructions to the user rubbing across their palms. In the studies, two experiments have been conducted. According to the first experiment, palm-based imaginary interfaces allow people to interact effectively on the palm without visual feedback. Audio instructions assist users rubbing across their palms. In the second experiment, most of the participants agreed that the tactile sensing on the palm is more important than the tactile cue on the pointing finger. In other words, users rely on the tactile sense on the palm to orient themselves to the targeted item in the imaginary interface.
A brief description of the two experiments are as follows. Four scenarios are designed in the first experiment (Figure 11). 1) Palms or a fake phone are in sight, 2) Blindfolded, blocks the sight of participants of their hands, 3) a fake phone surface where a grid is drawn on the surface to guide the participants to find the target item, and 4) Palm. Considering the participants are blindfolded, the experiment demonstrates that touching on the palms is no worse than touching on the fake phone; This implies that the tactile feedback on an imaginary palm interface can achieve a performance similar to the availability of visual clue on the touchscreen of smartphone. The result supports the hypothesis that tactile feedback improves the task performance. After proving that the tactile cue is relevant to task performance, the next important question is about the importance of the tactile sources, that is, active touch and passive touch (Gustafson et al., 2013).
The second experiment considers three scenarios (Figure 11). 1) Palm, 2) Fake Palm, 3) Palm with finger cover. The fake palm is used for evaluating the performance when passive touch is removed, while the finger cover is to discover the effects of active touch on the fingertip. The results show that browsing on the fake palm is significantly slower than on a real palm, while in contrast there is no significant performance gap between touching the real palm with or without a finger cover (tactile sense exists or not). Consequently, the experiment gives evidence that the tactile cue comes from the passive tactile sense (from the palm), instead of the active one (on the fingertip).
PalmGesture (Wang et al., 2015b) is an example of eyes-free interaction using the palm as an interaction surface. It is an implementation based on the findings of an imaginary interface. The interaction highly depends on the tactile cue on the palm of one hand (passive touch), while a finger of another hand acts as the stylus (active touch). The finger performs stroke gestures on the palm, and the user does not require any visual attention on the palm. The proof-of-concept system consists of an infrared camera mounted on the user’s wrist, which detects touch events on the palm. The user can enter text by drawing single-stroke Graffiti characters, as well as trigger an email list by drawing an envelope symbol on the palm.
Forearm as surface The forearm interface, analogous to the finger-to-palm interaction, can be divided into two approaches. First, widgets or menus are projected onto the surface of the forearm as a visual clue, and the user touches the forearm and obtains a tactile clue. Another approach is eyes-free interaction that solely depends on a tactile clue. The forearm serves as a ’trackpad’ and the user rubs across the forearm.
The finger-to-forearm interaction requires either optical or vibration sensors mounted on the arm. As mentioned, Skinput (Harrison et al., 2011b) can be applied in finger-to-forearm interaction as long as the projected virtual interface is located on the forearm. Azai et al. (Azai et al., 2017) designs a menu widget on the forearm for smart glasses (Figure 12). Due to the latest development in augmented reality smart glasses such as Microsoft Hololens having a bigger field of view (FOV), the widgets can be fully displayed on the forearm. Four types of interactions are designed for forearm widgets (Azai et al., 2017), which are Touch, Drag, Slide, and Rotation. The interactions on the forearm are detected by infrared sensors mounted on the top of the head-worn computer. Touch and drag interactions are suitable for item selection and controlling a scrolling bar. Slide means one hand slides from the wrist to the elbow of another hand, and the menu switches accordingly. Rotation is designed for adjusting parameters on the widget such as increasing the volume of a music player. In SenSkin (Ogata et al., 2013), photo-sensitive sensors can sense any force exerted on the forearm, such as pull, push and pinch on the skin (Figure 12). PUB (Lin et al., 2011) converts the user’s forearm into a touch interface by using ultrasonic sensors. SenSkin (Ogata et al., 2013) and PUB (Lin et al., 2011) allow eyes-free interaction and are mainly driven by tactile cues, while Skinput (Harrison et al., 2011b) and forearm widget (Azai et al., 2017) offer both visual and tactile cues on the forearm.
Finger as surface Finger can be viewed as a part of palm-based interaction. We separate them with the following reasons. We discuss the thumb-to-fingers interaction in this sub-section. It is the subtle movement of the thumb on the index and middle fingers (Huang et al., 2016); The finger-to-palm interaction has been discussed thoroughly in the previous sub-section.
The space (Kuo, Chiu, Chang, Hsu, and Sun, Kuo et al.) and coordination (Li and Tang, 2007) between the thumb and other fingers are crucial to the design of thumb-to-fingers interaction. Huang et al. (Huang et al., 2016) studies the possibility of designing the button (tap gesture) and touch (stroke gesture) widget under the scenario of thumb-to-fingers interaction. The comfortable reach between thumb and other fingers are investigated in their study. The results (Figure 13) are as follows. Regarding the button widget, participants prefers to touch on the 1st and 2nd phalanx of the index, middle, and ring fingers, as well as the 1st phalanx of the little finger. As for the touch widget, only the 1st phalanx and 2nd phalanx of the index finger and middle finger are the areas of comfortable reach. Their study also indicates that participants prefers stroke movements because larger movements improve physical comfort. The above findings suggest that the 1st and 2nd phalanx of the index and middle fingers are considered as the ideal area for thumb-to-fingers interaction.
Furthermore, the implementation of thumb-to-finger interactions are as follows. TiMMi (Yoon et al., 2015) is a flexible surface enclosing the index finger that forms a ring-like device (Figure 13). It achieves multi-modal sensing areas between the thumb and index finger. The surface can capture gestures when the thumb exerts forces on the index finger. As the surface is slim and flexible, the press on the surface can give a tactile cue to the user. TiMMi is a rudimentary prototype, while iSkin (Weigel et al., 2015) is a mature prototype ready for commercialization. Likewise, iSkin is a thin, flexible and stretchable overlay on the user’s skin. It encloses the index finger, and senses the touch from the thumb. The incredibly thin layer enables the user to receive tactile feedback. Additionally, the remarkable feature of iSkin is that the appearance of iSkin is customizable and aesthetically pleasing, and hence achieves higher social acceptance. According to the indicative examples of iSkin, the layers can be extended to other body surfaces such as forearm, palm, face, and so on. FingerPad (Chan et al., 2013) allows the user’s thumb to perform pitch gesture on the 1st phalanx of the index finger, in which magnetic sensors are positioned on the nail of the index finger.
Face as surface Serrano et al. (Serrano et al., 2014) proposes a hand-to-face input for interacting with the head-worn display including smart glasses. The face is well suited for natural interaction with the following justifications. First, the facial area is touched frequently, which is 15.7 times per hour in the observational experiment (Mahmoud and Robinson, 2011). Users feel at ease to do subtle interaction on their faces. The frequent touch on the face means that the gesture could be less intrusive and therefore shows a higher level of social acceptance. Second, the hand-to-face interaction has enough space on the facial area for various gestural interactions including panning, pinch zooming, rotation zooming, and cyclic zooming (Figure 14). An example shown in a user study (Mahmoud and Robinson, 2011), browsing a webpage requires a lot of panning and zooming. Third, likewise for other on-skin interactions, tactile feedback from the facial area can actually orient the user. When tactile feedback is available, eyes-free interaction is also facilitated (Yi et al., 2012), and hence minimizes the waiting time for visual feedback (Wagner et al., 2013). Last, the moment of positioning the user’s hand on the facial area can serve as a gesture delimiter that informs the gesture system to record a new gesture and thus avoid unintentional activation.
Regarding the ideal facial area, the lower region of the face is suggested and the facial area in front of the eye and mouth should be avoided because gestural inputs in front of these areas will obstruct the user’s view (Figure 14). The area on the cheek is highly preferred by the participants (Serrano et al., 2014) because the cheek imitates the large area of the touchpad on smart glasses, which is regarded as an extension of the touch surface from the body of the smart glasses. However, the task performance is subject to the arm-shoulder fatigue, especially when prolonged use, because the hand-to-face actions require lifting the user’s arm. Also, some participants do not accept the hand-to-face interaction because excessive touching could mess up their face makeup or finger skin oil will remain on their face.
Ear as surface In this survey, the ear is distinguished from the facial area as the description of face-to-hand input is limited to the cheek. Lissermann et al. (Lissermann et al., 2013) proposes a hardware prototype, namely EarPut (Figure 14), which instruments the ear as an interactive surface for touch-based interaction. The user can touch on the ear and accordingly trigger the arc-shaped capacitive touch sensor at the back of the ear for smart glasses input. Similar to hand-to-face input, the advantages of the hand-to-ear input are four: proprioception, natural tactile feedback, eyes-free interaction, and easy access.
In comparison with hand-to-face input, the surface area of the ear is relatively small and not flat, e.g. ear helix. The participants prefer to divide the ear into a maximum of four areas. This means the sole reliance on touch or tap is not enough for various interactions. Proposed gestures include touch gestures (such as single tap, slide on ear, and multi-touch on the ear), grasp interactions (e.g. bending the ear, pulling the ear lobe, and covering the entire ear), as well as mid-air gestures. However, the social acceptance of the proposed gestures is not evaluated in their work. The touch gestures on the ear can be considered as an analogous example of hand-to-face input and thus it is suitable for use in a public area. However, the acceptance of grasp interactions, blending the ear especially, and mid-air gestures next to the ear are still questionable.
3.2. Touchless inputs
Regarding the touchless inputs, smart glasses users make gestural input mid-air and receive visual clues from the optical display on the smart glasses. The touchless input can be classified into two categories: Hands-free and Freehand interactions. Hands-free interaction can be made by the movements of the head, gaze, voice and tongue, while freehand interaction focuses on mid-air hand movements for gestural input.
Hands-free input is one of the most popular categories in the domain of interaction techniques. It enables users to perform hands-free operations on smart glasses. That is, interaction between users and smart glasses involves no hand control. In the wide body of literature, hands-free interaction techniques include voice recognition, head gestures, and eye tracking. In addition, tongue gestures have been studied in recent years.
Voice recognition This technology has been deployed in smart glasses and becomes the major input method for Google Glass and Microsoft Hololens. However, it might be inappropriate in shared or noisy environments, for example, causing disturbance and obtrusion (Yi et al., 2016), disadvantages to mute individuals, accidentally activated by environmental noise, and less preferable than the input approaches by body gestures and handheld devices (Kollee et al., 2014).
Head movement Head-tilt gestures are mainly driven by built-in accelerometers and gyroscopes in smart glasses. This technology is applicable to text input (Jones et al., 2010), user authentication (Yi et al., 2016) as well as game controller (Wahl et al., 2015). Glass Gesture (Yi et al., 2016) utilizes both accelerometer and gyroscope in smart glasses to achieve high input accuracy, in which a sequence of head movements is regarded as authentication input. In (Wahl et al., 2015), users can control the movement (up, right, left, down) of the characters in a Pac-Man game by head movement. However, head movements cannot be considered as the major input source due to the ergonomic restriction of users moving their heads for long-periods of gaming.
Gaze movement Gaze movement can instruct the cursor movement for pointing tasks (Ware and Mikaelian, 1987), for instance, choosing an object with an eye gaze (Slambekova et al., 2012; Toyama et al., 2014; Bâce et al., 2016), text input based on Dasher writing system (Tuisku et al., 2008), and recognizing objects with eye gaze in augmented reality (Toyama et al., 2012). Gaze interactions have been proposed for head-mounted displays (Shimizu and Chernyshov, 2016; Schuchert et al., 2012) and smart glasses (Wahl et al., 2015). Slambekova et al. (Slambekova et al., 2012) have designed multi-modal system for fast object manipulation of virtual contents. Gaze input acts as a mouse cursor that chooses objects and simultaneously hand gestures performs object manipulation such as translation, rotation and scaling. The system well utilizes the characteristics of eye and hand. Gaze interaction can catch the target object quickly and human hands have a high degree of freedom (DOF) that enables manipulating objects in diverse manners. Toyama et al. (Toyama et al., 2014) utilizes gaze movement to select the targeted text for translation on the optical display of smart glasses. In UbiGaze (Bâce et al., 2016), users can embed visible messages into any real-world object and retrieve such messages from those objects with the assistance of gaze direction which indicate where the users are looking in the surrounding physical environment.
Eye movement is a natural and fast input channel, in which only slight muscle movement is involved, but it has major drawbacks, for instance, they are error-prone and suffer from excessive calibration, and the eye-tracking hardware is not available in smart glasses (Bulling et al., 2012). The performance of gaze input can be further improved by considering haptic feedback. Kangas et al. (Kangas et al., 2014) studies the effect of vibro-tactile feedback from a mobile device as a confirmation of gaze interaction. The results show that the task completion time is shortened when the vibro-tactile feedback is available, and the participants feel comfortable due to reduced uncertainty. Nonetheless, the eye-tracking technology for smart glasses will not be popular for the next several years because the price of the tracker is no less than a few hundred dollars.
Tongue movement The tongue machine interface is usually proposed for paralyzing injuries or medical conditions which retain the use of their cranial nerves (Saponas et al., 2009). The locations of sensors can be either intrusive (Saponas et al., 2009; Zhang et al., 2014) or non-intrusive (Goel et al., 2015). Saponas et al. (Saponas et al., 2009) places infrared optical sensors inside the user’s mouth to detect the tongue movement. Four simple gestures (back, front, left, right) are achieved with 90% accuracy. Zhang et al. (Zhang et al., 2014) locates electromyography sensors on the user’s chin to detect the muscle changes driven by tongue gestures. Two additional gestures (protrude and rest) are designed in  with 94.17% accuracy. Tongue-in-Cheek (Goel et al., 2015) has a system that uses 10 GHz wireless signals to detect different facial gestures in four directions (up, right, left, down) and two modes (tap and hold). It detects the facial movement on cheeks driven by moving different parts of the mouth: touching of tongue against the inside of the cheeks, puffing the cheeks, and moving the jaws. The total 8 gesture combinations achieve 94.30% accuracy. Even though the tongue interface can achieve highly accurate detection, it lacks considerations for a complicated interface. Only simple gestures are demonstrated in the testing scenarios like Tetris (Saponas et al., 2009). It is very likely that the current works on tongue machine interface are not ready for interactions in augmented reality.
Although various hands-free techniques are proposed in the literature, there is no evidence showing that hands-free input with smart glasses outperforms other interaction techniques such as freehand interaction. As reported by Zheng et al. (Zheng et al., 2015), human beings are good at adapting to various conditions whether or not their hands are occupied by instruments or tasks or not. In other words, performing hands-free operations may not be the necessary condition in the design of interaction techniques and thus freehand interaction involving hand gestures is not inferior to hands-free input. A usability study (Tung et al., 2015) also found that the gestural input is preferable to on-body gestures and handheld devices especially in an interactive environment.
Freehand interaction refers to the human-smart glasses interaction driven by hand gestures. Hand gestures can be classified into 8 types (et al., 2012): Pointing, Semaphoric-Static, Semaphoric-Dynamic, Semaphoric-Stroke, Pantomimic, Iconic-Static, Iconic-Dynamic, and Manipulation, as shown in Figure 15. The followings is a brief explanation of the listed hand gesture types.
Pointing: Used to select an object or to specify a direction. Pointing can be represented by index finger, multiple fingers, or a flat palm.
Semaphoric-Static: Derived meaning from social symbols such as thumbs-up as ’Like’ and forward-facing flat palm as ’Stop’. The symbols can be carried out with one or both hands and be directed to the camera without movement.
Semaphoric-Dynamic: Added temporal aspect on the Semaphoric-static. Clock-wise rotation motion means ’Time is running out’.
Semaphoric-Stroke: Similar to Semaphoric-dynamic, but an additional constraint of a single dedicated stroke is considered. Examples can be ’Next/Previous Page’.
Pantomimic: Considered a single action of mime actor to illustrate a task, for example, grabbing an object, as well as moving and dropping an object.
Iconic-Static: Pertaining to an icon, for instance, making an oval by cupping two hands together.
Iconic-Dynamic: Added temporal aspect on Iconic-Static. An example is constantly circular hand movement (i.e. drawing a circle).
Manipulation: the above gesture types requires a pre-defined time interval to recognize the hand gesture. This type refers to executing a task once the user performs a particular gesture. Considering moving an virtual 3D object, no delay should exist once the mid-air touch on the virtual object is executed and the update of an object’s location should be instantly performed in a continuous manner.
Sensors are necessary for capturing the dynamic movements and static postures of a user’s hand. Glove and camera are commonly used for freehand interaction with smart glasses. In this sub-section, we focus on the recent works on smart glasses.
Glove The device is commonly comprised of sensors and inertial measurement units to detect hand gestures and postures. A comprehensive review of the history and advancement of glove-based systems can be discovered in (Dipietro et al., 2008). In general, glove-based interaction is applied to hand gestures of the pointing kind (Haque et al., 2015; Hsieh et al., 2014) and text input (Rosenberg and Slater, 1999). Vulture (Markussen et al., 2014) is a mid-air hand gesture interaction technique for text entry, where the instrumental gloves tracks hand and fingers positions. Myopoint (Haque et al., 2015) contains electromyography and inertial motion sensors detecting arm muscle movement and achieves pointing and clicking through muscle contraction and relaxation. Recent works related to smart glasses interaction are mainly designed for particular considerations such as enhancing social acceptance of gestural inputs (Hsieh et al., 2016), ubiquitous gaming in mixed reality (Martins et al., 2008), designed rested posture for long-term use (Guinness et al., 2015), and supporting tangible augmented reality on physical objects (Simon et al., 2012).
Camera Multiple types, such as RGB, depth, infrared, thermal camera and so on, of Camera enable vision-based approaches are for freehand interaction. The recent works have applied RGB camera (Chan et al., 2015; Huang et al., 2015; Lee and Hollerer, 2007; Bailly et al., 2012; et al., 2004), and depth camera (Rekimoto, 2001; Guimbretière and Nguyen, 2012; Ha et al., 2014), which are image processing, tracking, and gesture recognition. The components of tracking and recognition can be achieved by mainly two approaches: model-based or appearance-based. Forearm, hand and finger are the target object in the gesture recognition (Moeslund and NÃ¸rgaard, 2003). A systematic literature review showing the development of mid-air hand gestures refers to (Groenewald et al., 2016).
Regarding vision-based freehand interaction with smart glasses, there have been a number of gestural interfaces with diverse purposes. From the early works, we can see hand gestures are applied as a mouse cursor that enables interactions with a 2D interface in the optical display (Guimbretière and Nguyen, 2012; et al., 2004). As augmented reality owns the prominent features of the integration of virtual contents with the physical environment, hand gesture shows its intuitiveness and convenience in the environment (van Krevelen and Poelman, 2010). Huang et al. (Huang et al., 2015) propose a hand gesture system that facilitates interaction with 2D contents overlaid on physical objects in an office environment. In addition, Heun et al. (Heun et al., 2013) enhances the capability of simple physical objects, such as knobs and buttons, by augmenting a 2D tangible interface on a tangible surface or on top of a physical object. On the other hand, hand gesture systems are designed for manipulating virtual 3D objects in augmented reality. An early work utilizes human hand to substitute for a fiducial marker (Lee and Hollerer, 2007), and another recent work enables barehanded manipulation of virtual 3D objects in augmented reality (Ha et al., 2014).
Ubii (Huang et al., 2015) is a gestural interface in which users can perform in-situ interactions with physical objects from a distance, including computers, projects screens, printers, and architecture partitions in an office environment. With the assistance of fiducial markers, users can simply apply hand gestures to complete tasks such as document copying, printing, sharing, and projection display. Kolsh et al. (et al., 2004) proposes an ego-centric view interface that enables users to perform pointing gestures in a 2D interface. In addition, some Iconic-Static gestures are included in their work, for instance, if two open hands are settled for five seconds, the head-mounted camera takes a snapshot. Similarly, Francois and Chan (Guimbretière and Nguyen, 2012) have proposed a multi-finger pinching system that simulates multi-button mouse interaction under depth camera, for instance, pinching gestures with index finger or middle finger invokes left and right clicks, respectively. A marker-less camera tracking system for 3D interface, namely Handy AR (Lee and Hollerer, 2007), uses the hand pose model to substitute the fiducial marker for 3D objects tracking and manipulation in augmented reality. By transforming the palm and fingers on the outstretched hand into the hand pose model, users can manipulate the 3D object by hand rotation and movement in augmented reality. In WeARHand (Ha et al., 2014), users can select and manipulate virtual 3D objects with their own bare hands in a wearable AR environment, for instance, moving the virtual 3D object from one location to another.
The above works commonly uses a head-worn camera or a camera embedded in smart glasses. Cameras can also be positioned on arms (Rekimoto, 2001; et al, 2010), fingers (Chan et al., 2015), shoes (Bailly et al., 2012), chests and belts (Gustafson et al., 2010). These approaches using wearable cameras aim to provide subtle interactions for higher social acceptance (Chan et al., 2015; Rekimoto, 2001) and free body movement (Bailly et al., 2012; Gustafson et al., 2010) that prevents gorilla arm (Hincapié-Ramos et al., 2014). Pinchwatch (et al, 2010) has a wrist-worn depth-camera to capture the thumb-to-palm and thumb-to-fingers interactions. CyclopsRing (Chan et al., 2015) detects the webbing of fingers by a fisheye RGB-camera in which the segmentation of skin color on fingers can produce a 2D silhouette for gesture recognition. Shoe-Sense (Bailly et al., 2012) has an upward-oriented optical sensor installed on a shoe. Users can make various two-armed poses in triangular form and the sensor can read the triangular arm gestures. Gustafson et al. (Gustafson et al., 2010) proposes an imaginary mid-air interface for wearables without touchscreens. The camera on the user’s chest owns a wide perspective that captures the user’s hand movement and accordingly allows input such as graffiti characters, symbol and curves.
While hand gestural interaction has compelling features such as natural and intuitive interaction, mouse and touch interaction outperforms the hand gestural interaction for fast repetitive tasks. An exploratory study (Sambrooks and Wilkinson, 2013) shows the comparison between gestural, touch, and mouse interaction in the WIMP paradigm with Fitt’s Law (MacKenzie, 1992). The results indicate that gestural interaction suffers from inaccurate recognition (hit-to-miss ratio is 1:3), poor performance time due to potential unfamiliarity with hand gesture library, and muscle fatigue. Another study also aligns with these findings (Pino et al., 2013). Additionally, gestural interaction requires relatively long dwelling time compared with mouse or touch interaction, and consequently an intensive task is not appropriate. The user needs to hold the posture for a period of time and this problem is regarded as the Midas problem(Istance et al., 2008), in which guessing the gesture initiation and termination are consuming and erroneous(Chen et al., 2016). It is concluded that gestural interaction is slower and harder to use than direct pointing interaction in a 2D interface.
A midpoint on the spectrum between Direct pointing and Semaphoric gesture should be taken into consideration. Some gesture types for 2D interfaces, such as Pantomimic and Iconic gestures, are less than ideal as discussed. Therefore, gesture type towards barehanded direct pointing (Ren and O’Neill, 2013) is a potentially fruitful direction for 2D interface interaction on smart glasses. Moreover, direct pointing or manipulation are analogue to the touch interface on a smartphone, that is, touchscreen, but the mouse interaction is not available on smart glasses and the visual content is no longer touchable. Therefore, pointing gestures become a viable option for 2D interfaces, for instance, Heo et al. (Heo et al., 2015) proposes a vision-based pointing gesture system by detecting the number of fingertip, instead of identifying the silhouette of the hand posture. More importantly, virtual 3D objects are always involved in augmented reality. Gesture types such as Semaphoric-stroke should be considered because of its natural and intuitive interaction (et al., 2013).
4. Existing research efforts and trend
In this section, we evaluate the interaction approaches from the perspective of interaction goals, including spectacle frame of smart glasses, rings, wristbands, belts, body surfaces, body movements, gloves and cameras. Based on the proposed classification system for touch and touchless input, their input abilities are discussed. According to the characteristics and features of the identified works in the previous section, we choose and compare more than 30 research works relevant to smart glasses interactions in recent years. All these works representing their categories (TOD: Touch-on-device, TOB: Touch-on-body, HFI: Hands-free interaction, and FHI: Freehand interaction) are designed for various interaction goals including manipulating an item and a scrolling bar inside a 2D interface, selecting a key on a virtual keyboard, writing graffiti words and unistrokes in text entry systems, manipulating 3D objects, interacting with a physical object in augmented reality, as shown in Figure 16. In the table, the interaction goals are summarized into 8 types, as the followings: TAP: single-tap gestures for operating items (e.g. select and drag a button or a menu), including single-tap and tap-and-hold, TRA: single-finger gestures that produces a trajectory for stroke inputs (e.g. swipe for switch between pages as well as scroll up/down, drawing a circle or envelope), MFT: multi-finger touch gestures such as zooming in/out, cyclic gestures, KEY: selecting keys on a virtual keyboard and other non-stylus based text entry techniques, GUT: stylus based (e.g. graffiti or unistroke) inputs for text entry systems, GES: hand gestural commands, a total of eight types as discussed in Section 3.2.2, DMO: direct manipulations on virtual three-dimensional objects (e.g. rotation, translation), PHY: interacting with a physical environment in augmented reality.
From the results shown in Figure 16, it is easy to recognize the touch input is mainly designed for tap and swipe gestures (TAP and TRA), as well as text entry system (KEY and GUT). The tap and swipe gestures are what commonly used for smartphone interface supporting multitudinous tasks. The on-device and on-body touch inputs aim to provide alternatives input approaches for smart glasses. Except those research works solely focusing on text inputs (Grossman et al. 2015, Yu et al. 2016, Nirjon et al. 2015, Wang et al. 2015), single-tap and stroke-based gestures are commonly available on touch inputs. When a larger surface is available on an external device or a body skin surface, multi-finger touch gestures are proposed accordingly (MFT). For example, the wristband proposed by Ham et al. (Ham et al., 2014) has a phone-sized touch interface, and the pioneer work of finger-to-face interaction proposed by Serrano et al. (Serrano et al., 2014) utilizes the considerably large surface on cheeks. Additionally, we observe that touch interfaces are responsive and accurate designs that sufficiently supports various tasks. Consequently, the touch input doesn’t necessarily support all the eight types of gestural inputs (GES). Interestingly, the gesture types such as Semaphoric-Stroke and Iconic-Static are also convenient on the 2D sensible touch surface. Instead of performing mid-air hand swipe, the user can draw a stroke on a touch-sensitive interface (Schneegass and Voit, 2016). Similarly, the user can draw an icon of envelope to trigger an email application (Wang et al., 2015b).
Four noticeable trends are identified in the existing works of touch input, as follows. First, research studies of on-body interaction focus on upper limbs (Weigel et al., 2014) and facial areas (Serrano et al., 2014). On-body interaction requires sensor channels detecting either infrared light (Wang et al., 2015b) or vibration exerted on skin surfaces (Lin et al., 2011). The existence of additional sensor arrays in external device merely serves as a detector of on-skin interaction. Inertial measurement units can possibly integrate with the external device for more variations of gestural inputs. Second, the research efforts have considered various forms and sizes of external devices and a wide range of sensing capabilities. Considering the finger-worn device as example, the device can be made in the form of traditional ring, distal addendum, whole finger addendum, fingernail addendum, finger sleeve, thumb addendum (Shilkrot et al., 2015). In addition, the capabilities of sensors can influence the gesture library on the ring, for instance, detecting the degree of the bending of the finger muscle can enrich the possible number of gestures in the gesture library on the limited touch-sensitive surface (Ogata et al., 2012). Third, there are no evident restriction on the recommended size of the touch surface. The selection of surface size is commonly justified by the functionality and the social acceptance. The common norms are that a larger surface, such as wristbands, finger-to-forearm, finger-to-face and finger-to-palm interactions, can support more comprehensive gestural inputs. For example, SenSkin (Ogata et al., 2013) supports single tap (TAP), drawing trajectory (TRA), and multi-finger gesture (MFT). In contrast, a smaller surface such as finger-worn devices and thumb-to-finger interaction usually supports simpler gestural inputs (TAP and TRA). In addition, the smaller surface supports subtle and inattentive one-hand interactions, which is more favorable in terms of social acceptance. The larger surface requires the two-hand interaction and the discernible body movement, which can raise unfavorable attention from the surroundings. From some recent works, we observe that the smaller surface attempts to expand its functionality. For instance, text entry is commonly proposed on a larger surface like finger-to-palm interaction (Wang et al., 2015a). A finger-worn device has demonstrated the interaction potentials beyond the tap and swipe gestures. In TypingRing (Nirjon et al., 2015), the finger-worn device achieves a typing speed of 6.26 - 10.44 word per minutes, while finger-to-palm interaction such as PalmType (Wang et al., 2015a) can only achieve 4.7 word per minute. Another example can be the text entry system using the graffiti word. Both the finger-to-thumb (Chan et al., 2013) and Palm Gesture (Wang et al., 2015b) allow user to write graffiti words on skin surfaces. These works show that the surface size is not necessarily a trade-off with the comprehensiveness of design functions. Last, the eyes-free interaction is an important feature of the touch interaction, which is supported by the existence of tactile cue. The benefits are discussed in previous section.
On the other hand, touchless input demonstrates distinguishable input characteristics from touch input. Hands-free interaction such as head and gaze movement provides very limited functions, that is, they are designed for micro-interaction such as a short duration of authentication inputs (Yi et al., 2016) and locating a few items in augmented reality (Bâce et al., 2016). The freehand interaction enabled by gloves and electromyography (EMG) wristbands can achieve item selections (TAP) on a virtual interface, and mid-air text input through sensing the finger movement (KEY). We consider that vision-based freehand interactions are interested primarily in the manipulation of a 3D objects (TAP and DMO) and a physical environment in augmented reality (PHY). These works emphasize the unique characteristics of hand gesture, i.e. intuitiveness and naturalism of direct manipulation on a virtual 3D objects and a physical environment in augmented reality. They also support single tap and drawing trajectory in mid-air that enable manipulating a virtual icons and switching pages in a virtual 2D interface, regardless of the fact that touch input has better performance in terms of accuracy, speed and repetitiveness (Sambrooks and Wilkinson, 2013). Interestingly, we discover that the research efforts of vision-based approaches for text input systems are commonly regarded as sign languages (Groenewald et al., 2016), which is deliberately designed for people with special needs. However, the iconic-static sign language is not appropriate for the purpose of intensive text entry because it suffers from long dwelling time of recognizing every single hand sign (Istance et al., 2008) and hence unproductive input speed. In addition, the mid-air tap on the virtual keyboard often appears in the usage examples of these works but they also suffer from the accumulated dwelling time of recognizing tap gestures on the keys of a virtual keyboard. From the above existing research works, we find that the vision-based freehand interactions are distinctive assets on direct manipulating virtual 3D objects and physical environment, however, the concerns on text entry makes vision-based freehand interactions cannot be an all-rounded approach.
Multi-modal input is one of the prominent trends in the existing work of freehand interaction for smart glasses. First, there exist several pioneer research works combining the benefits of touch and touchless input. Ens et al. have applied finger-worn device to reinforce the subtle movement on small items in a 2D interface, as freehand interaction on small items is lack of precision and fatigue-prone (Ens et al., 2016). In the system design, hand gestures are assigned to locate large items, such as windows and menus, while the finger-to-ring interaction is responsible for subtle operations on the large items, like relocating a window and changing some parameters in a scroll bar. Instead of having external touch interface on ring devices, Zhang et al. (Bai et al., 2014) utilize the touch interface on the spectacle frame of smart glasses. Second, multi-input modal has been considered to alleviate the issue of dwelling time. Yu et al. (Chen et al., 2016) have exploited the use of electromyography (EMG) sensor on commercial smart wristband to minimize the idle time of detecting the initiation and termination of intended gestures. Another trend is designing low-power hand gesture systems. MIME (Colaço et al., 2013) applies the hybrid processing of image information captured from both RGB and depth cameras. Optimized arrangement of the image sources can achieve both accurate and low-power gesture detection. The depth channel operates intermittently to enhance the performance of color-based detection of hand gesture and avoids intensive uses of power-consuming depth channel.
In conclusion, touch input and freehand interaction are the most popular research topics in smart glasses interaction. Figure 16 shows the coverage of interaction goals by the proposed four categories, in which touch input (TOD and TOB) shows promising interaction capabilities in 2D interfaces and vision-based freehand interaction (FHI) demonstrates intuitive and natural interactions with virtual 3D objects and physical environment in augmented reality. We envision the trend of combining the touch and touchless inputs have great potentials to smart glass interaction and meanwhile the boundary between freehand interaction and touch input will become ambiguous.
5. Interaction challenges on smart glasses
So far we have discussed four categories of interaction approaches that are important to smart glasses interaction. It is essential to note that these categories are research areas that need to be explored further and significantly. We have also provided a coverage of research efforts that readers can use to investigate and fill the performance gap among the interaction approaches. In this section, we highlight a number of challenging problems in smart glasses interaction. The reader may consider the below challenges as some design directions and guidelines for devising new interaction approaches on smart glasses.
Hybrid user interface on smart glasses Smart glasses are mobile device and its goal is to deliver an interface of augmented reality to users. Augmented reality involves superimposing interactive computer graphics images onto physical objects in the real world (Poupyrev et al., 2002). The virtual contents on the optical display can be represented by the taxonomy of 2D and 3D objects. This combination of virtual 2D and 3D contents can be regarded as hybrid user interface (van Krevelen and Poelman, 2010). The interactions with the virtual contents in the hybrid user interface creates a more intricate and complex scenario than what we have seen on smartphones. The virtual 2D contents refers to the operations on icon, menus and windows in 2D interfaces, for instance, selecting an object (Kienzle and Hinckley, 2014), drawing a trajectory (Ens et al., 2016), and illustrating a symbolic icon on two-dimensional space (Schneegass and Voit, 2016). The 3D contents refer to direct manipulation on virtual 3D objects and augmented information superimposed on physical objects, such as translation and rotation of 3D objects (Ha et al., 2014) and instructing a printer for printing jobs (Huang et al., 2015). In the works we surveyed, the virtual 2D and 3D contents can be matched into the eight types of interaction goals mentioned in Section 4. In general, virtual 2D contents can be effectively managed by the types of TAP, TRA, MFT, KEY and GUT, while virtual 3D contents can be handled by the remaining types of GES, DMO and PHY. Figure 16 has clearly shown that no existing works can provide a full coverage of eight interaction goals. This obvious gap depicts an immense opportunity for researchers to develop comprehensive approaches for interaction in hybrid user interface on smart glasses.
Towards higher coverage of interaction goals From the results in Figure 16, touch input mainly aids the interactions with virtual 2D contents (TAP, TRA and MFT) and text entry (KEY and GUT). Touchless input dominates the interaction with virtual 3D contents (GES, DMO and PHY). The reasons are as follows. First, the interaction with 2D interface usually needs fast repetition and accurate input, in which high dexterity of fingers on finger-to-device/body interfaces poses more advantageous than the movement of larger body part in mid-air, and the mid-air movement of larger body part (e.g. head and head gestures) is criticized by the lack of precision and prone to fatigue (Ens et al., 2016; Hincapié-Ramos et al., 2014). As a result, touch input has demonstrated higher input performance than touchless input in terms of interactions of 2D interface and text entry systems (MacKenzie, 1992; Pino et al., 2013). On the other hand, freehand interactions have exhibited its capability in 3D interface among the touchless inputs. Most of the users prefers interaction of 3D objects with hand gesture more than touch-based approaches because users agreed that performing gesture in front of face is natural and straightforward (Tung et al., 2015). The results can also be justified by the intuitiveness of hand gesture. Hand gesture enables users to direct manipulate the virtual 3D contents, for instance, rotation and translation can be done by simply rotating the wrist and swiping the hand, respectively. In comparison, touch input is less straightforward. For instance, the user first rubs on a touch surface to locate the targeted 3D object, and afterwards draws a circle on the touch surface to rotate the targeted 3D object. In order to achieve a higher coverage of interaction goals in hybrid user interface, it is worthwhile to judiciously consider exploiting both the touch-based and touchless gestures.
Building all-rounded interaction approaches In order to devise interaction approaches on smart glasses fulfilling the aforementioned interaction goals, one possible solution is to make the touch and touchless inputs to tackle its interaction challenges. Touch input can provide more intuitive gestures for the interaction with virtual 3D contents, while touchless input has to fill its gap in tasks requiring fast repetition. Another possible solution is to mingle the touch and touchless inputs together. We envision this assortment of input methods is a like-wise interaction as the multi-modal input appearing in touchscreen computer, e.g. Microsoft Surface. As discussed, exploiting the combination of touch and touch inputs can gain benefits of both inputs, as follows. Hand gesture is ideal for fast, coarse and convenient manipulation of virtual 3D objects, while the operations on virtual 2D interfaces can be fulfilled by touch surfaces that are suitable for precise and longer usage, such as surfing on web browser, selecting items in a widget menu, as well as inputing texts.
According to the surveyed works, we anticipate that the augmented reality on smart glasses would consist of a number of virtual large contents including menus, widgets, windows and 3D objects (Ha et al., 2014). Inside the large contents, there exist some small contents such as buttons, icons and scroll bars in menus/widgets, and adjusting parameters of 3D objects (Huang et al., 2015). Under this circumstance, users could first locate the large contents by fast and coarse hand gestures, and subsequently manipulate the small contents with subtle and repetitive touch inputs (Ens et al., 2016). We here elicit possible configurations for building comprehensive interaction approaches. The touch interface can be designed as a companion device to work complementary with touchless input. Here are two illustrative configurations. 1) touch interface on finger-worn device and vision-based freehand interaction, and 2) haptic glove equipped with touch-sensible textile for touch input, and embedded sensor (in the glove) supporting freehand interaction. Building multi-modal inputs using companion devices may circumvent the obstacles of interaction with smart glasses. To conclude, we see the strengths and weaknesses of input approaches. A variety of interaction potentials can be achieved by considering various combinations of input approaches. These combinations aim at supporting natural and fast interaction for augmented reality on smart glasses. The multi-modal inputs on smart glasses would be one of the most exciting research areas for further investigation. In the rest of this section, several key design factors for the multi-modal inputs on smart glasses are highlighted.
Form size for wide-ranging coverage When multi-modal inputs are considered, the choice of inputs can influence the comprehensiveness for the coverage of interaction goals (Kienzle and Hinckley, 2014; Dobbelstein et al., 2015; Schneegass and Voit, 2016; Wagner et al., 2013). For example, the coverage of interaction goals can be influenced by the size of touch-sensitive area on touch input device. The skin surfaces on forearms and palms as well as wristbands are considered as large interaction areas, which can be regarded as a full-sized trackpad for various missions (e.g. drawing trajectory and text entry). In comparison, the thumb-to-finger interaction and finger-worn devices have very limited space, which is used as an off-hand controller for click and swipe gestures or other simple interactions. Additionally, these small surfaces are only considered as an off-hand substitute for tangible interface (trackpad / button) on the spectacle frame of smart glasses. As the small surfaces are not advantageous to complicated tasks like text entry, an additional input approach is necessarily vital to fill the gap in the coverage, e.g. speech recognition.
Considering temporal factor in interaction design The timing of switching between multiple input modals is another crucial consideration. Vernier and Nigay (Vernier and Nigay, 2001) proposes a framework to describe five temporal possibilities in input modalities (order, succession, intersection, inclusion, and simultaneity). The key characteristics of the model is to describe the temporal relationship between two or more input approaches. Considering the combination of touch input and freehand interaction, the switching point from touch-based input to mid-air hand gesture can be the manifestation of 3D object. For example, the scenario requires manipulation of virtual 3D object after selecting an application in 2D interface, i.e. succession. Another illustrative example about inclusion can be mingling voice recognition with small-sized touch surface for text entry, as finger-worn device cannot support efficient text entry.
Social acceptance and appealing design Among the surveyed papers, social acceptance is regularly included in the evaluation sections. Designing an unobtrusive interaction technique for smart glasses can encourage people to use smart glasses in public area (van Krevelen and Poelman, 2010). As discussed in Section 3, speech recognition has poor social acceptance due to causing disturbance and obtrusion. In contrast, touch-based input has considerably good social acceptance. People nowadays are acceptable to wristbands, rings, and armbands. We can view the touch-sensible external devices as fashionable-traditional gadgets (Shilkrot et al., 2015; Mulling and Sathiyanarayanan, 2015). Regarding the on-skin interaction, finger-to-palm, thumb-to-finger, and forearm are the most popular touch interfaces (Tung et al., 2015). However, touch interaction on facial area is uncertain because repetitively touches on the facial area would impact the user’s appearance, for instance, removal of make-up or bringing dust on facial area. In addition, one-hand inputs (finger-to-ring and thumb-to-finger interactions) need only subtle interaction and thus avoid awful interactions in public area. As for the freehand interaction, gloves or body-worn cameras are more preferable than head-worn cameras. A study considering social acceptance suggested that the hand gesture should be performed off-face (Hsieh et al., 2016). The study reported the comments from participants ‘in-air hand gesture performed in front of the face is weird’. Gloves and body-worn cameras as the form factor might raise the question of why extra device is being worn. We recommend that wearable devices emerge on the market as their outfit designs are considerably attractive. Researchers have to provide aesthetically pleasing appearance to their proposed input devices for higher social acceptance (Weigel et al., 2015).
Energy consumption on smart glasses Smart glasses have very limited battery life and good utilization of energy can facilitate the everyday use of smart glasses (Ok et al., 2015). Thus, an additional fundamental factor of energy consumption should be further considered. The energy consumption of the interaction approaches varies from one case to another case. Inputs using external devices or having separate energy provision (e.g. touch-based and glove-based inputs) are preferred choices. In contrast, vision-based approaches using embedded cameras in smart glasses are energy-consuming. It is expected that the energy-consuming issue can be alleviated if multi-modal inputs are appropriately designed. For example, vision-based freehand interactions can be triggered only in some particular scenarios like the interactions with virtual 3D objects are unavoidable, or the cameras will switch on when inertia measurement units inside the finger-worn wearable recognize the forearm movements for hand gestures, and to name but a few.
In this survey, we studied the smart glasses available on the market, giving a detailed overview of the related literature. We initially presented the research efforts in the field and more specifically in the context of on-device touch input, on-body touch input, hands-free input, and freehand input. We group all these with more abstract terms of touch input and touchless input. We created a classification framework that distinguishes interaction methods for smart glasses, on the basis of their key characteristics: input modality, form factor, existence of tactile feedback, and interaction areas. After that, we categorized and presented the existing research efforts and the interaction challenges on smart glasses. Nevertheless, we see several works have applied multiple input modal to enhance the input capabilities (touch and mid-air gestures), ease-of-use or input accuracy. We believe it is important to further study the trend of multi-modal inputs for smart glasses.
Although the future of interactions on smart glasses is highly uncertain, the current works, touch and touchless input, give some important clues to the field. Both the 3D natural hand gestures and touch-based gestures are important to the smart glasses interaction with the hybrid user interface comprised of 2D and 3D objects. While there has been significant research on interaction methods using natural hand and touch gestures such as large screen display and touchscreen, very few works (i.e. combining both hand and touch gestures) have been considered in the scenario of augmented reality on mobile devices. This opens research opportunities for overcoming the hurdle of encumbered interactions with the miniature smart glasses. We propose a potential research direction of creating multi-modal input by combining various input approaches as mentioned in the literature.
- journal: CSUR
- journalvolume: 1
- journalnumber: 0
- article: 1
- journalyear: 2017
- publicationmonth: 0
- copyright: acmlicensed
- doi: 0000001.0000001
- ccs: Human-centered computing Interaction paradigms
- ccs: Human-centered computing Interaction devices
- ccs: Human-centered computing Interaction techniques
- 2017a. AR-Kit. (2017). Retrieved from https://developer.apple.com/arkit/.
- 2017b. CCS-insight. (2017). Retrieved from http://www.ccsinsight.com/press/company-news/2251-augmented-and-virtual-reality-devices-to-become-a-4-billion-plus-business-in-three-years.
- 2017c. Digi-captial. (2017). Retrieved from http://www.digi-capital.com/news/2017/01/after-mixed-year-mobile-ar-to-drive-108-billion-vrar-market-by-2021/.
- 2017. FOVE. (2017). Retrieved from https://www.getfove.com/.
- 2017. Google Glass Project. (2017). Retrieved from https://en.wikipedia.org/wiki/Google-Glass/.
- 2017. Microsoft Hololens. (2017). Retrieved from https://www.microsoft.com/en-us/hololens.
- 2017. Recon Transcend. (2017). Retrieved from https://www.reconinstruments.com/2010/10/worlds-first-gps-goggles-head-mounted-display-available-now/.
- 2017. Sony SmartEyeglasses. (2017). Retrieved from https://developer.sony.com/devices/mobile-accessories/smarteyeglass/.
- 2017. Wikipedia:Accelerometer. (2017). Retrieved from https://en.wikipedia.org/wiki/Accelerometer.
- 2017. Wikipedia:Camera. (2017). Retrieved from https://en.wikipedia.org/wiki/Camera.
- 2017d. Wikipedia:GPS. (2017). Retrieved from https://en.wikipedia.org/wiki/Global-Positioning-System.
- 2017. Wikipedia:Gyroscope. (2017). Retrieved from https://en.wikipedia.org/wiki/Gyroscope.
- 2017. Wikipedia:Magnetometer. (2017). Retrieved from https://en.wikipedia.org/wiki/Magnetometer.
- 2017. Wikipedia:Microphone. (2017). Retrieved from https://en.wikipedia.org/wiki/Microphone.
- 2017. Wikipedia:Photodetector. (2017). Retrieved from https://en.wikipedia.org/wiki/Photodetector.
- Daniel Ashbrook, Patrick Baudisch, and Sean White. 2011. Nenya: Subtle and Eyes-free Mobile Input with a Magnetically-tracked Finger Ring. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM, New York, NY, USA, 2043–2046. https://doi.org/10.1145/1978942.1979238
- Takumi Azai, Shuhei Ogawa, Mai Otsuki, Fumihisa Shibata, and Asako Kimura. 2017. Selection and Manipulation Methods for a Menu Widget on the Human Forearm. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA ’17). ACM, New York, NY, USA, 357–360. https://doi.org/10.1145/3027063.3052959
- Mihai Bâce, Teemu Leppänen, David Gil de Gomez, and Argenis Ramirez Gomez. 2016. ubiGaze: Ubiquitous Augmented Reality Messaging Using Gaze Gestures. In SIGGRAPH ASIA 2016 Mobile Graphics and Interactive Applications (SA ’16). ACM, New York, NY, USA, Article 11, 5 pages. https://doi.org/10.1145/2999508.2999530
- Huidong Bai, Gun Lee, and Mark Billinghurst. 2014. Using 3D Hand Gestures and Touch Input for Wearable AR Interaction. In CHI ’14 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’14). ACM, New York, NY, USA, 1321–1326. https://doi.org/10.1145/2559206.2581371
- Gilles Bailly, Jörg Müller, Michael Rohs, Daniel Wigdor, and Sven Kratz. 2012. ShoeSense: A New Perspective on Gestural Interaction and Wearable Applications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12). ACM, New York, NY, USA, 1239–1248. https://doi.org/10.1145/2207676.2208576
- Andreas Bulling, Raimund Dachselt, Andrew Duchowski, Robert Jacob, Sophie Stellmach, and Veronica Sundstedt. 2012. Gaze Interaction in the post-WIMP World. In CHI ’12 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’12). ACM, New York, NY, USA, 1221–1224. https://doi.org/10.1145/2212776.2212428
- Liwei Chan, Yi-Ling Chen, Chi-Hao Hsieh, Rong-Hao Liang, and Bing-Yu Chen. 2015. CyclopsRing: Enabling Whole-Hand and Context-Aware Interactions Through a Fisheye Ring. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST ’15). ACM, New York, NY, USA, 549–556. https://doi.org/10.1145/2807442.2807450
- Liwei Chan, Rong-Hao Liang, Ming-Chang Tsai, Kai-Yin Cheng, Chao-Huai Su, Mike Y. Chen, Wen-Huang Cheng, and Bing-Yu Chen. 2013. FingerPad: Private and Subtle Interaction Using Fingertips. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (UIST ’13). ACM, New York, NY, USA, 255–260. https://doi.org/10.1145/2501988.2502016
- Yineng Chen, Xiaojun Su, Feng Tian, Jin Huang, Xiaolong (Luke) Zhang, Guozhong Dai, and Hongan Wang. 2016. Pactolus: A Method for Mid-Air Gesture Segmentation Within EMG. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA ’16). ACM, New York, NY, USA, 1760–1765. https://doi.org/10.1145/2851581.2892492
- Boon Chew, Jennifer A. Rode, and Abigail Sellen. 2010. Understanding the Everyday Use of Images on the Web. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (NordiCHI ’10). ACM, New York, NY, USA, 102–111. https://doi.org/10.1145/1868914.1868930
- Andrea Colaço, Ahmed Kirmani, Hye Soo Yang, Nan-Wei Gong, Chris Schmandt, and Vivek K. Goyal. 2013. Mime: Compact, Low Power 3D Gesture Sensing for Interaction with Head Mounted Displays. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (UIST ’13). ACM, New York, NY, USA, 227–236. https://doi.org/10.1145/2501988.2502042
- L. Dipietro, A. M. Sabatini, and P. Dario. 2008. A Survey of Glove-Based Systems and Their Applications. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38, 4 (July 2008), 461–482. https://doi.org/10.1109/TSMCC.2008.923862
- David Dobbelstein, Philipp Hock, and Enrico Rukzio. 2015. Belt: An Unobtrusive Touch Input Device for Head-worn Displays. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 2135–2138. https://doi.org/10.1145/2702123.2702450
- Barrett Ens, Ahmad Byagowi, Teng Han, Juan David Hincapié-Ramos, and Pourang Irani. 2016. Combining Ring Input with Hand Tracking for Precise, Natural Interaction with Spatial Analytic Interfaces. In Proceedings of the 2016 Symposium on Spatial User Interaction (SUI ’16). ACM, New York, NY, USA, 99–102. https://doi.org/10.1145/2983310.2985757
- Aigner et al. 2012. Understanding Mid-Air Hand Gestures: A Study of Human Preferences in Usage of Gesture Types for HCI. Technical Report. https://www.microsoft.com/en-us/research/publication/understanding-mid-air-hand-gestures-a-study-of-human-preferences-in-usage-of-gesture-types-for-hci/
- Gang Ren et al. 2013. 3D selection with freehand gesture. (2013), 101–120.
- Loclair et al. 2010. PinchWatch: A Wearable Devices for One-Handed Microinteractions. In Proce MobileHCI ’10 Workshop on Ensembles of On-Body Devices (MobileHCI’10).
- M. Arango et al. 1993. The Touring Machine System. Commun. ACM 36, 1 (Jan. 1993), 69–77. https://doi.org/10.1145/151233.151239
- Tamaki et al. 2010. BrainyHand: A Wearable Computing Device Without HMD and It’s Interaction Techniques. In Proceedings of the International Conference on Advanced Visual Interfaces (AVI ’10). ACM, New York, NY, USA, 387–388. https://doi.org/10.1145/1842993.1843070
- Tobias HÃ¶llerer et al. 2004. Vision-Based Interfaces for Mobility. Mobile and Ubiquitous Systems, Annual International Conference on 00 (2004), 86–94. https://doi.org/doi.ieeecomputersociety.org/10.1109/MOBIQ.2004.1331713
- Christina T. Fuentes and Amy J. Bastian. 2010. Where Is Your Arm? Variations in Proprioception Across Space and Tasks. (Jan 2010), 164-71 pages. Issue 1.
- Mayank Goel, Chen Zhao, Ruth Vinisha, and Shwetak N. Patel. 2015. Tongue-in-Cheek: Using Wireless Signals to Enable Non-Intrusive and Flexible Facial Gestures Detection. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 255–258. https://doi.org/10.1145/2702123.2702591
- Celeste Groenewald, Craig Anslow, Junayed Islam, Chris Rooney, Peter Passmore, and William Wong. 2016. Understanding 3D Mid-air Hand Gestures with Interactive Surfaces and Displays: A Systematic Literature Review. In Proceedings of the 30th International BCS Human Computer Interaction Conference: Fusion! (HCI ’16). BCS Learning & Development Ltd., Swindon, UK, Article 43, 13 pages. https://doi.org/10.14236/ewic/HCI2016.43
- Tovi Grossman, Xiang Anthony Chen, and George Fitzmaurice. 2015. Typing on Glasses: Adapting Text Entry to Smart Eyewear. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI ’15). ACM, New York, NY, USA, 144–152. https://doi.org/10.1145/2785830.2785867
- François Guimbretière and Chau Nguyen. 2012. Bimanual Marking Menu for Near Surface Interactions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12). ACM, New York, NY, USA, 825–828. https://doi.org/10.1145/2207676.2208521
- Darren Guinness, Alvin Jude, G. Michael Poor, and Ashley Dover. 2015. Models for Rested Touchless Gestural Interaction. In Proceedings of the 3rd ACM Symposium on Spatial User Interaction (SUI ’15). ACM, New York, NY, USA, 34–43. https://doi.org/10.1145/2788940.2788948
- Sean Gustafson, Daniel Bierwirth, and Patrick Baudisch. 2010. Imaginary Interfaces: Spatial Interaction with Empty Hands and Without Visual Feedback. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST ’10). ACM, New York, NY, USA, 3–12. https://doi.org/10.1145/1866029.1866033
- Sean G. Gustafson, Bernhard Rabe, and Patrick M. Baudisch. 2013. Understanding Palm-based Imaginary Interfaces: The Role of Visual and Tactile Cues when Browsing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13). ACM, New York, NY, USA, 889–898. https://doi.org/10.1145/2470654.2466114
- T. Ha, S. Feiner, and W. Woo. 2014. WeARHand: Head-worn, RGB-D camera-based, bare-hand user interface with visually enhanced depth perception. In 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 219–228. https://doi.org/10.1109/ISMAR.2014.6948431
- Jooyeun Ham, Jonggi Hong, Youngkyoon Jang, Seung Hwan Ko, and Woontack Woo. 2014. Smart Wristband: Touch-and-Motion–Tracking Wearable 3D Input Device for Smart Glasses. Springer International Publishing, Cham, 109–118. https://doi.org/10.1007/978-3-319-07788-8_11
- Faizan Haque, Mathieu Nancel, and Daniel Vogel. 2015. Myopoint: Pointing and Clicking Using Forearm Mounted Electromyography and Inertial Motion Sensors. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 3653–3656. https://doi.org/10.1145/2702123.2702133
- Chris Harrison, Hrvoje Benko, and Andrew D. Wilson. 2011a. OmniTouch: Wearable Multitouch Interaction Everywhere. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST ’11). ACM, New York, NY, USA, 441–450. https://doi.org/10.1145/2047196.2047255
- Chris Harrison, Desney Tan, and Dan Morris. 2011b. Skinput: Appropriating the Skin As an Interactive Canvas. Commun. ACM 54, 8 (Aug. 2011), 111–118. https://doi.org/10.1145/1978542.1978564
- G. Heo, D. W. Lee, H. C. Shin, Hyun-Tae Jeong, and Tae-Woong Yoo. 2015. Hand segmentation and fingertip detection for interfacing of stereo vision-based smart glasses. In 2015 IEEE International Conference on Consumer Electronics (ICCE). 585–586. https://doi.org/10.1109/ICCE.2015.7066537
- Valentin Heun, Shunichi Kasahara, and Pattie Maes. 2013. Smarter Objects: Using AR Technology to Program Physical Objects and Their Interactions. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’13). ACM, New York, NY, USA, 2939–2942. https://doi.org/10.1145/2468356.2479579
- Juan David Hincapié-Ramos, Xiang Guo, and Pourang Irani. 2014. The Consumed Endurance Workbench: A Tool to Assess Arm Fatigue During Mid-air Interactions. In Proceedings of the 2014 Companion Publication on Designing Interactive Systems (DIS Companion ’14). ACM, New York, NY, USA, 109–112. https://doi.org/10.1145/2598784.2602795
- Yi-Ta Hsieh, Antti Jylhä, and Giulio Jacucci. 2014. Pointing and Selecting with Tactile Glove in 3D Environment. Springer International Publishing, Cham, 133–137. https://doi.org/10.1007/978-3-319-13500-7_12
- Yi-Ta Hsieh, Antti Jylhä, Valeria Orso, Luciano Gamberini, and Giulio Jacucci. 2016. Designing a Willing-to-Use-in-Public Hand Gestural Interaction Technique for Smart Glasses. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). ACM, New York, NY, USA, 4203–4215. https://doi.org/10.1145/2858036.2858436
- Da-Yuan Huang, Liwei Chan, Shuo Yang, Fan Wang, Rong-Hao Liang, De-Nian Yang, Yi-Ping Hung, and Bing-Yu Chen. 2016. DigitSpace: Designing Thumb-to-Fingers Touch Interfaces for One-Handed and Eyes-Free Interactions. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). ACM, New York, NY, USA, 1526–1537. https://doi.org/10.1145/2858036.2858483
- Zhanpeng Huang, Weikai Li, and Pan Hui. 2015. Ubii: Towards Seamless Interaction Between Digital and Physical Worlds. In Proceedings of the 23rd ACM International Conference on Multimedia (MM ’15). ACM, New York, NY, USA, 341–350. https://doi.org/10.1145/2733373.2806266
- Howell Istance, Richard Bates, Aulikki Hyrskykari, and Stephen Vickers. 2008. Snap Clutch, a Moded Approach to Solving the Midas Touch Problem. In Proceedings of the 2008 Symposium on Eye Tracking Research & Applications (ETRA ’08). ACM, New York, NY, USA, 221–228. https://doi.org/10.1145/1344471.1344523
- Robert J.K. Jacob, Audrey Girouard, Leanne M. Hirshfield, Michael S. Horn, Orit Shaer, Erin Treacy Solovey, and Jamie Zigelbaum. 2008. Reality-based Interaction: A Framework for post-WIMP Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’08). ACM, New York, NY, USA, 201–210. https://doi.org/10.1145/1357054.1357089
- Eleanor Jones, Jason Alexander, Andreas Andreou, Pourang Irani, and Sriram Subramanian. 2010. GesText: Accelerometer-based Gestural Text-entry Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10). ACM, New York, NY, USA, 2173–2182. https://doi.org/10.1145/1753326.1753655
- Jari Kangas, Deepak Akkil, Jussi Rantala, Poika Isokoski, Päivi Majaranta, and Roope Raisamo. 2014. Gaze Gestures and Haptic Feedback in Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA, 435–438. https://doi.org/10.1145/2556288.2557040
- Wolf Kienzle and Ken Hinckley. 2014. LightRing: Always-available 2D Input on Any Surface. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST ’14). ACM, New York, NY, USA, 157–160. https://doi.org/10.1145/2642918.2647376
- Barry Kollee, Sven Kratz, and Anthony Dunnigan. 2014. Exploring Gestural Interaction in Smart Spaces Using Head Mounted Devices with Ego-centric Sensing. In Proceedings of the 2Nd ACM Symposium on Spatial User Interaction (SUI ’14). ACM, New York, NY, USA, 40–49. https://doi.org/10.1145/2659766.2659781
- Li-Chieh Kuo, Haw-Yen Chiu, Cheung-Wen Chang, Hsiu-Yun Hsu, and Yun-Nien Sun. Functional workspace for precision manipulation between thumb and fingers in normal hands. Journal of Electromyography and Kinesiology 19, 5 (????), 829–839. https://doi.org/10.1016/j.jelekin.2008.07.008
- Takeshi Kurata, Masakatsu Kourogi, Takekazu Kato, Takashi Okuma, and Ken Endo. 2002. A Functionally-Distributed Hand Tracking Method for Wearable Visual Interfaces and Its Applications. Correspondences on Human Interface 4, 5 (2002), 5–31–5–36. https://doi.org/10.11184/hisrm.4.5-31
- T. Lee and T. Hollerer. 2007. Handy AR: Markerless Inspection of Augmented Reality Objects Using Fingertip Tracking. In 2007 11th IEEE International Symposium on Wearable Computers. 83–90. https://doi.org/10.1109/ISWC.2007.4373785
- Zong-Ming Li and Jie Tang. 2007. Coordination of thumb joints during opposition. Journal of Biomechanics 40, 3 (2007), 502 – 510. https://doi.org/10.1016/j.jbiomech.2006.02.019
- Shu-Yang Lin, Chao-Huai Su, Kai-Yin Cheng, Rong-Hao Liang, Tzu-Hao Kuo, and Bing-Yu Chen. 2011. Pub - Point Upon Body: Exploring Eyes-free Interaction and Methods on an Arm. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST ’11). ACM, New York, NY, USA, 481–488. https://doi.org/10.1145/2047196.2047259
- Roman Lissermann, Jochen Huber, Aristotelis Hadjakos, and Max Mühlhäuser. 2013. EarPut: Augmenting Behind-the-ear Devices for Ear-based Interaction. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’13). ACM, New York, NY, USA, 1323–1328. https://doi.org/10.1145/2468356.2468592
- Bertarini M. 2014. Smart glasses: Interaction, privacy and social implications. (2014). https://www.vs.inf.ethz.ch/edu/FS2014/UCS/reports/MaricaBertarini-SmartGlasses-report.pdf
- I. Scott MacKenzie. 1992. Fitts’ Law As a Research and Design Tool in Human-computer Interaction. Hum.-Comput. Interact. 7, 1 (March 1992), 91–139. https://doi.org/10.1207/s15327051hci0701_3
- Marwa Mahmoud and Peter Robinson. 2011. Interpreting Hand-over-face Gestures. In Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction - Volume Part II (ACII’11). Springer-Verlag, Berlin, Heidelberg, 248–255. http://dl.acm.org/citation.cfm?id=2062850.2062879
- Anders Markussen, Mikkel Rønne Jakobsen, and Kasper Hornbæk. 2014. Vulture: A Mid-air Word-gesture Keyboard. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA, 1073–1082. https://doi.org/10.1145/2556288.2556964
- Tiago Martins, Christa Sommerer, Laurent Mignonneau, and Nuno Correia. 2008. Gauntlet: A Wearable Interface for Ubiquitous Gaming. In Proceedings of the 10th International Conference on Human Computer Interaction with Mobile Devices and Services (MobileHCI ’08). ACM, New York, NY, USA, 367–370. https://doi.org/10.1145/1409240.1409290
- R. McCall, B. Martin, A. Popleteev, N. Louveton, and T. Engel. 2015. Text entry on smart glasses. In 2015 8th International Conference on Human System Interaction (HSI). 195–200. https://doi.org/10.1109/HSI.2015.7170665
- Pranav Mistry, Pattie Maes, and Liyan Chang. 2009. WUW - Wear Ur World: A Wearable Gestural Interface. In CHI ’09 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’09). ACM, New York, NY, USA, 4111–4116. https://doi.org/10.1145/1520340.1520626
- Thomas B. Moeslund and Lau NÃ¸rgaard. 2003. A Brief Overview of Hand Gestures Used in Wearable Human Computer Interfaces. Technical Report.
- Tobias Mulling and Mithileysh Sathiyanarayanan. 2015. Characteristics of Hand Gesture Navigation: A Case Study Using a Wearable Device (MYO). In Proceedings of the 2015 British HCI Conference (British HCI ’15). ACM, New York, NY, USA, 283–284. https://doi.org/10.1145/2783446.2783612
- Shahriar Nirjon, Jeremy Gummeson, Dan Gelb, and Kyu-Han Kim. 2015. TypingRing: A Wearable Ring Platform for Text Input. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys ’15). ACM, New York, NY, USA, 227–239. https://doi.org/10.1145/2742647.2742665
- Donald A. Norman and Jakob Nielsen. 2010. Gestural Interfaces: A Step Backward in Usability. interactions 17, 5 (Sept. 2010), 46–49. https://doi.org/10.1145/1836216.1836228
- Masa Ogata, Yuta Sugiura, Yasutoshi Makino, Masahiko Inami, and Michita Imai. 2013. SenSkin: Adapting Skin As a Soft Interface. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (UIST ’13). ACM, New York, NY, USA, 539–544. https://doi.org/10.1145/2501988.2502039
- Masa Ogata, Yuta Sugiura, Hirotaka Osawa, and Michita Imai. 2012. iRing: Intelligent Ring Using Infrared Reflection. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (UIST ’12). ACM, New York, NY, USA, 131–136. https://doi.org/10.1145/2380116.2380135
- A. E. Ok, N. A. Basoglu, and T. Daim. 2015. Exploring the design factors of smart glasses. In 2015 Portland International Conference on Management of Engineering and Technology (PICMET). 1657–1664. https://doi.org/10.1109/PICMET.2015.7273236
- Alexandros Pino, Evangelos Tzemis, Nikolaos Ioannou, and Georgios Kouroupetroglou. 2013. Using Kinect for 2D and 3D Pointing Tasks: Performance Evaluation. In Proceedings of the 15th International Conference on Human-Computer Interaction: Interaction Modalities and Techniques - Volume Part IV (HCI’13). Springer-Verlag, Berlin, Heidelberg, 358–367. https://doi.org/10.1007/978-3-642-39330-3_38
- Ivan Poupyrev, Desney S. Tan, Mark Billinghurst, Hirokazu Kato, Holger Regenbrecht, and Nobuji Tetsutani. 2002. Developing a Generic Augmented-Reality Interface. Computer 35, 3 (March 2002), 44–50. https://doi.org/10.1109/2.989929
- Philipp A. Rauschnabel, Alexander Brem, and Bjoern S. Ivens. 2015. Who will buy smart glasses? Empirical results of two pre-market-entry studies on the role of personality in individual awareness and intended adoption of Google Glass wearables. Computers in Human Behavior 49 (2015), 635 – 647. https://doi.org/10.1016/j.chb.2015.03.003
- Jun Rekimoto. 2001. GestureWrist and GesturePad: Unobtrusive Wearable Interaction Devices. In Proceedings of the 5th IEEE International Symposium on Wearable Computers (ISWC ’01). IEEE Computer Society, Washington, DC, USA, 21–. http://dl.acm.org/citation.cfm?id=580581.856565
- Gang Ren and Eamonn O’Neill. 2013. Freehand Gestural Text Entry for Interactive TV. In Proceedings of the 11th European Conference on Interactive TV and Video (EuroITV ’13). ACM, New York, NY, USA, 121–130. https://doi.org/10.1145/2465958.2465966
- R. Rosenberg and M. Slater. 1999. The Chording Glove: A Glove-based Text Input Device. Trans. Sys. Man Cyber Part C 29, 2 (May 1999), 186–191. https://doi.org/10.1109/5326.760563
- Lawrence Sambrooks and Brett Wilkinson. 2013. Comparison of Gestural, Touch, and Mouse Interaction with Fitts’ Law. In Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration (OzCHI ’13). ACM, New York, NY, USA, 119–122. https://doi.org/10.1145/2541016.2541066
- T. Scott Saponas, Daniel Kelly, Babak A. Parviz, and Desney S. Tan. 2009. Optically Sensing Tongue Gestures for Computer Input. In Proceedings of the 22Nd Annual ACM Symposium on User Interface Software and Technology (UIST ’09). ACM, New York, NY, USA, 177–180. https://doi.org/10.1145/1622176.1622209
- Stefan Schneegass and Alexandra Voit. 2016. GestureSleeve: Using Touch Sensitive Fabrics for Gestural Input on the Forearm for Controlling Smartwatches. In Proceedings of the 2016 ACM International Symposium on Wearable Computers (ISWC ’16). ACM, New York, NY, USA, 108–115. https://doi.org/10.1145/2971763.2971797
- Tobias Schuchert, Sascha Voth, and Judith Baumgarten. 2012. Sensing Visual Attention Using an Interactive Bidirectional HMD. In Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction (Gaze-In ’12). ACM, New York, NY, USA, Article 16, 3 pages. https://doi.org/10.1145/2401836.2401852
- Marcos Serrano, Barrett M. Ens, and Pourang P. Irani. 2014. Exploring the Use of Hand-to-face Input for Interacting with Head-worn Displays. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA, 3181–3190. https://doi.org/10.1145/2556288.2556984
- Roy Shilkrot, Jochen Huber, Jürgen Steimle, Suranga Nanayakkara, and Pattie Maes. 2015. Digital Digits: A Comprehensive Survey of Finger Augmentation Devices. ACM Comput. Surv. 48, 2, Article 30 (Nov. 2015), 29 pages. https://doi.org/10.1145/2828993
- Junichi Shimizu and George Chernyshov. 2016. Eye Movement Interactions in Google Cardboard Using a Low Cost EOG Setup. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct (UbiComp ’16). ACM, New York, NY, USA, 1773–1776. https://doi.org/10.1145/2968219.2968274
- Tim M. Simon, Ross T. Smith, Bruce Thomas, Stewart Von Itzstein, Mark Smith, Joonsuk Park, and Jun Park. 2012. Merging Tangible Buttons and Spatial Augmented Reality to Support Ubiquitous Prototype Designs. In Proceedings of the Thirteenth Australasian User Interface Conference - Volume 126 (AUIC ’12). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 29–38. http://dl.acm.org/citation.cfm?id=2512125.2512130
- Dana Slambekova, Reynold Bailey, and Joe Geigel. 2012. Gaze and Gesture Based Object Manipulation in Virtual Worlds. In Proceedings of the 18th ACM Symposium on Virtual Reality Software and Technology (VRST ’12). ACM, New York, NY, USA, 203–204. https://doi.org/10.1145/2407336.2407380
- Michael Stengel, Steve Grogorick, Martin Eisemann, Elmar Eisemann, and Marcus A. Magnor. 2015. An Affordable Solution for Binocular Eye Tracking and Calibration in Head-mounted Displays. In Proceedings of the 23rd ACM International Conference on Multimedia (MM ’15). ACM, New York, NY, USA, 15–24. https://doi.org/10.1145/2733373.2806265
- Takumi Toyama, Thomas Kieninger, Faisal Shafait, and Andreas Dengel. 2012. Gaze Guided Object Recognition Using a Head-mounted Eye Tracker. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA ’12). ACM, New York, NY, USA, 91–98. https://doi.org/10.1145/2168556.2168570
- Takumi Toyama, Daniel Sonntag, Andreas Dengel, Takahiro Matsuda, Masakazu Iwamura, and Koichi Kise. 2014. A Mixed Reality Head-mounted Text Translation System Using Eye Gaze Input. In Proceedings of the 19th International Conference on Intelligent User Interfaces (IUI ’14). ACM, New York, NY, USA, 329–334. https://doi.org/10.1145/2557500.2557528
- Outi Tuisku, Päivi Majaranta, Poika Isokoski, and Kari-Jouko Räihä. 2008. Now Dasher! Dash Away!: Longitudinal Study of Fast Text Entry by Eye Gaze. In Proceedings of the 2008 Symposium on Eye Tracking Research & Applications (ETRA ’08). ACM, New York, NY, USA, 19–26. https://doi.org/10.1145/1344471.1344476
- Ying-Chao Tung, Chun-Yen Hsu, Han-Yu Wang, Silvia Chyou, Jhe-Wei Lin, Pei-Jung Wu, Andries Valstar, and Mike Y. Chen. 2015. User-Defined Game Input for Smart Glasses in Public Space. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 3327–3336. https://doi.org/10.1145/2702123.2702214
- Yu-Chih Tung and Kang G. Shin. 2016. Expansion of Human-Phone Interface By Sensing Structure-Borne Sound Propagation. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys ’16). ACM, New York, NY, USA, 277–289. https://doi.org/10.1145/2906388.2906394
- D. W. F. van Krevelen and R. Poelman. 2010. A Survey of Augmented Reality Technologies, Applications and Limitations. The International Journal of Virtual Reality 9, 2 (June 2010), 1–20.
- F. Vernier and L. Nigay. 2001. A Framework for the Combination and Characterization of Output Modalities. In Proceedings of the 7th International Conference on Design, Specification, and Verification of Interactive Systems (DSV-IS’00). Springer-Verlag, Berlin, Heidelberg, 35–50. http://dl.acm.org.lib.ezproxy.ust.hk/citation.cfm?id=1756227.1756232
- Julie Wagner, Mathieu Nancel, Sean G. Gustafson, Stephane Huot, and Wendy E. Mackay. 2013. Body-centric Design Space for Multi-surface Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13). ACM, New York, NY, USA, 1299–1308. https://doi.org/10.1145/2470654.2466170
- Florian Wahl, Martin Freund, and Oliver Amft. 2015. Using Smart Eyeglasses As a Wearable Game Controller. In Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers (UbiComp/ISWC’15 Adjunct). ACM, New York, NY, USA, 377–380. https://doi.org/10.1145/2800835.2800914
- Cheng-Yao Wang, Wei-Chen Chu, Po-Tsung Chiu, Min-Chieh Hsiu, Yih-Harn Chiang, and Mike Y. Chen. 2015a. PalmType: Using Palms As Keyboards for Smart Glasses. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI ’15). ACM, New York, NY, USA, 153–160. https://doi.org/10.1145/2785830.2785886
- Cheng-Yao Wang, Min-Chieh Hsiu, Po-Tsung Chiu, Chiao-Hui Chang, Liwei Chan, Bing-Yu Chen, and Mike Y. Chen. 2015b. PalmGesture: Using Palms As Gesture Interfaces for Eyes-free Input. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI ’15). ACM, New York, NY, USA, 217–226. https://doi.org/10.1145/2785830.2785885
- David J. Ward, Alan F. Blackwell, and David J. C. MacKay. 2000. Dasher&Mdash;a Data Entry Interface Using Continuous Gestures and Language Models. In Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technology (UIST ’00). ACM, New York, NY, USA, 129–137. https://doi.org/10.1145/354401.354427
- Colin Ware and Harutune H. Mikaelian. 1987. An Evaluation of an Eye Tracker As a Device for Computer Input2. In Proceedings of the SIGCHI/GI Conference on Human Factors in Computing Systems and Graphics Interface (CHI ’87). ACM, New York, NY, USA, 183–188. https://doi.org/10.1145/29933.275627
- Martin Weigel, Tong Lu, Gilles Bailly, Antti Oulasvirta, Carmel Majidi, and Jürgen Steimle. 2015. iSkin: Flexible, Stretchable and Visually Customizable On-Body Touch Sensors for Mobile Computing. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 2991–3000. https://doi.org/10.1145/2702123.2702391
- Martin Weigel, Vikram Mehta, and Jürgen Steimle. 2014. More Than Touch: Understanding How People Use Skin As an Input Surface for Mobile Computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA, 179–188. https://doi.org/10.1145/2556288.2557239
- Xing-Dong Yang, Tovi Grossman, Daniel Wigdor, and George Fitzmaurice. 2012. Magic Finger: Always-available Input Through Finger Instrumentation. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (UIST ’12). ACM, New York, NY, USA, 147–156. https://doi.org/10.1145/2380116.2380137
- Bo Yi, Xiang Cao, Morten Fjeld, and Shengdong Zhao. 2012. Exploring User Motivations for Eyes-free Interaction on Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12). ACM, New York, NY, USA, 2789–2792. https://doi.org/10.1145/2207676.2208678
- S. Yi, Z. Qin, E. Novak, Y. Yin, and Q. Li. 2016. GlassGesture: Exploring head gesture interface of smart glasses. In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. 1–9. https://doi.org/10.1109/INFOCOM.2016.7524542
- Sang Ho Yoon, Ke Huo, Vinh P. Nguyen, and Karthik Ramani. 2015. TIMMi: Finger-worn Textile Input Device with Multimodal Sensing in Mobile Interaction. In Proceedings of the Ninth International Conference on Tangible, Embedded, and Embodied Interaction (TEI ’15). ACM, New York, NY, USA, 269–272. https://doi.org/10.1145/2677199.2680560
- Chun Yu, Ke Sun, Mingyuan Zhong, Xincheng Li, Peijun Zhao, and Yuanchun Shi. 2016. One-Dimensional Handwriting: Inputting Letters and Words on Smart Glasses. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). ACM, New York, NY, USA, 71–82. https://doi.org/10.1145/2858036.2858542
- Qiao Zhang, Shyamnath Gollakota, Ben Taskar, and Raj P.N. Rao. 2014. Non-intrusive Tongue Machine Interface. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA, 2555–2558. https://doi.org/10.1145/2556288.2556981
- Xianjun Sam Zheng, Cedric Foucault, Patrik Matos da Silva, Siddharth Dasari, Tao Yang, and Stuart Goose. 2015. Eye-Wearable Technology for Machine Maintenance: Effects of Display Position and Hands-free Operation. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 2125–2134. https://doi.org/10.1145/2702123.2702305