Towards a Domain Independent Framework for Input Device Testing and Comparison in Gaming
Western Washington University Computer Science
Augmented and virtual reality hardware has not reached their true potential due to our limited understanding about user’s experience surrounding these 1. To bridge this gap, we propose a framework for examining input devices for use with augmented and virtual reality hardware. Our contribution is in three parts. First, we provide a categorization of the types of user interfaces and content that one can encounter in virtual and augmented applications. Second, we propose the design of an experimental software testbed for evaluation of an input device’s potential for use in virtual and augmented reality applications. Finally, we propose an experimental protocol, which was informally tested using a pilot study.
While a variety of commercial input devices are available to users today, none are as predominant as the keyboard and mouse on personal computers2, gamepads on home consoles, and touchscreens on mobile devices3. These input devices have emerged as the most effective way to interact with their respective platforms regarding common task such as navigation, text input, and gaming. As consumer virtual and augmented reality devices become available in the imminent future, users will face the challenging task of familiarizing themselves interacting with depth in immersive 3d experiences. However, little is known about the efficacy of these input devices that will require users to make a transition from 2D screens to 3D headsets. The novel interactions may or may not be embraced by the users due to unfamiliarity of interaction in a 3D environment. Alternately, we may discover that these alternative input methods 4, such as remotes or gesture detection, can prove more effective than currently available ubiquitous methods, or perhaps that entirely new input devices will need to be developed to satisfy user needs.
In our research we aim to learn about users experience with augmented reality headsets. By conducting an informal pilot study, we discovered that traditional game engine IDEs such as Unity make it difficult to interchange input devices for the purpose of testing augmented content. Too often these systems are designed for abstracting input across personal computers, consoles, and mobile devices, therefore rendering them inadequate for use with non-standard input devices. This inadequacy led to the development of a software testbed which can map a variety of input devices to an action, render 2d, 3d, virtual and augmented content, among other uncommon features. With a proper testbed we can empirically study the relative strengths and weaknesses of the available input devices for virtual and augmented reality environment.
In addition, to learn about user experience with these input devices, we developed an experimental protocol focusing on understanding user’s comfort and aptitude with an input device while performing gamified tasks. These tasks reflect the types of interactions a user may have in 2d, 3d, virtual and augmented environment. We conducted a preliminary study with a small group of users ranging from proficient to novice in various computer and gaming tasks. Findings from this pilot study will help us to prepare for an in-depth study with larger sample size, which will enable us to guide the design of next generation interaction technique.
While a wealth of work has been created since Steve Mann’s seminal contribution to wearable computing the advances in user input, sensor fidelity, rendering technology, and content generation has placed the technology on the cusp of consumer adoption. Due to the tremendous technological undertaking of creating the latter there has yet to be a dominant new user input device for augmented reality. Understand these other technologies in integral for designing an AR input device.
Out pilot study consisted of 11 persons (3 female) testing a range of input devices across multiple domains to encapsulate the above mentioned categories. All participants were asked to complete a testset of game skills using keyboard and mouse, the Xbox Gamepad, the Steam Controller, and the Leap Motion, our Augmented Reality standin device. Our testset included a blockworld navigated in the first and third person, a racing simulator, a fighter plane simulator, and an on rails flying game in the vein of the Nintendo classic Starfox. Participants were asked to complete the four games with each of the input device, answering a semi-structured questionnaire about their experience.
For each device we asked users questions regarding the “dimensionality” of the input device. For instance a D-Pad would be effective for playing a racing game but would be ineffective for piloting a plane, where roll, pitch, and yaw require an additional axis of input to control. Users were asked about their ability to navigate their avatar within the test game they were playing but also how effective they believed the device would generalize to other tasks, such as text input or menu navigation. Finally we asked users to rate the haptic feedback of the device, including the aesthetic feel of the controller (using a control stick vs using a mouse) and servo driven vibration or resistance (“Rumble”) they noticed. The question of haptics was of particular importance to us as we postulate a lack of touch feedback to be a barrier in creating immersive AR experiences. To provide differential feedback between participants we also collected data on participants socioeconomic status, age, and experience with gaming.
Matching demographic expectations, men were more likely to respond yes to “I am a gamer,” or “I am relatively familiar” when asked about their gaming experience. Similarly, men responded as being more proficient using an Xbox Controller or Keyboard and mouse than their female counterparts. More interesting results were found when using the Steam Controller and leap motion. While the Steam Controller resembles a traditional gamepad, it’s haptic touchpads instead of dual control sticks proved foreign for the more avid gamers, and female users performed better than their male counterparts. Even more interesting were our observations of the participants use of the Leap Motion controller: without the traditional haptic feedback of the controllers, the more experienced “gamer” identifying men found using the LEAP motion device more frustrating and performed worse at the on rails piloting game, taking longer to complete it successfully and achieving a lower score.
We postulate from our pilot study that the transition to using next generation input devices on virtual and augmented reality headsets will be more challenging for those with prior gaming experience than for those without. As these new devices will rely on more natural input methods such as hand gestures, skilled controller users (with traditional input devices) will have to unlearn the mental mappings they have created to be proficient with earlier devices. Conversely, those who spent less time focusing on mastering game controllers will find it easier to transition into using these next generation input devices.
While our pilot study offered insight into how users with varying proficiencies interact with familiar and foreign devices, an in-depth study is necessary to get a better understand of how demographics and personal experiences affect one’s ability to master the new generation input devices. Additionally, while collecting data for our study, we were stymied by the brittleness of different game engine input, which made it extremely challenging to generalize across console, PC, and mobile input. Currently, we are working on an input testbed that will aid in collecting data about a variety of input devices that can be used to examine user experience with traditional and novel input devices.
We describe four basic categories of experiences, dealing with the type of content, and the way it is experienced by the user5. We describe their general definitions below.
Classic 2d experience focuses on two dimensional content - images and text - displayed on a flat surface like a computer screen. Examples include window managers, terminals, or 2d games.
Classic 3d experience focuses on three dimensional content - meshes and interactive worlds - displayed on a flat surface, such as a computer screen. Examples include 3d games, interactive globes, and CAD (Computer Aided Design) systems. Interaction is often necessary for experiencing the 3d content.
2d content displayed using a stereoscopic system can be place as a virtual screen. Images can be viewed from the sides or from behind. Examples include HUDs (Heads up Displays) and virtual screens.
3d content displayed using a stereoscopic system can become immersive. For example, immersive games, telepresence (viewing or projecting), and interactive virtual models.
With the exception of trivial HUD AR experiences ala Google Glass sensor telemetry is integral to augmented reality on two fronts. Firstly precise measurements of the position and orientation of the head mounted display must be reconciled with the position and orientation of a virtual camera in a 3D rendered in order for augmented content to appear accurate when overlaid on the real world. Conversely the position and orientation of physical objects within a scene must be known in order to provide context to augmented content. Finally the degree of freedom in which a user can navigate an augmented world stipulates the types of interactions a user can experience. DOF in AR is inherently bounded by sensor technology.
As an AR simulation runs continuously sensors must be resistant to error accumulation. 1 For example the Inertial Measurement Unit in a mobile device is quite effective at detecting rotation events (switching the device from portrait to landscape mode,) but prolonged use of the accelerometer (even over one second interval!) would lead to an inaccurate measurement of position. Algorithmic compensation for error rate is the domain of Signal Processing, employing techniques such as a Kalman Filter
Sensor fidelity is governed by sampling rate and resolution. Ideal sampling rate would be twice the rendering rate of the simulation (~6-7ms at 60 FPS)as to always have a new sample for each frame. Ideal sensor resolution would be a sub-pixel fidelity at a given rendering distance as to avoid visual artifacts. As examples IMUs that sample at 120hz are sufficient for current rendering technology (pre submillisecond render technologies like DX12/Vulkan) whereas MIT’s Chronos which uses Wifi time of flight to measure the location of a mobile device to 4cm would produce visual artifacts when rendering short range content.
Sensors can be classified by whether they give measurement in reference to themselves, to their scene, or to the set of all scenes. For example an IMU may give the local rotation of a head mounted device, a beacon may give location relative to a remapped room, and GPS would give position relative to the earth (where all successful AR simulations have taken place.) How these measurements are interpreted and combined in the simulation layer affects the types of AR content which can be created.
Sensors in augmented reality are either mounted on device, giving position/orientation relative to scene features, or scene mounted, providing position of the HMD relative to fixed room features. Both techniques have advantages and drawbacks, and composing sensors will lead to a more immersive AR experiences. The following table is an extensive, though not exhaustive list of sensors for use in augmented reality.
|Inertial Measurement Unit||Head Mounted||Accurate Orientationr||Inaccurate Positionr|
|RGB Camera||Head Mounted||Accurate Orientation, Positionr||Tracking locked to marker/feature visibility. Precision scales computation quadratically|
|Depth Camera||Head Mounted||Feature detection: surface extraction||Range Limitations|
|Lidar||Head Mounted||Feature detection: surface extraction||High Costr|
|IR Pylons||Scene Mounted||Accurate position||Limited movementr|
When considering user experiences in Augmented Reality it is important the degree of freedom with which a user can explore the augmented scene. A perfect system would of course allow augmented content to be placed anywhere and viewed from any angle that physical content could exist, creating a perfectly immersive experience. While obviously impossible the optimal system sets the criteria by which users will judge augmented experiences.
With no position and no orientation of our virtual camera we can only create augmented HUDs, like those used in Google Glass. While this allows the user to keep their vision focused on their task while viewing context information HUDs provide only the tip of the iceberg for Augmented Reality. It is important also to note that there is experimental evidence that HUDs can actually hinder one’s ability to focus on tasks such as piloting a vehicle.
With the exception of specialized devices like the Google Glass or Daqari’s Smart Helmet the vast majority of deployed AR applications use smartphones with a monocular camera. These applications work by superimposing 3D content over a marker or feature detected by the camera. With camera marker tracking a user’s position is limited to the half sphere in which the camera can see the marker. This sphere’s radius is the minimum distance with which the camera can detect the marker or feature, measured with a 480p camera and AR Toolkit to be between 1-2.5 meters. Increasing the resolution of the camera can increase the radius of this half sphere, at the cost of quadratically growing the cost of computation.
As discussed earlier Inertial Measurement Units can accurately measure orientation but not position. This technique has been used by the original Oculus development kits and mobile VR experiences like Google Cardboard. IMUs provide the inverse of the camera tracking experience: the user’s position is locked to the center of a sphere. Their orientation is unlocked, allowing them to examine any point on the inside of the sphere. 360 video or VR movie theater applications effectively use IMUs for their experience, however the inability to move limit’s the experience’s immersion.
Combining IMUs and camera tracking provides an interesting hybrid case of Augmented Reality. A user in this system can translate their position by maintaining camera tracking of their marker and then “unlock” their orientation by looking away from the marker. This technique creates a topology of overlapping spheres that the user can navigate. The experience would feel quite constricting however, as humans are used to moving and looking at the same time, and doing such would cause tracking artifacts: AR content bound to the marker or feature would appear off centered until the user required the marker.
As an alternative to, or in conjunction with performing tracking via a head mounted sensor a user’s position and orientation can be mapped by static sensors in a room, as is implemented by Valve and HTC’s Vive. In this system the user has freedom of orientation and freedom of movement, provided that they remain within the sensor boundaries. This creates a potentially unpleasant user experience: unlike in the case of IMU or camera tracking the user belives that they have freedom of movement and will meet an abrupt loss of tracking when they cross the sensor boundary.
This limitation defines the type of content that can be used with scene mapping. For example a video game where the user was walking in an open area would create a suboptimal user experience: users would constantly be running into the sensor boundary. In contrast a virtualized workspace, attaching AR content to physical content in room could provide a more immersive simulation.