Articles

Apple has made a huge bet on our eyes. The Apple Vision Pro uses eye tracking as its primary input method. Apple is not the first to include eye tracking in their devices. Quest Pro, Microsoft’s Hololens 2, Magic Leap 1 and 2, Varjo 4, and Pico 4 Pro all have eye tracking built in. However, none of their platforms or operating systems use eye tracking as the primary way to target objects and interfaces. Many interaction designers (myself included) have experimented with eye tracking and recommended it to be the main input method for new hardware platforms. But Apple went for it, shipped it, and there’s no turning back now. Will gaze & pinch be the new touch screen for this next wave of computing? We think yes, but not by itself. It needs to work seamlessly with hands and other input devices.

Do I really need to understand how my eyes work?

Like iOS did for touch input before, Apple has made it simple to design native VisionOS apps using their premade controls. If you want to port your iPad app into Vision OS, you can simply transfer your app using XCode and things will work pretty well. There are some gotchas that Apple helps you with within their Human Interface Guidelines. But it’s pretty seamless to get a 2D iPad app floating in space. If that’s what you’re doing you don’t need to keep reading. But if you’re looking to make an authentic spatial application that utilizes eye gaze input, you’ll need a deeper understanding. And if you have a Unity application you are porting to the Vision Pro, you’ll be rolling a lot of these controls yourself.

Understanding eye gaze as an input method

1. We don’t control our eyes, so don’t make the user think about them

Believe it or not, we don’t control how our eyes move. When you’re reading this you might think that your eyes look one letter at a time and take in these beautiful words sequentially. But in reality, this is how your eyes move when reading.

Our eyes are extensions of our brains, and like our brains, most of what they do is outside our conscious control. They jump around constantly scanning the world. If you make your users conscious how their eyes are moving, they won’t be able to focus on anything else. The worst possible thing to do is show a cursor exactly where the eyes are looking. Invariably, there will be a slight offset, and the user can get into a feedback loop. You see this if you stare at the little floaters in your eyes. Lock in on one and it looks like it’s moving, but in fact, it’s just not in the center of your vision and you’re eyes are moving it like a dog chasing its tail.

Tip: Don’t ever show a cursor locked to the user’s gaze.

‍

2. Balance responsiveness and attention

We shouldn’t make the user conscious of their exact gaze motion, but our app should be clear how it will interpret gaze input. In typical input systems, you want an interface to respond as quickly as possible. Touching an iPhone 5 back in the day got a response within 55 milliseconds. This was a big differentiator for the iPhone at the time. But eyes are different. If you give instant hover response to your eyes then you break the first rule,”don’t make the user think about their eyes”. Have you ever glanced at someone, then they immediately stare back at you? You become self-conscious.

So you have to play it cool with response time to eyes to make it feel like the system is responding to your gaze, without making the user self-conscious. Slowly fade in the hover feedback. 230 milliseconds is a good starting point. Too much more and it will feel like you have to stare at something for a long time to activate anything, leading to eye strain. You can experiment with different duration and brightness combinations in ShapesXR using the gaze & pinch interactivity.

Tip: Don’t change major states on gaze. Fade in gaze feedback gradually over 230ms.

‍

3. Eyes set context, hands manipulate, voice summons

A researcher named Bill Buxton once said, “Everything is best for something and worst for something else”. And it’s up to us to map the best inputs to what they are best suited for.

Eyes are incredibly fast at targeting things. When eye tracking is accurate enough it can be much faster than controllers or hands to target virtual objects. But eyes are horrible at moving things around.

Hands are incredible at manipulating things. An object like an apple can be moved in three axes and rotated in those same three axes. Our hands have evolved to manipulate the apple in these six degrees of freedom with wonderful dexterity. However, some objects can be manipulated in more than 6 degrees of freedom. Scissors are a 7DOF (degrees of freedom) object. Rotate, move, open and shut. Our hands are capable of much more. Believe it or not, each human hand is capable of 27 degrees of freedom, each finger independently moving up/down, left/right, and open/shut. So utilize this dexterity of our hands in manipulating content in your spatial experiences.

Voice: How many words do you know? Our voices are incredible at accessing a massive amount of random data quickly. The average American adult’s working vocabulary is around 20,000. In a way that is 20K degrees of freedom, slightly more than our hands and arms combined 48. Yes, it is not always socially acceptable to be talking to your devices. But when it is, a typical person can talk at 130 words per minute. Put together in the right way we unlock a massive amount of communication between the human and computer.

This is a classic example showing the power of voice mixed with pointing via hands. I still love the end.

Tip: Use eyes to target objects and set context. Use hands to activate and manipulate objects. Use voice to modify what is in context.

‍

4. Hands override eyes

We aren’t in full conscious control of our eyes, but we are in control of our hands. So when the hands conflict with the eyes we should listen to the hands. There are some weird examples of this.

Hands lag behind your eyes: Imagine you are typing on a keyboard in the air by gazing at the keys, then pinching your fingers in the air to activate the keys you are looking at. You would think that every time you pinch you are looking at the key you want to activate. But in fact, most of the time you will be looking at the next key by the time the signal gets down to your fingers to pinch. So you as a designer need to delay or make targets sticky for a time so the user’s intent is interpreted correctly.

Ignore eyes while touching something: When you are touching an object directly, your hands are a much better signal of your intention than your gaze. Much of the time your eyes will be darting back and forth between the object in hand and where you want to put it. It’s recommended to just use hand and arm movement to manipulate the object rather than try to move the object with your hand. It’s been tried many times because it sounds so cool to move stuff around with your eyes. But don’t do it. It will make users aware of their eyes.

Tips: When hands are directly touching something, ignore gaze. Don’t move things with gaze, only with hands.

‍

5. Infallible eyes

Imagine you are looking at two eggs and you need to look at the right one to break it. But the left one is highlighting no matter how hard you look at the one on the right. What do you do? Look further to the right? Look harder? When eye gaze for targeting works well, it is magical. When it’s off there’s no way to recover because our eyes are infallible. If you’re looking at something, there’s no way to look at it more. It’s not like a mouse where you can overshoot, and then adjust your hand to acquire the target.

In order to avoid unrecoverable situations like this one you need to make your hit targets big enough for eyes. Each headset has different tolerances and accuracy for the eyes. Apple uses its point system for targets. 60 points is their minimum target size for eyes. This equates to around 3 degrees per target from the eye’s perspective. Or around an inch (2.5cm) at arm's reach. That’s a pretty BIG minimum target. And if you’re designing for Magic Leap or Hololens it needs to be bigger still.

You can use the ShapesXR template to help get your Figma UI the correct initial size. Then use our DMM import system to keep targets consistent regardless of distance to the user. Learn more our Choosing the Right Size and Distance for UI article.

Tips: Keep targets no smaller than 3 degrees from the user’s perspective. Dynamically scale your UI by distance so users can always target it with their eyes.

‍

Dig deeper

Start by watching Apple’s Design for Spatial Input video. Dig deeper into gaze with Ken Pfeffer’s article on Design Principles & Issues for Gaze and Pinch Interaction. John LePore’s thread is also helpful for a foundational understanding of gaze.

‍

Articles

Introducing the ShapesXR Design Database

Introducing the ShapesXR Design Database NEW

Why design in a headset

Heading

Find Your Design Faster by Designing in the Medium

Find Your Design Faster by Designing in the Medium

Designing on Top of the Real World Instead of the Computer

Designing on Top of the Real World Instead of the Computer NEW

How designing for XR is different

An Introduction to the Spatial Medium

Choosing the Right Size and Distance for UI

With XR, Your Head is the Camera

Understanding Spatial Awareness in XR

Eye gaze & the next wave of user interfaces

Four Ways to Share Space in XR

Designing an Emotionally Impactful VR Training Experience for Amazon Logistics

Teaching Immersive Design with ShapesXR at Loughborough University

L+R's spatial computing workflow for Printemps New York's interactive signage design

Accelerating Medical VR Training Design with ShapesXR

Streamlining XR Training Design: How NXR Accelerated Prototyping and Stakeholder Buy-in with ShapesXR

How Bellevue College empowers students with the presentation tools of the future

How Kluge interactive brought back the 1980s in Synth Riders

How Catalyst XR Used ShapesXR to Revolutionize Adidas' City2Surf Activation

Take 2D Apps To Quest - A Journey to Turn Mobile Developers into XR Innovators

An inside look at Pencil: the MR apps that teaches you how to draw

How Simtryx Enhanced UI/UX Design of Mixed Reality Medical Training with ShapesXR

How PianoVision designs UI for MR apps

Transforming Collaboration and Design in XR Game Development

How Treeview Implements Spatial User Interfaces for Apple Vision Pro

How antwerpes ag accelerate design and communication for XR

How AREYES studio designs Metaverse spaces

How TriggerXR designs for Mixed Reality

How Logitech design for the real world in VR

ShapesXR unlocks a brand new way of designing mixed reality apps

Learning XR got easier and more accessible thanks to ShapesXR

Designing immersive experiences for the healthcare industry

Using ShapesXR to Design Spatial Learning Experiences

VRXP and Studio KwO XR

BadVR

How to design in XR

Bodystorming Spatial Apps

Paul Hoover