Eye gaze & the next wave of user interfaces
February 8, 2024
Will eyes win the future of input?
Apple has made a huge bet on our eyes. The Apple Vision Pro uses eye tracking as its primary input method. Apple is not the first to include eye tracking in their devices. Quest Pro, Microsoft’s Hololens 2, Magic Leap 1 and 2, Varjo 4, and Pico 4 Pro all have eye tracking built in. However, none of their platforms or operating systems use eye tracking as the primary way to target objects and interfaces. Many interaction designers (myself included) have experimented with eye tracking and recommended it to be the main input method for new hardware platforms. But Apple went for it, shipped it, and there’s no turning back now. Will gaze & pinch be the new touch screen for this next wave of computing? We think yes, but not by itself. It needs to work seamlessly with hands and other input devices.
Do I really need to understand how my eyes work?
Like iOS did for touch input before, Apple has made it simple to design native VisionOS apps using their premade controls. If you want to port your iPad app into Vision OS, you can simply transfer your app using XCode and things will work pretty well. There are some gotchas that Apple helps you with within their Human Interface Guidelines. But it’s pretty seamless to get a 2D iPad app floating in space. If that’s what you’re doing you don’t need to keep reading. But if you’re looking to make an authentic spatial application that utilizes eye gaze input, you’ll need a deeper understanding. And if you have a Unity application you are porting to the Vision Pro, you’ll be rolling a lot of these controls yourself.
Understanding eye gaze as an input method
1. We don’t control our eyes, so don’t make the user think about them
Believe it or not, we don’t control how our eyes move. When you’re reading this you might think that your eyes look one letter at a time and take in these beautiful words sequentially. But in reality, this is how your eyes move when reading.
Our eyes are extensions of our brains, and like our brains, most of what they do is outside our conscious control. They jump around constantly scanning the world. If you make your users conscious how their eyes are moving, they won’t be able to focus on anything else. The worst possible thing to do is show a cursor exactly where the eyes are looking. Invariably, there will be a slight offset, and the user can get into a feedback loop. You see this if you stare at the little floaters in your eyes. Lock in on one and it looks like it’s moving, but in fact, it’s just not in the center of your vision and you’re eyes are moving it like a dog chasing its tail.
Tip: Don’t ever show a cursor locked to the user’s gaze.
2. Balance responsiveness and attention
We shouldn’t make the user conscious of their exact gaze motion, but our app should be clear how it will interpret gaze input. In typical input systems, you want an interface to respond as quickly as possible. Touching an iPhone 5 back in the day got a response within 55 milliseconds. This was a big differentiator for the iPhone at the time. But eyes are different. If you give instant hover response to your eyes then you break the first rule,”don’t make the user think about their eyes”. Have you ever glanced at someone, then they immediately stare back at you? You become self-conscious.
So you have to play it cool with response time to eyes to make it feel like the system is responding to your gaze, without making the user self-conscious. Slowly fade in the hover feedback. 230 milliseconds is a good starting point. Too much more and it will feel like you have to stare at something for a long time to activate anything, leading to eye strain. You can experiment with different duration and brightness combinations in ShapesXR using the gaze & pinch interactivity.
Tip: Don’t change major states on gaze. Fade in gaze feedback gradually over 230ms.
3. Eyes set context, hands manipulate, voice summons
A researcher named Bill Buxton once said, “Everything is best for something and worst for something else”. And it’s up to us to map the best inputs to what they are best suited for.
Eyes are incredibly fast at targeting things. When eye tracking is accurate enough it can be much faster than controllers or hands to target virtual objects. But eyes are horrible at moving things around.
Hands are incredible at manipulating things. An object like an apple can be moved in three axes and rotated in those same three axes. Our hands have evolved to manipulate the apple in these six degrees of freedom with wonderful dexterity. However, some objects can be manipulated in more than 6 degrees of freedom. Scissors are a 7DOF (degrees of freedom) object. Rotate, move, open and shut. Our hands are capable of much more. Believe it or not, each human hand is capable of 27 degrees of freedom, each finger independently moving up/down, left/right, and open/shut. So utilize this dexterity of our hands in manipulating content in your spatial experiences.
Voice: How many words do you know? Our voices are incredible at accessing a massive amount of random data quickly. The average American adult’s working vocabulary is around 20,000. In a way that is 20K degrees of freedom, slightly more than our hands and arms combined 48. Yes, it is not always socially acceptable to be talking to your devices. But when it is, a typical person can talk at 130 words per minute. Put together in the right way we unlock a massive amount of communication between the human and computer.
This is a classic example showing the power of voice mixed with pointing via hands. I still love the end.
Tip: Use eyes to target objects and set context. Use hands to activate and manipulate objects. Use voice to modify what is in context.
4. Hands override eyes
We aren’t in full conscious control of our eyes, but we are in control of our hands. So when the hands conflict with the eyes we should listen to the hands. There are some weird examples of this.
Hands lag behind your eyes: Imagine you are typing on a keyboard in the air by gazing at the keys, then pinching your fingers in the air to activate the keys you are looking at. You would think that every time you pinch you are looking at the key you want to activate. But in fact, most of the time you will be looking at the next key by the time the signal gets down to your fingers to pinch. So you as a designer need to delay or make targets sticky for a time so the user’s intent is interpreted correctly.
Ignore eyes while touching something: When you are touching an object directly, your hands are a much better signal of your intention than your gaze. Much of the time your eyes will be darting back and forth between the object in hand and where you want to put it. It’s recommended to just use hand and arm movement to manipulate the object rather than try to move the object with your hand. It’s been tried many times because it sounds so cool to move stuff around with your eyes. But don’t do it. It will make users aware of their eyes.
Tips: When hands are directly touching something, ignore gaze. Don’t move things with gaze, only with hands.
5. Infallible eyes
Imagine you are looking at two eggs and you need to look at the right one to break it. But the left one is highlighting no matter how hard you look at the one on the right. What do you do? Look further to the right? Look harder? When eye gaze for targeting works well, it is magical. When it’s off there’s no way to recover because our eyes are infallible. If you’re looking at something, there’s no way to look at it more. It’s not like a mouse where you can overshoot, and then adjust your hand to acquire the target.
In order to avoid unrecoverable situations like this one you need to make your hit targets big enough for eyes. Each headset has different tolerances and accuracy for the eyes. Apple uses its point system for targets. 60 points is their minimum target size for eyes. This equates to around 3 degrees per target from the eye’s perspective. Or around an inch (2.5cm) at arm's reach. That’s a pretty BIG minimum target. And if you’re designing for Magic Leap or Hololens it needs to be bigger still.
You can use the ShapesXR template to help get your Figma UI the correct initial size. Then use our DMM import system to keep targets consistent regardless of distance to the user. Learn more our Choosing the Right Size and Distance for UI article.
Tips: Keep targets no smaller than 3 degrees from the user’s perspective. Dynamically scale your UI by distance so users can always target it with their eyes.
Dig deeper
Start by watching Apple’s Design for Spatial Input video. Dig deeper into gaze with Ken Pfeffer’s article on Design Principles & Issues for Gaze and Pinch Interaction. John LePore’s thread is also helpful for a foundational understanding of gaze.