Computer Vision and Eye Tracking

Computer Vision and Eye Tracking

Computer Vision is a field of Artificial Intelligence (AI) that aims to enable computers to see, recognize and analyze things from visual inputs such as digital images and videos. In my previous article, I introduced Computer Vision (CV). I talked about how CV methods are becoming very useful in traditionally non-image-based tasks such as audio classification and cyber threat detection. It would be great to read that first if you do not know about CV or are curious about CV applications. In this article, I will explore how we can use CV in Eye Tracking.

Let's get into it!

Technology has been improving in the past years. As computing power and capabilities are increasing, researchers have been interested in finding ways of enhancing interaction with computing devices. Like in this picture, you have probably watched movies where people use their hands to show and control virtual screens and objects.

Role of Augmented Reality in Science
Credit: Gorodenkoff/Shutterstock.com

This is a quite interesting technology that is made possible by tracking hand movements and how hands interact with the displayed virtual objects. We use our hands a lot when we interact with computing devices, whether virtually or physically, but sometimes we don't always want to use our hands, and some people do not have the option to use their hands because of disabilities.

Imagine being able to interact with objects on a computer without using your hands but your eyes. For example, when shopping online and you look at a box of chocolates and blink twice, then the system knows that you want the chocolates, and they are added to your cart. This is probably not the best example, but you can think of other cool ways to use this technology. It would make certain tasks easier and more accessible, especially to those with disabilities. But how does this work? This works by using Eye Tracking.

Eye Tracking

Eye Tracking (ET) is a process of recording movements and positions of the eyes. Devices that do this are called eye trackers. They have tiny cameras facing a person's eyes, and they record data such as the x and y coordinates of eyes and the time at which the eyes are on those coordinates. Some eye trackers are head-mounted, some are in the form of eyeglasses, and some are static. The task to be done determines the eye tracker be used.

Computer Vision in Eye Tracking (use case 1)

For ET to work, eye trackers need to be able to detect and recognize eyes. This can be done using different methods, one of which is eye detection using CV. CV models trained for eye detection are included as part of the software inside an eye tracker. As a person's eyes move while using an eye tracker, tiny cameras facing the eyes record the eyes, and the eye detection models detect the eyes and the position they are looking at, and the results are recorded.

Applications of Eye Tracking

ET is used in many tasks. It can be used for gaze-based computer interaction, as discussed above, or to analyze people's visual attention during a particular activity.

The eyes are the mirror of the soul and reflect everything that seems to be hidden; and like a mirror, they also reflect the person looking into them.     – Paulo Coelho

The assumption is that a person will look at objects of interest or importance to them. By analyzing what a person looks at, conclusions can be drawn about their focus, interests, or items of importance.

Check out some cool examples of ET.

·       What Does a Soccer Player See?

·       Juggling expert vs novice.

·       How eye tracking enhances karate referee training

The results of such analyses can be helpful in tasks such as training people using skills from experts or just understanding what goes on during a particular activity. Weren't you curious about how expert jugglers manage to follow the items they juggle? If you haven't checked out the links above, then spoiler alert - they don't follow the items! Some supermarkets can be interested in what shoppers focus on before they select a product to buy, and such ET tasks can help to understand this.

Implementing Eye Tracking

Mobile eye trackers such as the Tobii Glasses have multiple functions that aid ET tasks. After performing an ET task, it is possible to have a playback of the scene through a recorded video. The good thing is that you can also see the gaze points as the video plays, and you can do so many other things, such as changing how the eye movement is displayed on the video using ET management software.

It is important to define a region for each viewed object for a proper analysis in ET. We need to give each object X an area, such as a box, where we can say that if a person's gaze falls into the box, then the person is looking at X. One way we can select regions for easy mapping when using eye trackers is to use markers such as the black and white boxes in the image below.

Region with Markers
Markers mapping region in playback video

But this won't always work in every situation. Imagine we are working with a recording of students in a classroom. We'd need to find a creative way of placing markers on each student, which would not be feasible.

Credit: https://chris.gunawardena.id.au/software-development/webgl/optical-position-tracking-with-opencv-and-aruco-markers/

Computer Vision in Eye Tracking (use case 2)

In such cases, it would be better to use Computer Vision methods to detect people. There are many object detectors nowadays. Some detect multiple classes, including people, e.g., YOLO object detectors, and some are specific to one task, e.g., face detectors such as RetinaFace. Object detectors return the x and y coordinates of a bounding box for a detected object.

Object detection | TensorFlow Lite
Credit: https://www.tensorflow.org/lite/examples/object_detection/overview

We could let a person look at the above image while wearing ET glasses, and we record the eye movement. After the recording, we will get information that contains gaze coordinates and timestamps at each gaze point. We then determine how long we want a person to look at a fruit in order to call that a fixation point, i.e. a focus point. For example, we can say that if the gaze is on the apple box continuously for 1 second, we can say that the person was looking at the apple.

This seems like an easy task when we look at single images. This becomes quite challenging when it comes to videos, which are collections of images over time. If we were to move that picture around and perform eye tracking, we would also have to keep track of the new locations of the picture and the new bounding boxes of each object. It may sound difficult, but this is a fun task to work on.

In my next article, we will get into a tutorial with code (hopefully using videos). For now, you can look at the following eye gaze and video datasets and develop different projects you might want to work on using the datasets.

1. DR(eye)VE: https://aimagelab.ing.unimore.it/imagelab/page.asp?IdPage=8

2. EgoMon Gaze & Video Dataset NB: Very large, about 21GB

Description - http://imatge-upc.github.io/egocentric-2016-saliency/

Download -      https://imatge.upc.edu/web/sites/default/files/resources/1720/saliency/2016-egomon/egomon.tar.gz

You can also watch this video (shown below) to see my experience and the activities we worked on when I took an eye-tracking class.

Image Credits

Head-mounted eye tracker: https://www.tobii.com/products/eye-trackers/wearables/tobii-pro-glasses-3

Eye tracking glasses: https://www.asterics.eu/plugins/sensors/Eyetracker.html

Static eye tracker: https://www.semanticscholar.org/paper/Saccadic-Vector-Optokinetic-Perimetry-(SVOP)%3A-A-for-Murray-Perperidis/bd941f1b0d5432a531bb47a9d08af1dcc86ef8ff