Applying Computer Vision to non-image based ML problems

Applying Computer Vision to non-image based ML problems

The field of computer vision has been making tremendous advances in the past years. Innovations in Computer Vision have had many applications with perhaps the greatest being in the medical field where it is used for things like cancer and tumor detection, cell classification, disease progression scores and even as late to diagnose COVID-19. Revolutionary right? Well, systems in this field still struggle with some simple tasks that humans can do, such as identifying puppies from cookies (well even humans can struggle with this one😂).

Obtained from Chihuahua or muffin? My search for the best computer vision API by Yao Maria

Luckily, however, the technologies used in this field have proven to be useful for other non-image based tasks in machine learning. But, before we talk about other tasks in machine learning, let’s talk about what computer vision is and how it works.

I am sure you have noticed by now that I am assuming everyone reading this knows that Computer Vision is a field of Artificial Intelligence (AI)  that uses Machine Learning and Deep Learning to allow computers to see, recognize and analyze things from visual inputs such as digital images and videos. But how is this possible?

Images are made up of cells, called pixels. Each pixel is a point in an image that represents the colour or intensity value of that point. In computers, the combinations of pixels in an image are represented as matrices of numbers. We can think about it this way.

A very small part of that image can be represented using the shown matrix. For the whole image, it will be an even larger matrix. These matrices differ in dimensions, i.e. for images without colour, they can be 2-D and for images with colour, they can be 3-D. For computer vision tasks, the matrices are fed into an algorithm that learns the features of the images and then develops a model that can be used for a particular task. Some applications of computer vision are in medical imaging, navigation systems, surveillance systems, optical character recognition, biometric systems, motion capture etc.

The most common machine learning algorithm for computer vision is a convolutional neural network (CNN). It can be explained simply by using the image below. It starts with an input image, which is read as a matrix. In the picture below, the matrix is 3-D as you can see layers on the image. The image goes through processes called convolution and pooling, where image features are extracted and the matrix dimension is reduced, and finally flattening, where the resulting matrix is converted to a vector, which will be input to a fully connected neural network layer. The result is the classification of an image into a category depending on the task at hand.

CNN Architecture

With the increasing success of computer vision algorithms and the fact that there is no judgement in research, some people came up with the idea of using the algorithms for other input that can be represented as images. I’m sure in another environment, if someone had said they want to convert audios of spoken words to pictures and then use computer vision to classify the words, someone would have judged them. However, research is very accommodating, and this is why we have advances in technology.

There are various types of inputs that can be converted to images. A common one is audio input. Audio files are read into computers as matrices (similar to matrices for images). These matrices can be used to create an image representation of the audio, called a spectrogram. A spectrogram is a visual representation of frequencies of a signal over time. It’s like a heat map of signal frequencies. When plotted, an audio file looks like the image on the left below and when converted to a spectrogram, it looks like the image on the right.

These images/spectrograms can be input to a computer vision algorithm, such as a CNN and used to classify audios. I once worked on a project that does exactly this. Feel free to check it out here.

Another interesting application is in cyber threat detection. Using binary visualization, files can be transformed into images that are colour coded. Some researchers have shown that when malicious files are visualized using this method, there are patterns that emerge, and these show that there is a difference between malicious and safe files. The following images show an example.

The research states that malicious files have a tendency to result in colourful images, while safe files have cleaner and less colourful images. The researchers used an image classification neural network for the classification of these files. This really sounds interesting and would be a good project for machine learning enthusiasts.

There are other applications of computer vision techniques in other domains, but we can stop here for today. Well, the other one is an idea I came up with a few days back. I was facing a difficult text classification problem and thought that if only it was possible to convert the text into images or spectrogram-like heat maps, I’d probably obtain a better classification model using a CNN. Well, I haven’t found a way to do this so if anyone does, you can contact me and tell me. Overall, computer vision techniques are really interesting and useful for other domains. Do you have a machine learning problem and input that can be converted to images? If yes, then try computer vision algorithms and see how that works.

Did you enjoy the article and want to learn more. Here are some helpful resources:

IBM. What is Computer Vision? Available at https://www.ibm.com/topics/computer-vision#:~:text=Computer%20vision%20is%20a%20field,recommendations%20based%20on%20that%20information.

Yao Maria. Chihuahua or muffin? My search for the best computer vision API. Available at https://www.freecodecamp.org/news/chihuahua-or-muffin-my-search-for-the-best-computer-vision-api-cbda4d6b425d/

Baptista, I., Shiaeles, S. and Kolokotronis, N., 2019, May. A novel malware detection system based on machine learning and binary visualization. In 2019 IEEE International Conference on Communications Workshops (ICC Workshops) (pp. 1-6). IEEE. Available at https://arxiv.org/abs/1904.00859

Dickson Ben, Computer vision and deep learning provide new ways to detect cyber threats. Available at https://bdtechtalks.com/2021/09/10/computer-vision-deep-learning-threat-detection/