Custom vs. General Vision AI Services

Table of Contents

Understand the Difference in Image Recognition Platforms

AI is on fire, and so are services delivering different forms of artificial intelligence. In this post, I would like to focus on the Visual segment and compare two different approaches: general vision and custom vision.

What is the difference, and what works better for different visual tasks?

General Vision Platforms

Here are some examples of general vision platforms:

Google Cloud Vision API
Amazon Rekognition
Microsoft Computer Vision
Clarifai
IBM Visual Recognition Watson

These platforms are built for understanding everyday objects (dogs, aeroplanes, faces, tables…). They mostly provide photo tagging. This means understanding as many objects and abstracts in the image as possible.

The main goal of general platforms is a human-level understanding of images. They are reaching for more than object understanding. The idea is to understand the abstract interaction between objects, moods, and contexts. In the video, they are trying to understand the action, its impact and time continuity. These are a very complex task and need a lot of labelled data to learn.

General vision providers gain millions of images from different sources. Users upload images into Google Photos, OneDrive and Google index every image on the web for Google images.

How do users then benefit from these never-ending data sources?

Machine learning requires a lot of data for training. In this case, users don’t have to provide any data. General models learned thousands of everyday objects, face emotions, landmarks, car types… We can start using this treasure right away with no pain of gathering relevant training data.

This is an amazing benefit, and general models learn more and more categories. We can set up many cool apps by implementing general models because our apps often look at everyday objects.

Another benefit of general solutions is other functionalities they provide out of the box. Generate a thumbnail, read a text, or find a celebrity name. All with no training data and for a reasonable price.

How to choose?

The most important factor we are trying to maximise is accuracy on our specific task. Each provider has a different number of training images, and different deep learning architecture and provides different tasks. These are company secret data we will never reach.

Generally, in AI, we want to find all the providers who offer the functionality we need and test them out to find the best-performing solution. For complex tasks, it is common to mix a few providers with the best results.

Who is this for?

General vision suits the best to the application that needs to recognise everyday objects. Robots reading human faces, e-shop image captioning for better SEO performance and helping blind people to understand new environments. These are the great examples of general vision. When reaching for vision solutions, the first question should be: Is this something I could find online? If yes, then it is worth trying general models.

Services like Google Vision provide the power of millions of images to everyone.

But what happens when we come out of every day’s space? What if we have scientific data available only to a few universities? Here comes the Custom vision.

Custom Vision Solutions

One example is Ximilar solutions. These custom vision solutions are continually evolving to meet the dynamic needs of the rapidly growing sectors such as e-commerce, manufacturing, healthcare, and more. Most of these solutions are ready for deployment via API with just a click, requiring no knowledge of coding or machine learning techniques. They are highly customizable and can be easily combined in a modular fashion to suit various applications. I call this a win.

And some more:

Microsoft custom vision AI
Clarifai custom image recognition
Imaga computer vision

In custom computer vision, users create their own rules to sort images.

Rather than asking the type of the flower, you may want to know if there is a sun shining on the flower. Sometimes you want to be alert when your security camera spots humans, but your neighbour on a mower tractor is all right.

You could make sure all the product thumbnails you display show the unboxed products on a white background. Someone wants to make sure that the product on the end of the line is not damaged. This is something that off-shelf solutions are not built for.

About a year ago, there was only one solution. Hire an AI team to deliver an expensive on-premiss solution. Custom vision solution opens up whole new possibilities in visual AI. Compared to general platforms, there is an infinite number of tasks we can solve by defining custom objects. We can also detect different states of one object or environment. Custom vision is a little machine learning lab where everyone can test his idea. As a result, we can automate boring human tasks and save some time.

Custom vision goal is not general image understanding but a 100% accurate understanding of the specific task. This is very close to the market, but it comes with one disadvantage. User have to gather their own training data. This can be a pain and time-consuming, but it is a competitive advantage too.

At this moment, all the custom services offer image classification tasks. This means sorting the images into classes while looking at the whole image. The task can be as simple as deciding “ok” or “broken” or it can consist of many classes (e.g. several terrain appearances).

Custom vision can also come in handy when we need high accuracy on a smaller set of categories. We don’t always need to recognise thousands of categories. We would like to find 10 that are interesting for us. Custom vision can often supply better accuracy for general tasks.

The key for the user is the number of images they need for training. This is very hard to estimate in general but can be as little as 20 images per class. Read more about custom datasets in this post. The smaller the visual difference is, the higher the number of images we need.

How to choose?

We are looking for a solution that is easy to use, provides a simple user interface and the best accuracy for our task. We should test all the available solutions before we start using one. Before we start testing our idea I would also recommend saving some time discussing the project with the support team.

Who is this for?

Custom vision suits the application that needs to recognise very specific images or object states. It also fits images that are not available online or are not mass-produced by web users. It can solve many scientific, industrial, medical, and laboratory tasks. General models are often made for the needs of the online business. Custom vision can help in a variety of industries. Agriculture, production lines, security, and many others.

Custom vision opens new possibilities in visual automation. All made simple for users with no technical background.

Summary

Machine learning is a technology that saves people a lot of time. Vision is one of the human abilities that is now possible to automate. There is no universal approach for vision tasks, so we have to decide what type of task we are facing.

General vision is here to organize and structure images that are available on the web. It is very simple to use and needs no training data.

Custom vision makes sense in images that are very specific to the task or not available online. It needs some effort to gather training data, but it provides very accurate results for vision tasks.