In recent weeks, we have been hearing a lot about DALL-E 2, an Artificial intelligence system that is capable of generating realistic images from text. We can ask it to generate an image of an astronaut riding a unicorn and it will do so so efficiently that we will think it is a real photograph.
On twitter there are already many threads of people who have tried it, and although it is not publicly available, there are sessions from time to time that allow us to obtain texts from the audience to see what the DALL-E 2 is capable of creating (I do not say drawing because they are not drawings exactly).
This recent release of OpenAI uses advanced deep learning techniques that improve the quality and resolution of the generated images, and that can be used for the generation of data sets, with the aim of solving the biggest challenges of machine vision.
It is important to remember that many artificial intelligence applications of artificial vision are responsible for analyzing medical results to find tumors, or improving the skills of autonomous cars, and they do so thanks to the training carried out with millions of real images.
A good image classification system must be trained with some 300 million images and more than 375 million labels, and for that it is necessary to get those images and deliver them to the program for the appropriate training.
The current problem with machine vision artificial intelligence applications
Imagine that we are trying to train an AI system to know what a beach umbrella is, and all the photos we have sent to it have a lot of blue and yellow, of the sky, the sea and the sand. If we do it this way, the Artificial Intelligence system may think that these colors are essential to recognize an umbrella, and if we show it one in the middle of a city, with a red and green background, for example, it will not recognize it correctly.
The solution that DALL-E 2 could give
That problem could be solved if we give the computer millions of photos of artificially generated umbrellas, which are not on the beach and are in other environments, and DALL-E 2 could do it in a simple way.
These imaging techniques have been around for quite some time, but DALL-E 2 offers high resolution (1024 × 1024), and by using text you can better understand the relationship between different objects in a given image.
Logically, it will be necessary to have a human sample to select the randomly generated samples, to verify their validity, but the work will be greatly speeded up.
You can read more about this topic in Sahar Mor’s article on venturebeat.com