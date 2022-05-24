Through the use of artificial intelligence mechanisms, the ability to generate images from a text description has already been demonstrated. OpenAI surprised us with DALL-Ea system that was later introduced in an improved edition: DALL-E 2.

Google, through its research division, claims to have achieved a more powerful mechanism. It was recently presented, under the name of Image.

Image, a generator based on textual descriptions

Google Research, the Internet giant’s research projects division, unveiled Image, an AI system that creates photorealistic images from text input.

To encode the text presented to the system, Image uses an encoder called T5-XXL. What this system does is, based on the data it masters, develop a 64 x 64 pixel sketch. Using diffusion mechanisms, the AI ​​then upscales the resolution to 256 × 256 pixels and then 1024 × 1024 pixels, producing sharp, realistic-looking results by retouching details during the enlargement process.

An interesting aspect of this system is its self-correction capacity. As the image is refined during its enlargement process, the AI ​​is able to assess to what extent the applied retouches establish a relationship between the reference phrase and the result obtained.

From Google Research they affirm that this is an unprecedented development, highlighting as achievements the establishment of optimizations in the text encoder; the establishment of a new threshold diffuser, to obtain higher resolution images; optimizing memory usage on computers running this system; and the positive evaluation of the correlation index between the reference texts and the generated images.

The demos for now are limited to the examples shared by Google. To avoid risks of misuse, the use of this tool has not yet been released. “At this time, we have decided not to release the code or a public demo. In future work, we will explore a framework for responsible outsourcing that balances the value of external auditing with the risks of unrestricted open access.”says the Image team on the project’s website.

Another detail highlighted by the Imagen team is that, for future work, they will need to refine the data sample with which they train this system, to avoid bias, offense or other social problems or distortions of reality. In this first instance, the focus was on the development of the most technical aspects of the system, working with a set of data extracted from the web without filtering.

A complete technical description of this project, together with examples that illustrate the potential of this tool, are available at Google Research website.