The DALL·E AI system expands the frame: Outpainting paints pictures beyond the edge
A feature in DALL·E now expands the original picture boundary – outpainting enables new elements, style combinations and large-format elaboration.
OpenAI expands its text-to-image generator DALL·E 2 with a major capability: Outpainting allows users to play with the size of the graphics they create. With the new component, a display can be continued beyond the original edge of the image. There should be no limits to creativity, according to the blog entry: If you like, you can now add visual elements in the same style or continue the scene shown in a new direction. With outpainting, original images can be expanded in DALL·E to form large-format versions with any aspect ratio.
Users of the image generator were already able to make changes within images. The corresponding function is called Inpainting, and it makes no distinction between images generated in the system and those that users upload ready-made. The new Outpainting feature works in a similar way and is intended to recognize the visual elements present in an image as it expands its dimensions. Shadows, textures, coloring, style, mood, technique and reflections are also taken into account so that the context of an original image – if desired – is seamlessly preserved.
Alignment of names DALL E / DALL E 2
One detail is noteworthy: In the announcement of Outpainting, the team at the Californian AI company – as has happened more recently – omits version number 2 and simply writes “DALL·E”. OpenAI has not yet made a public statement on this; At least not as far as the editors know.
In this news report, DALL E means the system for AI synthesis in general and up to date (from DALL E 2), not the predecessor of the same name. The release “DALL E 1”, released in early 2021, did not have a version number in external communication.
Text-to-image: Over a million people use DALL·E
According to OpenAI, over a million people are already using the AI system to create images and works of art from text instructions. At the end of July 2022, DALL E switched to the paid beta phase. Since then, the range of functions has been regularly expanded, such as outpainting. Systems for text-to-image creation such as DALL·E generate images via text prompts, i.e. visually convert descriptions of a desired image content.
The users tell the system their ideas about the desired image in natural language in writing (as a text prompt), whereby the system includes wishes for a certain style, artist, textures, material properties, incidence of light, epochs, scenes and image techniques. The spectrum is wide and ranges from abstract art, new works by old masters and all kinds of fusion of styles to the photo-realistic synthesis of creatures and mythical creatures that have never existed in this form.
Contain problems – against bias in images
DALL·E (2) is celebrated on the internet as a creative tool. At the same time, it regularly fuels controversy through its potential for creating deepfakes and malicious content, but also through the issue of bias and how OpenAI deals with it. In the summer of 2022, for example, the company made efforts at system level to achieve more diversity in personal images.
The problem of potential misuse and manipulation applies equally to image and text-generating AI systems, some of which only experience their public release after a long closed phase and usually do not reach the public unfiltered. It was only in November 2021 that OpenAI made its large language model GPT-3 available without a waiting list, i.e. without checking those interested in using it.
A few weeks later, OpenAI put a filter over GPT-3 that was supposed to tame wild output (InstructGPT). For the most part, the inclusion of DALL·E (2) is considered successful, although according to critics this was accompanied by a loss of quality, and OpenAI reduced the output from the previous six to maximum four images per prompt.
“Black”/”Female”: DALL E 2 probably modified user prompts
When readjusting, the company may have overshot the target a bit in the meantime. Apparently, some users received random image attachments that didn’t match their text instructions over the summer. Some of those affected shared pictures with surprising content. In July 2022, Richard Zhang from Adobe Research was able to use targeted prompts to reconstruct that OpenAI had probably changed something at the level of the text prompts, i.e. on the input side.
Zhang used DALL·E to create images of people holding signs. He left the text supplement open, thereby tricking the system – in his words – into making the stored text visible. According to him, phrases such as “Black”, “Female” and “Black Male” appeared on the signs. Although the phenomenon was not tangible with every attempt, since Zhang did not type the words, he assumes that DALL·E automatically included these prompts at the system level (source: New Scientist).
DALL·E is closed source, so researchers cannot look directly into it and have to resort to tricks – the results and the conclusions drawn from them should be treated with due caution. At the moment, the image generators cannot freely generate text as graphics. Images containing text tend to show illegible gibberish that visually resembles type but makes no sense.
A clearly legible visualization of words on signs is therefore an indication that the visualized text is available to the system internally as a text instruction. The extent to which OpenAI has modified users’ text prompts (without declaring this) and the extent to which the problem of surprising outputs has now been resolved is an open question.
Family of AI image generators is growing
DALL·E is currently setting new standards in voice-controlled creation of images. With outpainting, the AI company from San Francisco has again achieved a technical breakthrough that underpins its market leadership in image generation. However, the Visual AI Composer is not the only text-to-image system. There are Midjourney, Craiyon (formerly dall e mini – despite the name not from OpenAI), Disco Diffusion, Google Imagen (no public demo yet) and the newly released Stable Diffusion.
AI freely available: Stable Diffusion
The freely available text-to-image generator Stable Diffusion has been causing a stir since August 2022. However, AI-supported image generation does not stop with static images, but touches on the production of moving images and films. New tools and techniques for filmmaking are emerging: Text-to-video productions are to be expected more and more. According to tweets, the team behind Stable Diffusion, among others, is working on further developing its own model for such an application. Former Tesla head of AI Andrej Karpathy introduced Video created with Stable Diffusion for which he provides the code on GitHub.
Behind Stable Diffusion is an association of research teams from the environment of a computer vision group from the Universities of Heidelberg and Munich, and the LAION community and the EleutherAI group also support the project. The grassroots movement EleutherAI had already spawned open source alternatives, namely GPT-J and GPT-Neo, during the closed phase of GPT-3. Employees of the Heidelberg start-up Aleph Alpha made a significant contribution to this. Stability AI is once again a smaller private AI company behind Stable Diffusion, which is financially supporting the project. The motto of the company is simply: “AI to augment the potential of humanity.”
Stand at the big language models
In turn, Microsoft has been on board with the provider of the proprietary systems GPT-3 and DALL E since September 2020 as a financier: the group had secured the exclusive rights to GPT-3 for one billion US dollars in order to expand its capabilities in cloud products, among other things to use. Meanwhile, Microsoft partnered with Nvidia to introduce the Megatron-Turing Natural Language Generation Model (MT-NLG), and behind the scenes, OpenAI is working on GPT-4. Until the release of its GPT-2 model, OpenAI had provided even deeper insight into its research, since then the company’s research department has kept a low profile. So far, only hints and assumptions have been circulating about the development status of GPT-4.
As a European alternative, the German company Aleph Alpha offers large language models with multimodal capabilities (Luminous with MAGMA), and in France, at the initiative of the state, the BLOOM model was created together with Huggingface (somewhat muted user feedback about this on Twitter). A consortium around the KI-Bundesverband has designed a major project called LEAM, estimated at 400 million euros, but the financing of which is probably still uncertain. Due to high hardware requirements, it is obviously not easy to assert oneself and build something independent in the rapidly growing, cost-intensive field, instead of access to large models from others or the computing power required to develop and operate your own models from one of the hyperscalers to rent.
The latter has caused a stir since its research and public release in August. Unlike OpenAI’s products, Stable Diffusion is widely available and freely accessible. It’s under an open-source license that allows commercial use – copyrights remain with the person who creates the image (questions also arise here, for example when prompts are reused by third parties).
The results are of high quality, so that OpenAI is facing competition in the area of image generation, at least due to the free accessibility and potential range of the tool. The Lexica search engine already indexes over five million images and prompts that come from Stable Diffusion. Since its release a few weeks ago, there has been a wave of new startups and projects building graphical user interfaces (GUIs), web APIs, and services around the free model. The ecosystem is growing rapidly. And the team behind Stable Diffusion is apparently already thinking about an outpainting function, as the founder of Stability AI, Emad Mostaque, indicates in a tweet:
The fact that alternatives are emerging (especially as open source) stimulates the market and is good news for science, business and society. It is important to continue to monitor this development.
Illustrations with outpainting
On a positive note, in the Outpainting blog post, OpenAI names some of the artists whose works illustrate the announcement. At least in the preliminary test phase, OpenAI retained the rights to the images created by users, which subsequently circulated on the Internet without information about the original author.
A selection of images created with DALL·E Outpainting can be found in the announcement blog post. Among other things, in the Twitter profile of OpenAI employee David Schnurr other nice examples to find.
(her)