Intel Labs, Intel’s research and development arm for new computing technologies, announced on Wednesday (21) the first artificial intelligence model capable of generating images with depth mapping through text. Called Latent Diffusion Model for 3D (LDM3D or “Latent Diffusion Model for 3D”, in free translation), the new generative artificial intelligence model works in a similar way to OpenAI’s DALL-E, which creates a rendering from a text command, but Intel’s proposal stands out for ability to generate 3D images.

LDM3D allows the creation of images that can take the viewer into a realistic and immersive virtual environment. According to the hardware giant, the model has the potential to revolutionize the metaverse industry. The company projects a future in which professionals in the graphics, architecture and game development sectors will directly benefit from AI. Below are some examples of renderings created using LDM3D: Trained on a dataset consisting of 10,000 samples from the LAION-400M, which contains more than 400 million text-image pairs, the generative artificial intelligence model uses Dense Prediction Transformer (DPT), which guarantees an estimate of relative depth accurate for each pixel in the generated images.

Unlike existing models, LDM3D allows users to generate an image and depth map from a given text prompt using almost the same number of parameters. It provides more depth accuracy for each pixel in an image compared to standard post-processing methods and saves developers significant time when creating scenes. Vasudev Lal AI and Machine Learning Researcher at Intel Labs

Intel points out that the tool is trained on one of its supercomputers powered by Xeon processors and Habana Gaudi accelerators, developed by Habana Labs. To demonstrate the capabilities of LDM3D, Intel partnered with Blockade researchers to develop DepthFusion, an application that creates 360º views of scenarios from conventional 2D images.