The capabilities of artificial intelligence in the field of imaging continue to amaze. This time the protagonist is Point-E, an open source development carried out by OpenAI capable of generate 3D models of objects from a natural text description.
3D models take just two minutes to generate with Point-E
In the style of what had already been achieved with tools such as Dall.e, which generates images from a text description, Point-E manages to generate a 3D model of the described object with which it can be rotated in all directionsand also in a very short time, just one or two minutes using a single Nvidia V100 GPU, so stratospheric requirements are not necessary either.
One of the main differences of the 3D object generation model used by Point-E with respect to other tools is that for the representation of volumes uses discrete sets of points, like clouds, which shape the object to be represented. This is the reason for its name, since the letter E of Point-E indicates “efficiency”, and by conjugating it with the word “point” (point) it manages to define its operation.
And it is that from the computational point of view the representation of point clouds is easier, although (and here is the, for the moment, great limitation of Point-E) the surface texture of the object is not precisely defined, which looks like what it really is, a cluster of small spheres. This can also have another undesirable consequence, which is that sometimes a small portion of the 3D object may be missing or appear distorted.
Behind Point-E there is a double technical combination. On the one hand, the tool capable of translating text into two-dimensional images and then the one that converts a 2D image into a 3D model of said object. The training base of both combined tools is the same: receiving images labeled with text to be able to make the reverse path, adding the three-dimensional objects identified with their two-dimensional pairs to be able to make that other reverse transition as well.
The operation, therefore, would be the following:
-Reception of a text description: “orange and white traffic signal cone”.
-Creation of the 2D image of said cone.
-Generation of a cloud of points that represents said cone.
-Obtaining of a 3D model of the traffic signaling cone initially described.