On the same day that ChatGPT announced the availability of voice for everyone , Stability AI announced the arrival of Stable Video Difussion , its solution for generating videos with AI. This company was practically unknown a year ago, but now it already has a name for itself thanks to its AI imaging.
Apart from ChatGPT, the generation of images with artificial intelligence has been another of the greatest technological innovations so far this year with different actors entering the scene to show their solutions and thus try to crown themselves as the AI par excellence to create those images that still They have their errors or shortcomings
The generation of video with AI is another of the open fronts, although Stable Diffusion, Adobe or Midjourney have been more entertained with the creation of images. Everything changes now with the announcement made by Stability AI, a model based on artificial intelligence that generates videos by animating different images.
These are that it will be used as an educational tool or for creativity as well as for design or other types of artistic processes and not those dedicated to the real or partial representation of people or events. Stable Video Diffusion comes in two models: SVD and SVX-XT. The first is responsible for transforming images into video with a resolution of 576 x 1024 with 14 frames. SVD-XT uses the same architecture, although it makes the jump in frames per second to 24.
Of course, both models can generate videos between three and 30 frames per second. According to the company in the announcement made from its website , these two Stable Video Diffusion models ( here the white paper with all the information ) were initially trained with a data set of millions of videos and then were optimized with another series of data from smaller size composed of hundreds of thousands to a million video clips.
The next question is to know the source of all those videos and the information is not really very clear, but it does imply that many of these videos come from public sources, so it is difficult to know whether or not they are under copyright. Both models are capable of generating videos of up to four seconds and in terms of quality they are on par with Meta’s video generation model or the examples produced by Google and its startups Runway and Pika Labs.
A technology with its limits
If the generation of images has already had its setbacks and potholes , such as how complex it can be to create two hands that are expressive and full of details, the generation of video is going down the same path or a worse one. Here are some of the current Stable Video Diffusion limits:
- The models cannot produce motionless videos or slow camera pans.
- Be controlled by text .
- They cannot render text.
- At the moment it cannot generate faces or people properly.
The next steps to be taken according to the company is the creation of a variety of models that use the two current SVD and SVD-XT as a base, as well as a text-to-video tool that will bring the introduction of prompts to the models in the Web.
The great objective is the commercialization of this tool to take it to different fields such as advertising, education, entertainment and more. And as is known through Semafor or Forbes itself, Stability is looking for a coup d’état to begin generating profits, since currently its investors are putting pressure on it due to the almost literal burning of existing capital without really seeing it. the fruits in economic terms.
For now, we will have to wait for the web tool to be launched , since this is a preview that shows how the generative AI technology used for video creation works. A company that also launched Stable Audio, its tool for musical generation, so in its hands are some of the most disruptive technologies.