How to separate people, animals, shadows and smoke in a video

0
34
separar video.jpg
separar video.jpg

When we are editing a video, many times we want to eliminate a person with their dog, or a car with the smoke it generates, elements that are very difficult to separate in this format.

There are resources that separate people, animals and objects, but that are also capable of separating shadows and fumes, it is much more complex, so we read with joy the advances in this regard.

The article is published by Google on its blog on Artificial Intelligence. In it they talk about Omnimatte, the association of objects and their effects in video, a new approach that takes advantage of the neural representation in layers to separate a video in layers called omnimattes that include not only the subjects but also all the effects related to them In the scene.

The study, available in this PDF, shows how omnimattes can capture soft, partially transparent effects such as reflections, splashes or tire smoke.

There are similarities to traditional mattes, since omnimattes are RGBA images that can be manipulated with editing tools. Once separated, we can use them to insert text in a video under a smoke trail, for example, as shown with the logo included in the image above next to the skidding car.

How omnimattes are generated

To achieve this, they divide the video into several layers, one for each moving element and another for the non-moving background. This decomposition should capture the correct effects in each layer, so the person’s layer should include their shadow, and the car’s, the smoke from their tires.

SEE ALSO  Google wants you to forget about WhatsApp video calls and use Google Meet and knows how: make your own FaceTime

They achieve this by training a convolutional neural network (CNN) to map the segmentation mask of the subject and an image of background noise in an omnimatte. CNNs know what the correlation is between image effects, so that if a subject moves, and a shadow moves in a synchronized way, it understands that they are the same element.

The problem is that you have to train a new rendering network for each video, and that requires significant processing power.

What is it for

On a day-to-day basis, we can use this technique to remove objects from videos more naturally, to duplicate objects by repeating their layer in the composition, and to manipulate the speed of subjects (make a car run faster in the video than it does). ran in real life, for example).

This third point is important in movies, where time is currently managed by taking separate shots for each subject in a controlled filming environment. If we separate the objects from the videos, we can control the playback speed of each layer more easily. This gif shows how they have managed to accelerate the movement of a child using this technique.

Now they have to move forward to make these techniques accessible to the general population within their popular video editors. The theory is already done, the operation has already been demonstrated, now the application that will allow us to perform these wonders is missing.