Years ago a friend told me “I don’t trust people who don’t look me in the eye when I speak”. With video conferences this guy would have a problem, because nobody looks at the eyes, or, rather, at the webcam: It always seems that we are looking away because, effectively, we are looking at the screen.
Now the NVIDIA Maxine platform has a solution for this: it proposes improvements in the field of videoconferencing, but we have been able to test one that is especially striking: the so-called ‘eye contact’ that allows even if you are looking elsewhere, your eyes seem to be always looking directly to the webcam. It is as if you looked at your interlocutorand the feature works really well.
Look at me when I talk to you
You connect to a video conference and the same thing always happens: who talks to you is not looking you in the eye, something that makes the conversation not so direct. We would have to look at the webcam all the time, but it is usually located above the monitor.
Communication is thus somewhat more unreal and indirect, but NVIDIA Maxine poses the use of artificial intelligence to fix it. The company has been surprising us for some time with various experiments that paint us photos while we speak or that generate faces impossible to distinguish from the real ones.
In EuroXlivewe were able enjoy an exclusive demo of the technology: For this we use a small internal application running on a laptop with a built-in webcam – an external webcam can be used without problems.
The application shows a divided screen that on the left side records our face at all times, how we move it and, above all, where we look. From there, artificial intelligence does the work of “create” by augmented reality eyes that always look at the camera.
The effect is amazing. It’s not perfect, of course.and sometimes it seems clear that the eyes do not quite fit the image precisely, but in a normal conference it is probably difficult to see.
NVIDIA technology is able to emulate blinks and to “relocate” the eyes and gaze when we look at other areas and even when we turn our face a little.
In one of our tests we pushed the machine a bit and actually looked outside the screen to see how the system behaved, which most of the time it fit without problems although at some point there could be some small conflict.
overall operation it was nonetheless fantasticespecially since in a videoconference what we constantly do is look at the screen, not at the webcam.
If we did that in our tests, our eyes didn’t appear to be looking a little up or down (which is what would happen in a scenario without this technology), but instead they appeared to be looking always straight ahead, as if we were looking directly into the eyes of our interlocutor.
We were also able to test one of the small applications that are used in conjunction with this development and that allowed us to create the mesh of our face, creating a mask on the right side that imitated the gestures and movements of our face.
The follow-up was also remarkable, and it certainly showed that this type of technology can help make videoconferences, which are increasingly present in our lives, be more “immersive”.
Maxine and the future of video conferencing
This technical demo made it clear that the operation of this type of technology -we talked about it in October 2020- is really promising and poses advantages in the future of videoconferencing.
With this platform several improvements are proposed for the future of videoconferencing in the form of SDKs. There is no specific NVIDIA application that offers these options, but the company proposes three SDKs with different effects so that developers can take advantage of them. They are the following:
- audio effects: These artificial intelligence algorithms are designed to improve audio quality, such as removing noise or echo from the room.
- video effects: Based on the input from the webcam, deep learning is used to improve the resolution of the video (Super resolution) or to create quality virtual backgrounds.
- Augmented reality effects: This is where facial or gesture and pose tracking comes into play, and thanks to this, the movements we make with our mouth or eyes can be recognized. Among the features offered is the creation of a precise mesh for our face and also the eye contact feature (“Eye contact”) that we have been able to test.
The question, of course, is when will we see this kind of improvements in our day to day. After the technical demo we were able to enjoy a question and answer session with Alex Qi, one of the people in charge of the Artificial Intelligence Software group at NVIDIA.
As Qi explained to us, the SDKs are already prepared to be used by companies and developers. It remains to be seen therefore if platforms such as Zoom, Teams or Skype integrate it into their serviceswhich would be the first key element to be able to enjoy this option.
The other requirement – ​​apart from the conventional webcam, of course – is to have an NVIDIA RTX graphics card that allows these artificial intelligence algorithms to be processed, but in reality even that part would not be strictly necessary: work can be delegated to data centers which would be in charge of processing the video signal so that we could use the “Eye contact” feature and the rest in any computer, no matter how modest.
Qi explained to us how in any case they are still working on improving the eye contact feature, which certainly poses challenges: the color of the eyes or the fact that hair can cover part of our face and eyes pose a challenge in some scenarios, as are the light conditions.
Still, the software is capable of working in all those conditions and there is simply a work of refinement of algorithmswhich, for example, work without problems if the user wears glasses, although reflections on the lenses of those glasses —if they exist— can pose another challenge for these algorithms.
The truth is that the state of this technology makes us want it to be available as soon as possible, but there are no estimated dates. That friend of mine, by the way, I would be most happy.