Facebook publishes a new study of Hyper realistic virtual avatars

Facebook publishes a new study of Hyper realistic virtual avatars

Facebook Reality Labs published a detailed study of the method of creating Hyper realistic virtual avatars in real-time, focusing on previous work, which the company calls a “Codec Avatars”.


Facebook Reality Labs has designed a system to animate virtual avatars in real-time with unprecedented precision thanks to its compact equipment. All of the three standard cameras in the headset that capture the eyes and mouth of the user, the system is able to more accurately display the complex gestures of a particular person than the previous methods.



The meaning of this research lies not only in attaching the cameras to the headset, but in the “technical magic” that underlies the use of the incoming image for a virtual view of the user.


The decision relies heavily on machine learning and computer vision. “Our system works in real time and with a wide range of expressions, including inflated cheeks, bite lips, the movement of the tongue and other parts, such as wrinkles, which are difficult to accurately reproduce using previous methods”, says one of the authors.


Facebook Reality Labs has posted a technical video summary of the work to SIGGRAPH 2019:



The group also published their full research, which is still deeper into the methodology of the system. The work titled “facial Animation using virtual reality with the use of the Multiview Image Translation” was published in ACM Transactions on Graphics. The authors of the article — Shih-EN Wei, Jason Saragih, Thomas Simon, Adam W. Hurley, Steven Lombardi, Michal Perdok, Alexander Hayes, Dawei Wang, Hernan set along, Yaser Sheikh.


(a) The “Learning” with nine cameras. (b) a Tracking headset with three cameras; the camera position used for the training of the headset, circled in red.




The document explains how the project included the creation of two separate pilot headsets, headset Training and the headset Tracking.

Training headset has great volume and uses nine cameras, which allow it to capture a wider range of views of the face and eyes of the subject. This facilitates the task of finding a matching between input images and previously cleared digital scan of the user (a decision about which parts of the input image represent which parts of the avatar). The document States that this process “is automatically detected through self-translation multi-view images, which does not require manual annotation or to-one correspondence between domains”.


After a match you can use a more compact headset Tracking. The alignment of the three cameras reflects three of the nine cameras on the headset Training. The types of these three cameras better understood thanks to the data collected from the headset “Training,” which allows the input to precisely control the animation of the avatar.


The document focuses on system accuracy. Previous methods produce realistic results, but the accuracy of the actual user’s face compared with the presentation is broken in key areas, especially with extreme expressions and a relationship between what you do eyes, and that mouth is doing.



As impressive as was this approach, he still faces serious constraints to technology development. Support both on a detailed pre-scan, and on the start the need to use headsets “Training” will require something such as that and “scanning centers” where users can go to the scanning and training of your avatar. While VR will not be a significant socializing part of society, it is unlikely that such centres would be viable. However, advanced sensing technology and constant improvements in the construction of automatic correspondence at the peak of this work may ultimately lead to the viability of the process at home. The question becomes: when will this happen…?



Go to our cases Get a free quete