Now its possible to convert voice to face of people:::::

MIT published research paper for create face using voice of people ,
basically how its work here i explained it lets see.


The source of main model is youtube videos which uploaded by peoples with their voice.
Model will analysis it  wavelength and peoples face 
next step is to get spectrum  and voice decoder for video voice analysis  another is detect face and make group of its at last pre-traind model and trainable models give final result which is similar to real face of actual peoples voice.






Here is research paper in pdf format download it



Speech2Face (S2F) Model 
The large variability in facial expressions, head poses, occlusions, and lighting conditions in natural face images makes the design and training of a Speech2Face model non-trivial. For example, a straightforward approach of regressing from input speech to image pixels does not work; such a model has to learn to factor out many irrelevant variations in the data and to implicitly extract a meaningful internal representation of faces—a challenging task by itself. To sidestep these challenges, we train our model to regress to a low-dimensional intermediate representation of the face. More specifically, we utilize the VGG-Face model, a face recognition model pre-trained on a large-scale face dataset , and extract a 4096-D face feature from the penultimate layer (fc7) of the network. These face features were shown to contain enough information to reconstruct the corresponding face images while being robust to many of the aforementioned variations . Our Speech2Face pipeline, illustrated in Fig. 2, consists of two main components:
 1) a voice encoder, which takes a complex spectrogram of speech as input, and predicts a low-dimensional face feature that would correspond to the associated face; and
 2) a face decoder, which takes as input the face feature and produces an image of the face in a canonical form (frontal-facing and with neutral expression).
 During training, the face decoder is fixed, and we train only the voice encoder that predicts the face feature. The voice encoder is a model we designed and trained, while we used a face decoder model proposed by Cole et al.. We now describe both models in detai




Post a Comment

Previous Post Next Post