Budapesti Műszaki és Gazdaságtudományi Egyetem - Villamosmérnöki és Informatikai Kar

Témák listája

VID2SPEECH: beszédgenerálás néma videóból, deep learning alapon
Speechreading is a difficult task for humans to perform. However, with recent deep learning methods, it is possible to create lip-to-speech systems, which can convert silent lip motion to intelligible audible speech. The task of the student is to learn about recent deep learning methods (e.g. convolutional and recurrent neural networks) and create new solutions for lip-to-speech conversion. Suggested programming languages: Python. The research will be conducted in collaboration with the MTA-ELTE Lingual Articulation Research Group (Momemtum grant, http://lingart.elte.hu).