Olindias

Monastir

10 Subject 4 : AI Model for Real-Time Video Translation with Live Lip Sync PFE

10 Subject 4 : AI Model for Real-Time Video Translation with Live Lip Sync PFE

Olindias•Monastir

Time Series Modeling & Machine LearningComputer Vision & NLPnatural language processing

Publié il y a 8 mois

Stage

⏱️3-6 mois

💼Hybride

📅Expiré il y a 7 mois

Visa: commence par la liste des documents.

Description du poste

Project Overview

Develop an AI system capable of translating spoken language in real time within video streams while preserving natural lip synchronization on the target-language video output.
Aim to combine automatic speech recognition (ASR), neural machine translation (NMT), text-to-speech (TTS) or voice conversion, and visual lip-sync generation to produce low-latency, high-quality translated video suitable for live or near-live scenarios.

Objectives & Key Tasks

Design and implement a pipeline that performs: speech capture → ASR → translation → speech generation → lip-sync-driven video rendering, minimizing end-to-end latency.
Research and integrate state-of-the-art models for real-time ASR, low-latency NMT, and lip-sync synthesis (e.g., audio-driven facial animation / viseme alignment), and evaluate trade-offs between quality and speed.

Technical Requirements & Skills

Strong knowledge in Machine Learning, especially deep learning frameworks (PyTorch or TensorFlow) and experience with sequence-to-sequence models.
Experience in Computer Vision (video processing, facial landmark detection), Speech/NLP (ASR, NMT, TTS) and real-time inference optimization (quantization, model pruning, batching strategies).

Deliverables & Evaluation

A working prototype demonstrating live or near-real-time translation with synchronized lip movements on output video, plus qualitative and quantitative evaluation (latency, translation accuracy, lip-sync accuracy, perceptual quality).
Documentation, source code, and a short demo video showcasing typical use-cases and performance metrics.

Tools & Environment

Development on Python with common ML libraries (PyTorch/TensorFlow, OpenCV, Hugging Face Transformers, Kaldi/ESPnet or similar for ASR, TTS toolkits).
Optional deployment targets: desktop GPU, edge device, or cloud inference; include notes on scalability and latency optimization.

How to Apply

To apply for this PFE internship, send your CV, a brief cover letter describing relevant projects, and links to any demos or repositories.
Use the subject line: "PFE Application - 10 Subject 4: AI Model for Real-Time Video Translation with Live Lip Sync" and send to career@olindias.com.