Lanterns Studios
Lanterns Studios
Tunisie

1 Image-Driven Conversational Agent Framework for Web PFE

Computer Vision (CLIP/BLIP)Mobile & Web DevelopmentIA / Machine Learning

Publié il y a 9 jours

Stage
⏱️3-6 mois
💼Hybride
📅Expire dans 5 jours
Cohérence LinkedIn / CV vérifiée.

Description du poste

Overview

  • This project focuses on developing a lightweight framework that generates a conversational AI agent from a single 2D image, designed specifically for web environments.
  • The system must produce a responsive on-screen persona capable of real-time interaction through text and optional speech, while prioritizing fast loading and minimal computation overhead.

Key Features / Objectives

  • Generate an interactive AI persona using only a static 2D image with lightweight facial reactions or expression cues without 3D rendering.
  • Provide real-time conversational capabilities (text and optional voice) and prompt-based configuration for personality, tone, and behavior.
  • Optimize for browser performance on low-spec devices and enable simple integration into existing web applications.

Technical Stack & Responsibilities

  • Implement using JavaScript and WebAssembly, leveraging ONNX Runtime Web or TensorFlow.js for model inference in-browser.
  • Integrate Speech-to-Text and Text-to-Speech APIs for optional voice interaction and use lightweight vision models for facial cue generation.
  • Responsibilities include designing the framework architecture, model selection/tuning for web inference, performance optimization, and creating integration examples/demos.
  • Deliverables expected: working web prototype, performance benchmarks on low-spec devices, integration guide, and documentation for prompt-based configuration.

How to Apply

  • To apply, send your CV and a brief motivation letter referencing this project to recruitment@lanterns-studios.com .
  • You can also include links to relevant demos or repositories that demonstrate experience with Web ML, JavaScript, or lightweight vision models.