Publié il y a environ 15 heures
Object or person re-identification (Re-ID) involves determining if two images, often from different cameras, depict the same entity. This task is performed without facial recognition for ethical or technical reasons. Traditional approaches rely on semantic spaces and similarity measures, but face limitations like occlusions and pose variations. Multimodal language models (LLMs) like GPT, Gemini, Claude, LLaVA/LLaMA, and Pixtral have shown promise in understanding visual scenes with reasoning and contextual interpretation.
The research internship aims to explore how multimodal LLMs can enhance or complement traditional re-identification methods. The intern will:
State of the Art and Baseline Implementation
Exploration of Image→Text Capabilities of Multimodal LLMs
Join us in this exciting research internship to push the boundaries of object and person re-identification using cutting-edge multimodal language models.