CL-08 - AI Agent harness

Looyas•Tunisie

ai/mlagentic-aiBenchmarking/EvaluationPython/PyTorch

Publié il y a environ 1 mois

Stage

⏱️4-6 mois

💼Présentiel

📅Expiré il y a 27 jours

Intègre les mots-clés de l’offre.

Description du poste

Design and implement an evaluation harness for AI agents. Define and instrument key metrics (task completion, tool-call correctness, hallucination detection, silent failure identification), run agents against controlled scenarios, and score their behavior systematically.

Technologies: Python, LangChain/LangGraph, RAGAS, DeepEval, guardrails, context & memory management, Git, Docker.

Envoyer ma candidature

Sauvegarder

Partager le stage