Looyas
Looyas
Tunisie

CL-08 - AI Agent harness

ai/mlagentic-aiBenchmarking/EvaluationPython/PyTorch

Publié il y a environ 7 heures

Stage
⏱️4-6 mois
💼Présentiel
📅Expire dans 14 jours
Utilise l’IA pour gagner du temps.

Description du poste

Design and implement an evaluation harness for AI agents. Define and instrument key metrics (task completion, tool-call correctness, hallucination detection, silent failure identification), run agents against controlled scenarios, and score their behavior systematically.

Technologies: Python, LangChain/LangGraph, RAGAS, DeepEval, guardrails, context & memory management, Git, Docker.

Looyas - CL-08 - AI Agent harness | Hi Interns | Hi Interns