Looyas
Looyas
Tunisia

CL-08 - AI Agent harness

ai/mlagentic-aiBenchmarking/EvaluationPython/PyTorch

Published 3 days ago

Internship
⏱️4-6 months
💼On-site
📅Expires in 11 days
Visa: start with the document checklist.

Job description

Design and implement an evaluation harness for AI agents. Define and instrument key metrics (task completion, tool-call correctness, hallucination detection, silent failure identification), run agents against controlled scenarios, and score their behavior systematically.

Technologies: Python, LangChain/LangGraph, RAGAS, DeepEval, guardrails, context & memory management, Git, Docker.