CL-08 - AI Agent harness

Looyas•Tunisia

ai/mlagentic-aiBenchmarking/EvaluationPython/PyTorch

Published about 1 month ago

Internship

⏱️4-6 months

💼On-site

📅Expired 29 days ago

Remove what you cannot explain.

Job description

Design and implement an evaluation harness for AI agents. Define and instrument key metrics (task completion, tool-call correctness, hallucination detection, silent failure identification), run agents against controlled scenarios, and score their behavior systematically.

Technologies: Python, LangChain/LangGraph, RAGAS, DeepEval, guardrails, context & memory management, Git, Docker.

Send my application

Save

Share internship