Tun Up - Automated Testing & Quality Evaluation Framework for AI Chatbot & Mailbot (PFE / Internship) | Hi Interns

Overview

Project: design and implement an automated quality evaluation framework for production chatbot and mailbot systems to improve reliability and quality.
The project is part of the Care Technology team and targets production systems using real customer interactions.

Responsibilities / What you will do

Build a maintainable benchmark test dataset from real customer interactions and implement a multi-layer evaluation engine (rule-based, embedding-based, LLM-based).
Implement false-positive control strategies using intent-scoped rules, multi-reference answers, and weighted scoring; implement automated regression detection and integrate it into the CI/CD pipeline as a quality gate.

Technical environment

Programming Language: Python; AI Stack: LLM APIs, Retrieval-Augmented Generation (RAG), Embeddings; APIs: Open AI or similar LLM services.
DevOps: Git, CI/CD (GitHub Actions); Cloud: AWS; Visualization: Streamlit / Grafana / Web dashboards; Cost control: token consumption tracking, tiered evaluation strategy.

Design & Maintenance

Design a scalable test maintenance system (config-driven tests, versioning, human-in-the-loop review) and implement cost-aware evaluation and optimization (tiered test execution, controlled LLM usage).
Deliver full technical documentation and ensure the framework is maintainable and integrable into existing pipelines.

Candidate profile / Qualifications

Final-year student in Software Engineering with strong foundations in software development, REST APIs, and web technologies (HTTP, JSON).
Good programming level in Python; experience or interest in software testing and automated testing of AI systems; interest in NLP and LLMs.
Comfortable with Git and basic cloud concepts; analytical mindset, strong problem-solving skills, good documentation and communication skills in English; ability to work autonomously on a complex end-to-end technical project.

How to apply

Automated Testing & Quality Evaluation Framework for AI Chatbot & Mailbot (PFE / Internship)