Mission Make Phoenix™ & Mythik™ reliable, observable, and production-grade for long‑running AI workflows.
What you’ll do
- Implement distributed tracing across APIs, agents, and tools
- Design metrics, structured logs, dashboards & alerts
- Apply reliability patterns (timeouts, retries, idempotency)
- Support incident readiness & system debugging
What you’ll learn
- How production systems fail—and how to fix them
- Observability best practices used in real platforms
- Designing for long‑running workflows
- Engineering discipline beyond “it works on my machine”
Profile
- Final‑year engineering student (Computer Science, Software, AI, or equivalent)
- Strong Python backend foundations (FastAPI, async, typing)
- Solid understanding of APIs, debugging, and system behavior
- Curious about reliability, performance, and production systems
- Comfortable reading documentation and learning fast