Brief: Design representative application-level test suites for real HPC workloads (CUDA, OpenMPI, PyTorch, TensorFlow, VASP, Quantum ESPRESSO) and integrate them into CI/CD to improve reliability and efficiency of the HPC software ecosystem.
Goals and responsibilities:
- Build application/library test cases validating correctness and basic performance on CPU and GPU.
- Integrate tests in CI using ReFrame and Jenkins for automated periodic validation of the software stack.
- Ensure scalability and portability across architectures, compilers, and configurations managed with EasyBuild/Spack and modules.
Required skills:
- Linux command line and shell; solid development in C/C++ or Python.
- Problem‑solving mindset; good English, organization, and PM tool usage.
Planned training:
- Linux fundamentals (Udemy), intro to HPC and parallel programming, EasyBuild & Spack, 1:1 mentorship with ReDX engineers.
Other details:
- Recommended period: 2–3 months.
- Compensation: Monthly stipend with potential end‑of‑internship performance bonus.
- Opportunity to work on real HPC systems and interact with end users.