Build an agentic AI module that transforms a full software repository into an executable HPC workload on Slurm and outputs a Workload Execution Requirements (WER) profile for CloudLift.
Responsibilities
- Analyze repositories (source files, notebooks, scripts, configs, build/dependency files, docs)
- Extract software requirements (Python packages, system libs, compilers, MPI/OpenMP/CUDA, modules, containers, external data)
- Infer build/run instructions from artifacts (README, requirements.txt, pyproject.toml, setup.py, Makefile, CMakeLists.txt, Dockerfile, shell scripts, notebooks, CI/CD workflows)
- Produce a safe action plan: clone, checkout, install, build, test, execute, collect logs, extract workload requirements
- Generate Slurm scripts tailored to workload type (CPU/GPU, memory, time, modules, env setup, command)
- Execute safely, monitor jobs, collect stdout/stderr, scheduler metadata, runtime/memory/GPU metrics
- Diagnose failures (missing deps, wrong paths, failed builds, missing datasets, permissions, invalid Slurm opts, CPU/GPU mismatches)
- Enable codebase Q&A for users (what it does, how to run, deps, GPU needs, failure reasons)
- Produce final WER profile and convert to CloudLift input format
Required skills
- Strong Python; Linux/Git/shell familiarity
- Basics of pip/conda/make/CMake/Docker
- Interest in HPC, Slurm, workload execution; agentic AI/LLM tool use/structured reasoning/validation
- Work with JSON/YAML and clear technical documentation
Expected outcome
- A working prototype linking repository understanding with HPC execution and CloudLift recommendation.