ReDX Technologies
ReDX Technologies
Tunisie

Project 9 - Agentic Repository Execution Layer for HPC Codex

HPCAgentic AILLM toolsIT Consulting / Software EngineeringAutomation / SCADACloud infrastructure / DevOpsSlurmPython/PyTorchCloud computing (GCP)

Publié il y a environ 20 heures

Stage
⏱️3 mois
💼Présentiel
📅Expire dans 13 jours
Vérifie que tes liens sont cliquables.

Description du poste

Build an agentic AI module that transforms a full software repository into an executable HPC workload on Slurm and outputs a Workload Execution Requirements (WER) profile for CloudLift.

Responsibilities

  • Analyze repositories (source files, notebooks, scripts, configs, build/dependency files, docs)
  • Extract software requirements (Python packages, system libs, compilers, MPI/OpenMP/CUDA, modules, containers, external data)
  • Infer build/run instructions from artifacts (README, requirements.txt, pyproject.toml, setup.py, Makefile, CMakeLists.txt, Dockerfile, shell scripts, notebooks, CI/CD workflows)
  • Produce a safe action plan: clone, checkout, install, build, test, execute, collect logs, extract workload requirements
  • Generate Slurm scripts tailored to workload type (CPU/GPU, memory, time, modules, env setup, command)
  • Execute safely, monitor jobs, collect stdout/stderr, scheduler metadata, runtime/memory/GPU metrics
  • Diagnose failures (missing deps, wrong paths, failed builds, missing datasets, permissions, invalid Slurm opts, CPU/GPU mismatches)
  • Enable codebase Q&A for users (what it does, how to run, deps, GPU needs, failure reasons)
  • Produce final WER profile and convert to CloudLift input format

Required skills

  • Strong Python; Linux/Git/shell familiarity
  • Basics of pip/conda/make/CMake/Docker
  • Interest in HPC, Slurm, workload execution; agentic AI/LLM tool use/structured reasoning/validation
  • Work with JSON/YAML and clear technical documentation

Expected outcome

  • A working prototype linking repository understanding with HPC execution and CloudLift recommendation.