Proxym Group
Proxym Group
Tunisie

32 Automated System for Kubernetes Disaster Recovery PFE

Développement .NET / DevOpsKubernetesCloud Infrastructure

Publié il y a 6 mois

Stage
⏱️3-6 mois
💼Hybride
📅Expiré il y a 6 mois
Reste lisible (ATS friendly).

Description du poste

Project overview

  • Build an automated system that performs a full disaster-recovery workflow for a running Kubernetes application: backup, cluster destruction, reprovisioning, and full restore with a single click.
  • The system must automatically back up Kubernetes resources and persistent data, store backups securely, verify integrity, then destroy the original cluster in a controlled manner and provision an identical cluster using Infrastructure-as-Code.

Objectives & key features

  • Automatic backups of Kubernetes resources (manifests, CRs) and persistent volumes and verification of backup integrity.
  • One-click workflow to delete the existing cluster, recreate identical infrastructure, and fully restore the application state.
  • Use Infrastructure-as-Code tooling to ensure the newly provisioned cluster is identical to the destroyed one (networking, node sizing, addons, RBAC, storage classes).

Responsibilities / Tasks

  • Design and implement backup orchestration for Kubernetes resources and persistent data (scheduling, retention, integrity checks).
  • Implement a controlled cluster teardown process and an automated provisioning pipeline to recreate the cluster and restore backups end-to-end.
  • Integrate IaC tooling for reproducible cluster provisioning and automate the end-to-end restore flow.
  • Build a simple web or CLI interface to trigger and monitor the one-click disaster recovery process.

Technologies & Tools

  • Use Velero for backup/restore of Kubernetes resources and persistent volumes.
  • Use Ansible and Kubespray (or similar IaC tools) to provision and configure Kubernetes clusters reproducibly.
  • Use Python for orchestration/automation logic and React JS for any frontend interface required.

Desired profile

  • Engineer profile: Trainee (1 position).
  • Skills: Kubernetes administration, DevOps automation, experience with IaC (Ansible/Kubespray), backup/restore concepts, Python development; frontend React JS is a plus.

How to apply

  • Apply via the trainees platform: https://trainees-platform.proxym-group.net
  • Email subject to use when applying: "PRX-2026-17 - Automated System for Kubernetes Disaster Recovery"
Proxym Group - 32 Automated System for Kubernetes Disaster Recovery PFE | Hi Interns | Hi Interns