32 Automated System for Kubernetes Disaster Recovery PFE
32 Automated System for Kubernetes Disaster Recovery PFE
Proxym Group•Tunisie
Développement .NET / DevOpsKubernetesCloud Infrastructure
Publié il y a 6 mois
Stage
⏱️3-6 mois
💼Hybride
📅Expiré il y a 6 mois
Reste lisible (ATS friendly).
Description du poste
Project overview
Build an automated system that performs a full disaster-recovery workflow for a running Kubernetes application: backup, cluster destruction, reprovisioning, and full restore with a single click.
The system must automatically back up Kubernetes resources and persistent data, store backups securely, verify integrity, then destroy the original cluster in a controlled manner and provision an identical cluster using Infrastructure-as-Code.
Objectives & key features
Automatic backups of Kubernetes resources (manifests, CRs) and persistent volumes and verification of backup integrity.
One-click workflow to delete the existing cluster, recreate identical infrastructure, and fully restore the application state.
Use Infrastructure-as-Code tooling to ensure the newly provisioned cluster is identical to the destroyed one (networking, node sizing, addons, RBAC, storage classes).
Responsibilities / Tasks
Design and implement backup orchestration for Kubernetes resources and persistent data (scheduling, retention, integrity checks).
Implement a controlled cluster teardown process and an automated provisioning pipeline to recreate the cluster and restore backups end-to-end.
Integrate IaC tooling for reproducible cluster provisioning and automate the end-to-end restore flow.
Build a simple web or CLI interface to trigger and monitor the one-click disaster recovery process.
Technologies & Tools
Use Velero for backup/restore of Kubernetes resources and persistent volumes.
Use Ansible and Kubespray (or similar IaC tools) to provision and configure Kubernetes clusters reproducibly.
Use Python for orchestration/automation logic and React JS for any frontend interface required.
Desired profile
Engineer profile: Trainee (1 position).
Skills: Kubernetes administration, DevOps automation, experience with IaC (Ansible/Kubespray), backup/restore concepts, Python development; frontend React JS is a plus.
How to apply
Apply via the trainees platform: https://trainees-platform.proxym-group.net
Email subject to use when applying: "PRX-2026-17 - Automated System for Kubernetes Disaster Recovery"