Lead DevOps Engineer (#4926)

REFERRAL BONUS
$1000
Europe, LATAM
Work type:
Office/Remote
Technical Level:
Leader
Job Category:
Software Development
Project:
Leading platform for electronic agreements

While stateless applications are easily replaced, StatefulSets are the bedrock of our data integrity and service uptime. We are seeking a specialist to architect and manage the entire lifecycle of stateful workloads within our Azure-based MSF (Microservices Framework).

Your mission is to ensure that databases, message brokers, and persistent storage layers are architected for 99.99% availability. You will move us away from "snowflake" configurations toward a fully automated, self-healing stateful infrastructure where manual intervention is a relic of the past.

Working hours: 15:00-23:00 CET.

Responsibilities:

  • Lifecycle Orchestration: Automate the end-to-end lifecycle of StatefulSets: provisioning, seamless volume expansion, graceful termination, and automated re-attachment during node failures.
  • High Availability & Uptime: Implement advanced scheduling logic (Pod Topology Spread Constraints, Anti-affinity) to ensure stateful workloads survive zonal outages and maintenance windows.
  • Storage Performance & Tuning: Optimize Azure Disk (Premium/Ultra) and Azure NetApp Files integration via CSI drivers to minimize IOPS bottlenecks and latency.
  • Disaster Recovery Automation: Develop and test automated "Snapshot-to-Restore" pipelines. Ensure that the Actual State of data volumes can be recovered to the Goal State in minutes, not hours.
  • Infrastructure as Code: Utilize Terraform to provision the hardened Azure foundation (Disk Encryption Sets, Proximity Placement Groups, and Networking) required for high-performance stateful clusters.

Basic Qualifications:

  • Kubernetes Internal Mastery: Expert-level understanding of StatefulSet controllers, Persistent Volume Claims (PVCs), and the Container Storage Interface (CSI).
  • Azure AKS Specialist: Deep experience with Azure Kubernetes Service, specifically around persistent storage integration and Azure-specific networking constraints.
  • Automation & Scripting: Proficient in Go or Python/Bash for writing custom controllers or maintenance hooks (PreStop/PostStart) that ensure data consistency during updates.
  • Reliability Engineering: Proven track record of managing production databases or distributed systems (e.g., Postgres, ClickHouse, Elasticsearch) on Kubernetes.

We offer*:

  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

*not applicable for freelancers

×

Easy apply

    or
    Refer a friend