N-iX is a global software development company founded in 2002, connecting over 2,400+ tech professionals across 40+ countries. We deliver innovative technology solutions in cloud computing, data analytics, AI, embedded software,IoT, and more to global industry leaders and Fortune 500 companies. Join us to create technology that drives real change for businesses and people across the world.
About the Team
We are the AI Platform Team, building and operating highly available, scalable, and automated infrastructure supporting global machine learning workloads. We are seeking a Site Reliability / DevOps Engineer with a solid background in Java development, who thrives in solving complex infrastructure challenges and driving platform automation. In this role, you will ensure reliability, scalability, and efficiency of our AI platform systems through automation, Java-based service optimization, and SRE best practices. You’ll collaborate closely with development, infrastructure, and research teams to deliver production-grade, self-healing, and performance-optimized services.
Key Responsibilities:
SRE / DevOps Focus (~40%)
- Design, implement, and maintain CI/CD pipelines for platform services.
- Manage and optimize Kubernetes clusters, Docker containers, and cloud infrastructure.
- Ensure high availability (99.999%), system reliability, and operational security.
- Automate infrastructure tasks, monitoring, and service deployments.
- Troubleshoot production incidents, perform root cause analysis, and implement preventive solutions.
- Drive observability improvements using Prometheus, Grafana, and log aggregation tools.
- Collaborate with developers to define operational standards and DevOps best practices.
Java / Platform Development (~60%)
- Design, develop, and optimize Java / Spring-based microservices.
- Contribute to service discovery, orchestration, and API development.
- Improve system performance, scalability, and resilience through code and infrastructure enhancements.
- Integrate application services into automated build and deployment pipelines.
- Work with both SQL and NoSQL databases to support scalable platform components.
Requirements
- 3–5 years of combined experience in SRE / DevOps and Java development.
- Strong proficiency in Java / Spring Framework (2–4 years of hands-on experience).
- Experience with Kubernetes, Docker, and Linux systems.
- Proven experience with CI/CD tools (e.g. Jenkins, GitHub Actions, GitLab CI).
- Understanding of cloud environments (AWS, GCP, or Azure).
- Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack).
- Good understanding of JVM tuning, profiling, and debugging.
- Excellent problem-solving, communication, and collaboration skills.
- Exposure to MLOps tools, Ray.io
- Fluent English (spoken and written).
We offer*:
- Flexible working format - remote, office-based or flexible
- A competitive salary and good compensation package
- Personalized career growth
- Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
- Active tech communities with regular knowledge sharing
- Education reimbursement
- Memorable anniversary presents
- Corporate events and team buildings
- Other location-specific benefits
*not applicable for freelancers