N-iX is a global software development service company that helps businesses across the world develop successful software products. Founded in 2002, N-iX has come a long way, expanding its presence across Europe, the US, and Latin America. Today, we are a strong community of 2,000+ professionals and a reliable partner for global industry leaders and Fortune 500 companies.
Our client is a global commerce leader where you can influence how the world buys, sells, and gives. You’ll be part of a work culture that’s been genuinely committed to diversity and inclusion since its founding over twenty five years ago. Here, you can be yourself, do your best work along with a team of professionals, and have a meaningful impact on people across the globe. We seek people with drive, ideas, and a passion for helping small businesses succeed to help.
We are looking for a SRE/MLOps engineer with Python and ML framework experience to drive operational excellence, automation, and platform reliability. You will join an AI Platform Team, providing highly available, scalable, and automated machine learning infrastructure for researchers and data scientists globally.
Responsibilities:
- Support the AI Platform and next-generation AI architecture for research and engineering teams
- Partner with vendors and infrastructure teams to ensure security and service availability
- Diagnose, triage, and fix production issues, including performance and functional problems
- Provide technical support to researchers, data scientists, and engineering teams
- Write support documentation and prepare reports on customer issues
- Identify opportunities for automation in the problem management process
- Collaborate with AI platform developers to implement CI/CD pipelines for automated deployment and configuration
- Develop and follow operational standards for tools, automation, versioning, and source control
- Ensure high availability of services (zero-downtime, 99.999%)
Requirements:
- Main coding language: Python
- Infra vs. coding requirements: 30:70
- Kubernetes / Docker level: Proficient / Experience
- Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton
- Familiarity with VLLM for large language models is a plus
- Debugging and triaging skills
- Experience with AI/ML model training and inferencing platforms is a plus
Familiar with DevOps practices and continuous testing
- DevOps pipeline and automations: app deployment/configuration & performance monitoring
- Test automations, Jenkins CI/CD
- Excellent communication, presentation, and collaboration skills
- Good oral/reading/writing English ability
We offer*:
- Flexible working format - remote, office-based or flexible
- A competitive salary and good compensation package
- Personalized career growth
- Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
- Active tech communities with regular knowledge sharing
- Education reimbursement
- Memorable anniversary presents
- Corporate events and team buildings
- Other location-specific benefits
*not applicable for freelancers