SRE/MLOps, ML Framework (#3950)

India

Work type:

Office/Remote

Technical Level:

Senior

Job Category:

Software Development

Project:

American multinational e-commerce company

N-iX is a global software development service company that helps businesses across the world develop successful software products. Founded in 2002, N-iX has come a long way, expanding its presence across Europe, the US, and Latin America. Today, we are a strong community of 2,000+ professionals and a reliable partner for global industry leaders and Fortune 500 companies.

Our client is a global commerce leader where you can influence how the world buys, sells, and gives. You’ll be part of a work culture that’s been genuinely committed to diversity and inclusion since its founding over twenty five years ago. Here, you can be yourself, do your best work along with a team of professionals, and have a meaningful impact on people across the globe. We seek people with drive, ideas, and a passion for helping small businesses succeed to help.

We are looking for a SRE/MLOps engineer with Python and ML framework experience to drive operational excellence, automation, and platform reliability. You will join an AI Platform Team, providing highly available, scalable, and automated machine learning infrastructure for researchers and data scientists globally.

Responsibilities:

Support the AI Platform and next-generation AI architecture for research and engineering teams
Partner with vendors and infrastructure teams to ensure security and service availability
Diagnose, triage, and fix production issues, including performance and functional problems
Provide technical support to researchers, data scientists, and engineering teams
Write support documentation and prepare reports on customer issues
Identify opportunities for automation in the problem management process
Collaborate with AI platform developers to implement CI/CD pipelines for automated deployment and configuration
Develop and follow operational standards for tools, automation, versioning, and source control
Ensure high availability of services (zero-downtime, 99.999%)

Requirements:

Main coding language: Python
Infra vs. coding requirements: 30:70
Kubernetes / Docker level: Proficient / Experience
Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton
Familiarity with VLLM for large language models is a plus
Debugging and triaging skills
Experience with AI/ML model training and inferencing platforms is a plus
Familiar with DevOps practices and continuous testing
DevOps pipeline and automations: app deployment/configuration & performance monitoring
Test automations, Jenkins CI/CD
Excellent communication, presentation, and collaboration skills
Good oral/reading/writing English ability

We offer*:

Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing
Education reimbursement
Memorable anniversary presents
Corporate events and team buildings
Other location-specific benefits

*not applicable for freelancers

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

SRE/MLOps, ML Framework (#3950)

POLAND

UKRAINE

BULGARIA

COLOMBIA

ROMANIA

SRE/MLOps, ML Framework (#3950)

You may also be interested in:

Subscribe to your search result