Senior MLOps Engineer (Ray.io) (#3953)

Bengaluru
Work type:
Office/Remote
Technical Level:
Senior
Job Category:
Software Development
Project:
American multinational e-commerce company

Our client is a global commerce leader where you can influence how the world buys, sells, and gives. You’ll be part of a work culture that’s been genuinely committed to diversity and inclusion since its founding over twenty five years ago. Here, you can be yourself, do your
best work along with a team of professionals, and have a meaningful impact on people across the globe. We seek people with drive, ideas, and a passion for helping small businesses succeed to help

About the team:
We are the AI Platform Team! We are looking for a highly motivated, self-reliant, experienced SRE and customer support engineer who is passionate about driving major transformation within the AI organization.
This role will support the AI Platform and services that provide machine learning infrastructure to researchers and data scientists across the company. You'll be expected to stay in touch with the latest technology development, drive the implementation of DevOps practices across the organization, and provide customer support.

Job Functions

  • You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization.
  • You'll partner with vendors and the infrastructure engineering team for security and service availability.
  • You'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issues.
  • Diagnose and solve customer technical problems.
  • Participate in training customers and prepare reports on customer issues.
  • Be responsible for customer service improvements and recommend product improvements.
  • Write support documentation.
  • You'll design and implement zero-downtime to monitor and accomplish a highly available service (99.999%).
  • As a support engineer, find opportunities to automate as part of the problem management process, creating automation to avoid issues.
  • Define engineering excellence for operational maturity.
  • You'll work together with AI platform developers to provide the CI/CD model to deploy and configure the production system automatically.
  • Develop and follow operational standard processes for tools and automation development, including style guides, versioning practices, source control, branching and merging patterns and advising other engineers on development standards.
  • Deliver solutions that accelerate the activities phenomenal engineers would perform through automation, deep domain expertise, and knowledge sharing.

Required Skills

  • Demonstrated ability in designing, building, refactoring and releasing software written in Python, C++.
  • Hands-on experience with Ray.io, including workload management, cluster deployment, distributed task scheduling, and troubleshooting.
  • Ability to use Ray Dashboard and CLI tools for monitoring, resource tracking, debugging distributed jobs, and resolving production issues.
  • Having knowledge of Ray ecosystem libraries such as Ray Train, Ray Tune, Ray Serve, and Ray Data is a big plus.
  • Experience integrating Ray with tools such as Airflow, MLflow, Dask, DeepSpeed is a big plus.
  • Debugging and triaging skills.
  • Cloud technologies like Kubernetes, Docker and Linux fundamentals.
  • Familiar with DevOps practices and continuous testing.
  • DevOps pipeline and automations: app deployment/configuration and performance monitoring.
  • Test automations, Jenkins CI/CD.
  • Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
  • Well organized and able to manage multiple projects in a fast paced and demanding environment.
  • Good oral/reading/writing English ability.

 

 

We offer*:

  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

*not applicable for freelancers

×

Easy apply

    or
    Refer a friend