About the Role:
We are looking for an experienced Data Service Module Engineer to develop and deploy the data service module for the HPC modeling project. This role focuses on implementing high-performance data storage and retrieval systems using HDF5 or similar, with parallel and concurrent I/O capabilities. The ideal candidate will have expertise in designing scalable data services optimized for HPC or distributed workflows, ensuring low latency and high throughput.
Key Responsibilities:
- Design and implement the data service module using HDF5 for efficient data storage and retrieval.
- Develop parallel and concurrent I/O mechanisms to optimize performance for large-scale datasets.
- Ensure the module is tightly integrated with HPC and visualization workflows.
- Optimize I/O operations for CPU/GPU-based workflows to minimize bottlenecks.
- Implement caching, compression, and other strategies to enhance performance.
- Design data structures and schemas suitable for storing 3D grid data and other simulation outputs.
- Ensure data integrity and consistency during concurrent read/write operations.
- Develop and execute test cases to validate module performance and reliability under various load conditions.
- Conduct benchmarking to ensure scalability across different hardware configurations.
- Document the architecture, APIs, and usage guidelines for the data service module.
- Provide technical support to the development and visualization teams for data integration.
Requirements:
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, or related fields.
- 3+ years of experience in developing and deploying data services for HPC or similar systems.
- Proven expertise with HDF5 or similar, in parallel I/O operations. Equivalent experience in distributed systems is also applicable.
- Programming: Strong proficiency in (at least one): C++, Python, GoLang, or Fortran.
- HDF5 Expertise: In-depth knowledge of HDF5 APIs and advanced features like parallel HDF5.
- Parallel I/O: Experience with MPI I/O, POSIX I/O, or similar frameworks for concurrent/parallel data access.
- Performance Optimization: Skills in profiling and optimizing I/O operations for large datasets.
- Proficiency in SQL and experience with any RDMS
- Might be a plus: knowledge of at least one orchestration and scheduling tool, for example, Airflow, Prefect, Dagster, etc.
- Strong problem-solving skills and ability to work in a multidisciplinary team.
- Excellent communication skills for cross-team collaboration and documentation.
Preferred Qualifications:
- Familiarity with data formats used in scientific computing, 3D visualization, and simulation workflows.
We offer:
- Flexible working format - remote, office-based or flexible
- A competitive salary and good compensation package
- Personalized career growth
- Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
- Active tech communities with regular knowledge sharing
- Education reimbursement
- Memorable anniversary presents
- Corporate events and team buildings
- Other location-specific benefits