Exciting Opportunity at MSK: HPC Engineer
Research Technology Services, High-Performance Computing (HPC) is seeking an experienced Research Solution Engineer to complement our growing HPC team. You would be supporting Memorial Sloan Kettering Cancer Center as it strives to achieve its singular mission: ending cancer for life. This exciting position will join a team of experienced engineers, architects, and application specialists working towards this very challenging goal. You will have access to state-of-the-art equipment and support from world-leading researchers. Under the supervision of the Senior Director of Research Technology, the HPC Engineer will provide support for a complex multi-datacenter high-performance computing system.
Position Overview:
- Collaborate with world-class researchers to support their computing needs.
- Gain an in-depth understanding of the workflows of individual research groups.
- Work closely with HPC architects and engineers to ensure that research needs are met.
- Create documentation and provide HPC training for researchers.
- Assist HPC engineers and architects with day-to-day operations and ticket management.
Key Responsibilities:
- Design and support scalable and fault tolerant HPC systems, including network design and resource allocation.
- Leverage accelerators like GPUs to optimize HPC workloads.
- Perform hands-on system administration for HPC environments.
- Support and troubleshoot job scheduling and data management within the HPC environment.
- Compile, install, and debug scientific applications tailored for research needs.
- Develop and deliver training materials and sessions to educate researchers on HPC resources.
Key Qualifications:
- Proven experience in an HPC environment, including job scheduling, data management, and system administration.
- Infrastructure Knowledge: Ability to work on the infrastructure side of HPC systems, including designing scalable solutions and resource allocation.
- Hands-on experience with accelerators like GPUs to accelerate HPC workloads.
- Proficiency in scripting/programming languages such as BASH, Python, etc.
- Understanding of scientific workflows and life sciences research.
Core Skills:
- Background in designing HPC systems with fault-tolerant network architecture.
- Experience in life sciences research or supporting computational biology workflows.
- Knowledge of data transfer protocols and large-scale storage solutions.
- Proven ability to collaborate effectively with researchers and other stakeholders.
Additional Information:
Schedule: 2 days a week, 3 days remote. Monday – Friday, 9:00 AM – 5:00 PM EST
Location: Location Zuckerman Research Center. NYC
Reporting to: Associate Director, Infrastructure
Pay Range: $121,400.00-$200,400.00
Helpful Links: