GenBio AI - High Performance Computing (HPC) Engineer

Are you authorized to work in the country where this position is based?

If you don't have work authorization, will you require work authorization sponsorship?

Are you willing to relocate if you do not live close to the local office of choice?

How soon can you move to the area of the local office?

within 1 month after offer signed
1-3 month after offer signed
3-6 months after offer signed

Briefly describe the most relevant project you have worked on. Be sure to outline your specific contributions.

What is your experience level with managing and optimizing GPU clusters for large-scale ML workloads?

Advanced – I have independently deployed and managed GPU clusters, including installation, resource scheduling (e.g., SLURM), monitoring, and performance tuning for distributed ML workloads.
Intermediate – I’ve helped configure or maintain GPU clusters and can monitor and troubleshoot jobs, but haven’t independently built or optimized clusters.
Beginner – I’ve run jobs on existing clusters (e.g., via SLURM or cloud platforms) but have not configured or managed them.
No experience – I have not worked with GPU clusters.

What is your experience level with distributed deep learning and parallel training of large models (e.g., with PyTorch, DeepSpeed, Megatron-LM)?

Advanced – I have implemented distributed training pipelines across multiple nodes/GPUs using frameworks like DeepSpeed, FSDP, or Megatron-LM, and have tuned synchronization strategies and batch scheduling for scale.
Intermediate – I’ve run or adapted distributed training scripts using tools like PyTorch DDP or HuggingFace Accelerate but didn’t build or optimize them myself.
Beginner – I’ve used standard training scripts or single-GPU setups, but have not worked on parallel or multi-node training.
No experience – I have not worked on distributed model training.

What is your experience level with resource scheduling and containerized orchestration (e.g., SLURM, Kubernetes) for ML or HPC environments?

Advanced – I have designed and managed resource scheduling or orchestration workflows using SLURM, Kubernetes, or similar for large-scale ML/HPC workloads, including autoscaling and multi-user environments.
Intermediate – I have worked with job schedulers or Kubernetes in preconfigured environments but did not configure them myself.
Beginner – I have basic familiarity or followed tutorials using schedulers or containers for small-scale experiments.
No experience – I have not worked with scheduling or container orchestration systems.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

High Performance Computing (HPC) Engineer

Submit your application

Links

Work Permit

Relocation

Short Answer

HPC

Additional information