GenBio AI - CUDA Kernels Engineer

Are you authorized to work in the country where this position is based?

If you don't have work authorization, will you require work authorization sponsorship?

Are you willing to relocate if you do not live close to the local office of choice?

How soon can you move to the area of the local office?

within 1 month after offer signed
1-3 month after offer signed
3-6 months after offer signed

Briefly describe the most relevant project you have worked on. Be sure to outline your specific contributions.

What is your experience level with writing and optimizing GPU kernels using CUDA or similar low-level programming frameworks (e.g., Triton, OpenCL)?

Advanced – I have independently written and optimized custom CUDA or Triton kernels for performance-critical applications, and understand warp-level programming, memory hierarchy, and performance profiling.
Intermediate – I have written or modified CUDA/Triton kernels and used them in ML or HPC workflows, but optimization and debugging were supported by others.
Beginner – I’ve experimented with CUDA or similar frameworks in tutorials or coursework but haven’t deployed anything in a real system.
No experience – I have never written or optimized GPU kernels.

What is your experience level with AI accelerators or GPU/CPU hardware architecture and performance optimization?

Advanced – I deeply understand GPU/CPU architecture (e.g., memory bandwidth, SIMD, registers, cache), and have optimized software to maximize hardware utilization in ML or HPC workloads.
Intermediate – I have a solid understanding of GPU or accelerator performance characteristics and have used tools like Nsight, perf, or VTune for optimization guidance.
Beginner – I’m familiar with general GPU concepts and performance tuning ideas but haven’t optimized software for specific hardware architectures.
No experience – I haven’t worked on performance optimization with awareness of hardware internals.

What is your experience level with foundation model architectures and training infrastructure (e.g., Transformers, LLMs)?

Advanced – I have worked closely with training infrastructure and optimization for LLMs or other foundation models and understand architecture-level tradeoffs that affect training efficiency.
Intermediate – I’ve worked with Transformer-based models or training pipelines but not in a performance or systems optimization role.
Beginner – I have implemented or fine-tuned Transformer models but haven’t explored the system-level or architectural aspects.
No experience – I haven’t worked with LLMs or foundation models.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

CUDA Kernels Engineer

Submit your application

Links

Work Permit

Relocation

Short Answer

Performance Engineering

Additional information