Machine Learning Engineer
Las Vegas, Nevada
Engineering /
Full Time /
Hybrid
At Tensorwave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.
Job Description:
TensorWave is seeking a driven Machine Learning Engineer with expertise in training / fine-tuning, wide knowledge of open source AI libraries, exposure to kernel development and a passion for pushing the boundaries of GPU acceleration. In this pivotal role, you will empower our customers by supporting cutting-edge tools and techniques for fine-tuning and training deep learning models on AMD GPUs. Your work will directly contribute to the growth of the ROCm ecosystem and the advancement of PyTorch on AMD hardware, enabling users to harness the full potential of our AI cloud services.
Responsibilities
- Contribute to open-source deep learning libraries like PyTorch, advocating for and implementing ROCm support and enhancements.
- Develop in-house frameworks and tools that simplify and streamline model fine-tuning and training for our customers.
- Debug and identify compatibility issues with libraries, and collaborate with internal and external teams to resolve them.
- Design and develop optimization strategies to accelerate fine-tuning and training of deep learning models on AMD GPUs.
- Conduct in-depth research and performance analysis to identify and address bottlenecks in the AMD GPU acceleration pipeline.
- Stay at the forefront of advancements in deep learning, GPU acceleration, and model optimization techniques, particularly those related to ROCm and AMD hardware.
Essential Skills & Qualifications
- Equivalent of a Bachelor's Degree in Computer Science, Artificial Intelligence, or a related field.
- 3+ years of hands-on experience with PyTorch training and fine-tuning deep learning models.
- Strong understanding of GPU architecture, memory management, and optimization techniques.
- Proficiency in Python and C/C++ for implementing high-performance deep learning models
- Extensive experience with LLM / transformer architecture and a deep understanding of its internals.
- Experience with GPU kernel development (CUDA or ROCm) for deep learning applications.
- Excellent communication and collaboration skills, with the ability to effectively engage with both technical and non-technical audiences.
Preferred Qualifications
- Experience with Triton or other model deployment frameworks.
- Experience with distributed training across GPU clusters.
- Familiarity with networking protocols and technologies, especially in the context of HPC and AI.
- Contributions to open-source deep learning projects, particularly those focused on training / fine-tuning / optimizing LLMs.
- Familiarity with Python profiling and benchmarking tools.
- Familiarity with ROCm and its ecosystem, including libraries like hipBLAS and MIOpen.
- GDB Benchmarking/ Profiling/ Stacktrace
We’re looking for resilient, adaptable people to join our team—folks who enjoy collaborating and tackling tough challenges. We’re all about offering real opportunities for growth, letting you dive into complex problems and make a meaningful impact through creative solutions. If you're a driven contributor, we encourage you to explore opportunities to make an impact at Tensorwave. Join us as we redefine the possibilities of intelligent computing.
What We Bring:
In addition to a competitive salary, we offer a variety of benefits to support your needs, including:
Stock Options
100% paid Medical, Dental, and Vision insurance
Life and Voluntary Supplemental Insurance
Short Term Disability Insurance
Flexible Spending Account
401(k)
Flexible PTO
Paid Holidays
Parental Leave
Mental Health Benefits through Spring Health