Data Parallel Accelerator Post-Silicon Performance Lead
Santa Clara, CA, Austin TX
Engineering – Silicon Engineering /
Full-time /
Hybrid
Join a well-funded, innovative hardware startup in Silicon Valley as the Post-Silicon and Emulation Performance Lead Engineer.
About the Role: As a key technical leader, you will drive silicon performance analysis and optimization across software, firmware, architecture, power, and system design. Your work will ensure our silicon consistently achieves industry-leading efficiency and performance standards. This role offers a rare opportunity to shape future architectural directions by executing and analyzing end-to-end workloads in advanced post-silicon environments. You will champion best-in-class performance for both single-socket and scale-up/scale-out systems.
Our Mission: We are reimagining silicon to build accelerated computing platforms that will transform the industry. You’ll collaborate with some of the world’s most talented engineers to push boundaries in performance, energy efficiency, programmability, and scalability. Our environment encourages exploration across the full hardware-software stack, from ISA design and compiler optimization to RTL correlation, verification, and power/area analysis. We offer a creative, collaborative, and flexible workplace where you can contribute to our vision of hardware-software co-design and continually expand your expertise.
Key Responsibilities
- Lead cross-functional performance validation: Analyze workloads and microbenchmarks in emulation and post-silicon environments, ensuring strong correlation with cycle-accurate models and RTL
- System-level performance optimization: Measure and tune workloads (Generative AI, data analytics) for optimal performance per watt
- Collaborate across teams: Work closely with design, architecture, systems, and software groups to enable enterprise use-case performance measurements
- Power and performance correlation: Integrate silicon power measurements with simulation and full-chip projections to drive hardware/software tuning.
- Performance infrastructure automation: Develop and automate tools for performance measurement, debug, and reporting.
- Debug and tuning: Conduct system-level power and performance debugging, including silicon register tuning to meet aggressive performance targets.
- Drive innovation: Influence architectural decisions and validation methodologies to ensure our platforms remain at the forefront of the industry
Required Qualifications
- Deep expertise in GP-GPU architecture and microarchitecture
- Strong programming skills in C/C++ and Python
- Solid understanding of ML/DL workloads and benchmarks; experience optimizing LLMs at the system level is a significant plus
- Familiarity with SIMT processing, cache, and memory hierarchies
- Hands-on experience with performance counters and profiling techniques
- Knowledge of performance improvement concepts: bottleneck analysis, latency hiding, speculative execution, resource scheduling, buffer sizing, replacement policies
- Experience with embedded systems (bare-metal testing/debugging is a plus)
- Excellent teamwork, ownership, and communication skills; ability to thrive under aggressive schedules and adapt quickly
Education and Experience
- Bachelor’s degree with 12+ years of experience in a relevant field
- Master’s degree with 10+ years of experience in a relevant field
- PhD with 5+ years of experience in a relevant field
Why Join Us?
Shape the future of silicon and accelerated computing
Work alongside world-class engineers
Explore research and engineering across the hardware/software stack
Flexible, creative, and collaborative environment
Opportunity to drive architectural innovation and industry impact
If you are passionate about performance leadership in advanced silicon systems and eager to make a mark on the next generation of computing, we want to hear from you.