Data Parallel Accelerator Post-Silicon Performance Lead

Santa Clara, CA, Austin TX

Engineering – Silicon Engineering /

Full-time /

Hybrid

Join a well-funded, innovative hardware startup in Silicon Valley as the Post-Silicon and Emulation Performance Lead Engineer.

About the Role: As a key technical leader, you will drive silicon performance analysis and optimization across software, firmware, architecture, power, and system design. Your work will ensure our silicon consistently achieves industry-leading efficiency and performance standards. This role offers a rare opportunity to shape future architectural directions by executing and analyzing end-to-end workloads in advanced post-silicon environments. You will champion best-in-class performance for both single-socket and scale-up/scale-out systems.

Our Mission: We are reimagining silicon to build accelerated computing platforms that will transform the industry. You’ll collaborate with some of the world’s most talented engineers to push boundaries in performance, energy efficiency, programmability, and scalability. Our environment encourages exploration across the full hardware-software stack, from ISA design and compiler optimization to RTL correlation, verification, and power/area analysis. We offer a creative, collaborative, and flexible workplace where you can contribute to our vision of hardware-software co-design and continually expand your expertise.

Key Responsibilities

Lead cross-functional performance validation: Analyze workloads and microbenchmarks in emulation and post-silicon environments, ensuring strong correlation with cycle-accurate models and RTL
System-level performance optimization: Measure and tune workloads (Generative AI, data analytics) for optimal performance per watt
Collaborate across teams: Work closely with design, architecture, systems, and software groups to enable enterprise use-case performance measurements
Power and performance correlation: Integrate silicon power measurements with simulation and full-chip projections to drive hardware/software tuning.
Performance infrastructure automation: Develop and automate tools for performance measurement, debug, and reporting.
Debug and tuning: Conduct system-level power and performance debugging, including silicon register tuning to meet aggressive performance targets.
Drive innovation: Influence architectural decisions and validation methodologies to ensure our platforms remain at the forefront of the industry

Required Qualifications

Deep expertise in GP-GPU architecture and microarchitecture
Strong programming skills in C/C++ and Python
Solid understanding of ML/DL workloads and benchmarks; experience optimizing LLMs at the system level is a significant plus
Familiarity with SIMT processing, cache, and memory hierarchies
Hands-on experience with performance counters and profiling techniques
Knowledge of performance improvement concepts: bottleneck analysis, latency hiding, speculative execution, resource scheduling, buffer sizing, replacement policies
Experience with embedded systems (bare-metal testing/debugging is a plus)
Excellent teamwork, ownership, and communication skills; ability to thrive under aggressive schedules and adapt quickly

Education and Experience

Bachelor’s degree with 12+ years of experience in a relevant field
Master’s degree with 10+ years of experience in a relevant field
PhD with 5+ years of experience in a relevant field

Why Join Us?

Shape the future of silicon and accelerated computing

Work alongside world-class engineers

Explore research and engineering across the hardware/software stack

Flexible, creative, and collaborative environment

Opportunity to drive architectural innovation and industry impact

If you are passionate about performance leadership in advanced silicon systems and eager to make a mark on the next generation of computing, we want to hear from you.

Apply for this job