Senior Site Reliability Engineer (Heretic Stealth PortCo)

San Francisco
Stealth Mode Portfolio Company /
Full-Time /
Hybrid
Overview of Role

Heretic Ventures is seeking an experienced Site Reliability Engineer to join an early stage generative AI business that Heretic Ventures is launching.

The ideal candidate has built and operationalized cloud infrastructure from the ground up, with monitoring, alerting, and deployment orchestration. You are entrepreneurial and adapts to a fast-changing environment with limited time and resources. You get excited about designing and taking full ownership of our SaaS architecture. In addition, you have worked with complex systems at scale. You understand how to plan for growth and traffic patterns, and you know to implement the right safety checks to mitigate the unexpected. Working with both web app and AI engineering teams, this role will help define and implement the engineering tooling and processes needed to ensure our platform is performant, stable and scalable.

This is a unique opportunity to help build a billion-dollar company from the ground up while learning from successful repeat entrepreneurs and a team of powerful and experienced mentors and advisors. 

This is a hybrid role with the expectation of partial in-person work in our sunny Presidio, SF office. The position is compensated with salary, benefits, and equity.

About Heretic 

Heretic Ventures is a San Francisco-based venture studio ideating and launching new businesses in the creator economy, including those that capitalize on AI/ML technology. Heretic is run by Managing Partner Mariam Naficy, who founded and built the pioneering internet companies Minted and Eve.com. Heretic is backed by household names in Silicon Valley (investors and entrepreneurs), who act as the studio’s advisors both in selecting and in advising companies.

Responsibilities

    • Build and extend tooling for end-to-end ML model deployment and lifecycle management 
    • Setup, configure and connect cloud infrastructure services together to serve as the foundation of our platform
    • Automate deployment orchestration, building a fast and maintainable CI/CD pipeline for our web applications
    • Hook up real time monitoring and alerting for all parts of the web platform, enabling engineering teams to quickly respond to incidents.
    • Build and maintain analytics pipeline, connecting data sources to data warehouse, then from data warehouse to reporting platform and back to model training.
    • Collaborate with cross-functional teams to deploy and maintain AI models in production environments, ensuring scalability, reliability, efficiency and robustness
    • Orchestrate model serving to accommodate our unique infrastructure in a scalable manner
    • Configure and maintain Kubernetes clusters on Ubuntu.
    • Maintain backend planning and optimize GPU capacity continuously.

Qualifications

    • Bachelor's or Master's degree in Computer Science, a related field, or equivalent work experience
    • 5+ years of professional experience as DevOps, TechOps, or SRE engineer
    • Extensive experience with setting up IaaS cloud platforms (GCP preferred)
    • Experience scaling infrastructure for consumer facing web applications
    • Proven experience in working with and scaling GPUs
    • Proficiency in containerization technologies, especially Docker and Kubernetes
    • Proficient in Python and creating scripts to automate pipelines and processes
    • Extensive Linux troubleshooting experience
    • Excellent problem-solving and analytical thinking skills, with a strong attention to detail
    • Effective verbal and written communication a must. 
    • Comfortable working in a dynamic, fast-paced, and collaborative environment

Nice to Haves

    • Marketplace and/or E-commerce experience a plus
    • Experience with deploying AI models in cloud-based environments (Diffusion models preferred)
    • Experience managing Triton inference servers
    • Experience in popular machine learning libraries (e.g., TensorFlow, PyTorch, Spark)
Applicants must be authorized to work for ANY employer in the U.S. We are unable to sponsor or take over sponsorship of an employment Visa at this time.