Site Reliability Engineer

San Francisco
Engineering /
Full-time /
Hybrid
Ready to shape the future of AI infrastructure and build systems that power the most advanced unstructured data pipelines in the world?
At Unstructured, we’re building the backbone of generative AI—enabling companies to transform PDFs, HTML, Word docs, images, and more into high-performance data pipelines that scale. Our tools are already used by half of the Fortune 500, and our open-source package has been downloaded 26+ million times. Now we’re entering our next chapter—and we’re hiring a Site Reliability Engineer to help scale our systems and safeguard our infrastructure.

If you’re energized by reliability, love solving infrastructure challenges at scale, and want to help define how modern AI systems run in production, this is your moment. You’ll work closely with Engineering, Product, and Customer teams to build scalable systems, streamline CI/CD, and make reliability a first-class citizen across everything we deploy.

🏢 This role is hybrid in San Francisco—join us in-office 3x a week for deep collaboration, whiteboard sessions, and hands-on impact.

🔧 What You’ll Own & Drive

🛠 Scale & Stability at the Core
Design and implement highly available, observable, and scalable infrastructure across cloud environments
Build resilient systems that meet the demands of enterprise-grade, production AI workloads

⚙️ Automate Everything
Develop Infrastructure-as-Code using Terraform, Pulumi, and others
Own CI/CD automation and build reusable pipelines with GitHub Actions and modern DevOps tooling

🚀 Own Kubernetes & Orchestration
Manage and optimize our Kubernetes clusters and containerized environments
Tune Helm charts, service mesh configs, and orchestration systems for performance and security

📊 Obsess Over Observability
Implement and maintain monitoring, logging, and alerting with tools like Prometheus, Grafana, Datadog, and Elastic
Ensure we can see, understand, and respond to system behavior in real-time

🧪 Drive Production Readiness
Partner with engineering to prepare features and systems for production rollouts
Contribute to capacity planning, deployment strategies, and fault-tolerant system design

🔥 Lead Incident Response
Support and lead incident response processes, postmortems, and root cause analysis
Champion a culture of blameless retrospectives and continuous improvement

💻 Accelerate Engineering Velocity
Improve developer experience through tooling, automation, and streamlined feedback loops
Help teams move faster without sacrificing quality or uptime

🧬 What You Bring
-4+ years in SRE, DevOps, or Infrastructure Engineering roles supporting high-scale production environments
-Deep experience with cloud platforms like AWS, GCP, or Azure
-Expertise in Kubernetes, Docker, and container orchestration at scale
-Strong Linux systems and networking fundamentals
-Scripting and automation skills (Python, Bash, or Go preferred)
-Proficiency with Infrastructure-as-Code (Terraform, Pulumi, Ansible, or similar)
-Solid understanding of monitoring and observability best practices
-A calm, systems-thinking approach to incident response and reliability

💎 Bonus Points
-Experience supporting ML infrastructure or real-time data pipelines
-Exposure to serverless or event-driven architectures
-Contributions to open-source DevOps projects or communities
-Familiarity with security and compliance in cloud-native environments

🌟 Why You’ll Love It Here
Impact That Matters: Own the core infrastructure behind AI systems used by the Fortune 500
Big Technical Challenges: Solve hard, meaningful problems at the cutting edge of cloud and data
Elite Team: Join a sharp, humble group of engineers who value execution and impact
SF Office Vibes: Collaborate live with real whiteboards and real humans (not just Slack threads)
Flexible Culture: Hybrid structure with async-friendly, low-ego collaboration
$190,000 - $250,000 a year
This role's salary is benchmarked against San Francisco market rates to remain competitive with top-tier talent in high-cost-of-living regions. Final compensation may vary based on experience, skill set, and location.