Site Reliability Engineer
R&D – DevOps
Unbabel’s “Translation as a Service” platform allows modern enterprises to understand and be understood by their customers in dozens of languages.
Powered by AI and refined by a global community of tens of thousands of human linguists, Unbabel delivers professional-grade content at the scale required by modern enterprises like Facebook, Microsoft, Under Armour, Pinterest and Expedia.
Backed by Scale Venture Partners, Notion, Microsoft Ventures, Salesforce Ventures, Samsung NEXT and Y Combinator, Unbabel is accelerating the shift to a world without language barriers.
We are a diverse team, working everyday to build an outstanding organisational culture, based on strong values of transparency, team spirit and continuous learning, with a fast paced Silicon Valley atmosphere in the beautiful city of Lisbon, Portugal.
At Unbabel all engineers participate in maintaining and operating our production services. Site Reliability Engineers are experts that help grow and raise the bar on designing and operating highly reliable systems that are very low effort to operate and maintain.
We're looking for a software or systems engineer that can see the challenge and joy of designing, implementing and operating sharp foundational tools, like automated deployment systems, monitoring and logging services, load balancers, caching services, databases, message queue systems, deep health-checkers, security scanners, you name it.
We use AWS, Kubernetes, Terraform, Python, Ansible and a few other things.
- Participate in the Oncall processes and help improve Incident Management practices.
- Develop tools and procedures to allow engineers to ship their code faster and reliably.
- Participate in the design and validation of complex distributed systems focussing on their reliability and fault tolerance.
- Mentor and assist other engineers on maintaining and operating production services.
- Raise the bar on monitoring and alarming practices applied to all services of Unbabel.
- Troubleshoot and diagnose exotic issues on complex distributed systems.
- Perform deep-dives, root cause analysis and write Post-Mortems.
- Contribute to a culture of self-improvement and high standards of quality
- Lead and Grow the Site Reliability culture and team at Unbabel.
- Excellent communication skills and command of the English language.
- Excellent knowledge of either AWS infrastructure or Kubernetes.
- Strong software development skills in one of: Python, Ruby, Go, Java, C/C++.
- Experience in deploying and operating systems with alarming and monitoring.
- Experience with operating complex distributed systems deployments.
- Solid knowledge of Linux and Container internals
- MsC in Engineering Degree or Equivalent
- We have positions open for different levels of professional experience.
- Competitive salary at one of Europe’s leading tech startups
- Stimulating startup environment committed to diversity and inclusion
- Individual budget for training and conferences
- Individual budget to setup your workstation (mechanical keyboard, mouse, etc.)
- Stock options
- Health Insurance
- MacBook and external monitor
- Yearly company retreat
- Healthy food(fruit, dairy & snacks) in the office
- English, Portuguese and Japanese language courses
- Surf trips every Thursday morning before work
- Team lunch every Tuesday
- Drinks and snacks every Friday
At Unbabel, we're now facing a new frontier of challenges and quickly growing from a start-up to a scale-up. Join this dynamic diverse team, and help raise the bar making reliable systems built for failure and bigger scale.