Sr. Site Reliability Engineer

Sevilla, Spain or Remote
Engineering
Full-time

This position is specifically a remote role or based out of our Sevilla office. Want more insight on what it's like to be a part of a distributed team at Bitnami? Take a look at this article, written by our own Director of Engineering, Victor Tuson Palau.

Bitnami is at the forefront of innovation that scales up to the largest production clouds as well as down to laptop development environments. Millions of applications are launched every month with Bitnami technologies.

Our Site Reliability Engineering (SRE) team deploys microservices to clouds leveraging modern practices such as containers, Kubernetes and immutable infrastructure.  The SRE team is responsible for the availability and performance of the production infrastructure as well as partnering with the other engineering teams to successfully build, deploy and manage Bitnami’s services. We are all about tools and automation, not toil and firefighting. If you are a cloud and container-savvy automation and instrumentation zealot, you should join our mission to bring awesome software to everyone.

The principles that drive how we approach SRE at Bitnami are:

- If it’s repeatable, it can be automated; quality of life matters, it must not be subsumed by toil
- If it’s monitored, it can alert its owners; failure detected by humans ahead of systems are second degree failures
- If it's backed up, it can be restored; disasters must be recoverable
- If it's measured, it can be improved; when it fails, it’s a learning opportunity for that improvement

You must bring an understanding of the IT business (typically gained by having built or worked extensively with a private or public cloud); a broad perspective of the cloud industry and where it is headed; and experience in building solutions that scale. You will be collaborating with engineers around the world to bring cutting-edge solutions to market. Working with all of the significant cloud providers and container infrastructures will provide you with challenges and opportunities rarely found elsewhere.

Responsibilities

    • Creating and/or provisioning reliable tools and infrastructure that enables rapid iteration amongst the product, research and development teams
    • Automate All The Things by eating, sleeping and breathing Infrastructure as Code
    • Monitor, measure and troubleshoot infrastructure and services
    • Participate in the 24x7 follow-the-sun (US/Europe) on-call rotation to assure service SLAs are me
    • Optimize business continuity capabilities and drive down incident recovery times
    • Capacity planning and management

Requirements

    • At least 5 years of experience deploying, monitoring and troubleshooting multi-tier SOA applications, Rails, Node.js and distributed systems at scale
    • Software development with any or all these programming languages: Ruby, Go, Java, Javascript and Python
    • A passion for automated provisioning (Ansible, Puppet, Chef, etc) and instrumentation for status and trend monitoring (CloudWatch, DataDog, Icinga, Nagios, Graphite, Kibana, etc.)
    • Experience with modern application system log management (Syslog, SumoLogic, Loggly, Splunk, etc.)
    • Highly developed cloud literacy with strong knowledge of AWS, GCE and Azure
    • Broad experience with Linux kernel and shell, TCP/IP and HTTP
    • Designing networks and systems for security, encryption, performance and agility
    • Backup and restoration automation, business continuity planning and testing

Nice to Haves

    • Database administration with MySQL replication and high availability
    • Networking and security best practices with software defined networks
    • Container orchestration with Kubernetes, Docker Swarm, and/or Mesos
    • Big data, streaming and search systems like Cassandra, Hadoop, Spark, Kafka and ElasticSearch

BENEFITS/PERKS

    • Competitive salary and stock options
    • Flexible time off policy; we believe everyone needs to recharge
    • Sweet set-up; huge monitor and your choice of operating system and hardware
    • Annual trips to Spain (if working remotely)
    • Benefits vary based on location

Bitnami is a distributed engineering company with offices in San Francisco, USA and Seville, Spain. With team members also in Australia, Vietnam, UK, Italy, Washington DC , Uruguay and more, we've created an incredibly enjoyable and productive distributed environment.

We are bootstrapped, profitable and growing. Bitnami was also part of Y Combinator's Winter 2013 batch.

Learn more about our team and what it's like to work at Bitnami by visiting the About Us and Careers pages on our website.

Bitnami is an equal opportunity employer.