Site Reliability Engineer - Cloud Platform (Remote 🇮🇪)
Dublin, Ireland /
R&D – Cloud Platform /
This role will have the primary accountability of designing, implementing, and operating Couchbase’s Cloud platforms. Golang knowledge is a huge plus! The team operates with a “run what you write” philosophy and each engineer is responsible for deploying and operating the code they write.
A successful candidate must have demonstrable experience in at least one programming language (preferably Go), previous work in SaaS application development and operations. You will be working closely with the Support and Development team on the architecture and configuration of our AWS hosted infrastructure. You will be responsible to ensure the environment is built, deployed, configured, managed, and monitored correctly to support the business. You will drive decisions on the correct usage of cloud resources, troubleshoot performance issues, and ensure the highest level of reliability for the platform by tuning the environment for maximum scalability, cost efficiency, and security. Candidates must have experience developing and maintaining applications running on large public cloud platforms - ideally AWS, Azure, and GCP. This role is also open to remote work (USA, UK, India & Canada) as our teams are globally distributed. We are a remote-first team. Prior experience working remotely is not required, however, we are looking for team members who perform well given a high level of independence and autonomy and will establish a cadence of on-time delivery with high-quality work.
This role is also open to remote work (USA, UK, India & Canada) as our teams are globally distributed. We are a remote-first team. Prior experience working remotely is not required, however, we are looking for team members who perform well given a high level of independence and autonomy and will establish a cadence of on-time delivery with high-quality work.
- Design, deploy and maintain the requirements of a large scale cloud platform with a focus on the key pillars of the cloud: Reliability, Operational excellence, Security, Performance and Cost Optimization
- Own and be responsible for best practice use of our cloud ecosystem from the cloud infrastructure through to the use of our application
- Passionate about automating everything and proficient in at least one of the following languages (Golang, Python, Ruby)
- Understand why using infrastructure as code to efficiently provision infrastructure and services is the only way to build and maintain a large-scale cloud platform
- Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, and other similar tools.
- Experience working within an Agile/Scrum SDLC
- Integrate different components and develop new services with a focus on open source to allow a minimal friction developer interaction with the platform and application services
- Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware, operating environment, network, and application
- Evaluate performance trends and expected changes in demand and capacity, and establish the appropriate scalability plansTroubleshoot and solve customer issues on production deployments
- Ensure that SLAs are met in executing operational tasksCollaborate with other engineers to implement operational solutions while defining, adhering to industry best practices
- Experience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source code
- Systematic problem-solving approach, combined with a strong sense of ownership and drive
- Conduct periodic on-call duties
- Working knowledge of information security issues
- Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
- 5+ years related professional experience
- 2 to 5 years as a cloud administrator supporting enterprise computing platforms and systems
- Public cloud provider certifications are great to have
- Strong experience with Infrastructure as Code and Configuration Management tools. Preferably Terraform
- Demonstrable experience of methods to promote the correct use of cloud platforms with multiple layers of abstraction and responsibility
- Experience with Prometheus/Grafana for metrics aggregation/visualization
- Configuration of CI/CD pipelines. Preferably Spinnaker
- Experience using Kubernetes
- Experience with automation tools/platforms
- Experience with alerting and monitoring tools
- Experience working with NoSQL databases is a plus
- Experience working in a highly distributed company is a plus
- Experience writing backend applications is not required but definitely a plus
- Experience working within an Agile/Scrum SDLC.Align a portion of your day with the business hours of Pacific Time Zone - UTC -8
What does success in this role look like?
- In three months, you have become the cloud administrator with respect to overall site availability, security, latency, system health, customer accounts, and billing. You’ll have taken on independent code review responsibilities and are collaborating on the design of new features
- In six months, you have earned the trust of the team and are delivering tasks through the entire SDLC, from design through development with minimal guidance, and are helping to effectively mentor new engineers joining the team
- In twelve months, you have established a cadence of predictable, on-time delivery without cutting corners
Couchbase's mission is to be the platform that accelerates application innovation. To make this possible, Couchbase created an enterprise-class, multi-cloud NoSQL database architected on top of an open source foundation. Couchbase is the only database that combines the best of NoSQL with the power and familiarity of SQL, all in a single, elegant platform spanning from any cloud to the edge.
Couchbase has become pervasive in our everyday lives; our customers include industry leaders Amadeus, AT&T, BD (Becton, Dickinson and Company), Carrefour, Comcast, Disney, DreamWorks Animation, eBay, Marriott, Neiman Marcus, Tesco, Tommy Hilfiger, United, Verizon, Wells Fargo, as well as hundreds of other household names.
Couchbase’s HQ is conveniently located in Santa Clara, CA with additional offices throughout the globe. We’re committed to a work environment where you can be happy and thrive, in and out of the office.
At Couchbase, you’ll get:
* A fantastic culture
* A focused, energetic team with aligned goals
* True collaboration with everyone playing their positions
* Great market opportunity and growth potential
* Time off when you need it.
* Regular team lunches and fully-stocked kitchens.
* Open, collaborative spaces.
* Competitive benefits and pre-tax commuter perks
Whether you’re a new grad or a proven expert, you’ll have the opportunity to learn new skills, grow your career, and work with the smartest, most passionate people in the industry.
Revolutionizing an industry requires a top-notch team. Become a part of ours today. Bring your big ideas and we'll take on the next great challenge together.
Check out some recent industry recognition:
Want to learn more? Check out our blog: https://blog.couchbase.com/
Couchbase is proud to be an equal opportunity workplace. Individuals seeking employment at Couchbase are considered without regards to age, ancestry, color, gender (including pregnancy, childbirth, or related medical conditions), gender identity or expression, genetic information, marital status, medical condition, mental or physical disability, national origin, protected family care or medical leave status, race, religion (including beliefs and practices or the absence thereof), sexual orientation, military or veteran status, or any other characteristic protected by federal, state, or local laws.