Site Reliability Engineer - Remote (North America)

Remote (North America)

Engineering /

Full-Time /

Remote

As a Site Reliability Engineer you will be responsible for maintaining high availability of production and non-production work environments. Your role will also be to automate all the manual tasks for developing and deploying code and data to implement continuous deployment and continuous integration frameworks.

Responsibilities

Our product is a cloud-hosted, workspace application for business intelligence, enabling users and organizations to create, edit, and share data tables and dashboards powered by our Druid database engine.

As such, your responsibilities would include:

Creating state-of-the-art technical architectures with automation to make complex product delivery easy (or at least easier)
Setup monitoring systems with the “three pillars of observability”: metrics, tracing, and logs
Implementing cloud architecture in GCP, AWS, and Azure
Maintaining and managing the whole deployment stack

Qualifications

Maintaining a streaming analytics pipelines service requires an impossible breadth of knowledge and experience, no individual will have all of them, and thus will be expected to learn some of these on the job.

Recognizing this, here are some of the key qualifications we seek in a successful candidate for this role:

5+ years of experience, ideally for an Enterprise SaaS company in the infrastructure, analytics, and/or data space
The ability to think through requirements to determine high-impact solutions to problems
Knowledgeable working with cloud infrastructure (AWS, GCP, Azure) and cloud data warehouses (BigQuery and Snowflake) in order to deliver end-to-end Cloud Infrastructure engagements that includes assessment, design, deployment and migrations.
Experience building and maintaining high-scale distributed systems in a service-oriented architecture, ideally with tools such as GCP, Pulumi, Kubernetes, Docker, etc.
Experience working with infrastructure technologies such as Kubernetes, Terraform, Helm.
Experience with continuous integration/continuous delivery systems using tools such as GitHub Actions, Travis, Argo, or Jenkins
Experience with one or more general purpose programming languages like Python, Golang, etc.
Setting up observability & monitoring tools like Datadog, Prometheus, Grafanaetc.
Hands-on experience in hardening infrastructure for security, performance,compliance & regulatory requirements

About Rill

Rill makes it easy to create and consume metrics by combining a SQL-based data modeler, real-time database, and metrics dashboard into a single product—a simple alternative to complex BI stacks. Our thousands of users love Rill for the "magical" experience of our real-time, interactive (and easy-to-use!) dashboards. Founded at the start of Covid, Rill is a remote-first company that values human connection. Our team is truly global, with co-founders in the Bay Area and India - with the team spread across the US, Europe and Asia.

We believe that having a team of diverse backgrounds and voices working together will enable us to create innovative products that improve the way people live and communicate. We are proud to be an equal opportunity employer, and committed to providing employment opportunities regardless of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, pregnancy, childbirth and breastfeeding, age, sexual orientation, military or veteran status, or any other protected classification, in accordance with applicable federal, state, and local laws. If you have a disability or special need that requires accommodation, please let us know.

Apply for this job