Software Engineer, DevOps / Reliability

San Francisco /
Alchemy /
Full-Time
At Alchemy, we're building the fundamental developer platform to bring the benefits of blockchain to the world. Today, Alchemy, the leading blockchain developer platform, powers over $7.5 billion in transactions for millions of people in 99% of countries globally. Our mission is to provide developers with the fundamental building blocks they need to create the future of technology.

The Alchemy team draws from decades of deep expertise in massively scalable infrastructure, AI, and blockchain from leading institutions including Google, Microsoft, Facebook, Stanford, and MIT.

Backed by Stanford University, Coinbase, the Chairman of Google, Charles Schwab, and the executives of global organizations, Alchemy powers has been featured in Bloomberg, TechCrunch, Wired, and numerous other media outlets.

We're growing rapidly and hiring across many roles, so please reach out and say hi if you're interested in joining our awesome family and mission!

The Opportunity

As an engineer focused on DevOps and reliability at Alchemy, you'll be working with the wider engineering team on the design, deployment, and continuous improvement of the infrastructure that supports our developer platform used globally. You'll focus on improving developer productivity and product reliability as our product and team scale.

Responsibilities

    • Dual focus on developer productivity and product reliability
    • Improve important infrastructure and systems from an operational standpoint (i.e. deployment, logging, monitoring, alerting, etc.)
    • Develop and own best practices for managing production infrastructure: provisioning, application scaling, configuration management, capacity planning, monitoring, etc.
    • Develop and own best practices for developer processes: CI/CD, dev and staging environments, etc.
    • Provide input into long-term platform requirements and operational guidelines with a focus on reliability
    • Continuously raise our standard of engineering excellence by implementing best practices for coding, testing, and deployment
    • Build and maintain documentation around process and workflows

What We're Looking For

    • 4-8 years of experience as a DevOps or Site Reliability Engineer
    • Experience designing and operating large-scale, multi-region production systems
    • Experience working with AWS and cloud infrastructures in general
    • Experience building deployment pipelines leveraging common CI/CD tools
    • Experience with Infrastructure-as-Code (e.g. Terraform, CloudFormation, Chef, Puppet, etc)
    • Experience configuring and managing VPC networks
    • Experience with container schedulers and runtimes such as Docker and Kubernetes
    • Experience with MySQL and Redis
    • An understanding of security best practices
    • (Preferred) Experience with Typescript and Python
    • (Preferred) Experience with streaming infrastructure (Kinesis, Kafka, etc)
    • (Preferred) Experience with real-time telemetry and tracing tools like Prometheus and DataDog
    • (Preferred) Good understanding of web applications, microservice architecture