Senior Site Reliability Engineer

Europe
Engineering – Platform /
Full-Time /
Remote
About Airalo
Alo! Airalo is the world’s first eSIM store that helps people connect in over 200+ countries and regions across the globe. We are building the next digital service that revolutionizes the telecom industry. We are a travel-tech company and an equal-opportunity environment that values and executes diversity, inclusion, and equity. Our team is spread across 50+ countries and six continents. What glues us together is our commitment to changing the way you connect.

About you
We hope that you care deeply about the quality of your work, the intrinsic worth of tasks, and the success of your team. You are self-disciplined and do not require micromanagement in terms of your skillset and work ethic. You do your best to flourish as an individual every day while working hard to foster a collaborative team environment. You believe in the importance of being — and staying — authentic, honest, positive, and kind. You are a good interlocutor with clear and concise communication. You are able to manage multiple projects, have an analytical mind, pay keen attention to detail, and love to get your hands dirty. You are cognizant, tolerant, and welcoming of vulnerabilities and cultural differences.

About the Role
Position: Full-time / Employee
Location: Remote-first
Benefits: Health Insurance, work-from-anywhere stipend, annual wellness & learning credits, annual all-expenses-paid company retreat in a gorgeous destination & other benefits

We are looking for an experienced Site Reliability Engineer to join our growing engineering team.We are a company that values SRE principles and practices. We believe in empowering our SREs to make data-driven decisions, automate operational tasks, and continuously improve the reliability of our systems. We foster a blameless culture where everyone is encouraged to learn from mistakes and share knowledge. If you are passionate about building and maintaining highly reliable systems, we would love to hear from you!

Responsibilities include, but are not limited to:

    • Develop and maintain reliable, scalable, and efficient systems.
    • Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and improve system reliability.
    • Conduct blameless post-incident reviews to identify root causes and implement preventive measures
    • Drive automation of operational tasks and incident response.
    • Develop and maintain runbooks and playbooks for common operational tasks and incident response.
    • Mitigate operational risks.
    • Work with software engineers to design systems for reliability, scalability, and maintainability.
    • Continuously evaluate and optimize system performance, capacity, and cost.
    • Participate in on-call rotation and be available to troubleshoot and resolve critical issues.

Must-haves:

    • Bachelor’s degree in Computer Engineering or a similar discipline.
    • 5+ years of experience as a Site Reliability Engineer or in a similar role.
    • 3+ years of experience with AWS services including strong knowledge of container orchestration.
    • 2+ years of Kubernetes experience
    • Deep understanding of observability principles and tools (logging, monitoring, tracing).
    • Experience with incident management and postmortem analysis.
    • Experience and interest in infrastructure as a code approach (Terraform).
    • Experience with chaos engineering and other techniques for testing system resilience.
    • Experience with CI/CD tools such as GitHub Actions.
    • Proficiency in at least one programming language (Python, Go, Java, etc.) for automation and tooling.
    • Comfortable with messaging systems (SNS, SQS, etc)
    • Ability to work independently and collaboratively in a fast-paced environment.
    • Team player and open to new ideas.
    • Good communication skills and fluency in English.

Good to haves:

    • Prior experience with Scrum and other agile methods.
    • Certification in relevant areas such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or similar.
    • Experience with AI-driven SRE tools for anomaly detection and improvements
    • Contributions to open-source SRE projects or communities.
    • Prior work experience in telecommunications.
    • Knowledge of eSIM and GSMA related technologies and services.
If you are interested in this position, please apply via the link.

We sincerely thank all applicants in advance for submitting their interest in this opportunity with Airalo.