Senior Site Reliability Engineer

LATAM
TECH – Production Engineering /
Full time /
Remote
As a Senior Site Reliability Engineer at Catena, you will play a crucial role in maintaining optimal system performance and upholding high standards of availability, security, and resilience. Working at the intersection of software development and operations, you will collaborate closely with cross-functional teams to deliver high-quality services to our visitors.

YOUR CHALLENGE:

    • Proactively monitor system health, performance, and reliability metrics.
    • Design, implement, and maintain automation tools and infrastructure to streamline operations tasks.
    • Conduct capacity planning and scalability assessments to accommodate growing demands.
    • Collaborate with software development teams to improve system reliability, performance, and efficiency.
    • Participate in incident response activities, diagnosing and resolving issues to minimize downtime and service disruptions.
    • Conduct post-incident reviews and implement recommendations to prevent recurrence.
    • Contribute to the evolution of best practices and standards for reliability engineering within the organization.
    • Stay abreast of industry trends and emerging technologies to drive continuous improvement in our systems and processes.

TO DO IT, YOU WILL NEED:

    • Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience).
    • 4 years + of experience in a similar role, with a proven track record of improving system reliability and performance.
    • Proficiency in Linux, NGINX, PHP, Git, Docker, and containerized applications.
    • Hands-on experience with CI/CD pipelines and configuring monitoring tools like Datadog.
    • Strong scripting skills in Bash and/or Python.
    • Knowledge of databases such as MySQL and MongoDB.
    • Familiarity with edge and cloud computing services (e.g., Cloudflare, AWS).
    • Ability to prioritize tasks and thrive under pressure.
    • Exceptional stakeholder management skills in both technical and non-technical environments.
    • Proficient in spoken and written English with strong interpersonal skills.
    • Excellent team player with the ability to collaborate effectively with team members located across different regions of the globe.