Senior AI Site Reliability Engineer

Buenos Aires
Engineering – Engineering /
Contractor /
Hybrid
WHO WE ARE

SQUIRE is the leading business management system designed for the needs of barbers, shop owners, and their communities. We believe the pursuit of artistry and autonomy should not be restricted by the complexities of running a business. With SQUIRE, we provide custom-branded tools, resources, and guidance to help barbers of all stages and experience levels attract and retain more customers, efficiently manage their shop operations, and increase their revenue.

Founded in 2015, SQUIRE is trusted by barbers in 4,000+ shops in more than a thousand cities around the globe. From streamlined booking and opening new shops to real-time earning dashboards and building lasting customer relationships, SQUIRE supports shop owners in seamlessly bridging the gap between their personal craft and business goals. SQUIRE enables barbers everywhere to unlock their full potential both as artists and as entrepreneurs.

For more information, please visit getsquire.com or download the SQUIRE app from the App or Play Store.
 

SUMMARY

As a Senior AI Site Reliability Engineer, you will bring an AI-first mindset to solving classic reliability challenges. You’ll design, prototype, and deploy intelligent automation that improves observability, incident response, performance tuning, and operational efficiency across SQUIRE’s platform. This role is highly cross-functional, you’ll collaborate with engineering, infrastructure, and product teams to identify where AI can create leverage, then build and scale those solutions into production.

REPORTS TO

    • Senior Director, Platform Engineering

JOB DUTIES AND RESPONSIBILITIES

    • Develop and deploy AI/ML-driven solutions for monitoring, anomaly detection, and predictive alerting to improve system reliability and reduce MTTR.
    • Use AI techniques to optimize capacity planning, autoscaling, and resource utilization across distributed systems.
    • Automate repetitive operational tasks with intelligent agents and large-scale data analysis.
    • Integrate LLMs and generative AI into incident response, post-mortem analysis, and business continuity
    • Partner with platform and product engineering teams to embed AI-based observability into services from the ground up.
    • Continuously evaluate new AI/ML methods and tools to expand SQUIRE’s AI-driven SRE capabilities.
    • Drive a culture of experimentation: build prototypes, run pilots, measure results, and productionize what works.
    • Mentor engineers on applying AI approaches to reliability problems; help establish standards and best practices.

    • The duties and responsibilities outlined above are not a comprehensive list and additional tasks may be assigned from time to time based on business needs.

REQUIREMENTS AND QUALIFICATIONS

    • 5+ years of experience in Site Reliability Engineering, DevOps, or related roles.
    • Proven experience using AI/ML (supervised learning, anomaly detection, LLMs, etc.) to solve operational or reliability problems.
    • Strong background in distributed systems, cloud infrastructure (AWS Preferred), and container orchestration (Docker, ECS, Elastic Beanstalk).
    • Proficiency with observability stacks (Datadog, Sentry, Prometheus, etc.).
    • Solid programming/scripting skills in Python, Go, or similar — with experience integrating ML/AI libraries and APIs.
    • Hands-on with automation frameworks and infrastructure as code (Terraform, CloudFormation, etc.).
    • Excellent analytical and problem-solving skills, with the ability to innovate in operational domains.
    • Strong communication and collaboration skills across technical and non-technical stakeholders.
    • English proficiency is a must. It's important you can communicate your ideas clearly as you will be interacting with English-speaking coworkers.
    • Must be based in Buenos Aires.
    • Availability to work on-site in our office in CABA two days a week (Tuesdays and Thursdays).

NICE TO HAVE

    • Familiarity with generative AI/LLM deployment (e.g., for operational assistants, automated runbooks).
    • Experience with predictive scaling, proactive fault detection, or automated incident management systems.
    • Contributions to AI-Ops / MLOps tooling or open source reliability projects.
    • Background in applying AI to security operations or compliance monitoring.
Interview Accommodations
SQUIRE is committed to working with and providing reasonable assistance to individuals with physical and mental disabilities. If you are an individual with a disability requiring an accommodation to apply for an open position, please email your request to recruiting@getsquire.com and someone on our team will respond to your request.

Equal Employment Opportunity
SQUIRE provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

This applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.

Pay Transparency Nondiscrimination Provision
SQUIRE will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor’s legal duty to furnish information.