Senior Site Reliability Engineer (SRE) / Dynatrace - GP

Remote, Colombia / Remote, Costa Rica
Practice – Cloud Engineering /
Full-Time/Salary /
Remote
Gorilla Logic provides nearshore Agile teams to Fortune 500 and SMB companies, bringing unparalleled expertise in the delivery of full-stack web, mobile, and enterprise applications. Our highly collaborative Agile Gorillas are uniquely qualified to implement complex software initiatives. With offices in the United States, Costa Rica, Colombia and Mexico, Gorilla Logic helps clients gain competitive advantages to achieve results faster.

Senior Site Reliability Engineer (SRE)

Gorilla Logic is looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability and monitoring systems, including Dynatrace, to lead the design, implementation, and governance of monitoring frameworks. This role is focused on enabling product and operations teams through scalable observability tooling, enforcing best practices, and driving the migration of applications' monitoring components from on-premises environments to SaaS Dynatrace. You will act as a strategic partner to internal teams and a subject matter expert in monitoring strategy and execution.

Responsibilities

* Serve as the primary technical lead for the design and delivery of monitoring stacks tailored to team-specific requirements.
* Meet with Product Owners and Operations/Security teams to understand observability needs and translate them into reusable monitoring patterns.
* Develop initial dashboards and alerting frameworks, enabling teams to customize and maintain them going forward.
* Provide governance by ensuring engineering teams define and maintain accurate SLOs and SLIs.
* Act as the monitoring liaison for assigned teams, promoting a self-service observability culture.
* Lead the migration of over 200 applications' observability stacks from on-premises to SaaS Dynatrace, with a goal of completion within six months.
* Use Infrastructure as Code (IaC) tools like Terraform to automate deployment of dashboards, alerts, and metrics configurations.
* Import and manage Terraform state from existing Dynatrace on-prem setups and re-implement them in the SaaS environment.
* Advocate for observability best practices across teams and enforce consistency in implementation.
* Ensure alignment with organizational SLAs, incident response practices, and performance optimization goals.

Technical Requirements

* Bachelor's degree in Computer Science, Engineering, or equivalent experience.
* 5+ years of experience in site reliability engineering or observability-focused roles.
* Proven hands-on experience with Dynatrace administration, including user/role management, data sources, configuration, alerting, and dashboarding.
* Demonstrated expertise in application observability migration, especially from on-premise environments to SaaS platforms (preferably Dynatrace).
* Expertise in observability concepts, including SLIs, SLOs, error budgets, and alert tuning.
* Proficiency with Terraform and other IaC tools for managing observability infrastructure.
* Experience managing large-scale observability environments, including configuration governance and multi-team enablement.
* Familiarity with Agile development methodologies and cross-functional collaboration.
* Strong communication and mentoring skills; ability to train engineers in observability principles.
* Detail-oriented with a proactive mindset for process improvement and monitoring automation.

Bonus Skills

* Experience with Kubernetes, Docker, or other container orchestration platforms.
* Familiarity with additional monitoring or logging tools like Datadog, Sumologic, Prometheus, or Grafana.
* Experience with CI/CD platforms and deployment automation tools.
* Networking and security knowledge related to observability and telemetry.
* Experience in scripting languages such as Python, Bash, or Go.