Senior DevOps Engineer

São Paulo, SP
Software Development – Infrastructure Engineering /
Full-time /
Remote
Engineering at TRACTIAN

The Engineering team at TRACTIAN builds and operates the cloud-native backbone that powers our industrial IoT platform. We design for massive scale, high reliability, and security across AWS, Azure AKS, and Oracle Cloud (OCI) Kubernetes clusters.

What you'll do

- Own end-to-end delivery pipelines—from GitHub commit to production—running on GitHub Actions, ECS Fargate, AKS, and OCI Kubernetes.
- Evolve our multi-cloud, multi-cluster architecture (AWS + OCI) with zero-trust networking.
- Write and maintain IaC (Terraform + Terragrunt), Helm charts, and Kubernetes operators to automate everything.
- Optimize observability: build dashboards/alerts using Grafana OSS stack, Prometheus, Loki, Tempo, and Datadog.
- Troubleshoot complex incidents involving microservices, monoliths in containers, and AI workloads on GPU nodes.
- Improve security posture—harden images, manage secrets, enforce policies, and audit compliance.
- Help other engineers on DevOps best practices and drive continuous improvement.

Responsibilities

    • Apply DevOps practices to increase deployment speed, security, and quality.
    • Architect and run CI/CD workflows in GitHub Actions (matrix builds, reusable workflows, OIDC federation).
    • Design, build, and maintain Terraform/Terragrunt modules for VPCs, subnets, security groups, side-to-side VPNs, and private links.
    • Manage container orchestration on ECS Fargate and Kubernetes (AWS & OCI) with Helm, Keda.
    • Implement autoscaling, blue-green / canary releases, and cost-optimization for GPU and CPU workloads.
    • Diagnose performance bottlenecks across network, compute, storage, and application layers.
    • Maintain high-quality documentation.

Requirements

    • B.S. in Computer Engineering, Information Systems, or equivalent experience.
    • Strong scripting skills (Python, Bash); Go or Rust a plus.
    • Hands-on CI/CD with GitHub Actions and experience running production workloads on:
    • AWS: ECS Fargate, S3, RDS, CloudWatch, VPC networking.
    • Kubernetes: OCI OKE, Helm, Istio, Keda.
    • IaC expertise with Terraform and Terragrunt in multi-account/multi-cloud setups.
    • Solid networking foundations: VPC design, subnets, routing, VPN/IPSec tunnels, security groups, load balancers.
    • Observability stack experience (Grafana, Prometheus, Loki, Tempo, Datadog).
    • Familiarity with container security, SBOMs, image scanning, secret management, and least-privilege IAM.
    • Excellent problem-solving skills, ownership mindset, and ability to work autonomously within a distributed team.