Senior DevOps Engineer
São Paulo, SP
Software Development – Infrastructure Engineering /
Full-time /
Remote
Engineering at TRACTIAN
The Engineering team at TRACTIAN builds and operates the cloud-native backbone that powers our industrial IoT platform. We design for massive scale, high reliability, and security across AWS, Azure AKS, and Oracle Cloud (OCI) Kubernetes clusters.
What you'll do
- Own end-to-end delivery pipelines—from GitHub commit to production—running on GitHub Actions, ECS Fargate, AKS, and OCI Kubernetes.
- Evolve our multi-cloud, multi-cluster architecture (AWS + OCI) with zero-trust networking.
- Write and maintain IaC (Terraform + Terragrunt), Helm charts, and Kubernetes operators to automate everything.
- Optimize observability: build dashboards/alerts using Grafana OSS stack, Prometheus, Loki, Tempo, and Datadog.
- Troubleshoot complex incidents involving microservices, monoliths in containers, and AI workloads on GPU nodes.
- Improve security posture—harden images, manage secrets, enforce policies, and audit compliance.
- Help other engineers on DevOps best practices and drive continuous improvement.
Responsibilities
- Apply DevOps practices to increase deployment speed, security, and quality.
- Architect and run CI/CD workflows in GitHub Actions (matrix builds, reusable workflows, OIDC federation).
- Design, build, and maintain Terraform/Terragrunt modules for VPCs, subnets, security groups, side-to-side VPNs, and private links.
- Manage container orchestration on ECS Fargate and Kubernetes (AWS & OCI) with Helm, Keda.
- Implement autoscaling, blue-green / canary releases, and cost-optimization for GPU and CPU workloads.
- Diagnose performance bottlenecks across network, compute, storage, and application layers.
- Maintain high-quality documentation.
Requirements
- B.S. in Computer Engineering, Information Systems, or equivalent experience.
- Strong scripting skills (Python, Bash); Go or Rust a plus.
- Hands-on CI/CD with GitHub Actions and experience running production workloads on:
- AWS: ECS Fargate, S3, RDS, CloudWatch, VPC networking.
- Kubernetes: OCI OKE, Helm, Istio, Keda.
- IaC expertise with Terraform and Terragrunt in multi-account/multi-cloud setups.
- Solid networking foundations: VPC design, subnets, routing, VPN/IPSec tunnels, security groups, load balancers.
- Observability stack experience (Grafana, Prometheus, Loki, Tempo, Datadog).
- Familiarity with container security, SBOMs, image scanning, secret management, and least-privilege IAM.
- Excellent problem-solving skills, ownership mindset, and ability to work autonomously within a distributed team.