Senior Software Engineer, Devops

Bengaluru
Saviynt LABS – DevOps /
Full-Time /
Hybrid
As a Senior Software DevOps Engineer, you will lead the design,implementation, and evolution of telemetry pipelines and DevOps automation that enable
next-generation observability for distributed systems. You will blend a deep understanding of Open Telemetry architecture with strong DevOps practices to build a reliable, high-performance and self-service observability platform across hybrid cloud environments (AWS & Azure). Your mission: empower engineering teams with actionable insights through rich metrics, logs, and traces, while championing automation and innovation at every layer.

WHAT YOU WILL BE DOING


    • Observability Strategy & Implementation
    • Architect and manage scalable observability solutions using OpenTelemetry (OTel),encompassing:
    • Collectors: Design and deploy OTel Collectors (agent/gateway modes) for ingesting and exporting telemetry across services.
    • Instrumentation: Guide teams on auto/manual instrumentation for services (metrics, traces, and logs).
    • Export Pipelines: Build telemetry pipelines to route data to backends like
    • Grafana, Prometheus, Loki, New Relic, and Azure Monitor.
    • Processors & Extensions: Leverage OTel processors (batching, filtering,
    • resource detection) and extensions for advanced enrichment and routing.

    • DevOps Automation & Platform Reliability
    • Own the CI/CD experience using GitLab Pipelines, integrating infrastructure automation with Terraform, Docker, and scripting in Bash and Python.
    • Build resilient and reusable infrastructure-as-code modules across AWS and Azure ecosystems.Manage containerized workloads, registries, secrets, and secure cloud-native deployments with best practices.

    • Cloud-Native Enablement
    • Develop observability blueprints for cloud-native apps across AWS (ECS, EC2, VPC,IAM, CloudWatch) and Azure (AKS, App Services, Monitor).
    • Optimize cost and performance of telemetry pipelines while ensuring SLA/SLO adherence for observability services.

    • Monitoring, Dashboards, and Alerting
    • Build and maintain intuitive, role-based dashboards in Grafana ,New Relic..., enabling real-time visibility into service health, business KPIs, and SLOs. Implement alerting best practices (noise reduction, deduplication, alert grouping)integrated with incident management systems.

    • Innovation & Technical Leadership
    • Drive cross-team observability initiatives that reduce MTTR and elevate engineering velocity.
    • Champion innovation projects—including self-service observability onboarding, log/metric reduction strategies, AI-assisted root cause detection, and more.
    • Mentor engineering teams on instrumentation, telemetry standards, and operational excellence.

WHAT YOU BRING

    • 6+years of experience in DevOps, Site Reliability Engineering, or Observability roles.
    • Deep expertise with OpenTelemetry, including Collector configurations,
    • receivers/exporters (OTLP, HTTP, Prometheus, Loki), and semantic conventions.
    • Proficient in GitLab CI/CD, Terraform, Docker, and scripting (Python, Bash, Go). Strong hands-on experience with AWS and Azure services, cloud automation, and cost optimization.
    • Proficiency with observability backends: Grafana, New Relic, Prometheus, Loki, or equivalent APM/log platforms.
    • Passion for building automated, resilient, and scalable telemetry pipelines.
    • Excellent documentation and communication skills to drive adoption and influence engineering culture.

Nice to Have)

    • Certifications in AWS, Azure, or Terraform.
    • Experience with OpenTelemetry SDKs in Go, Java, or Node.js.
    • Familiarity with SLO management, error budgets, and observability-as-code approaches.
    • Exposure to event streaming (Kafka,rabbitmq), Elasticsearch ,Vault,consul