Site Reliability Engineer / Observability Engineer III (Fixed Night Shift Role)
India - Remote
Public Cloud - Offerings and Delivery – Cloud Infra Services /
Full - Time /
Remote
Sr. Site Reliability Engineer III - Job Description
As a Site Reliability Engineer, you will play a key role in ensuring our systems remain reliable, available, and performant for both our customers and internal teams. Your expertise will directly impact our users' experience and the success of our business.
In this role, you'll collaborate closely with our product development and platform engineering teams to build scalable systems and create robust automation that supports our company's goals. Your day-to-day work will make a meaningful difference in how efficiently and effectively our technology operates.
We're looking for someone who has hands-on experience with technologies like AWS, CDN, Terraform, Packer, and Splunk. Keen troubleshooting abilities will be essential as you identify and solve complex issues in the critical applications our customers rely on daily.
The ideal candidate thrives on learning new technologies and approaches challenges with enthusiasm. You'll be joining a collaborative environment where your problem-solving skills will shine as you work across multiple teams. If you're self-motivated, passionate about quality, and ready to make an impact, we want to hear from you!
Responsibilities
- Responsibilities:
- · Collaborate with development teams to implement and deploy new features that meet high standards for reliability, security, and performance.
- · Partner with cross-functional teams to establish and enhance enterprise standards and best practices.
- · Develop and maintain effective monitoring tools, alerts, and dashboards that provide clear visibility into system health and performance.
- · Analyze metrics and logs to proactively detect anomalies, optimize performance, plan capacity, and isolate issues before customer impact occurs.
- · Identify innovative solutions to complex problems and implement corrective actions decisively.
- · Mentor junior team members while documenting and sharing solutions to build team knowledge.
Qualifications:
- Minimum 5 years' experience in DevOps engineering roles such as SRE, DevOps, CloudOps.
- Advanced proficiency with Terraform for infrastructure as code implementation (required)
- Extensive experience with AWS technologies and services, including EC2, S3, RDS, and IAM (required).
- Comprehensive understanding of HTTP protocols, web server technologies, and troubleshooting.
- Strong experience with load balancing solutions such as AWS ELB, NGINX, or HAProxy.
- Practical knowledge of caching technologies and CDN implementations.
- Working experience with Redis for in-memory data storage and caching.
- Demonstrated ability implementing and optimizing CDN solutions for global content delivery (Preferred).
- Expertise in monitoring and troubleshooting web application performance and availability.
- Practical experience with observability solutions such as Splunk, Datadog, or similar.
- Proficiency in one or more languages such as Java, Go, Python, or Linux Shell.
- Proven experience operating effectively in an agile software development environment.
- Strong understanding of AWS pricing/cost models across compute, storage, and database offerings.
- Experience implementing and maintaining CI/CD pipelines.
- Ability to multitask and adapt to changing priorities in a fast-paced, 24x7 environment.
- Collaborative approach to working with cross-functional teams of both technical and business professionals.
- Excellent communication, problem-solving, and customer service skills.
- Bachelor's degree in computer science, science, engineering or equivalent technical certifications preferred.