[Job-15181] Senior SRE Engineer, Brazil

Brazil
North America – Proxima /
Full Time /
Remote
Position Summary:

The Site Reliability Engineer is responsible for the overall operational uptime of the client's environments including applications & service in Azure cloud, on-premise and AKS. As part of the Site Reliability team, fulfilling an operational role, this individual will be responsible for implementation of configuration changes, execution of upgrades and traditional on-call tasks and maintenance of a wide range of systems.

Duties and Responsibilities:

On-call and major incident management to ensure return to stability as quickly as possible.
Participate and respond to alerts and proactively manage issues.
Participate in release management for release and deployment of cloud infrastructure and services.
Perform regular build image updates for remediation of identified security vulnerabilities.
Ensure a standard platform is available, current, and extensible for all in-scope environments.
Ensure provisioning practices and documentation are current and maintained.
Identify opportunities in automation, participate in related activities, and manage content in revision control.
Manage implementation of configuration management for both server platform and service configurations.
Troubleshoot services and applications on AKS platform.
Deploy services to AKS using CI/CD pipelines and automation.
Supporting AKS infrastructure and upgrades.

Qualifications:

Bachelor’s degree in computer science or equivalent experience
7+ years production application support experience in a high uptime environment
7+ years UNIX administration experience including diagnosis of performance issues, package management, load estimation, kernel tuning, networking configuration, etc.
Excellent troubleshooting and analytical skills
Excellent scripting skills such as bash and python
Working knowledge of Kubernetes along with knowing how to interact with K8s API (kubectl, K9s, etc)
Experience with Azure AKS
Thorough knowledge and execution of methodologies such as TerraForm, Ansible and Puppet
Thorough understanding of YAML and other data management languages such as XML and JSON
Ability to work independently on large, complex projects with minimal guidance
Strong oral and written communications skills.
Ability to create systematic and manual operations procedures in both technical and user-friendly language.
Familiarity with process and efficiency enhancements.
Extensive knowledge of industry standard development methodologies and technologies.

#LI-SC1
#MidSenior