Site Reliability Engineer
Do you want to shape the future of enterprise software?
At Aera Technology, we apply Internet scale technology to the challenges facing enterprise businesses. Think of the self-driving car: connected, always-on, thinking, and autonomous. Our mission is to enable companies in the same way.
Once an Aera application is built by our developers it is handed over to the SRE/DevOps team. This is the team that supports the operations of our applications and services. They manage all environments from production, sandbox, sales, and implementation. This team pushes new code to our existing customers, monitors the health, performance, and reliability of the Aera stack, and in general, "keep the lights on" with 24/7 coverage.
The primary responsibilities for this role will be to use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues that relate to infrastructure; In order to adequately protect Aera assets and customer data as well as providing an escalation point for others to consult and trust.
- Designing, building, running and monitoring Aera's production infrastructure
- Responding to production incidents and determining how we can prevent them in the future
- Triaging and troubleshooting complex production issues to ensure reliability and performance
- Identifying and automating manual processes
- Continuously evolving our monitoring tools and platform
- Promoting and applying best practices for building scalable and reliable services across engineering
- Developing and maintaining technical documentation, runbooks, and procedures
- Supporting a 24x7 online environment as part of an on-call rotation
Required Skills & Education
- 7+ years of SRE/DevOps/infrastructure experience
- 7+ years of Experience deploying, operating and debugging server software on Linux at scale
- Unwavering commitment to identifying root cause of infrastructure issues and resolving them.
- Have experience automating and running large scale production Java/Tomcat services in AWS (EC2, ECS, KMS, Kinesis, RDS) or other cloud providers
- Advance experience with configuration management and orchestration tools (Ansible, Terraform)
- Experience with the use, maintenance and configuration of monitoring, metrics and logging infrastructure (Datadog, Sensu, New Relic, Icinga/Nagios, etc.)
- Aptitude for automation and streamlining of tasks in an SRE/Operations engineering context (Python, Go, Bash, Ruby, etc.)
- Have experience writing infrastructure as code using tools such as Chef and Terraform
- Comfortable working with modern databases and big data platforms (SQL, etc.) MySQL automation a Big Plus
At Aera, we're on a mission to solve the biggest, most intractable challenges of enterprise software. We envision the rise of the Self-Driving Enterprise: a more autonomously functioning business with a central operating system that connects and orchestrates business operations. Our platform is increasingly used by the world's largest companies to identify and respond to market opportunities faster.
If you share our passion for building the next generation of enterprise software and implementing it for the most sophisticated customers in the world, you’ve met your match. Headquartered in Mountain View, California, we're growing fast, with teams in Mountain View and San Francisco (California), Bucharest and Cluj-Napoca (Romania), Paris (France), Munich (Germany), London (UK), Pune (India), and Sydney (Australia). So join us, and let’s build this!