Site Reliability Engineer
Bucharest, Romania /
Do you want to shape the future of enterprise software?
At Aera, we deliver the cognitive technology that enables the Self-Driving Enterprise™: a Cognitive Operating System™ that connects you with your business and autonomously orchestrates your operations. Aera's Cognitive OS leverages the best of artificial intelligence, machine learning, natural language processing, big data, and enterprise domain expertise to deliver Cognitive Automation at scale for some of the world's largest companies.
Once an Aera application is built by our developers it is handed over to the SRE/DevOps team. This is the team that supports the operations of our applications and services. They manage all environments from production, sandbox, sales, and implementation. This team pushes new code to our existing customers, monitors the health, performance, and reliability of the Aera stack, and in general, "keep the lights on" with 24/7 coverage.
The primary responsibilities for this role will be to use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues that relate to infrastructure in order to adequately protect Aera assets and customer data as well as providing an escalation point for others to consult and trust.
- Designing, building, running and monitoring Aera's production infrastructure
- Responding to production incidents and determining how we can prevent them in the future
- Triaging and troubleshooting complex production issues to ensure reliability and performance
- Identifying and automating manual processes
- Continuously evolving our monitoring tools and platform
- Promoting and applying best practices for building scalable and reliable services across engineering
- Developing and maintaining technical documentation, runbooks, and procedures
- Supporting a 24x7 online environment as part of an on-call rotation.
- 3-5 years of SRE/DevOps/infrastructure experience
- 3-5 years of Experience deploying, operating and debugging server software on Linux at scale
- Unwavering commitment to identifying root cause of infrastructure issues and resolving them
- Experience automating and running large scale production Java/Tomcat services in AWS (EC2, ECS, KMS, Kinesis, RDS) or other cloud providers
- Advance experience with configuration management and orchestration tools (Ansible, Terraform)
- Experience with the use, maintenance and configuration of monitoring, metrics and logging infrastructure (Datadog, Sensu, New Relic, Icinga/Nagios, etc.)
- Aptitude for automation and streamlining of tasks in an SRE/Operations engineering context (Python, Go, Bash, Ruby, etc.)
- Have experience writing infrastructure as code using tools such as Chef and Terraform
- Comfortable working with modern databases and big data platforms (SQL, etc.) MySQL automation a Big Plus
At Aera, we're on a mission to solve the biggest, most intractable challenges in the world of enterprise software. We envision the rise of the Self-Driving Enterprise: a more autonomously functioning business with a central operating system that connects and orchestrates business operations. Our Cognitive Operating System is increasingly used by the world's largest companies to fundamentally transform their organizations and how work is done.
If you share our passion for building the next generation of enterprise software, and deploying it for the most sophisticated customers in the world, you’ve met your match. Headquartered in Mountain View, California, we're growing fast, with teams in Mountain View and San Francisco (California), Bucharest and Cluj-Napoca (Romania), Paris (France), Munich (Germany), London (UK), Pune and Bangalore (India), Sydney (Australia) and Singapore. So join us, and let’s build the future of work together!