Site Reliability Engineer
Engineering – TE2 - Development
TE2 - The Experience Engine™ Inc, a division of accesso, is the leader in experience-driven, personalized advertisement and content delivery for connected consumers, bridging the physical and digital brand experience across mobile, wearables and other digital technologies. TE2 is designed for industries where an in-person experience is a critical engagement opportunity, including hospitality, resorts, theme parks, food, travel, education and healthcare. For more information about TE2, please visit www.theexperienceengine.com.
The Site Reliability Engineers objective is to essentially "make things scale" which includes: building software that automates experiences, developing utilities that provide insights/metrics, and providing instrumentation for the Engineering teams to more efficiently scale up the TE2 platform's performance.
Red Hat, Docker, Kubernetes, AWS, Jenkins, and Ansible are the main internal tech stack you will be working with.
Challenges that you may tackle include:
- Instrumentation and metrics collection from AWS lambda FaaS or otherwise immutable containers
- Minimize and harden microservices and public-facing API gateway attack surface
- Continuous delivery using tools such as Jenkins pipelines, Docker, Kubernetes
- Observability, capacity planning, system and service performance analysis and tuning
- Orchestration of AWS VPC resources using tools such as terraform, boto, consul
Some of the technologies you will be working with:
- Configuration management: ansible, aws-cli, git
- Operating Systems: mostly RedHat derived linux
- Containerization and virtualization technologies: Docker Enterprise, Kubernetes
- Metrics and monitoring: statsd, ELK, PagerDuty, Slack chatops
- Messaging: Kafka, RabbitMQ
- Microservices patterns: Eureka, Ribbon, Hystrix, nginx
- Databases: Couchbase (NoSQL, N1QL), memcached, Elasticsearch, PostgreSQL, Oracle
- L2-L7 frame/packet/session inspection: netflow, WAF, pcap
- 5+ years of highly-available or high-volume site reliability engineering or systems administration
- 3+ years of infrastructure automation, configuration management or container orchestration
- Strong with one or more languages (Go (golang), Python, Java, Ruby, perl or bash) and git
- BA/BS in Computer Science or a related technical field (preferred, but not necessary)
- Periodic participation in an after-hours on-call rotation supporting production environments 24x7
- Willingness to embrace an agile devops culture
We're seeking deep expertise in one or more of the following:
- Deploying, configuring, scaling, debugging, and maintaining Kafka message brokers and Zookeeper clusters. In-depth knowledge of Kafka/Zookeeper internals is great.
- Managing Couchbase database clusters, encompassing provisioning, scaling, monitoring, and debugging. Expertise optimizing indices and queries is desired, as well as experience facilitating backup and recovery.
- Container orchestration in Docker Enterprise and/or Kubernetes environments. Managing, deploying, and configuring clusters running Swarm or Kubernetes, diagnosing networking issues, planning and implementing cluster upgrades.
What We Offer:
- Competitive compensation package including discretionary annual bonus opportunity;
- 26-days of paid annual leave for employees (paid leave increases with tenure);
- 8-hours of paid Volunteer Time Off to give back to organizations and groups you feel most passionately about;
- Robust health insurance scheme with the opportunity to participate in private medical scheme after satisfactory performance;
- Matching pension scheme (up to 8%);
- Unlimited access to Udemy for Business for continued learning and career development;
- A flexible work schedule around our core business hours.
- Eligibility to work in the US is required.
- accesso is a drug free company.
If you are interested in joining a team who values Passion, Commitment, Teamwork, Innovation and Integrity and what we’ve described above is YOU, then apply today and let’s talk!