Infrastructure/Site Reliability Engineer
Engineering – Backend Engineering
Who we are
TFG is the largest mobile game company in Latin America, and one of the largest in the world. In 8 years, we have released over 70 games, including hits such as Sniper 3D, the leading FPS game on App Store and Google Play Store, and Colorfy, the world's most popular coloring app. Our games have been downloaded 1 billion times in 125 countries. The team started with two brothers, and now there are around 320 of us – and counting. To build the very best mobile games, we gather exceptional talent in software engineering, art and animation, product design and management, marketing, and data science.
About the Team
An engineer in our team works with a global scale infrastructure and has great impact in millions of players. To guarantee the best experience possible, we count with several Kubernetes clusters spread around the world and connected to each other. We are in the cutting edge of open-source infrastructure technology, we adopted Kubernetes in production little after the project was launched and today we use technologies such as eBPF and Cilium in our network stack.
We handle billions of logs daily and have hundreds of nodes and thousands of containers to serve more than 1 million requests per minute. We know this number will only grow and we're looking for engineers that can help with the challenges of provisioning and operating infrastructure at large scale.
About the Role
TFG Co is searching for infrastructure/site reliability engineers to join our team. We seek an engineer with solid programming, network and operational systems knowledge. Since we are always looking for new tools and technologies that better solve our problems, we value professionals that like to learn new things, are autonomous and proactive to bring and implement their ideas.
We'll need you to understand our systems flows, diagnose problems in production environment, identify points of improvement and automation, and guarantee that we have the necessary infrastructure to create the best games in the world.
More about you
- Player focused. We are player oriented and infrastructure has a great impact in their experience. You have empathy with our players and focus on ensuring they have an amazing experience. You aim for a top-level infrastructure, guaranteeing the highest availability possible.
- Automation is key to scaling. We look for engineers that have a history of projecting and executing automation projects in order to get rid of any manual and repetitive tasks.
- Calm and pragmatism. When everything seems to be falling apart around you, you have a plan and keep calm.
- Bleeding edge. You are curious and like to study new technologies, test new solutions and measure the impact brought by changes. We want to ensure we are using the best stack possible
What you’ll do
- Develop, monitor and optimize infrastructure clusters (Kubernetes, Elasticsearch, MongoDB, Kafka...).
- Define monitoring and observability patterns.
- Troubleshoot and manage incidents in production.
- Automate and improve infrastructure provisioning (Infrastructure as Code).
What you'll need
- Bachelor's degree in Computer Science, Computer Engineering or equivalent experience.
- Linux knowledge. You should be able to discuss in detail what happens under the hood (SO, kernel, network).
- Solid knowledge in at least one programming language. We work mostly with Go and Python.
- Experience with large scale production systems and technologies.
- Experience with Kubernetes.
- Experience with monitoring systems (eg: Datadog, Statsd, Grafana, etc).
- Experience with infrastructure as code tools (eg: Ansible, Terraform, etc).
- Experience with messaging systems such as Kafka and Emqtt.
- Experience with database management (Postgres, MongoDB, Cassandra, Redis, ElasticSearch).
- Experience with CI/CD pipelines (eg: Jenkins, Travis, etc).
We welcome people from all backgrounds who seek the opportunity to help build the best gaming company, where everyone thrives.