Senior Platform Engineer

Boston, MA
Engineering
Full-time
Overview
indico is a venture-backed startup making the application of deep learning practical in the enterprise. Our focus is on helping to automate tedious back-office tasks, and improve the efficiency of labor-intensive document-based workflows. The fundamental branch of technology we use to achieve this is known as transfer learning, which allows us to train machine learning models with orders of magnitude less data than is required by traditional techniques, with a strong emphasis on NLP and text processing.

About the position
We are looking for an experienced Senior Platform Engineer to help us solve challenging IaaS product deployments of our container services. Our IaaS supports multiple deployment environments including multi-tenant, private cloud(AWS,GCP,Azure) and bare metal. The person taking on this role will be responsible for our production related deployments, monitoring and environments. 

About you
You are an engineer that hates doing something more than once. You hate having to be heroic to be the hero. You believe that the only success is measured success. 

Experience: 3+ years professional experience

Responsibilities

    • Implement cloud agnostic deployment system that allows indico and our customers to deploy our system anywhere
    • Implement or develop systems for managing and monitoring a highly scalable and highly available distributed system.
    • Implement or develop system and software monitoring services, investigate bottlenecks and prescribe or implement solutions.
    • Develop ML specific monitoring and diagnostic tools to catch issues before they are problems

Requirements

    • Experience working with linux container system
    • E.g. Docker, Docker Compose, Kubernetes
    • Exposure to cloud infrastructures, such as AWS, GCE or Azure
    • Experience running a scalable services in production
    • Experience with micro-service architectures
    • Experience deploying to customer premises

Bonus

    • Experience in coding Python
    • Experience in managing a Prometheus monitoring system
    • Exposure to operating distributed systems at scale
    • Building internal developer tools
    • Build Automation experience
    • Experience work in Test driven development
    • Experience working in a Agile , continuous deployment environment
    • Experience troubleshooting large micro-service instances