Platform Observability - Elixir/Erlang
Engineering – Engineering /
Eleos Technologies is a growing 10-year-old company building communication software for truck drivers and field workers.
We’re helping a diverse mix of customers—from mom and pop operations to thousand-truck fleets—improve how they communicate with their employees by tackling information overload, reducing phone calls, and eliminating obsolete technologies.
Our engineering team has been 100% distributed since day one. This means we’re tooled up to handle product, architectural, technical, and ops work all from home or remote. You won’t miss out of office conversations, and you won’t have to learn to drive a telepresence robot.
We began life as a bootstrapped startup, which allowed us to focus on building products our customers love. Today, we're part of the Knight-Swift family, which has allowed us to keep our focus and the freedom to build the best products we can for the transportation industry. We offer robust healthcare (dental, medical, vision), paid time off, and a generous equipment budget.
About the role
Thanks to a lot of hard work and careful engineering, our systems run extremely quietly on a day-to-day basis, and this role is about helping keep them that way as we grow and gain customers, increase traffic volumes, expand the engineering team, and refine our service level goals. By looking ahead and anticipating future performance work, early indicators of problems, and applying an eye to "ease of production operation" to new features, frameworks, and upgrades, you'll be an absolutely essential part of making the next decade of operations at Eleos exciting for the right reasons.
We have nearly a decade of experience operating OTP-based production systems, and we're excited to teach you what we know while learning from you too. We'll also continue to work on new features and fixes alongside you as we've done up until this point, but we're ready for a specialist who can bring their skills to bear on hotspots in application logic, the database, the network, and anywhere slowness or flakiness creeps in.
We take "you build it, you run it" very seriously—you can't expect to build reliable systems if an ops team is trying to keep up with modifications made by a disjoint product team. Along with the rest of the backend engineers, you'll participate in an on-call rotation to respond to and resolve (rare ⭐) major incidents and outages affecting our stack. That said, we take our evening, weekend, and vacation time really seriously—no email, no pings in Slack. We have automated rolling deploys, the ability to selectively enable changes and features behind deterministic feature flags, and a bunch of other techniques for de-risking even huge server-side refactors. We also try not to build or operate services outside our core competency—we happily pay other vendors to run Postgres, CI, and other systems so we can focus on solving unique problems instead of commoditized ones.
Because our core backend has no direct user interface besides a robust HTTP API, you’ll work closely with our iOS, Android, and frontend web engineers to make sure we have a complete picture of actual end-user experience—as you well know, it does no good if the backend is up but the mobile app is crashing, or vice versa. While the HTTP APIs provide a simple surface area, the application logic powering them ranges from weighted GIS queries to proprietary routing logic, each with their own unique set of runtime constraints, failure modes, and possible optimization wins.
You’ll have opportunities to flex your skills around uptime and performance monitoring, optimization, tracing, visibility, and error telemetry, and you'll get to learn from others in a collaborative environment. In time, you’ll also help onboard and mentor junior engineers as we grow our team.
This is a full-time, W2 position open to candidates located in the United States who are able to work Eastern Standard Time ± 3 hours, to facilitate realtime collaboration when needed. This role is permanently remote, although in non-COVID times we typically get together once a year to review and celebrate what we've accomplished.
We’re looking for a senior-level engineer—you won't know everything (nobody does!) but you should have a solid set of experiences in seeing systems work well, seeing systems you've built fail in various unexpected ways, fixing them, and striking a balance of engineering trade-offs to maximize the former and avoid the latter as you go.
You should be comfortable with a variety of approaches to debugging and tracing live systems as well as those "on the bench," including using a step debugger, packet captures, tracing, and have a sense of when to use each. You should have at least a solid theoretical mental model of modern program execution, including the stack, garbage collection, contention, threading, and sockets. At a higher level, you should feel confident instrumenting and developing automated monitoring for systems that may well wake you or a co-worker in the night.
Most importantly, you'll be joining a team of about 6 other full-time engineers, so you'll be able to work directly with the mobile and frontend teams to implement improvements all the way through the stack, from tracing headers to error tracking. Although we tend to release new mobile apps with larger features on a semi-monthly cadence, we ship our backend (thanks to automated tests) once a week or or more as fixes and smaller changes land, meaning you can move fast without the stress of pages and frequent rollbacks.
- 4+ years professional experience operating production systems on a major cloud platform (AWS, GCP/GAE, Azure)
- 1+ years professional experience working with Erlang or Elixir (and thus the BEAM runtime and OTP)
- A computer science-based educational background (can be a bachelor's degree or self-study!)
- Experience building SLO-based monitoring, setting and tracking error budgets, and automated or repeatable load testing
- A willingness to mentor other engineers and to share your knowledge to help them grow
- The ability to quickly convey ideas, opinions, and technical details in written English
The following aren’t requirements, but be sure to mention them if they apply to you:
- Experience with Docker or other common containerization platforms
- Experience with Postgres, Cassandra, DynamoDB, or other sharded data stores
- A passion for other functional programming languages like Clojure or Haskell
- You've ever held all 0.2 lbs of a Motorola Advisor Gold pager
We take a “reasoned opinions, weakly held” approach to tools, and resist letting those tools define our identity. That said, we’ve had positive experiences with (and made significant investments in) the following, which does mean you’ll be working with them at least in the medium term:
Elixir, plus Erlang/OTP
Amazon Web Services
If this sounds like a fun challenge and your kind of environment, drop us a line and let’s talk!
This role is not available to individuals located in or working from Colorado.