Linux Kernel Engineer

Remote /
Engineering /
Engineering
Flowmill analyzes operating-system telemetry collected using eBPF, providing deep visibility into cloud deployments -- no code changes required. The roadmap has some exciting projects, and we are looking for a strong Linux kernel engineer to drive eBPF instrumentation.

At Flowmill, we are maniacally focused on eliminating cloud application failure by building tools that quickly and automatically pinpoint service disruption -- both caused externally from faults in cloud infrastructure and API providers, and internally from bugs and configuration errors. The underlying tech (developed at MIT) is unique in its extremely low-overhead collection and analysis, its full coverage, and its ability to be deployed in minutes with no code changes or configuration. These allow Flowmill to provide SREs and DevOps engineers with smart alerts and a complete, easy-to-read picture of their deployment -- dramatically accelerating fault resolution.

This is a chance to join a small, rockstar team with backgrounds at Facebook, Google, and VMware and change the way engineers achieve high availability and performance in their production applications.

What You’ll Do

    • Instrument the kernel using eBPF, including bad interactions of applications with the scheduler (throttling, frequent context switches etc.), adverse competition for NIC resources ("noisy neighbor"), and more.
    • Design and build high-performance event collectors for userspace events (using uprobes and/or ptrace), for application data (for example mechanisms that support low-overhead TCP collection in userspace), and for perf events (e.g., sample stack traces for distributed profiling).

Qualifications

    • 6+ years of systems software engineering (kernel or userspace), including 2+ years of kernel development experience in the past 4 years
    • Ability to interact with a C++ codebase
    • (Highly Desired) Experience with eBPF, perf, perf rings, DTrace; Experience with performance engineering, USE method, profiling, "Systems Performance: Enterprise and the Cloud", etc.
    • (Optional) Familiarity with AWS-based infrastructure, Kafka, Prometheus, Distributed systems