Software Engineer [Senior, Staff, Principal]

Mission Bay, San Francisco
Engineering /
Full-time /
On-site
About Us 

LatchBio is building a cloud platform for biology that includes data storage, analysis, and visualization. Our end-to-end bioinformatics solution gives biologists direct access to computational tools without the need to deal with data infrastructure, command line interfaces, or code. Computational staff also benefit as they no longer manually execute analyses or setup and maintain infrastructure. Teams can aim higher and easily set up user-friendly interfaces and access infinitely scalable compute resources—something that would previously require an ad hoc solution, a large time investment, and likely hiring new expensive software engineering staff.

On a higher level, our centralized, easy-to-adopt, openly accessible platform solves the need to reinvent similar infrastructure at each biology company. By dedicating a team of product and software experts to the problem, we make software tooling a top priority, something not possible at companies making science their first priority. We bring the entire industry forward to the level of quality expected of modern software and beyond what even the biggest players can achieve on their own.

TLDR: the convergence of laboratory automation, high-throughput assays, and machine learning is moving the medium of biological discovery to silicon. At LatchBio, we are building the ubiquitous cloud platform to store, visualize and analyze data from biological experiments. 
- Data in biopharma will 20x in 7 years. $200m to $4b, 54% CAGR
- The increasing size of file data and compute needed for standard biological analysis will demand software innovation
- We are positioned to meet these software infrastructure needs

You can learn more about us at any of these places
- About Us
- Product Page
- Customer Use Cases
- Twitter Page
- Blog

About The Team

Our engineering team is small, ambitious, and fast. We place extremely high emphasis on code quality and professional development. Most of our engineers have joined before graduating college and were immediately able to perform at the equivalent of Senior level. We have experience investing in early-career hires and helping them become stellar engineers. 

Engineering Practices
- Ubiquitous in-depth code review
- Static typing wherever possible
- Very few meetings: daily 15-minute stand-up + ~2 hours of company-wide sync on Saturdays
- High trust environment & zero blame culture
- High ownership & responsibility for all members of the team regardless of level
- No code owners. Every engineer works on every product
- End-to-end development. No separation into frontend/backend/devops/DBA
- Close collaboration with the design, product, business, and customer success teams
- Emphasis on learning through weekly reading groups and occasional research projects
- Engineers are responsible for Quality Assurance/testing on their own features
- Aiming for code that is self-documenting and simple+robust without the need for excessive commenting or automated testing

About The Role 

This role will require both the understanding of low-level system concepts and the ability to ship high-level frontend/scripting code at high velocity. The following is a list of technologies and major concepts and their role within Latch. You do not need any existing experience with any one of these topics but will end up working with all of them at some point. The ideal candidate will have interest in each topic and in-depth familiarity with at least a few.

- PostgreSQL—main database, used also as a queue system, message broker, permissioning enginem, etc. We do not use any ORMs
 Concepts: transaction isolation, explicit locking, stored procedures, Postgres database administration, row-level security
- TypeScript, Python—main programming languages on the frontend and backend respectively
 Concepts: static typing, asyncio
- React—UI framework
 Concepts: writing hooks
- Distributed systems—most of our code is designed to be horizontally scaled. We maintain a distributed file system AWS
 Concepts: fuse, distributed systems, state machines
- Cloud environment
 Concepts: terraform, cloud administration, cloud cost optimization
- Kubernetes—cluster software
 Concepts: k8s administration, kustomize, cluster networking
- Linux, Docker, systemd—server system software
 Concepts: containerization, cgroups, writing sytemd units

Requirements

- Demonstrable in-depth understanding of some complex technology. See the Interview Process section for example topics
- Ability and interest to work with technology regardless of its complexity or role
- Motivation to learn and improve continuously
- Work well with minimal management
- Desire to create things and make them your own, contributing at every stage including requirement discovery, design, implementation, testing, and maintenance

Salary and Equity

- 160k to 260k
- Equity: roughly > 0.1%, < 0.5%

Benefits

- O-1/H-1B visa sponsorship
- Best vision, dental, health insurance available to companies of our size via Gusto
- 12+ meals delivered from restaurants each week
- $3,000 annual office technology stipend
- Biannual company-sponsored conference trips, textbooks, and other professional development
- Internal reading groups
- Company-wide 2-week Christmas vacation
- Annual company retreat
- 401(k) via Guideline
- Unlimited PTO
- Competitive cash compensation
- Large equity grants

Interview Process

Latch engineers typically benefit from knowledge, interest, and experience in engineering complex systems technology. At the bottom of this document is a list of example topics and sample problems that we find interesting and indicative of a candidates level.

To judge your fit we follow a 3-step process:
1: Introductory call. Be prepared to walk through your resume and talk in detail about past experience, going in-depth on your technical contributions.
2: Technical interview. This is a conversation on a technical topic to judge the depth of your knowledge. Be prepared to discuss a technology of your choice from your past experience similar in complexity to one of the example topics. Be prepared to discuss example topics of your choosing in case the original discussion ends up too short.
3: 1 week paid on-site contracting period. This is the time for us to meet each other and directly experience what it would be like to work together. Be prepared for a short onboarding followed by a real-world production task off our backlog (though we try to pick an interesting one). At the end of the week you will be required to give a short 1-hour presentation/explanation of your work and answer questions.

Example Topics and Problems

Databases

- Transaction isolation levels. SQL Repeatable Read and Serializable transactions. Implementing transaction isolation for all levels and Serializable transactions in particular.
- Locking. Using explicit locking to solve concurrency issues with Read Committed isolation.
- **Specific problem:** a User may create and delete "Teams". These are generic entities that are "owned" by the User that created them. How do we reliably limit the total number of Teams each User owns? Specify the tables, constraints, triggers, stored procedures, etc. The User is adversarial and can send any valid SQL requests directly to the database and manipulate the order in which statements in transactions complete (i.e. exploit race conditions or any other concurrency bugs).
- Please use PostgreSQL for any implementation specifics or examples

Javascript

- React. Rules of hooks. What is the purpose of `useCallback` and `useMemo`. Why should arrow functions and `.bind` be avoided in component props.
- **Specific problem:** `eslint-react-hooks` warns against using `async` functions in `useEffect`. Why? What similar problems can arise when using `async` in `useCallback`? Design a `useAsync` hook that works around these issues. There is a technical side to this question (things that React doesn't like) and a semantic side (things that behave weirdly in certain edge cases).
- TypeScript. Difference between `any` and `unknown`. Covariant and contravariant types.
- **Specific problems:** What is the most specific type for `(xs, cls) => xs.filter(x => x isinstance cls)`? (Note that TypeScript will not accept `filter` here, but an explicit for-loop would type-check). Why are function types contravariant in the parameter type?
- Immutable data structures (e.g. Immutable.js). Benefits and drawbacks vs mutable ones. What is the basic idea behind how these are implemented?

Python

- Threading. GIL. Multiprocessing using `concurrent.futures`.
- **Specific problem:** `tqdm` is a popular library for creating CLI progress bars. Design the simplest possible multiprocessing-compatible wrapper around a `tqdm` progress bar using the standard library. It should allow a number of processes to all add progress to a single progress bar. Hint: the intended solution simply uses a standard library class.
- `asyncio`. How to run multiple tasks at the same time.
- **Specific problem:** `aiohttp` is an async HTTP client library. It benefits greatly from reusing a single connection pool for each request. Design a way of using a `aiohttp.Session` to share a connection pool between multiple threads using the standard library. The consumer threads do not run `asyncio` loops.
- Static typing. How to properly add types to a function decorator.

GraphQL

- N+1 problem. How GraphQL solves it.
- Subscriptions.
- Apollo cache. Data normalization. Avoiding additional requests in child components.
- **Specific problem:** A UI displays a set of delivery orders fetching using a GraphQL API. `<OrdersPage/>` is the main page React component which contains `<Order/>` children. How should the queries be set up to make only one request? How should information be passed to the `<Order/>` children to take advantage of `React.lazy` components (i.e. avoid re-rendering each child when the parent renders)? How should subscriptions be set up to reliably update components state?

Containers

- Application isolation techniques. Traditional (file permissions, non-root users, kernel capabilities) vs `chroot` vs containers vs VMs vs AppArmor+SecComp. Limitations, pros and cons.
- Image building. Purpose and implementation of layers. Purpose and implementation of multi-stage builds. Reducing image size.
- Process supervisors/init processes. Purpose and different options (e.g. `tini`, `s6`, `supervisord`, `systemd`).
- Logging/monitoring solutions. Pros and cons of different approaches. Stdout/stderr, log files, fluentd/fluentbit, syslog (+ various backends e.g. rsyslog, journald), application-level logging instrumentation (e.g. in-app log file rotation/upload), distributed tracing.
- Solutions to Docker-in-Docker or running system-level software in container-like environments. GVisor, SysBox, FireCracker.
- Extra: modern [Dockerfile frontend](https://docs.docker.com/build/dockerfile/frontend/#dockerfile-frontend)
- Extra: purpose and implementation of VM hypervisors. Virtualization vs para-virtualization. `virtio`.

Kubernetes

- Basic internals. `etcd`, k8s resource definitions, controllers.
- Basic built-in resources. Nodes, Pods, Deployments, DaemonSets, Services, Ingresses, ConfigMaps, Secrets, Jobs, CronJobs, Horizontal Pod Autoscalers, Persistent Volume Claims, Storage Classes.
- Purpose of init containers, sidecar containers, ephemeral containers.
- Networking. In-cluster DNS. Network Policies. Purpose of service meshes. Load balancer setup and configuration. Reverse proxies.
- Autoscaling and node assignment. Node selectors and affinity, preemption, QOS, priorities. Scaling to and from 0 (any special configuration of cloud resources etc.).
- Extra: purpose and implementation of "node shells".

Database Implementation

- Basic data structures. B-Trees, Log-Structured Merge Trees. Tradeoffs.
- Crash recovery. Write-ahead log. Undo vs redo logging vs both.
- Replication. Unidirectional vs bidirectional. Physical vs logical. Single master vs multi-master. Role in system availability, upgrades, and backups.

Distributed Systems

- Consistency (C in ACID) guarantees. "Eventual" vs "strong" consistency (+ why these terms are imprecise). Distributed system vs single host database guarantees (e.g. linearity vs serializability). Impact on Availability. Byzantine failures.
- Consensus protocols. 2-phase commit, BFT, Paxos + notable variants, Raft.

UNIX-like Operating Systems

- Process state: current working directory, environment variables, parent process, kernel capabilities, signal masks, etc. Setting up and creating subprocesses using `fork(2)`.
- Inter-process communication. Stdio. Pipes. Shared memory. Sockets. System busses (e.g. dbus).
$160,000 - $260,000 a year