Research Engineer

San Francisco
ML /
Full-time /
Hybrid

Submit your application

  • File exceeds the maximum upload size of 100MB. Please try a smaller size.

Links

You at Magic

  • Tell us about (or post links to) cool things you've built
  • Why do you want to work at Magic?
  • How did you hear about Magic?

Core ML team

  • In Transformers, the KV cache is a bottleneck during inference time. We once tried setting k=v in order to save 50% of the memory in the hopes performance remains unaffected. Think of this as self_attn(q=q, k=k, v=k) instead of self_attn(q=q, k=k, v=v). About an hour later, it became clear this was a pretty dumb idea for a theoretical reason (not only experimental). What could this theoretical reason have been?
  • Explain in simple words what you think might be happening inside large language models that makes them work so well. Nobody knows the full answer to this question - we welcome wild ideas and hypotheses.