Staff Backend Engineer – AI Algorithm Platform

Israel
R&D /
Full-time /
Hybrid
Cloudinary empowers companies to deliver exceptional digital experiences by managing the entire media lifecycle at scale. Within Cloudinary’s R&D, the Research Group leads the development of cutting-edge algorithms for media understanding, generation, and optimization. We are seeking an experienced Staff Backend Engineer to lead the engineering efforts behind our homegrown platform for serving and operating production-grade AI models and AI based algorithms. This is a mission-critical role for someone passionate about building highly-scalable, GPU-aware, cloud-native systems that act as the connective tissue between algorithm research and product innovation. You will play a pivotal part in re-designing and evolving the platform, while supporting both research and application teams across the organization, and contributing to MLOps initiatives.

Key Responsibilities


    • Platform Ownership
    • Own the architecture, stability, scalability, and performance of the system.
    • Design and implement platform features that support both synchronous low-latency and asynchronous compute-heavy algorithm execution.
    • Enhance GPU management, scheduling, and resource allocation for optimal performance and cost-efficiency.
    • Ensure robust Kubernetes-based deployment and observability for a highly dynamic system.

    • Cross-Team Collaboration
    • Act as the technical bridge between Research and Application teams by translating requirements into scalable system designs.
    • Collaborate closely with algorithm developers to streamline model deployment processes.
    • Partner with backend engineers (primarily working in Ruby and Go) to integrate the research group algorithms into Cloudinary services.

    • Engineering Excellence
    • Advocate for high standards in code quality, observability, testing, and security.
    • Guide engineering integration efforts when consuming the different platform APIs.
    • Provide mentorship, support, and best practices to other engineers interacting with the platform.
    • Take part in general R&D efforts, supporting a broader production environment.

    • Platform Extension and MLOps
    • Contribute to the evolution of MMS to support a wider range of algorithmic workloads and model types.
    • Help shape tooling and infrastructure for model versioning, rollout, monitoring, and testing.
    • Collaborate with DevOps and Infrastructure teams to maintain operational excellence, system observability, and robust infrastructure support

Your Qualifications

    • 8+ years of experience in software engineering, with 3+ years working on infrastructure/platforms involving ML/AI, GPU, or data-heavy systems.
    • Proficiency in Python and familiarity with backend languages such as Ruby and/or Go.
    • Strong understanding of Kubernetes internals and experience running GPU workloads in production environments.
    • In-depth knowledge of AWS services.
    • Experience architecting systems that support both real-time and asynchronous processing pipelines.
    • Familiarity with the ML lifecycle and MLOps practices, including CI/CD for models, monitoring, and rollback strategies.

Bonus Qualifications

    • Experience working in research-driven environments or alongside data scientists, algorithm research team and ML engineers.
    • Contributions to open-source projects related to model serving, Kubernetes operators, or ML platforms.
    • Experience supporting systems with diverse user groups across engineering and research disciplines.

Why Join Us?

    • Opportunity to build and scale a one-of-a-kind platform powering state-of-the-art media algorithms.
    • Collaborate with world-class research, engineering, and product teams.
    • Have a direct impact on product experiences used by millions of developers and end-users.
    • Be part of a culture that values creativity, autonomy, and continuous improvement.
#LI-SL1