LLM Ops Engineer
Pune, India
Pattern Corporate – Engineering /
Full-time /
Hybrid
Monitor, evaluate, and optimize AI/LLM workflows in production environments. Ensure reliable, efficient, and high-quality AI system performance by building out an LLM Ops platform that is self-serve for the engineering and data science departments.
Key Responsibilities:-
- Collaborate with data scientists and software engineers to integrate an LLM Ops platform (Opik by CometML) for existing AI workflows
- Identify valuable performance metrics (accuracy, quality, etc) for AI workflows and create on-going sampling evaluation processes using the LLM Ops platform that alert when metrics drop below thresholds
- Cross-team collaboration to create datasets and benchmarks for new AI workflows
- Run experiments on datasets and optimize performance via model changes and prompt adjustments
- Debug and troubleshoot AI workflow issues
- Optimize inference costs and latency while maintaining accuracy and quality Develop automations for LLM Ops platform integration to empower data scientists and software engineers to self-serve integration with the AI workflows they build
Requirements:-
- Strong Python programming skills
- Experience with generative AI models and tools (OpenAI, Anthropic, Bedrock, etc)
- Knowledge of fundamental statistical concepts and tools in data science such as: heuristic and non-heuristic measurements in NLP (BLEU, WER, sentiment analysis, LLM-as-judge, etc), standard deviation, sampling rate, and a high level understanding of how modern AI models work (knowledge cutoffs, context windows, temperature, etc)
- Familiarity with AWS
- Understanding of prompt engineering concepts
- People skills: you will be expected to frequently collaborate with other teams to help to perfect their AI workflows
- Experience Level 3-5 years of experience in LLM/AI Ops, MLOps, Data Science, or MLE