Data QA Engineer - Contract Role
Bangalore
Analytics – Data Collection /
Full-time /
On-site
Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Pioneered by seasoned startup founders with a proven track record of creating and steering multiple unicorn companies, our groundbreaking GDP-shifting technology sets a gold standard.
Sanas is a 200-strong team, established in 2020. In this short span, we’ve successfully secured over $100 million in funding. Our innovation has been supported by the industry’s leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you’re not just adopting a product; you’re investing in the future of communication.
As a Data QA engineer, you will work on validating and curating audio and transcription datasets used to train and evaluate AI models. You will review real and synthetic audio for issues like clipping, background noise, and transcription errors, and help identify, reproduce, and document data anomalies. You will get to work closely with research and engineering teams, you will support data quality improvements, enhance validation tools, and contribute to scalable QA workflows; all while ensuring high standards of data hygiene and consistency.
Key Responsibilities:
- Conduct thorough validation of datasets used in model training and evaluation, focusing on transcription accuracy, metadata integrity, and audio quality.
- Review real customer calls and synthetic audio to detect data anomalies such as clipping, silence, incorrect speaker tags, or transcription mismatches.
- Reproduce and document data issues that impact model quality, enabling effective debugging and iteration by research teams.
- Curate, clean, and manage high-quality datasets from a variety of sources including customer calls, synthetic pipelines, and open-source corpora.
- Annotate and label audio with quality issues such as background noise, gender mismatches, speech overlap, silence, or segmentation errors.
- Collaborate with research and engineering teams to enhance data validation tools and scale automation within QA workflows.
- Ensure high standards of data hygiene, consistency, and reproducibility across all Data QA processes.
- Support data-related workflows such as data mining, extraction, transformation, and manipulation.
Must have qualifications:
- 2+ years of experience in Data QA, audio/transcription QA, or related quality assurance fields.
- Exceptional attention to detail with the ability to identify subtle inconsistencies and data quality issues.
- Hands-on experience with audio inspection tools like Audacity, Praat, or similar platforms.
- Familiarity with audio quality aspects such as clipping, background noise, channel imbalance, or a strong willingness to learn.
- Proficiency in handling structured data using tools like Excel, Google Sheets, CSV/JSON, and basic scripting in Python or Bash.
- Strong written communication skills for producing clear, actionable QA documentation and feedback.
- Knowledge of database languages (e.g., SQL) and experience working with DBMS tools like PostgreSQL.
- Demonstrated ability to collaborate effectively with ML researchers, product managers, and customer-facing teams.
Joining us means contributing to the world’s first real-time speech understanding platform revolutionizing Contact Centers and Enterprises alike.
Our technology empowers agents, transforms customer experiences, and drives measurable growth. But this is just the beginning. You'll be part of a team exploring the vast potential of an increasingly sonic future