Data Engineer (2021 Q4)

Austin TX or Toronto ON or Washington DC /
Data Science /
Full-time
About Cerebri AI
 
Cerebri AI is the creator of CVX, an end-to-end platform that automates AI processes from data engineering to the next best actions for multiple KPI and thus achieves CI. CVX 3 platform delivers the fastest time to market with production-ready BI and AI insights while not forgoing the quality that every enterprise or network operator expects from their analytics initiatives. Key to this performance is the fact that all processing on CVX is event-based.
 
CVX 3 platform implements Continuous Intelligence for classification, recommendation, and next best actions. It is designed to scale horizontally with data sources, event rates from said sources, KPIs management, BI insights, and AI insights. Continuous Intelligence ( CI ) is the ability to integrate raw data, calculate engineered data into datasets in real-time, score KPIs, generate insights seamlessly at scale. CI is essential to time-series processing. CI is the sine qua non for time-sensitive management of multiple Key Performance Indicators ( KPI ).
 
CI enables a slew of positive business outcomes and use cases:
·      Dynamic personalization: Serve content, propose products, promote services, or execute actions
·      Dynamic customer cohort creation: Determine cohorts of similar behavior or tuned to specific KPIs
·      Scalable actions across segments of customers
·      Usage-based/behavior-based pricing models: Insurance based on behavior
·      Abnormality and fraud detection: Identify and prevent unauthorized activity
·      Security and remediation: Detect issues and alert responders in exponentially less time than traditional security through intelligent analyses
·      Network performance: Monitor and respond to network performance issues faster
·      IoT analytics: Unify disparate data sources to reduce costs and improve performance
·      IOT TCO: Reduces the cost of installation by reducing tuning and maintenance 
 
How do we do this? We hire the best data scientists, mathematicians, and software developers and work as a cross-disciplinary team/gang/clan. We work hard, laugh hard, and impress our peers and clients. Because we can. And because we want to. To learn more, visit cerebriai.com. In the meantime, if you think you have what it takes, give us a spin and upload your resume.
 
"Cerebri AI was recognized as 2019 Cool Vendor for Customer Journey Analytics by Gartner."

Cerebri AI is looking for a full-time Data Engineer with experience in building data integrations as part of the Data Engineering & Pipeline team where we operate in a world of real-time data streams. If you have experience in the customer engagement/experience world, have a passion for bringing new concepts to life, and love using the latest open-source technologies, then join us!

This position will be responsible for building efficient data pipelines that transform streams of raw data into a format usable by downstream applications that serve both analytical and operational use cases. You will be working closely with the Software and Data Science teams to meet the data requirements of various initiatives in Cerebri AI.

Responsibilities

    • Apply knowledge of programming and data modeling to build data streaming platforms
    • Transition existing production processes from batch to streaming
    • Ingest, prepare, and aggregate data at scale to build feature banks for model training/scoring
    • Collaborate with data scientists, ask appropriate questions to gain a deep understanding of client data, and configure directed acyclic graphs for streaming system
    • Ensure all solutions comply with the highest levels of security, privacy, and data governance requirements including data anonymization, encryption, and security in transit and at rest, etc.
    • Maintain automated test coverage and comments against all code produced
    • Evaluate and improve efficiency and effectiveness of operations
    • Solve complex data issues and perform root cause analysis to proactively resolve product and operational issues
    • Own the data pipeline stack and support in the event of systems failure
    • Learn about the latest and greatest advancements in machine learning and data engineering while simultaneously looking for opportunities to apply them in our products

Qualifications

    • Bachelor's Degree in Statistics, Mathematics, Data Science, Computer Science, Engineering, or a related field
    • Experience in building ETL solutions using data pipeline products
    • Experience in engineering complex, high-volume data pipelines
    • Experience in relational databases and data manipulation languages such as SQL
    • Experience in at least one of the modern OOP languages (e.g., Python, Scala, Java)
    • Experience in code hosting platforms for version control and collaboration such as GitHub
    • Experience in writing production-quality code
    • Familiarity with OOP concepts, data structures, and algorithms 
    • Familiarity with cloud computing
    • Familiarity with scalable computing (e.g., Spark)
    • Familiarity with machine learning concepts (e.g., supervised vs. unsupervised, tree-based algorithms)
    • Good verbal and written communication skills with both technical and non-technical stakeholders
    • Team player

Nice to Haves

    • Master's degree or higher in a relevant quantitative subject
    • Familiarity with publish-subscribe paradigm and data streaming systems
    • Familiarity with Apache Kafka/Pulsar
    • Familiarity with Hadoop or AWS EMR
    • Familiarity with other distributed data stores (Elasticsearch, Apache Druid)
    • Familiarity with Atlassian suite (JIRA, Confluence, BitBucket) and agile programming concepts
How do we do this? We hire the best data scientists, mathematicians, and software developers and work as a cross-disciplinary team/gang/clan. We work hard, laugh hard, and impress our peers and clients. Because we can. And because we want to. To learn more, visit cerebriai.com. In the meantime, if you think you have what it takes, give us a spin and upload your resume.


Specify your location preference if we were to move away from all remote.