Data Engineer Intern
A.I. & Research /
Reports to - Data Engineer Team Lead
About Dathena Science
Dathena is a deep-tech company that brings a new paradigm to data privacy and security solutions. In a world of ever-growing information, regulation, and consumer privacy expectations, enterprises around the globe rely on Dathena to identify, classify and control sensitive data, reduce risks, and enhance data protection framework.
Leveraging the power of modern AI technologies, Dathena delivers breakthrough, petabyte-scale solutions with unprecedented accuracy, efficiency and speed that build consumer trust in a digital world and ensure the “privacy and data security protection journey.”
Founded in 2016, Dathena continues to grow with its latest round of funding. With offices in Singapore, Bangkok, Geneva, Lausanne, Paris, and New York City, Dathena employs more than 70 people, including the world’s top data scientists and information risk experts. For more information, go to www.dathena.io/.
- As a Data Engineer, you will work, in the AI research team, on designing, building and maintaining data pipeline that help solve challenging issue for Machine Learning algorithm.
- Your primary focus will be managing the data repositories, as well as working on the Label Management platform of Dathena.
- Good communication with your team lead will be require for planning and synchronize with the R&D department and Engineering department. Datasets guidelines will be defined by you and adjusted based on both the R&D and the Engineering teams’ requirements. The position also requires a good understanding of databases organization and a willingness to design and create efficient tools that are sustainable in the long-term.
- Designing how data is stored, consumed, integrated, and managed in Dathena
- Build pipeline for collecting data from different sources (Databases, Web Scraping)
- Writing and enforcing guidelines for dataset creation and management
- Investigating, deploying and optimizing new data management tools (organization and visualization)
- Modularize the access of dataset for research purpose
- Providing reports, statistics and visualization on datasets
- Processing, cleaning and verifying the integrity of data used for analysis
- Learn and stay abreast of new technologies and advancements in data science
- Management of the current labelling pipeline
Skills & Qualifications
- Undergraduate, Bachelor’s, Master’s degree in Computer Science, Engineering, or in a related technical field
- A good Computer Science foundation in data structures and algorithms, and modern software engineering practices
- Proficiency in data processing using Python
- Scripting experience in Bash
- Experience in querying databases (SQL, Hadoop/ Spark environment, NoSQL)
- Will to take part in designing phase and good organizational skills
- Docker/Kubernetes knowledge is a big plus
- Machine Learning experience is a big plus
- Solid written and verbal communication skills in English
- Ability to work in a fast-paced startup environment adopting agile methodologies
- Good oral and written communication skills
- Time management
- Interpersonal skills
- Critical thinking
- Proactive and interested in the area of data security and governance. This temporary position may be converted into a full-time job.
As a Data Engineer Intern, you will be part of a highly qualified and dynamic team (PhD) where you will be able to learn many skills and grow. You must fully embrace the team spirit of a young and innovative start-up and be able to adapt to a multi-cultural environment.
Location: Singapore R&D Office Please take note that only shortlisted candidates will be notified.