Senior Data Platform Engineer
Poland / Colombia / Argentina / Brazil / Mexico
Cloud Solutions – Data Platform /
Remote
Solvd is an AI-first advisory and digital engineering firm on a mission to redefine how AI transforms business. Working at the intersection of strategy and execution, we help clients move from experimentation to real ROI through the industry’s most advanced strategic advisory, data and platform engineering, and AI integration services. We’re an AI-native firm with over 12 years of experience, supported by offices in the USA, Brazil, Mexico, Ukraine, Poland, Argentina and Georgia.
We are looking for an experienced Senior Data Platform Engineer to join our growing team.
Responsibilities:
- Design & Optimization: Build, and fine-tune data clusters to support both batch and streaming workloads, ensuring optimal performance and reliability.
- Platform Development: Build and expand our (Spark, Hadoop, Kubernetes, Trino, Delta Lake, and Druid) ecosystems to meet evolving business needs and add new integrations, data ingestion, and data transforms as needed.
- Innovation: Introduce and scale new data platform solutions, iterating on our OLAP platforms and exploring next-generation data formats.
- Collaboration: Work closely with cross-functional teams, including infrastructure engineers, to align platform capabilities with organizational goals.
Required qualifications:
- Distributed Systems Expertise: Proven experience in scaling and tuning large deployments of Spark-on-Kubernetes and Spark-on-Hadoop.
- Object Storage Solutions: Knowledge of open-source S3 alternatives, including Ceph and MinIO.
- Storage Systems Knowledge: In-depth understanding of Hadoop and the HDFS protocol.
- Performance Tuning: Skilled in designing and optimizing shuffle-heavy systems, utilizing YARN or Kubernetes with remote shuffle services.
- Lakehouse Technologies: Hands-on experience with at least one lakehouse file format, such as Delta Lake, Apache Iceberg, or Apache Hudi.
- OLAP Systems: Familiarity with OLAP technologies, including ClickHouse, Apache Druid, Apache Pinot, or Apache Doris.
- Communication Skills: Strong ability to collaborate with diverse stakeholders and effectively communicate complex technical concepts.
- Problem-Solving: Proven track record of troubleshooting and resolving issues in large-scale, production environments.
Preferred qualifications:
- Advanced Data Formats: Experience with next-generation and multi-modal data formats, such as LanceDB.
- Self-Service Platforms: Background in building self-service stateful platforms.
- Accelerated Runtimes: Familiarity with native or accelerated runtimes for Spark, such as Apache DataFusion Comet, Apache Gluten, or NVIDIA RAPIDS.