We are looking for a Data Engineer (Data Platform) responsible for building and operating the company's entire data platform. The role focuses on designing, implementing, and optimizing infrastructure serving ETL/ELT, Data Warehouse, Data Governance, Streaming, Analytics, and ML Platform. The Data Engineer will develop services, frameworks, and tools for ETL Engineers, Data Analysts, Data Scientists, and application systems to use data in a stable, efficient, and secure manner.
Key Responsibilities
- Design and build Data Platform including compute, storage, orchestration, metadata, observability, and security.
- Deploy, configure, and optimize systems such as: Airflow, dbt, ClickHouse OSS, Trino, S3/Lakehouse, Cube.js OSS, Kafka, OpenMetadata, OpenLineage, and related components.
- Build or customize services/platform components to meet requirements for ETL/ELT, data governance, lineage, data quality, metric layer, and company-specific needs.
- Build ingestion & streaming systems for data synchronization from tech-app → data platform and vice versa (CDC, event streaming, API pipelines...).
- Design and operate Data Warehouse / Lakehouse platform: partitioning, storage layout, performance tuning, cost optimization.
- Build observability & monitoring mechanisms: logs, metrics, alerting, data quality, lineage, SLA/SLO for all pipelines.
- Manage system lifecycle and operations: backup/restore, scaling, upgrade, security, data access control.
- Build or support building ML Platform: feature store, model registry, batch/stream inference, training pipelines.
- Collaborate with ETL Engineers, DA, DS, and backend teams to ensure data is transmitted, stored, and utilized according to architecture.
- Propose system improvements to increase stability, performance, and scalability.
Requirements
- Experience designing & operating large-scale data systems: DWH/Lakehouse, streaming, orchestration.
- Proficient in several tools/platforms: Kafka, Airflow, DBT, ClickHouse OSS, Doris, Spark, Flink, Cube, S3/Lake storage, OpenMetadata, OpenLineage, Grafana/Prometheus, or equivalent.
- Experience customizing or building services: Python/Go/Java, REST/gRPC, message queue, caching, containerization.
- Deep knowledge of data architecture: ingestion, modeling, storage layout, metadata, data quality, governance.
- Understanding of cloud infrastructure/Kubernetes/IaC (Terraform, Helm…).
- Skills in performance optimization, designing stable, fault-tolerant, observable systems.
- Strong technical thinking, quick adaptation, ability to design frameworks for multiple teams.
Preferred Qualifications
- Experience building Lakehouse or Data Platform from scratch.
- Deep understanding of distributed systems, file formats (Parquet/ORC), object storage, compute engines.
- Experience building ML Platform or workflows for ML Ops.
- Understanding of data security: IAM, encryption, data masking, audit logging.
Benefits
- Directly build the company's core data platform and ML Platform.
- Work with modern, open-source, and flexible technology stack.
- Opportunity to advance to Senior/Principal Data Engineer or Data Platform Architect.
- Work with data engineers, ETL Engineers, and DS/ML in a high-tech environment.




