Data Engineer
Obol
Responsibilities
- Ingest & model Beacon‑chain data — blocks, attestations, sync‑committee aggregates, deposits, and slashings—into ClickHouse and MongoDB at multi‑TB scale.
- Develop scalable ETL/ELT pipelines in Apache Spark (PySpark/Scala) orchestrated via GitHub Workflows and containerized CI/CD.
- Implement columnar schemas & partition strategies to achieve sub‑second analytical queries and reduce storage footprint.
- Expose clean, version‑controlled datasets & metrics to internal stakeholders through APIs, dashboards, and notebooks.
- Collaborate with Protocol & DevOps teams to surface validator health, slash‑risk events, and protocol‑level anomalies in real time.
- Own data quality, lineage, testing, and documentation across the stack; champion best practices and continuous improvement.
- Contribute to open‑source tooling around consensus‑layer data, distributed‑validator monitoring, and Ethereum research.
Requirements
- 2+ years of professional experience in data engineering or high‑performance backend roles.
- Production expertise with ClickHouse and Apache Spark on multi‑terabyte datasets.
- Hands‑on experience operating MongoDB for semi‑structured/operational workloads.
- Proficiency in Python (pandas/PySpark) and/or Scala; solid Git and CI/CD habits (GitHub Actions/Workflows or similar).
- Deep understanding of the Ethereum consensus layer (Beacon chain architecture, validator lifecycle, slashing conditions, client diversity—Lighthouse, Prysm, Teku, etc.).
- Comfortable working in a remote, asynchronous startup environment with high ownership and autonomy.
Nice to have
- Familiarity with Ethereum execution‑layer JSON‑RPC, MEV‑Boost, and block‑building economics.
- Experience operating distributed systems on Kubernetes, Nomad, or similar orchestrators.
- Fluency in Python.
- Exposure to data‑observability stacks (dbt, Great Expectations, Dagster) and time‑series monitoring (Prometheus/Grafana).
- Prior contributions to web3 or other open‑source projects.
About the team - How we work?
- Async‑first: proposals & design docs precede meetings.
- Small, senior team: high trust & ownership.
- Open‑source by default: most code and discussions are public.
Our Values
- Synergistic
- Secure
- Innovative
- Reliable
Compensation
- Competitive salary in dollars
- Full remote company - Work from wherever you want
- Possibility to attend to relevant Conferences
- 2 Recharge weeks at the end of the year
- Equipment budget