A Scala API for Google Cloud Dataflow(github.com) |
A Scala API for Google Cloud Dataflow(github.com) |
A bit background: Spark and Flink are both frameworks with their own execution engine. Scalding is tightly coupled with Cascading + Hadoop as it's execution engine (also tez WIP). Dataflow Java SDK/Apache BEAM on the other hand is designed to be a simple abstraction with pluggable engines and Cloud Dataflow service is just one of the many runners possible.
Right now there are:
- local runner
- Dataflow runner, fully managed service in GCP
- Spark runner
- Flink runner
Scio wraps Dataflow Java SDK(Apache BEAM) and can potentially leverage any runner available.