How we run Spark and Sqoop in production(thumbtack.com) |
How we run Spark and Sqoop in production(thumbtack.com) |
You can define dependencies between jobs based on output file which allows you to re-run only part of your pipeline
Airflow is very similar to Luigi; we've been using in in production to schedule all of our workflows for ~4 months now and it's worked out really well for us.