Benchmarking Streaming Computation Engines at Yahoo(yahooeng.tumblr.com) |
Benchmarking Streaming Computation Engines at Yahoo(yahooeng.tumblr.com) |
With the Spark direct stream, kafka partitions are 1:1 with spark partitions, which means at most half of the workers would be doing work without a shuffle.
Seems like a pretty basic oversight that should be addressed.