The Backpressure Smackdown in review

So our three contenders have run the backpressure assault course. It’s been brutal. There were winners and there was definitely a loser. Let’s review how Storm, Spark and Flink got on and what this does and doesn’t mean.

What went down?

In short, Flink went down in flames, Spark wavered but stood firm and Storm largely breezed through. If we have to call out a winner then it has to be Storm, followed by Spark with Flink a distant third.

No-one was more surprised at that outcome than me. Had I been asked for it, my predicted outcome would have been the exact reverse of the results.

The numbers

Looking at the numbers they fall out as follows, where the throughput is given as a % of that for the minimal straggler scenario¹:

Engine	Slight straggler	Bad Straggler	Constant straggler
Storm	100%	100%	100%
Spark DStreams	82.5%	85%	82.5%
Spark Structured	100%	90 - 95%	99%
Spark Continuous	96%	99.5%	90%
Flink	80.5%²	45%	12%

Points of note

Spark DStreams incurred a 15% throughput drop from the outset due to over-partitioning.
Spark Structured Streaming drops off on the bad straggler scenario. This seems related to the over-partitioning factor and whether it happens to fit well or not with the delay caused by the straggler.
Spark Continuous Processing struggled a little with the constant straggler scenario.
We over-partitioned for the Spark scenarios by a factor of ten³.
and Flink, well, we didn’t really need the numbers given the charts we had previously seen.

What did Flink flunk?

The smackdown scenarios simulated varying degrees of a straggler node, with a dataflow that did not include repartitioning by key. This is the kind of situation that you might encounter in a multi-tenant environment. How common this is in the wild is up for debate; it is certainly something that I have encountered in production.

I discussed this with a staff-engineer from Ververica, the company founded by the creators of Flink, on the Flink-user mailing list. He was of the opinion that it was a very niche scenario. Personally, I think this is likely more common than you might expect, especially in on-premise Hadoop and multi-tenant environments.

But I guess this underlines an important point. All stream processing jobs are different to some degree. Your challenges and constraints might not be mine and visa versa. And it’s important to understand this when selecting technology …

Smacked. Down. Flink failed. So what?

Flink is an outstanding engineering effort. This is why Alibaba, the Chinese Amazon, bought Ververica (the company behind Flink). That is as sincere a form of flattery as you can get.

In fact, Storm, Spark, Flink, and others they are all amazing. That is why I think we need to be more generous with our words when talking about them. They represent hundreds of thousands of hours of the combined effort and hard work of many people, who have poured passion into making them what they are.

Should you use one of them over the other? Quite possibly, depending on your requirements. Does that mean that someone else, with different requirements, should use the same technology? Possibly not. I think we need to be more data-driven and open minded when selecting technology.

That is what this smackdown has accidently demonstrated. I anticipated Flink performing the best. It performed the worst. If your operating environment overlaps with these smackdown scenarios then it is possible that Flink might not be the best option for you. At least the current version.

However, clearly, Alibaba and many other companies are deriving massive benefit from deploying Flink. Statements like X is better than Y, X is broken, you should never use X are unhelpful, and do not reflect the nuanced reality of the world we live in.

The truth is that many of these streaming platforms have areas where they excel, and other areas where they are weaker. Some of that will overlap across platforms.

What can we conclude?

In a way, given my original motivations for these experiments, I should have really written a single blogpost rather than run the smackdown. However, as it turned out I accidentally demonstrated the points I had wanted to make, in a very practical way:

Storm is not past it and can still really shine. It might not be ‘next-generation’. However, it could be a great fit for your next streaming project. It is definitely worth considering.

Always be technically de-risking. We can be very wrong in our assumptions. I never expected Flink to fail. Yet it did. De-risk early, and your projects won’t be horribly gouged by the iceberg of unmitigated risk.

Let’s be open-minded and data-driven in our technology selection rather than hype-driven. Let’s cut trash-talking technology we are barely familiar with. There is space for all of them.

Note, this is the throughput for the non-straggler tasks. ↩
This figure is from the extended run, where things started going sideways after 30 minutes. ↩
If we had followed common best practice advice, and partitioned at a factor of 2 - 3, without tuning for our specific scenario the throughput for Spark would have been considerably reduced. ↩

Owen on software