Why Are People into Event Sourcing?

Why Are People into Event Sourcing?(adaptechsolutions.net)

140 points by adymitruk 9 years ago | 87 comments

btown 9 years ago |

Event sourcing isn't nearly as common knowledge among new programmers as the CRUD-one-row-per-entity pattern, and it really should be. I liken it to introducing version control for your data; when immutable updates are your canonical source, no matter how much the system behind them changes, or the business requirements change, and no matter how many teams are deriving different things from them in parallel, they can all work off of the same data and "merge" their efforts together.

The one downside is that shifting your business logic to read-time means that you need to have very efficient ways of accessing and memoizing derived data. For some applications, this can be as simple as having the correct database indices over your WhateverUpdates tables, fetching all updates into memory and merging on each request. For others, you'll need to have a real-time stream processing pipeline to preemptively get your derived data into the right shape into a cache. And those are more moving parts than your typical monolith app, but the

One benefit to actually using event sourcing with a stream processing system is that, in many cases, it can be the most effective way to scale both traffic capacity and organizational bandwidth, much in the same way that individually scalable microservices can (and fully compatible with that approach!). Martin Kleppman at Confluent (a LinkedIn spinoff creating and consulting on stream processing systems) writes some great and highly-approachable articles about this. Highly recommended reading.

http://www.confluent.io/blog/making-sense-of-stream-processi...

http://www.confluent.io/blog/turning-the-database-inside-out...

blowski 9 years ago | |

The CRUD one-row-per-pattern is common because it's enough for most projects. It works well with ORMs so you can build quickly and securely. And most of the time, performance isn't an issue and having a history of an entity is unnecessary.

I'm worried that event sourcing is going to become this year's over-applied design pattern with libraries in every language for every database with blog posts that recommend it be used on every project.

It's a good idea, very useful - in the right hands on the right projects. But it makes sense that junior devs normally use CRUD because that's normally the right solution. At least until better tools come along.

hcarvalhoalves 9 years ago | | |

> The CRUD one-row-per-pattern is common because it's enough for most projects. It works well with ORMs so you can build quickly and securely.

If by "works well", you mean it works until someone asks for historical data - then IT guy has to say w/ a straight face "we lost it". This is unacceptable considering the value of data and the strategic leverage it can have today.

Considering immutable facts tables are the most stable data model; companies often have to re-invent it (poorly) on top of relational at some point; that storage is often not a problem; and that having clean historical data is crucial for data science; there are increasingly fewer excuses to not adopt a sane data model from day one.

I agree partially w.r.t. to tooling - few implementations aid adopting this pattern, but I believe the value of historical data, over time, overcomes not being able to slap some quick Rail CRUD together and then being stuck at local minima.

haimez 9 years ago | | |

Could happen, but I think event sourcing (and CQRS generally) carries enough implementation overhead in the amount of code required that it's less likely to be adopted in situations where it isn't appropriate.

That isn't to say it won't happen, but I think it's more likely that teams would miss an opportunity to leverage it than leverage it inappropriately.

EdSharkey 9 years ago |

Here's the term I wish was unfashionable with the kids: reshaping.

Did you spot all those command-to-query-to-event-to-log-to-storage data type conversions in those pretty diagrams? That's a whole bunch of needless reshaping of data as it flows through the system.

For each one of those data transformations to be successful, there has to be accurate communications between people and bug free code written in the data conversion and routing of messages through the system. All those moving parts make changing the system extremely painful, lotsa ripple effects - and every time you have to make a change to your events, you'd have a data migration project for any running event streams.

Naming things is hard too, and there's a lot more naming of entities needed in a CQRS-ES system.

I like all the promised benefits of a CQRS and ES, but I can't imagine a case where I'd take the risk of attempting it on anything but a toy project. Perhaps if I was on the version 5 rewrite project for an insanely profitable system where the requirements and design are completely understood up-front. I would need to grok some canonical example of a large, well-architected, well-implemented representative system before I would ever attempt to implement one.

Are there any non-toy examples of successful CQRS-ES with open source available to read? Did those projects go over-budget, and by how much? Would the authors of those examples still recommend the architecture now that they've gone through the experience?

rreppel 9 years ago | |

Open sourced ones? The largest example I'm aware of is https://github.com/MicrosoftArchive/cqrs-journey. There's a pretty extensive write-up of their experiences too. https://msdn.microsoft.com/en-us/library/jj554200.aspx

EdSharkey 9 years ago | | |

I can't tell if this is a toy experiment or not.

kasey_junk 9 years ago | |

Does postgres count?

taeric 9 years ago |

As someone that has fallen for the "event sourcing" promise before, the article does a decent job explaining the promise. Not sure if it will be the next article, but the actual task of delivering on this work is where things break. Hard.

The vast majority of the things you will ever program are pretty much guaranteed from one statement to the next. Hard boundaries, where things can fail, are often decently understood and actually quite visible in the code.

Moving everything to be an event completely throws this out the window. You can take a naive view, where you pretend from one event to the next is safe to happen. However, to start building up the system to cope when this is not the case starts to build a complicated system. In areas that are decidedly not related to your business domain. (Well, for most of us.)

Maybe some day there will be a system that helps with this. Until then, my main advice is to make sure you have solved your system with a naive solution before you move on.

barrkel 9 years ago |

Architecting around events has several ramifications.

For building up a picture of the world, it's pretty good. It's very nice to be able to replay a log of events and recreate a view of the way things are expected to be; if there's a bug in your code, you can fix it and repeat the replay to get back into a good state (with caveats, sometimes later actions creating events may be dependent on an invalid intermediate state). Whereas mutating updates erase history, perhaps with some ad-hoc logging on the side that is more often than not worthless for machine consumption.

For decoupled related action, it's not too bad. If you have some subsystem that needs to twiddle some bits or trigger an action when it sees an event go by, it just needs to plug into the event stream, appropriately filtered.

For coordinated action OTOH, e.g. a high-level application business-logic algorithm, you need to start thinking in terms of explicit state machines and, in the worst case, COMEFROM-oriented programming[1]. Depending on how the events are represented, published and subscribed to, navigating control flow involves repeated whole-repo text searching.

It's best if your application logic is not very complicated and inherently suitable to loose coupling, IMO.

[1] https://en.wikipedia.org/wiki/COMEFROM

sanderjd 9 years ago |

FYI in case the author reads this, since this seems to be intended as an intro for people who aren't already familiar with this stuff: I didn't see "CQRS" defined anywhere in this article or in the two or three links I followed from it; they all begin with an assumption that you know the acronym, and delve straight into details. It might be good to define some terms in the front matter (unless I've misunderstood the target audience).

rreppel 9 years ago | |

Always a problem with techie - acronymania. :) Thanks, noted. I'll do an edit.

SEJeff 9 years ago |

Two must-read documents for those who want to learn more about this method of building reactive applications:

https://engineering.linkedin.com/distributed-systems/log-wha...

http://martinfowler.com/eaaDev/EventSourcing.html

Note that Martin's blog is what inspired the event bus in https://home-assistant.io, an open source home automation project I occasionally contribute to.

kazagistar 9 years ago |

I've tried working out how to move to an event sourcing system, but I always struggle with locking behavior. Do you just have to invent your own locking mechanisms on top of event sourcing?

PallarelCoedr 9 years ago | |

The stream is the consistency (locking) boundary. Your first step it to get your model aligned with such boundaries. For example, your amazon shopping basket is independent of my basket. Then you chose your concurrency model - append to a stream with an expected version (pessimistic) or just append anyway (no expected version). Your amazon basket may be the later, your amazon payment and shipping checkout may be the former.

Locking across streams is an anti-pattern / smell. It can be done (as can anything) but it usually points to a modelling problem. Example: cancelling an amazon order is a _request_ that is in a race with the fulfillment system (boundary); it may or may not be successful.

jnbiche 9 years ago | |

Read about how LMAX achieved 6 million transactions per second using a ring buffer-based concurrency architecture called disruptor, all on a single thread and without locks. Event sourcing plays a big role in their architecture [0].

0. http://martinfowler.com/articles/lmax.html

biot 9 years ago | |

Combine this with the actor model (using Akka or similar) gives you guaranteed "one message at a time" processing and you don't have to deal with locks.

taeric 9 years ago | | |

I question any amount of guarantees around "one message" anything. There might be this guarantee per actor, but you have no such guarantee per system. And, assuming a real system, this will be a problem.

So, you get to pick, "at most once" or "at least once." And then you need to build your system to act accordingly.

jessaustin 9 years ago | |

ISTM event sourcing actually avoids many locking problems, since it's essentially "write-only". Of course every event write should be atomic, but that seems easier than making updates atomic?

kazagistar 9 years ago | | |

When a certain set of events occurs (the files arrive etc) I want to kick off one and only one batch processor task. This is accomplished with a transaction and a write lock in an sql database, but when trying to use event sourcing it ends up requiring a 2 step "intent to run" event before running or some out of band synchronization.

tofflos 9 years ago |

Axon Framework http://www.axonframework.org is a great place to start if you're into Java and want to get a feeling for how event sourcing works.

There's also a great presentation by the developer, Allard Buijze, at https://www.youtube.com/watch?v=s2zH7BsqtAk.

grandalf 9 years ago |

There is a lot that could be done to make event sourcing easier to work with...

Imagine tooling that allowed an event stream to be used to create state for testing modules, crudlike helpers to allow crud-familiar developers to think that way at first, and workflows based on snapshots, rewind, etc.

I think a model that used events that correlated to graph deltas rather than crud deltas would be the cat's ass, and many queries about the near-current state could be handled efficiently using ephemeral subgraphs as indexes located at the network's edges.

If anyone wants to discuss and possibly build some of this stuff, let me know :)

impostervt 9 years ago |

I was looking into Event sourcing for a system I built recently, and the tooling just doesn't seem to be that widespread yet. How do you read out of the entire event stream to figure out the current state? While there are tols, they seem to be .net focused. Just didn't seem to be a "standard" answer yet.

We ended up going with microservices that pub/sub events into Kafka, but maintain their own databases. There's another microservice that lets you query past events for statistics.

mamcx 9 years ago |

I for some months now have tried to build a small test-case for a invoice app. I wish to have a good syn strategy and the use of ES sound good. However, I have find how replicate the functionality of a normal app with this: For example, what to do for avoid duplicates and in general pre-saving validations. Also, I need to anyway to use RDBMS tables for hold current-data and RDBMS have not a good history for stream back results.

zarkov99 9 years ago |

I have been working with this sort of patterns for a while but I have yet to find good texts exploring the topic. Does anyone have book or paper recommendations for event sourcing? The stuff I have seen is mostly programmers reporting on something that worked on their particular domain. I am, looking for something more rigorous and comprehensive.

karmajunkie 9 years ago | |

Lurk on the CQRS/DDD list [1], lots of good info there. I'm not aware of any textbooks on ES per se but there are a few good books on areas that overlap. [2] [3] [4]

[1] https://groups.google.com/forum/#!forum/dddcqrs

[2] https://www.amazon.com/Enterprise-Integration-Patterns-Desig...

[3] https://www.amazon.com/Implementing-Domain-Driven-Design-Vau...

[4] https://www.amazon.com/Domain-Driven-Design-Tackling-Complex...

freditup 9 years ago |

As an interesting comparison, some people see the Redux/Flux pattern as a front-end parallel to event sourcing.

[0]: https://github.com/reactjs/redux/issues/891#issuecomment-158...

avodonosov 9 years ago |

How strange, just today I've heard the Event Sourcing name and thought I don't know what it is. (Turns out it is this old idea I knew under various different names). And at the same day I hear about Event Sourcing on HN. What's the buzz?

karmajunkie 9 years ago | |

Its been slowly building steam (under that name) for about ten years, first in .Net and now filtering out to other ecosystems. I think its kind of inevitable given the recent popularity of functional programming models.

willvarfar 9 years ago |

Very curious: if you have multiple datastores, how do you ensure they are consistent? If you scale sideways, how do you ensure nothing gets lost if there's a partition? Etc?

PallarelCoedr 9 years ago | |

Embrace eventual consistency. A good deal of collaborative domains (things involving human decisions) are naturally eventually consistent. Meat computers appear to be particularly good at resolving conflicts and compensating.

GundersenM 9 years ago |

Having been part of a project to rewrite a monolith e-commerce site into an event-sourced, domain driven, CQRS system, let me tell you in which situation that is not possible: when you already have data. Remember that in a DDD, ES, CQRS system, the event store is the single source of truth. If you already have data in a relational database, then the existing data is the source of truth. You can't have two sources of truth, that completely defeats the purpose. So it's not actually possible to migrate to an event sourced system, you can only create one from scratch, with no existing data.