How Akka Cluster Works: Actors Living in a Cluster(lightbend.com) |
How Akka Cluster Works: Actors Living in a Cluster(lightbend.com) |
Before I went this route, I tried to make Akka Cluster work with RabbitMQ however I realized (like another poster here) that you're essentially duplicating concerns since Akka itself is a message queue. There's also a ton of logistics with Rabbit around binding queues, architecting your route patterns, etc that add extra cognitive overhead.
I'm creating a highly distributed chat application where each user has their own persistent actor and each chatroom has their own persistent actor. At this point, it doesn't matter where the user or chatroom are in the cluster it literally "just works".
All I need to do is emit a message to the cluster from a user to chatroom or vice versa, even in a cluster of hundreds of nodes, and things just work. Now there's some extra care you need to take at the edge (split-brain via multi-az, multi-datacenter) but those are things you worry about at scale.
Akka is the real fucking deal and it's one of the most pleasurable application frameworks I've ever had the pleasure of using in my career.
edit: The only reason I'd ever want to use Rabbit again is if I had external clients that needed to hook up to our message bus. If you're creating an entirely internal system, Akka Cluster is absolutely the way to go.
I worked for a company that made a real-time auction system, based on Akka. It's been frustrating in the years since to program on less powerful foundations.
If I were building a system that combined interactive and autonomous processes, I would absolutely reach for Akka again. The one thing I'd love to see is if they could build something like https://temporal.io on top of Akka. I think it would be complementary to the state machine style model of typed actors and the pipeline model of Akka Streams.
Akka embraces the "let it fail" mentality where as nodes go down (just as pods go down in Kuberentes), you don't have to worry about where your processes are running...they just are...somewhere.
- if you roll your cluster membership a lot the dotted version vectors which are created by Akka distributed data grow unbounded. Eventually they will start making gossip messages exceed the default maximum size (a few kB IIRC) and fail to send.
- in the presence of heavy GC Akka cluster has a really bad time. Members will flip flop in marking each other unavailable. Eventually this will render the leader unable to perform its duties and you will struggle to (for example) allow a previously downed member to rejoin the cluster.
- orderly actor system shutdown will also fail under high GC, which is problematic as sometimes you need to restart your actor system.
- split-brain resolution is really really hard to get right. The Akka team have recently made theirs open source I believe which is good, but back when we were building with Akka cluster it required a Lightbend subscription.
- If you aren’t all in on Actors, the integration point between Akka and the rest of your codebase can be a little odd. You often feel like you should reach for `Patterns.ask` (a way of sending a message to an actor and then getting a Future back which will complete on a particular response) but then people tell you that’s an Anti pattern.
————
Having said all the above, if you’re able to go all in on the Actor pattern and you’re unlikely to hit high GC then you should give Akka cluster a try. The problems it tackles are genuinely hard and you should build on their hard work if you can. In particular they offer (in distributed-data) the most robust/complete set of CRDTs I’ve yet come across. Many other CRDT libraries expect you to bring your own gossip protocol and transport layer.
What made you choose Akka
If you don't need behavior, you can use things like distributed databases or caches, and if you don't need to scale out, there are other pure fp solutions. But for this kind of distributed behavior, it still seems to me that Akka Cluster is the killer app.
The real problem with Akka is that, at least until very recently, you had to abandon any semblance of type safety if you used it. That was very frustrating to work with. I can take or leave pure FP, but you can pry my strong static typing from my cold dead hands.
There's of course the legit downside of needing to use one language and one framework for all actors, which is a problem Kubernetes and a message queues don't have.
Kubernetes deals with the OS/node level failures, the actor system deals with the application level failures.
It’s actually amazing how complementary they are.
That said, most work places I've been at, leadership has -wanted- to use one language. Even with containers and other decoupling technologies. So I don't know how much of a negative effect that downside has.
That's not inherent in the actor model, it's a potential artifact of some implementations. Though I think one VM is more common. E.g., BEAM instead of just Erlang, or JVM for Akka.
In it, he claims that Godel's Incompleteness Theorem is not true, and that the actor model is more general than the Turing machine. I am open to entertaining the idea.
I've seen that his ideas have been discredited elsewhere on HN. I would be interested to know people's opinions on this, as a lot of the paper went over my head.
He is certainly wrong about the incompleteness theorems. And it’s entirely possible to create a model of computation that is more general than Turings. The question is whether it better represents what’s computable, the abstract mentions computations involving an “infinite number of computations” between steps...
What more general systems of computation are there than Turing? Are any in use? Are they really more general?
The claims of the generality of Actors seem to rely on continuous time and non-determinism. Actors, determinism, non-determinism, concurrency and the completeness axiom are models which we can use to express computation and our surroundings, and nothing more.
One man says lambda, another says actor. Given our models of physics; given Planck and Heisenberg, are they really different? If so, how? Measure theory rests on the completeness axiom, but it is just a very useful axiom.
Am I missing something?
One of the problems I encountered was that to finding actors in remote actor systems. i.e.: you have a unique actor responsible for X, living somewhere in your cluster and you need to know its name as well as the IP address of the actor system where it is running.
A message queue solves this problem, but that was not the approach I took.
My solution was to implement an actor discovery system on top of Zookeeper. Using that, I could have a cluster-wide unique actors.
Maybe real Akka is better than Akka.NET when it comes to Akka Cluster?
Say you want to have multiple actors (one per user / customer or whatever) and you get HTTP requests and want that exactly this actor handles them (to guarantee consistency), then you can't really do this with RabbitMQ.
I mean, you can make the machine that receives the request push it to the queue and keep the http connection alive, have the machine that is responsible for the user read it from a queue and then somehow tell the first machine how to respond the http request... but then you pretty much re-implemented Akka Cluster in a worse way.
Persistent queues and Akka Cluster solve different usecases.
Nope. The two companies I have since worked have mature products that aren't built on the JVM or Scala. But I'd absolutely consider it if I were to build a system with similar constraints again.
There was a big learning curve, but the end result was a system that was built largely correctly, on a tight timeline, solving some tricky technical and business challenges.
In fact, unhandled exceptions are encouraged (i.e.: "let it crash" approach to fault tolerance).
So, in the system I worked with, individual applications were Akka powered and cross-process/cross-vm communication was done through queues
The actor just won't exist elsewhere, but another microservice that happens to use actors might exist. It would be sent a web request or a message in queue or similar.
In my scenario, multiple machines are used and necessary for the same service (otherwise there is no point in using Akka Cluster).
Example: online games. You have room/game being created on the fly and it is destroyed an hour later. There are many of these rooms and they are distributed over multiple machines. You can't really use rabbitMQ here, it's not performant enough.
And even if you did, you would pretty much reimplement what Akka Cluster does for you: instance synchronization, different strategies for handling split brain scenarios, dead letter handling, direct message forwarding, persistence (if needed) and so on
But I wonder if you will suffer from random GC pauses, inability to carefully isolate different behaviors into different resource clusters, resulting in uncontrolled blast radius etc.
If you are anyway doing persistence (because you care to not lose game progress), and whenever a cluster node dies you need to resurrect game state from persistence, I wonder if you will get the game state restored within a bounded latency.
If this happens frequently enough (to affect say 5% of your users – enough to kill your game experience), is the benefits of latency gain from in-memory object reads wiped out?
> If this happens frequently enough (to affect say 5% of your users – enough to kill your game experience), is the benefits of latency gain from in-memory object reads wiped out?
For this specific use-case, I don't think there is really an alternative, except for specifically a hand-crafted system (or non-scale, such as everyone hosts and manages their own server).
If you're trying to manage the concurrent state of 10s of thousands of players in a game, all server-side, and you want specific actors to handle that single player and you don't want a globally persistent state, then I suppose this makes sense.
I've never worked with anything of that scale though.
I am talking about bigger scala here of course. But not necessarily what you describe.
Take games like League of Legends or Counterstrike as examples. Having one game per actor seems like a sensible design to me.
But yeah, I think that traditional techniques still get you very far. I heard that Slack was running just on (multiple) postgres for the longest time.
I think the GA version of Project Orléans is now called Service Fabric, although I never had the pleasure to try it.
Service Fabric has something called Reliable Actors which are heavily inspired by Orleans.
Source: I'm the project lead for Orleans
JVM is an awesome piece of technology. And you can do robotic control systems to high-frequency trading systems with it with careful programming.
But I've seen a lot of Java code running in production suffering from latency jitters and needing continuous profiling and optimization by a small group of performance engineers while the majority of application engineers keep adding to GC load.
> For this specific use-case, I don't think there is really an alternative, except for specifically a hand-crafted system
Yes, but I think the handcrafted system doesn't need to be very complex. It can be quite simple and easy to understand and tame to your needs as your scale and complexity grows.
At the same time, I think (while not being an expert) that in the majority of the cases, the GC will not be a problem and that the time you can save from using Akka Cluster allows you to optimize your system more than enough to make up for any GC latency problems, in almost every system.
The only technology that might be better/comparable here is the erlang VM, but I have never used it myself.