The wire protocol seems more complicated -- why distinct protocols for producers and consumers? Linebreak-delimited headers are error-prone and ought to be banished. Why transmit the hostname with sub requests -- is another system requesting jobs on behalf of workers?
Then the topic/channel distinction seems artificial when wildcards would suffice (and provide much more flexibility), eg topic/*/channels or topic/channels/# in the MQTT parlance. MQTT also has more fine-grained delivery guarantees via its QoS levels. All this with a header structure that's as small as two bytes.
edit: While MQTT is pubsub, as long as the system is under your control, you're free to change the semantics at the broker side from "broadcast messages to all consumers" to "rotate messages among consumers."
The protocols that exist in NSQ now are designed to be the simplest implementation that worked.
You've correctly pointed out some of the issues. At this stage, the distinction of producer vs consumer was mostly so that you could publish at all without having to use the HTTP interface. For our use cases, in particular taking advantage of the /mput endpoint, we aren't even using the TCP based publishing protocol.
Re: your point on sending metadata with SUB commands. I agree its a bit ugly. We actually intend on improving that aspect by instead sending the data in the form of an IDENTIFY type command upon initial connection. That information is used in the various administrative UIs and endpoints.
I'm going to do a bit more reading on MQTT, thanks for the link.
DISCLAIMER: I wrote the RabbitMQ in Action book.
It would be cool to see some stats on throughput and latency relative to # of producers / consumers and amount of data currently accumulated in the producers (since there is no middleman).
I don't understand how you can call it a message delivery "guarantee" when you're susceptible to losing messages when a node dies.
One solution is to stand up redundant nsqd pairs (on separate hosts) that receive copies of the same portion of messages.
OK, delivery is only guaranteed if I run multiple independent sets of NSQd and write messages to both.
Regardless, it looks like an interesting project.
I'd love to see some numbers, reasons for decisions made, and suggested best practices for this solution other than "manual de-dupe."
In terms of manual de-duping, we strive internally for idempotent message processing so it's fairly irrelevant, but to handle cases where it matters, all of our messages have unique id's added to them outside of NSQ.
The actual cases where messages are handled multiple times is limited to when a client disappeared during message processing (a hard restart) or it passed the allowable time window to respond and was given to another client.
If there are any specific numbers you are curious about, please ask.
We do talk about the evolution of our infrastructure and the genesis of NSQ in our blog post, http://word.bitly.com/post/33232969144/nsq.
Thanks for sharing your work.
In the mean time, installing Go is pretty easy (you can use brew, or the official OSX package (assuming you are on OSX) http://golang.org/doc/install) and we've tried to leave clear steps for building NSQ here https://github.com/bitly/nsq/blob/master/INSTALLING.md (there really are very few dependencies other than go itself)
If you can produce binaries for UNIX (Linux, BSD, etc), OSX and Windows, I'll be impressed. That's something I was interested in doing with Go (is it possible to cross-compile?) but never managed to learn.
Some of the goals of NSQ transcended just replacing our specific daemon that buffered and delivered messages (most importantly the interactions with the lookup service). Because of that, we felt that owning that piece would make it easier to achieve those goals.
Additionally, one of the most important properties of nsqd (the queue component of NSQ) is that data is pushed to the client rather than polled (like in beanstalkd).
Go compilers, being derived from Plan 9, are always cross compiling. To cross compile you just set and export GOARCH and GOOS to your target, for example:
GOOS=windows GOARCH=386 CGO_ENABLED=0 go build foo.bar/baz
Would build foo.bar/baz for 32 bit Windows from any system that has Go and ran make.bash --no-clean with those variables set. More interesting is building for ARM: GOOS=linux GOARCH=arm CGO_ENABLED=0 go build foo.bar/bazThe ZeroMQ documentation is fantastic as well. It inspired and helped shape some of the design choices we made.
The further along we got from design to implementation it became obvious that it would be important to "own" the socket. Generally speaking, this is exactly what ZeroMQ prevents you from doing (and rightly so, it aims to abstract all of that away).
The choice to use Go had an impact here as well. Language features like channels and the breadth of the standard library made it really easy to translate our NSQ design into working code, offsetting the benefit of ZeroMQ's abstractions.
We wanted a brokerless, decentralized, and distributed system. NSQ has no broker or SPOF so it is able to route around failures.
That said, I think RabbitMQ is a good tool depending on your requirements. I can imagine a broker proving useful in situations where you may want strict ordering or no-duplication. Those were tradeoffs we were willing to make for fault tolerance and availability.
Also, given the fact that we were already operating infrastructure built on simplequeue (which is also distributed and brokerless) we found it more appealing to evolve rather than replace.
There is a bit of an impedance mismatch since now you have to go out-of-band to fetch information. Not saying it's a bad thing; that's essentially also the role of DNS and LDAP, among others, so it is a fairly common pattern.
You bring up an important point though... What is important is the topology we're promoting and the platform we've built to support it. The actual queue is less important and it made sense for us to own that piece to achieve our goals.