Reaching 200K events/sec(aphyr.com) |
Reaching 200K events/sec(aphyr.com) |
(streams
(rate 5 (comp prn float :metric))
Which means for each five-second interval, sum all the metrics of the events flowing through this stream, divide by the time elapsed, and print that rate to the console.I'm using this setup to put the heaviest load possible on Riemann's client and TCP server while I optimize those layers--it's not meant to stress the internal stream library, the state index, or pubsub. When I start optimizing those components, I'll have more "real-world" numbers to report.
I should also explain that this particular post explores the high-throughput, high-latency range of the client spectrum. End-to-end TCP latencies (not counting wire time) for single-event messages are on the order of ~100 microseconds-1ms, with occasional spikes to ~30ms depending on JVM GC behavior.
The principle bottleneck in this particular test is actually the client--Riemann itself spends ~94% of its time waiting on epoll in this test. The client is a total hack using Java OIO, calling flush() on every message, Nagle's algorithm disabled for low per-msg latencies... it's a wreck, haha. Part of next week's optimization push is replacing it with a Netty client and tuning TCP options for various workloads.
1.5M locally or over a network? What is the size of each message?
So in reality it is ~ 2k messages/sec. This is a rather poor throughput, as even off-the-shelf generic web servers (e.g. nginx) have the throughput an order of magnitude higher, and proprietary systems can reach 500k messages/sec over the network.
Riemann is not an HTTP server, or anything analogous. It's an event processor, and reacts to incoming events by running them through an arbitrary set of functions. Events are the logical "requests" against the system, if you're thinking in HTTP terms. Messages are just a bundle of events for synchronous transport, and events can be repackaged in varying bundles of messages depending on latency/throughput requirements. The clients can do this for you.
For instance, the code which generated this benchmark looks like:
(send client
{:host "test"
:service "drop tcp"
:state "ok"
:description "a benchmark"
:metric 1
:ttl 1
:tags ["bench"]})
which is a synchronous call, returning when the event is acknowledged by the server. It's making that call 200,000 times a second (in various threads). The clients are doing all sorts of internal buffering and pipelining to make that possible--this particular test uses a batch size of 100 events/msg.Take a look at http://riemann.io for more. :)
Ten layers (and probably buffers) traveled through until your data hits the wire. Layer x decides to change its IO model and your throughput takes a dive. It's exactly why there was a post recently about building an operating system just to run some network daemon.
(defn execution-handler
"Creates a new netty execution handler."
[]
(ExecutionHandler.
(OrderedMemoryAwareThreadPoolExecutor.
16 ; Core pool size
1048576 ; 1MB per channel queued
10485760 ; 10MB total queued
)))
It is a farce, isn't it?))After finally digging into how execution handlers worked, the puzzle started to unravel. Netty's docs are pretty good, but you have to understand what all the names mean before you can understand, well, any one part of the system. Bit tough to piece together, at least for my little brain. ;-)
I'm also not sure that JVM itself is a good idea, especially for serving content, but I do respect people who are trying nevertheless.)
The test was run with "SET key value" which in the redis protocol is something like 15-20 bytes / message.
(defn where-partition-clauses
"Given expressions like (a (else b) c (else d)), returns [[a c] [b d]]"
[exprs]
(map vec
((juxt remove
(comp (partial mapcat rest) filter))
(fn [expr]
(when (list? expr)
(= 'else (first expr))))
exprs)))
I wouldn't categorize Clojure as a "pure lisp"--it relies heavily on the JVM type system, for starters--but idiomatic Clojure feels closer to Lisp than Java, to me.Second... I guess I can reiterate. Riemann is not an HTTP server. It's an event-stream driven monitoring system. The protocol uses existing standards (e.g. protobufs), is simple to implement, and the community has written clients for many languages: http://riemann.io/clients.html.
As an aside, I do plan on adding an HTTP interface to Riemann, but HTTP processing (and using JSON for serialization) comes with certain unavoidable costs in bandwidth, memory and latency. It'll fill a complementary space to the existing TCP and UDP interfaces.
I completely agree that for specific uses Riemann is great. Your post was, though, about the performance/throughput, and so my comment was about the performance/throughput. Streaming messages/events over the network is an old problem, with well-known limitations, and this was what my comment was about.
And if a system designer wants to send a stream of "events" to another system to be acted upon, and if this designer cares about throughput (which is assumed here, given the title of the post), then this designer is likely to choose a faster messaging system, especially if it is more flexible, due to its ubiquity and universal support, protocol (e.g. HTTP) over a custom protocol.
If you are trying to make sense of Riemann in HTTP terms, sending an event to Riemann might look like POST /streams, with a body containing a single JSON object. There's no notion of GET, PUT, or DELETE though--the state inside Riemann streams has no name or external representation.
There are other components in Riemann which can be expressed as HTTP resources--the index, which is used for tracking the most recent event for a given host and service, and the pubsub system for example. Those have HTTP APIs for making a query (GET /index?q=service = "www" and state = "critical"), and a websocket variant which streams down updates for that query to you.
But as far as a general replacement, I'd say no, it doesn't make any sense. This is more akin to... a slow, insanely flexible, less complete version of Esper than an HTTP server.
Coincidentally, I was just working on a tracing system to dump data to Riemann while eating HTTP logs and/or handling live requests from browsers. It seems to be just what we need to aggregate, monitor and graph our trace data. Thanks!
BTW, you're not the first to wonder about streaming events directly to Riemann from client browsers. I... don't recommend it, just because I don't have the time to appropriately guarantee Riemann's performance and security characteristics as an internet-facing service (yet), but adding an HTTP POST path to (ws-server) is definitely on my list. Even if the HTTP+JSON interface is much slower than the TCP/UDP interfaces, I think it'll be plenty useful for many deployments, especially those making requests from JS.