One approach I can think of is to run a distributed document database and use that for storage. I don't need most features of such database products in the foreseeable future, so I fear that they will add operational overhead for not much benefit.
Another approach I can think of is to run my processing nodes against a network file system, and rely on that to do the replication.
Yet another approach I am considering is to use something like CopyCat to implement the file replication in my application code. Is this a good use case for CopyCat?
For your requirement, can you use S3 or something similar?
I am aware of Ceph but have not tried installing it to see how easy/hard it is to setup. Also, although this is not a hard requirement, I'd like to be able to support Windows; from what I have read so far, Ceph does not support Windows.
This API is more S3-like, operating directly on "objects" in the Ceph cluster. I wrote a system for storing many small (10-20kB) files using librados & Ceph, but the performance wasn't as good as I had hoped. Possibly I did not configure the Ceph cluster in the optimal manner - the cluster setup is quite complex.
I have a very similar problem and end up writing a python script that copies the files to cassandra and Nginx + Lua to serve them using lua-resty-cassandra. The read/write throughput is still not as good as I was expecting though.
Having seen in a sibling comment that you're talking about very small files, I'd recommend Zookeeper - it's mature and has pretty low admin overhead, IME.