To work around this limit, programs have resolved to various unnecessarily complex solutions, like starting extra threads for the only purpose of waiting, refactoring the logic, or replacing events with posting I/O completion packets.
In fact, if the application is waiting in a Vista+ Thread Pool, the pool itself uses the first approach: Starts as many threads as needed to wait for all the events. Or rather it used to. With Windows 8, all Windows threadpool waits can now be handled by a single thread. It does it through a new capability of associating the Event with an I/O Completion Port, to which the signalled state is enqueued. But this capability was not exposed through Win32 API to regular programmers.
It was exposed though, by a barely document NT API NtAssociateWaitCompletionPacket, which, it seems, nobody is using, except a few rare high performance libraries, Rust runtime, and um security researchers.
So I took a liberty to abstract out the details and implement what a simple Win32 call could look like. In the following example I wait for 2000 events in a single thread, through a single IOCP.
> https://github.com/tringi/win32-iocp-events
Of course, for larger systems, the Thread Pool API is the right way. But if your program is already using IOCPs, is single-threaded and you don't have resources to solve locking and concurrency, or are just thread-pooling your own way, this may be the ideal solution to reduce thread count, complexity and resource requirements.