Ask HN: Stream processing engine that joins to historical data in Snowflake? Hey, I'm looking for advice on stream processing engines that allow me to do the following: 1. Required: Write a query that joins an event stream with a historical table in Snowflake 2. Required: Executes in near-real time < 5s even if a query involves 300M rows 3. Highly desired: Gives me a way of doing dbt-like DAGs, where I can execute a DAG of actions (including external api calls) based on results of the query 4. Highly desired: allows me to write queries in standard SQL 5. Desired: true real time (big queries executing w/ subsecond latency) What are the best options out there? It seems like Apache Flink enables this, but there also seem to be a number of other projects out there that may enable some or all of what I'm describing, including: - kSQL - Arroyo - Proton - Kafka Streams - Snowflake's Snowpipe Streaming - Benthos - RisingWave - Spark Streaming - Apache Beam - Timely Dataflow and derivatives (Materialize, Bytewax, etc.) Any recommendations on the best tool for the job? Are there interesting alternatives that I haven't named? |