this is the first explanation i've seen that directly links disk latency and data architecture decisions. always felt it intuitively but never did the math.
If we take the extreme - ClickHouse as the most thoroughly optimized column-oriented DBMS and compare it with Postgres, the difference will be more than 100 times on average.
I find it interesting how these vectorized processing engines with DuckDB and Photon Engine of Databricks try to combine row and columnar-oriented strengths.
> When to use a data warehouse > Data warehouses are good for OLAP (online analytical processing) workloads such as the following:
> A small number of users, each of which may execute heavy analytics workloads
Not necessarily. ClickHouse is being used for user-facing analytics where every query can take in the order of 10 ms while supporting many concurrent queries.
> Downtime is permitted – generally not used as the one-and-only operational system
Not necessarily. ClickHouse is being used for HA setups replicated across multiple regions when the service has to survive the outage of a whole region.