How to Optimize MongoDB Query Performance with Indexes(visualeaf.com) |
How to Optimize MongoDB Query Performance with Indexes(visualeaf.com) |
1: export the data out of MongoDB
2: import the data in PostgreSQL, ClickHouse, DuckDB, whatever
3: create the indexes there
4: uninstall MongoDB
not sure if this was fixed in recent versions, but if you have index on array of strings/ints, and then use `addToSet` operator, the index is not used for the `addToSet` operation, and the disk is hammered for each update to verify if the content of the array has unique values. with pretty high volume, you can easily saturate the disk queue, wondering what is happening and how you can fix that
after few years of working with few TB collections (tables) in mongo, I can say that this DB is good if you want to insert data, but don't need to read the data that much often. if you need json and read heavy DB, consider something else eg elastic (with all it's drawbacks), because at some point the limiting factor for you will be indexes in mongo, that are the most weak part of this DB (even more weak than the infamous data loss)
That's also why I decided to concentrate on the entire workflow rather than on the index itself in the article: Find the slow query -> Analyze Explain -> Analyze the ratio between scanned and returned documents -> Create an index -> Test the same query again...
In this particular case, a compound index was selected based on filters and a sort clause. I completely agree that using index intersections in a real workload could be quite dangerous, especially if MongoDB chooses to use another index instead.
The `$addToSet` example is also very valid; however, for me, that is more like a schema problem than a regular indexing problem in a read query. Still, the array could become extremely big regardless of the presence of indexes, and it might require a dedicated note in the article.
If a system is working fine, the expense and risk of moving may end up being far greater than any optimization efforts on the existing database. This is also why I believe profiling tools, explain plans, and indexes are all useful; there are plenty of cases where MongoDB is sufficient but still needs optimization...