RC1 ArangoDB 3.4 – What’s new?(arangodb.com) |
RC1 ArangoDB 3.4 – What’s new?(arangodb.com) |
I was wondering if this release allows FULLTEXT indexes when the backend is RocksDB (now that it is the default storage engine)? The new ArangoSearch features look cool, but honestly a bit daunting vs the simple setup of a FULLTEXT index.
By the way, the ArangoSearch tutorial casually talks about "ArangoDB views of type 'arangosearch'", but I haven't come across the concept of views before in the documentation. Are there other types of views?
More beef to be added to this until the GA release.
The benefits of distributed COLLECT will come into play for queries that can push the aggregate operations onto the shards. Previous versions of ArangoDB shipped all documents from the database servers to the coordinator, so the coordinator would do the central aggregation of the results from all shards to produce the result.
With distributed COLLECT we now create an additional shard-local COLLECT operation that performs part of the aggregation on the shards already. This allows sending just the aggregated per-shard results to the coordinator, so the coordinator can finally perform an aggregation of the per-shard aggregates.
This will be beneficial in many cases when the per-shard aggregated result is much smaller than the non-aggregated per-shard result.
Following is a very simple example. Let's say you have a collection "test" with 5 shards and 500k simple documents that have just one numeric attribute (plus the three system attributes "_key", "_id" and "_rev"):
db._create("test", { numberOfShards: 5 });
for (i = 0; i < 500000; ++i) {
db.test.insert({ value: i });
}
Running a query that will calculate the minimum and maximum values in the "value" attribute can make use of the distributed COLLECT: FOR doc IN test
COLLECT AGGREGATE min = MIN(doc.value), max = MAX(doc.value)
RETURN { min, max }
The database servers can compute the per-shard minimum and maximum values, so they will each only send two numeric values back to the coordinator.Without the optimization, the database servers will either send the entire documents or a projection of each document (containing just each document's "value" attribute back) to the coordinator. But then each shard would still have to send 100k values on average.
With a local cluster that has 2 database servers and runs them on the same host as the coordinator, this simple query is sped up by a factor of 2 to 3 when the optimization is applied. In a "real" setup the speedup will be even higher because then there will be additional network roundtrips between the cluster nodes. And in reality documents tend to contains more data and collections tend to have more documents. If this is the case, then the speedup will be even higher.
So you'd never have any value; object, key, value of a string, int, float, or reference, without associated meta-data typing it elsewhere in the graph, and would be unlikely to operate on that data without making reference to those properties.
See https://neo4j.com/docs/developer-manual/current/drivers/cyph...
You can of course use datetime https://docs.arangodb.com/3.3/AQL/Functions/Date.html and decimals with a precision of 10E38 in ArangoDB but it is not as precise as in a relational database. If we want to be as precise as a relational DB, then we would have to say goodbye to JSON
Not sure if I'm a fan of the syntax - but some kind of open, sane, standard would be nice.