Show HN: Simple-graph – a graph database in SQLite

Show HN: Simple-graph – a graph database in SQLite(github.com)

237 points by dpapathanasiou 5 years ago | 42 comments

rklaehn 5 years ago |

Interesting. SQLite is awesome.

I did something similar recently, a block store for a rust implementation of ipfs, which models a directed acyclic graph of content-addressed nodes.

https://github.com/actyx/ipfs-sqlite-block-store

I found that performance is pretty decent if you do almost everything inside SQLite using WITH RECURSIVE.

The documentation has some really great examples for WITH RECURSIVE. https://sqlite.org/lang_with.html

abetlen 5 years ago | |

The issue I found with WITH RECURSIVE queries is that they're incredibly inefficient for anything but trees. I've looked around and there doesn't seem to be any way to store a global list of visited nodes. This means that when performing a traversal of the graph the recursive query will follow all paths between two nodes.

hansvm 5 years ago | | |

I'm pretty sure you could also maintain a temp table and use some kind of "insert...where...returning" construct to squeeze that into a recursive query.

At a moderate overhead you could also definitely return all seen nodes and a flag to identify them as such as part of your intermediate data at each recursive step.

The postgres query optimizer struggles with recursive queries even when well suited to the problem though. Are they actually efficient in sqlite even for trees?

rekwah 5 years ago |

Tangentially related to graph dbs, but if you're looking for more hierarchical support, SQLite does has a transitive closure extension[0] that might be of some assistance. I leveraged this back in 2014 to write our framework-agnostic result storage on AWS Device Farm.

[0] - https://www.sqlite.org/cgi/src/artifact/636024302cde41b2bf0c...

[1] - https://charlesleifer.com/blog/querying-tree-structures-in-s...

abetlen 5 years ago | |

I've actually been working on an extension to perform breadth first search queries in SQLite on general graphs [0]. The extension is actually based off of the transitive closure extension. You can use it on any existing SQLite database as long as you can wrangle your edges into a single table (real or virtual) and the node ids are integers (I'm planning on removing this constraint in the future).

[0]: https://github.com/abetlen/sqlite3-bfsvtab-ext

kevas 5 years ago |

Why not add that functionality directly to SQLite via stored procs*

https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem...

*https://github.com/wolfch/sqlite-3.7.3.p1

pstuart 5 years ago | |

A more recent effort: https://cgsql.dev/

claytongulick 5 years ago | | |

I just read through the docs of this, what an amazing project.

I'm considering doing a js template string implementation for node.. cql`...` type thing with an internal compilation cache.

kevas 5 years ago | | |

Thanks

loxias 5 years ago |

I really like this, OP. I'm member of the clan "why are you creating your own XDR, just use sqlite!!" and have oft jumped to that in technical discussions, so appreciate it.

However, what's lacking from something like this is a detailed bill of the cost. I'd love to see some, any benchmark on a DB with > 10^6 edges to see how it goes. That's the other hand of the equation "just use sqlite and be happy" -- the expectation that performance will actually be reasonable.

ptrik 5 years ago |

Similar project: https://github.com/CodyKochmann/graphdb

bjornsing 5 years ago |

How does this perform compared to a “native” graph database like Neo4J?

lolive 5 years ago |

I wonder if there are ways, in SQLite, to build indices for s,p,o/s,p/p,o/ and maybe more subtle ones... That would be uber nice, given the fact that most graph databases have their own indexing strategies, and you cannot craft your own.

JimmyRuska 5 years ago | |

I saw this lecture some time back on the topic of implementation and tradeoffs https://www.youtube.com/watch?v=Dxwo9DYWV_c

westurner 5 years ago | |

rdflib-sqlalchemy is a SQLAlchemy rdflib graph store backend: https://github.com/RDFLib/rdflib-sqlalchemy

It also persists namespace mappings so that e.g. schema:Thing expands to http://schema.org/Thing

The table schema and indices are defined in rdflib_sqlalchemy/tables.py: https://github.com/RDFLib/rdflib-sqlalchemy/blob/develop/rdf...

You can execute SPARQL queries against SQL, but most native triplestores will have a better query plan and/or better performance.

Apache Rya, for example:

> indexes SPO, POS, and OSP.

mark_l_watson 5 years ago | | |

Thanks for your comment. I use rdflib frequently but have never tried the SQLAlchemy back end. Now I will. That said, Jena or Fuseki, or the commercial RDF stores like GraphDB, Stardog, and Allegrograph are so much more efficient.

roland_nilsson 5 years ago |

Isn't the whole point of graph databases that they can traverse graph edges efficiently by following pointers to nodes, which relational databases can't do? Then it seems a bit strange to implement a graph database on top of a relational database like SQLite?

tpoacher 5 years ago | |

Tongue in cheek answer, but: No. That is the whole point of "efficient" graph databases. The point of a "graph database" in the more general sense is simply to be a database that uses a graph paradigm.

This is a silly pedantic point to make, but it is not necessarily trivial. E.g. it may be the case that a particular use-case scneario does not require massive efficiency, and has a lot to gain from the simplicity of sqlite. In which case this kind of project is an amazing thing to exist.

And if there is a way to get a valid benchmark comparison against a more traditional "efficient" graph database, then informed decisions can be made.

As a personal anecdote, a friend and I based a graph-based project on neo4j and were very happy ... until it was time to deploy. We then realised the installations involved were highly complex, rarely supported on traditional webhosts, and costs involved for adopting 'formal' commercial solutions were highly prohibitive. Had we known about this project at the time we would have definitely used it instead (at least as a proof of concept; you can always switch to a more efficient database later if you really have to)

joshspankit 5 years ago | | |

Just a quick side question: Why not deploy with Docker?

My latest API+multiple frontends application uses Neo4j as the only database and we deployed with Docker (compose) with great success. With the config in git we were able to do the traditional test-new-versions-on-a-branch-before-deploy and everything is solid.

brian_herman 5 years ago |

Awesome! And you can write SQL queries on the data amazing SQLite is the best database.