SQL AntiPatterns Presentation from MySQLConf 2009(slideshare.net) |
SQL AntiPatterns Presentation from MySQLConf 2009(slideshare.net) |
The author lists a drawback of the first antipattern ("metadata tribbles"): a table split into multiples to keep its size down requires tricky querying, specifically, it requires a union across all splits (slide 10). It also requires extra effort to keep table structures synchronized (slide 11). Then, in a solution to his second antipattern ("entity-attribute-value"), he suggests doing something quite similar to his first antipattern: creating multiple tables with nearly identical columns and using a union to query across them (slides 27 and 28).
Hello? So to resolve antipattern 2 you just apply antipattern 1? This is why precisely why relational databases for most problems look like square pegs being hammered into round holes.
[EDIT] Not sure if it's worth finishing reading this presentation. Slide 56, a "solution" to the problem of storing a hierarchy in a relational database. Breadcrumbs. Not bad, except that the query now requires a "like" clause. Whoops, can't use an index on that column anymore! (Feel free to correct me if there exists a RDBMS which can use an index on a "like.") Hope the author doesn't mind a nice table scan on that query. He doesn't even mention this problem.
A big part of why the modelling of data that you don't know the shape of in advance in a relational database is hard is because reasoning about data you don't know in advance is hard, and the model exposes that difficulty up front.
(Also, if like has its wildcard at the end of a string not the beginning, indexes can help you as they can look into the sorted trees.)
huh? mysql does that for 'like' queries which don't start with a wildcard (that's exactly what you need for breadcrumbed trees).
also (i didn't read TFA, though) you don't have to use 'like'. you can use BETWEEN or just "path >= 'a' and path < 'b'", which, i suppose, uses index in even the most stupid RDBMS in the world).
prooflink log from my live real production database with a breadcrumbed tree: http://pastie.org/541611
where col like 'str%'
However, doing where col like '%str'
Means it cannot use any index. So in the example given, it would be able to use an index because there is a leading 'path' before the %, assuming MySQL can do the same thing.In Oracle you can do hierarchy query's using the 'connect by' syntax, which means you can structure your table with id, parent_id and get all the rows out efficiently in a single piece of SQL, which is nice!
Comments:
/-----------------------------\
| ID | ParentID | etc |
|-----------------------------|
| 1 | Null | etc |
| 2 | Null | etc |
| 3 | 1 | etc |
| 4 | 1 | etc |
| 5 | 4 | etc |
\-----------------------------/
You then use a Common Table Expressions query, like this: WITH CommentTree (ParentID, ID, Level)
AS
(
SELECT ParentID, ID, 0 as [Level],
FROM Comments
WHERE ParentID = [ROOT COMMENT ID] OR ParentID IS NULL
UNION ALL
SELECT c.ParentID, c.ID, ct.Level + 1
FROM Comments c
JOIN CommentTree ct ON ct.ID = c.ParentID
)
-- Now actually query the data from the CTE Expression
SELECT ParentID, ID, Level FROM CommentTree
ORDER BY ParentID, Level, ID
This returns a table with a calculated column called Level. /---------------------------\
| ParentID | Level | ID |
|---------------------------|
| Null | 0 | 1 |
| Null | 0 | 2 |
| 1 | 1 | 3 |
| 1 | 1 | 4 |
| 4 | 2 | 5 |
\---------------------------/
Which of course, would represent this tree: Root
/ \
2 1
/ \
3 4
\
5This "path" method is straight out of SQL for Smarties - Hierarchies & Trees" and though the nested set method is probably better for most uses, the path column has its uses.
RDF would have been great if the W3C wasn't full of idiot blowhards.
There are some application domains that have not lived up to their potential because of RDBMS abuse where EAV would fit perfectly -- Desktop Search for example. All of [Spotlight, Google Desktop, Beagle, Gnome Tracker] use RDBMSs with a pre-determined set of 'attributes' baked in, crippling any creative use. Spotlight at least has the decency to let you add your own non-indexed properties, but it's not very useful. The contents of people's filesystems and metadata about them will never fit into some spec-writer's schema.
Compared to RDBMSs I think sugared EAV is lot more amenable to being abused in the "when all you have is a hammer" way -- they start a bit slow but scale linearly with more resources without getting any slower.