Joins In Steps(zindlerb.com) |
Joins In Steps(zindlerb.com) |
And yet, I've got not idea why would anybody need right join.
Have you guys ever had a case when you'd need a right join? I've been to the field for 15 years and yet to see people using right join in the wild.
Like the last example in this link - why would you do that? Most probably your business logic focuses on dogs, something like "find dogs with no owner" or something, In this case it is much more readable and straight forward to go with left join or even with sub-select where you'd have something like 'select * from dogs where owner_id not in (select id from owners)'.
Have you used right joins and if you have can you explain the use case?
Think of left join as a join from x to y, and a right join as a join from y to x, where x is the data we keep all of and y is the data we keep only when there's a match.
Then, in R, I often use right joins when my "y" data requires preprocessing, resulting in lines of code that are like: (y %>% preprocess1() %>% preprocess2() %>% right_join(x)). I could of course write this as "y_preprocessed = y %>% preprocess1() %>% ...; x %>% left_join(y_preprocessed)" but I think the former is actually a little syntactically clearer.
x %>% left_join(y %>% preprocess1() %>% preprocess2())
But yes, my understanding is that right and left joins are the exact same function, just with the order of input switched.I believe I used one once, exactly once ever, about a year ago. That was because I was fighting an almighty 5 page CTE charlie fox of a query and had to turn an inner join into an outer join to get data otherwise excluded. It just so happened the data missing was on the right so I made it a right join. Had it not been so awful I'd have flipped it into a left join, my usual style, but I couldn't bear rewriting it.
(background: it was complex but could not be broken down or it lost efficiency. Part of the length was long explicit lists of fields 'select a, b, c, d, ...' which were unavoidable, but it was just plain complex too. The inner join suddenly became an outer join because we found data missing - the DB was a commercial one written by some clueless people with little consistency and absolutely no documentation. I hope I never see that again).
The example shows a case where you'd want foreign keys to enforce referential integrity, though FKs aren't fashionable these days. You shouldn't be able to have an entry for owner_id 8 in the Dogs table without a corresponding Owner.
I just checked some code, saw 357 left joins and 1 right join.
SELECT p.password, u.id
FROM password p
RIGHT JOIN "user" u ON p.user_id = u.idselect * from dogs where owner_id not in (select id from owners)
You wouldn't actually recommend something like this would you? The query plans for sub-selects vs. left-joins are very different, with lots of implementations having limits on the "in" clause.
Kind of funny that the way you'd feel if you saw a right join is similar to how I feel about sub-selects!
I haven't seen anybody hitting "in" clause limits in query like mine but we did hit such a limit once when we were passing a hardcoded list of IDS from client side. Something like "where blah in (id1, id2, id3.... idn)". I remember that one failing on MS SQL.
I thought it made sense to start the SELECT from one table. From there, I could left join to another table, and from that table I could only right join to get what I needed.
I'd love to know, is there another way around that? I know I could have started from a different table, but I think the query makes more sense starting from where I did.
Having that, and helping people learn that, would make it much harder to not “get” how joins and the like end up working
>The left right is the table named in the outer join part of the query.
select * from a inner join b on mycustomfunction(a, b);
And if you do this, it will be incredibly slow because you cant use any indexes in this case.
[0] https://www.postgresql.org/docs/current/indexes-expressional...
[1] https://dev.mysql.com/doc/refman/8.0/en/create-index.html#cr...
Also it's downright hard to make a good query plan for a nontrivial query, and it gets exponentially harder.
Honestly the optimiser is in a better position to do the work.
On re-reading I think you're talking only about pedagogy in which case disregard the list-of-cons in my post. And I think it would be a damn good idea! I really like it.
I would appreciate a way to encode execution efficiency parameters in the semantics of the query itself. Specifying which indexes to use, or the runtime complexity of certain operations. This would also help me figure out which indexes I need to create, in the first place. Today I can add a clause in an order which makes using an index impossible, rendering a previously efficient query suddenly extremely expensive. The failure mode is very obscure: it all works, the DB just works overtime. I might not have a large enough DB yet to actually notice this, until one day I see these slow queries and have to backtrack all the way to that one commit that caused it, six months ago.
I have encountered this scenario a few odd times. It is not always obvious to someone changing a query that it is meant to use an index, and it is very hard for someone writing it to specify so in a unit test.
Flexibility for the scheduler to choose efficient strategies cuts both ways. I'd appreciate the ability to leave more explicit annotations about expected runtime characteristics, than mere comments offer.
Often you can just think a bit harder about your set based logic and make an index that's reusable by other queries.
In other SQL's you can also make a materialized view, which may be able to persist some of these things, its just like keeping n+1 copies of your data to satisfy a different set of relations/projections.
Could this be a rare case of the mythical sufficiently smart compiler having more information at runtime than the programmer has at coding time, and that if you could hint which index to use, the trade off would be that you'd instead be troubleshooting queries which end up slower because of it?
Mythical? Isn't this the core concept behind optimizing JITs?
That's a mistake. You can in MSSQL but you'd likely regret it (this from experience). Using an index can be a mistake; sometimes a table scan can be more efficient. And which, full scan or index, is more efficient is down to the data in the table and the query predicate.
(Edit, so which to do, scan or index, can be different for exactly the same query depending on the test:
select *
from people
where surname = ?
can be either depending on whether surname is 'smith' (vey common) or 'hepsibah' (very rare) ).> I'd appreciate the ability to leave more explicit annotations about expected runtime characteristics, than mere comments offer.
Interesting. Can you elaborate?