How I Write SQL, Part 1: Naming Conventions (2014)(launchbylunch.com) |
How I Write SQL, Part 1: Naming Conventions (2014)(launchbylunch.com) |
select * from person join team_member using (person_id)
The other reason is person_id now unambiguously refers to the same field regardless if we're looking at the PK or a FK. It's always person_id.The advantages of using person_id are even more obvious in multiple joins, such as a star schema where you can using(person_id) all the things, reducing both the typing and the cognitive load.
I suspect that if this convention was more pervasive, programmers would be a little bit less afraid of diving into SQL.
Well, at least we're settling with some convention, so it's not all bad.
EDIT: typo
SELECT * FROM Person JOIN TeamMember on PersonId = Person.Id
SELECT * FROM Person as P INNER JOIN TeamMember as TM on TM.PersonId = P.Id
I have:
- Aliased each table and prefixed every field names with their table alias in my join conditions.
- Explicited the JOIN type.
The above:
- Reduces mistakes due to ambiguities that tend to generate unwanted duplicates rows in SQL.
- Increases the likelihood of getting an error at parse time, instead of run-time or analysis-time, thanks to added scoping.
- Works in any schema, no matter what naming conventions are followed.
- Keeps working as the query becomes more complex with multiples table aliases or self-joins, and similar field names appearing in the set.
- Better expresses intent. Sure JOIN defaults to INNER JOIN, but writing "INNER JOIN" shows that you genuinely expect any row not matching your condition to be removed from the result set.
There are cons, of course. This is a matter that divides people and when working with other people's projects you have to ignore your own preference and follow the "local" convention.
As well, fewer keywords are plural, compared to singular, so there's less chance of accidentally using a keyword if you use plurals
Haven't yet seen an "octopus" table in production...
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d5f2f...
> Yes, this is the same version as I found, but the closest thing I could find to addressing table names in the paper itself was an "Object Class name", something like an OOP Class or something you'd find in a UML diagram, but not really the same as a table name, and in any case all the examples were singular.
> Was actually kinda hoping Celko would deign to comment on this himself as he seems to be the chief proponent of the "collective identifiers as specified by ISO 11179" meme.
Page 10, SQL For Smarties (Celko), 5th Ed
If Celko says it's right, it's right
I just kept repeating, over and over, that I expected the coding-style to be consistent. It was totally over his head, and he totally didn't even bother looking to find a code formatting utility to do a One-Shot style change.
The other devs' eyes glaze over when I say things like 'stored procedures' and 'trigger functions.' Bah.
> Mixed case identifier names means that every usage of the identifier will need to be quoted in double quotes
I've used quite a few RDBMS engines, including most mentioned by the author, and I've never had to quote mixed-case identifier names. They work just the same as all lower-cased names or as any other case-sensitive language.
Most of the programming languages I use typically have the convention of using PascalCase for classes and public fields/properties so I prefer to use that convention for tables and columns (and then everything else for consistency). When doing operations between the application and the database, the name is exactly the same without the need for translation.
Otherwise, I think it's a good list.
I haven't used Postgres enough to notice this, it's almost a deal breaker.
I might be tempted to mandate that all identifiers be quoted than deal with half the possible characters for names. Although more likely all code-generation would happen on the application side with DB migrations so the database wouldn't the source of truth for identifier names, anyway.
create table foo (
Bar integer,
"Foobar" integer
);
Referring to Bar, BAR, bar, and "Foobar" will work, but foobar and Foobar will not.I though the canonical way of doing this was to write KEYWORDS in caps and use camel case for Variables.
Also never really brought into adding the type as part of a name - your type is already defined in your schema.
And then there's a11y: accessibility. This is all about making user interfaces accessible for people with low or no vision, low or no hearing, difficulty typing, and so on.
There are generally applicable laws requiring G11N and A11Y, and these fall heavily on OS vendors, which is why people who've worked on OSes tend to know these acronyms.
I18N -> dealing with Unicode in general, codeset conversions, font issues, ...
L10N -> dealing with translating system/application messages to the users' preferred languages (and how to even know they preferences) (think locales)
G11N -> I18N and L10N.
Localization is damned difficult. There's all sort of little bothersome things, like how to format numbers (which varies quite a lot) and dates (can't we all just use ISO-8601?!). And translating printf-like format strings is often non-trivial, especially when the coder doesn't stop to think about just how hard they might be to a translator as they write their code.
That's a new one to me, but makes sense. I've been lucky enough to have heard of i19n and l10n for years (almost decades, and this point) but not had to deal with it much beyond tracking down a string in some open source webapp I was patching before deploying.
> can't we all just use ISO-8601?!
Preach on. I sometimes find myself filling out date fields in paper forms in YYYY-MM-DD without thinking. The elementary school my kids attend probably thinks I'm a weirdo. I know my wife does...
T15X --> Tyrannosaurus Rex
D11S --> Dilophosaurus
B11S --> Brachiosaurus
T9S --> Triceratops
S9S --> Stegosaurus
The fact is, you're probably going to be issuing more SQL via abstractions like ORMs or querying libraries than raw SQL. If you need to work against the grain of those libraries to map your model, what upside are you getting?
If most of your data is queried via ActiveRecord, for example, you should use plural table names.
I've spent weeks of my life tuning SQL, to the point of writing SQL generation libraries to effectively override the database optimizer when it consistently makes poor decisions in specific use cases. But I don't expect the rest of the team to know SQL as well as I do.
When I'm writing or generating SQL, I don't really care much what the naming convention is. If it's consistent, then SQL is easier to write. Consistency is more important than the specifics of any conventions.
Glad you cleared this up for the rest of us.
FWIW, naming conventions are like opinions. Everyone has them, and they usually differ from person to person. The best naming convention is a consistent naming convention. Also, naming conventions differ greatly by environment. A group of SQL Server engineers are going to have different standards than those of people working on mysql.
The only generic rule is "be consistent". Whatever convention/style you choose, it should be consistent.
The few advantages of singular:
1. It's not always clear what the plural of something should be.
2. Chances are the singular maps better to your application layer (Person class <=> Person table).
I don't think one standard is necessarily better than another, but the important thing is to have rules. Over the years I've adopted similar rules for myself, and just internal consistency is so much better. I have old projects with table names including: "logs", "log_requests", "log_users", "game_logs", etc., not to mention mixing of plural and singular, to the point where I need to `show tables` before writing any query just to remember what I even called the table I need.
It starts off with Yes. Beware of the heathens. Plural in the table names are a sure sign of someone who has not read any of the standard materials and has no knowledge of database theory. I thought the author was being flippant, but it became increasingly obvious that this is a true reflection of their dogmatic view with regard to this topic.
Even if this is the same view I would settle on with all the knowledge, being presented with what is obviously a single perspective with no acknowledgement whatsoever of any positive aspects of alternatives causes me to instinctively distrust quite a bit of the reasoning presented.
We're always learning.
those tools are wrong (and I know roughly which ones those are).
> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.
it would be: person.id and the foreign key column that refers to it person_thing.person_id. This is much preferable to person.person_id and person_thing.person_person_id.
I assume you mean you just use PersonID as the foreign key. This oftentimes introduces ambiguity into what the relationship actually is. I prefer names that describe the actual relationship (e.g. author, owner, approver, etc) rather than letting other people guess what it is.
Otherwise you're actually introducing ambiguity imo.
So now you have to join with the Person table on Person.PersonID from your local column PersonID. I much prefer the other way around: Table.ID with foreign keys being "TableID".
Do not do this in Postgres, it will be a pain in the ass since you will have to use quotes around everything.
At least PG tries really hard to not add new reserved keywords, which means you mostly don't have to worry about your schema element names possibly conflicting with new keywords in future releases.
Use an ORM when appropriate; when using one, follow its conventions. Don't use an ORM if it's not appropriate. This is much better advice.
My experience has been that transactional code is usually best written with an ORM, and complex reporting code winds up better with SQL.
SELECT * FROM table1 INNER JOIN table2 USING (table1_id);
vs SELECT * FROM table1 INNER JOIN table2 ON (table2.table1_id = table1.id);https://www.ftb.ca.gov/aboutFTB/Projects/ITSP/Part_5_Naming_...
The word "collective" doesn't show up in the document.
The word "plural" shows up twice, both times in item #a of "Lexical rules":
> a) Nouns are used in singular form only. Verbs (if any) are in the present tense.
> NOTE In Japanese, this rule shall not be applied because of no plural form of nouns and no distinction of verb tense.
The only reference on Wikipedia to the ISO 11179 standard making that recommendation has since been deleted for being unsubstantiated:
https://en.wikipedia.org/w/index.php?title=Data_element_name...
edit: The apparently inaccurate content is talked about on the "Data element name" Talk page: https://en.wikipedia.org/wiki/Talk:Data_element_name
maybe we should have a fizbuzz for SQL as a filter :-(
It's a small thing. You'll likely know enough about the tables to be able to know this information anyways. Then again, knowing which "id" field you want is pretty obvious too. In the end, there are pros and cons to both, and it's mostly preference.
That said, once it leaves the database, I much prefer my records have short id fields, which likely influences by schema design to some degree.
I used to just use ID as the name for surrogate primary keys, but find being more explicit to be helpful for clarity. It is one of a number of habits I used to have in the name of being concise that I now prefer not to do these days in the name of being descriptive.
[1] which I do for keys themselves and other table supporting objects[2], the PK for a table called Thing is explicitly named pkThing rather than letting SQL Server pick a name [2] for instance an index on Thing covering col1 and col2 is ix_Thing_col1_col2 (unless the index exists for a fairly specific reason that is somehow unclear from what it covers/includes in which case that reason is in the index's name)
Personally I write very little SQL anymore (the ORM does that for me unless I need performance), but the only time I use aliasing is when the same table is joined multiple times.
Like:
SELECT Mother.Name, Father.Name, Child.* FROM Person Child JOIN Person Mother ON Mother.Id = Child.MotherId JOIN Person Father ON Father.Id = Child.FatherId
That said, this naming convention also obscures what Person.FatherId points to without looking at the FKeys. So Take that with a grain of salt.
SELECT * FROM person, team WHERE person.team_id=team.id
(no unnecessary aliasing, no weird JOIN phrasing)SELECT * from person CROSS JOIN Team WHERE person.team_id=team.id
SELECT * from person INNER JOIN Team ON person.team_id=team.id
That they happen produce the same result in a query is practically just luck. Changing the join from INNER to LEFT OUTER is also much easier than managing (+)'s in the WHERE clause, once you're used to it.
It's harder to see in your example, likely because you tried to keep it similar to the example presented, but once there's more than a couple joins, having the join conditions close to the join is essential for keeping track of what's going on. e.g.
SELECT * FROM person, team, role, group, person AS lead
WHERE person.team_id=team.id
AND person.role_id=role.id
AND person.group_id=group.id
AND group.lead_id=lead.id
Compared to: SELECT * from person
INNER JOIN team ON person.team.id=team.id
INNER JOIN role ON person.role_id=role.id
INNER JOIN group ON person.group_id=group.id
INNER JOIN person AS lead ON group.lead_id=lead.id
And let's be clear, we know these should be left joins, because the chance some person doesn't have a team, role, group, or a group ends up without a lead is high when sampled over time. And simulating left joins with the non-ANSI joins quickly gets unwieldy.For example:
JOIN team_member ON team_member.person_id = person.id
It's completely self evident what is being joined without the need to trace back to the table aliases.
Pretty much all of your advice is premature optimization in my eyes. You can/should do those things when they are needed, but there is no reason to automatically write every single SQL query that explicitly.
I don't name the primary key field id because I'm denoting a type, I call it id because it holds the identifier for the current record. I don't call the foreign relation identifier person_id because it holds a "person" and is an id, I call it person_id because it holds the identifier for a person record, and that's a valid description for what the field holds. The constraint tells how the relation is defined.
> the PK for a table called Thing is explicitly named pkThing rather than letting SQL Server pick a name [2] for instance an index on Thing covering col1 and col2 is ix_Thing_col1_col2
So, do you call those fields that index works on col1_ix1 and cal2_ix1 or something? If not, why do you denote the primary key with PK, and not indexed columns? Both have explicit definitions in the schema that specify exactly how they are defined, one a primary key, the other an index over multiple columns.
To me this sounds a bit like how Hungarian Notation[1] had it's original purpose lost as it shifted over time to encompass a larger, less well defined set of behavior which didn't always deliver an actual benefits.[2] To me, if the system is enforcing the values, and you can introspect the system, that's good enough.
That said, I did spend a while reading an IDEF1X explanation and HOWTO last night, and there are some interesting points and arguments there, so I'm not entirely set in stone with these views.
1: https://en.wikipedia.org/wiki/Hungarian_notation#Systems_vs....
2: TL;DR "Apps Hungarian" prefixed variable names with info about what they were supposed to contain at a high level, such as rwFoo containing the row of Foo. Systems Hungarian came along later and they used the prefix to encode the type, such that iFoo might mean an integer Foo. The benefit of this in a language such as C or C++ that requires explicitly typing the variable anyways is debatable.
3: http://www.softwaregems.com.au/Documents/Documentary%20Examp...
In that sense, I'd argue that an ORM is only appropriate if your understanding of SQL and database design are strong enough that you can understand what's happening under the hood if/when everything goes to shit!! :)
An ORM is an abstraction of a concrete system. For very simple use cases, that’s fine. For anything even slightly complex, the ORM becomes a tool that you can use to enhance code readability, or reliability, or modularity. But it’s essential to know what is going on underneath before using that abstraction, much like most other systems. There are too many times I’ve witnessed a less experienced developer build a shockingly expensive n+1 query using an ORM to doubt that :)
It's just as vague. If you know of a good article that goes over the details, that'd be cool :^)