Python: Just Write SQL(joaodlf.com) |
Python: Just Write SQL(joaodlf.com) |
- The ORM is an externally maintained open-source project with a plurality of contributors; "just write SQL" is not
- The ORM is designed to support the full lifecycle of the application including migrations; "just write SQL" is not
- The ORM is documented to be legible to newcomers; "just write SQL" is not (for all but the simplest of applications)
- The ORM is composable and extensible with opinionated and customizable interfaces for doing so (I've lost track of the number of times I've had my mind blown by how elegant and smart Django and SQLAlchemy's query management tooling is)
- The ORM has a security posture that allows you to both reason about your application's security and receive security updates when bugs are found
- The ORM is a platform for many other modules responsible for different layers of the application (DRF, OpenAPI, django-admin, testing utilities, etc. etc.) to plug into and allow the application to grow sustainably
I now try to guide people to a middle ground. Yes, both Django's and SQLAlchemy's ORMs can be annoying, have performance issues, etc. But for large applications maintained by multiple people over time, their benefits usually outweigh the drawbacks. Both have extensible architectures that allow customization and opinionated restriction of the interface that the ORM presents. If you're unhappy with your organization's ORM, I suggest you try that route first.
https://github.com/sqlc-dev/sqlc
It's written in Go and it converts your sql migrations and queries into typesafe code that you use access your database.
It currently has a plugin for Python that's in Beta, but what essentially does something similar to what this post is saying.
https://github.com/sqlc-dev/sqlc-gen-python
You write your migrations, and queries and a config file and it does the rest.
Then I discovered peewee. I am happy now.
I have used several Python ORMs over the years, both for SQL and NoSQL. SQLAlchemy is the most powerful way of interacting with a relational database I have experienced.
I also write Go, and when I do, I do not use an ORM. But when it comes to Python I know my solution won't be better than SQLAlchemy, so why bother rolling out my own?
So what I do is write SQL commands, but keep all inside a specific file or module of the project. So that I can decide later to refactor it into an ORM.
I think ORMs are great if you write libraries that target more than one database. Or situations, where you have more than one database and need a proper migration part.
If you don't need migration, but in the worst case can start with a fresh, empty database, then write SQL.
But for production system, the no/manual migration might get old quickly. Writing migration code that just adds fields, indexes or tables is easy. But writing code that changes fields or table structures? Not do much.
Still, you don't need an ORM at the beginning of a project, just don't put SQL everywhere.
As a Django developer it's straightforward to go from one Django project to another, which isn't the case with other stuff as you don't know where everything is going to be.
If your app can work well with static queries then you should not add an ORM.
Never use an (active record) ORM. Ever. For any purpose. They are a disastrous idea that should be un-invented.
I have this discussion with a lot of people who claim they cannot imagine working without an ORM and that "surely" using SQL is so much more work blah blah blah. Yet they have never tried! And aren't willing to try!
You should try it.
PugSQL is a simple Python interface for using parameterized SQL, in files, with any SQLAlchemy-supported database.- immutable record type for each table in the database (could be a data class)
- functions for manipulating the tables that accept or return the immutable record types and directly use SQL
- that's it
you can generate the functions from the database schema if its too much boilerplate.
not readable at all, not easy to change, which is a terrible development experience.
I guess if it is simple CRUD, it does not give too much problem, but it will definitely work in a complex case.
Anecdotally, I haven't seen a place where Go is used without something like gorm or sqlc.
... oh wait, I thought you said migranes.
...
Proceeds to create an ad-hoc ORM.
Rob Pike's messed up ideology made people think abstraction is bad.
As this argument comes up over, and over, and over, and over again, writers of the "bah ORM" club continuously thinking, well I'm not sure, that ORMs are just going to go "poof" one day? I wrote some years back the "SQL is Just As Easy as an ORM Challenge" which demonstrates maybe a few little things that ORMs do for you besides "write SQL", like persisting and loading data between classes and tables that are joined in various very common ways to represent associations between classes:
https://gist.github.com/zzzeek/5f58d007698c4a0c372edd95ab8e0...
this is why whenever someone writes one of these "just write SQL" comments, or wow here a whole blog post! wow. I just shake my head. Because this is not at all what the ORM is really getting you. Plenty of ORMs let you write raw SQL or something very close to it. The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects. Not to mention abstraction over all the other messy things the database drivers do like dealing with datatypes and stuff like that.
It looks like in this blog post, they actually implemented their own nano-ORM that stores one row and queries one table. Well great, now scale that approach up and see how much fun it is to write the same boilerplate XYZRepository / XYZPostgresqlRepository code with the same INSERT / SELECT statement over, and over again. I'd sure want to automate all that tedium. I'd want a one-to-many collection too maybe.
You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings, and still use the ORM, and still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows. But why would anyone really want to, writing SQL for CRUD is really repetitive and tedious. Computer can do that for you.
Yes, since they are an notoriously bad abstraction, every ORM will give you an escape hatch for the minority of tasks that it can't abstract at all. That escape hatch is not in any way a defining feature of the system.
Now, about that extra freedom that you get from dropping the requirement that your connector is an ORM... well, neither your data at rest, the presentation to the user, the abstraction for a service client, nor your validation rules benefit from OOP. Proof of that is that OOP interfaces from all of those things precede the current popular ones, and all of them were abandoned due to fundamental dissonance between the model and the data.
The rationale for an ORM is that, even though none of the interfaces you actually want for your data is best done in OOP, somehow OOP is still the best way to integrate them so that you can reuse data and metadata. This thought is not completely without merit, but there is very little empirical motivation for it, and the tiny amount that exists is more than completely explained by the OOP hype that only started to die around a decade ago.
EDIT: Oh I saw you wrote SQL Alchemy! First, thank you for that great piece of software.
Now, SQL Alchemy does provide a lot of useful ways to postpone the object representation or even to map your values into dumb data. My comment up there is on theoretical limitations, but on practice, I do think it's the best option available on Python. (And maybe about the best option afforded, since the language is intrinsically biased into OOP.)
> The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects.
I think that that is the problem: mapping a relational dataset to a hierarchical dataset is the digital equivalent of pounding a square peg into a round hole.
I know you've read Ted Neward's "The Vietnam of Computer Science." (it's a short read, so search for it), and he articulated the same thoughts I had each time I had to work with an ORM in a mainstream language.
There is an impedance mismatch of sorts between relational data and hierarchical data. In cases where there is no hierarchy, you can do away with the ORM completely and turn out much easier to read code, for example with `sqlc` and `Go`.[1]
The problem with the ORM is the programming language - one in which hierarchical objects are the idiomatic way to code (all of the mainstream OO languages) is always going to require some wrangling.
> still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows.
Unless you're using `sqlc` and Go, where the automation is provided
[1] Now you may argue that sqlc is technically an ORM, but then where do we draw the line for calling something an ORM? Is it the case that any method for "generating boilerplate for mapping relational datasets to in-program structures" is going to be called an ORM? Because to my mind, the result is so different for in-program representation that they aren't the same thing.
I mean, that was really an important point in time when there really werent ORMs that were easy to work with, there was Hibernate in a very early stage and there were overly simplistic things for Perl, so I thought it was important that this "better way" I had in mind could be put out there, before the idea that "yeah let's all avoid ORMs unconditionally" could take hold.
Yeah if your app has one table and two queries, write the SQL and marshall the data yourself. I think everyone should do that approach anyway for awhile so they know what's actually going on. But if you are writing for 1200 tables, it's just not practical. if the Go thing is also automating that and generating SQL / data marshalling boilerplate code for 1200 classes, yes that's kind of ORMish. that's a totally valid way ORMs are written and if I had multiple lives I'd probably write a stored procedure ORM that does something like that too.
I swear the majority of these opinion posts against ORMs are from people who must have worked in a badly implemented project that left them with a bad experience and they blamed the pattern rather than the technology.
One of the best bits about an ORM is making it consistent for non-db users on the team to simply grok and work with, without creating monstrous and hard to debug joins everywhere. But when badly set up it can lead to a lot of debugging spaghetti. Which is the same as can happen with SQL but I suppose people think that at least there’s one less layer to debug while ignoring the problem is actually how they got where they are and not the technology
I switched a project from manually written sql to sql alchemy on a project that’s used by multiple Oscar nominated films daily for reviews. The SQL version was gross and impossible for the team to manage because it had bloated over the years, with no nice way to detangle the statement generations and joins. SQL Alchemy made it so any one of the technical directors on the team could step in and add new functionality, without serious performance footguns. Instead of me having to clean up bad sql every year (projects would fork per film and merge at the end) to keep performance up, I could trust the ORM to do that for me.
At its worst, it was way too easy for TDs to get the raw SQL to be tens of minutes per review session by structuring their logic incorrectly, but it was so difficult to see. Switching to an ORM meant I could get the performance down to seconds per session and they couldn’t destroy the performance in subtle ways.
Fair. But you then proceed to detail a personal experience on the other side of the spectrum: Badly written code, without an ORM, and how it was fixed by introducing an ORM.
I think we can all agree that you can write bad code, with or without an ORM :). Not that this is entirely relevant to my post, I am simply advocating for writing more SQL, not how to write a good object mapper. That's a different beast, and I purposely kept that simple just to illustrate that it is possible to get started without too much pain.
"microservices" - https://news.ycombinator.com/item?id=37125636
That cleared things up, at least for me. I totally agree that if your job is to essentially write a custom, RESTful, fault tolerant, 99.999% uptime accessor for a table or two, basically you're hired to be the ORM, so you don't want to just suggest people to use a ORM framework and effectively make yourself redundant.
I've found I would generally only need the handling of database drivers and query building since I'm already writing a validation layer and to add data marshalling to that is pretty trivial.
Likewise practicing YAGNI, what is the chances I need multiple database drivers for different databases, it's extremely unlikely that I'm going to be chopping and changing between different databases so I'm really only writing that code once.
I would argue to start with writing the basic SQL queries and adding an ORM later when you know you actually need it.
It's much easier to onboard someone into pure python code + a db driver vs having to onboard someone to SQLAlchemy, since it is quite a complex piece of software, necessarily complex for what it is trying to achieve but if you don't need it it's not a good fit.
I think you missed the point of the parent comment, which is that ORM's are not about writing SQL queries (although they do that). But ORMs are about moving data around, transforming it from rows and columns into meaningful objects in the project's language and data model.
As the parent comment suggested, if you are dying to write your own SQL (which does often happen as queries get more complex and don't fit into ORM language model) then you can write raw SQL but still let the ORM do the heavy lifting so you can take advantage of those features, which is the majority justification for the ORM in the first place.
You are basically suggesting what the original post author is suggesting. The comment above provided a rebuttal to their argument and you replied by suggesting the same thing they originally rebutted to. Hence a circular argument.
This is why I have come to hate YAGNI. Nowadays when it's said what I hear is "I don't understand why I need it (yet)".
With that out of the way, despite ORMs doing much more than "just writing SQL", it is exactly on that point that I flinch: Most devs should be exposed to SQL. And if your project allows you to build around simple enough abstractions so that you aren't reinventing the wheel, you should definitely be writing SQL. Especially if you don't know SQL yet - which is the growing case of new devs coming into the job market.
You can achieve a lot with SQlAlchemy Core, a tool that I absolutely recommend, but my post is just a simple alternative to get developers to think about their approach. If that results in some devs reconsidering using "full fat" SQLAlchemy and to try SQLAlchemy Core, that's a win for me!
Your gist tries to highlight the difficulty of doing certain things without an ORM. Migrations (as just 1 example) doesn't need to be hard, simple tools like flyway, or migrate (https://github.com/golang-migrate/migrate) achieve a similar result (while also keeping you on the path of writing SQL!). Deep and complex relationships between objects also don't need to be hard - typically people approach this subject with a requirement to be very flexible in the way they want to build queries and objects, but that to me in a sign that maybe they should reconsider their business logic AND reconsider that, just maybe, their project doesn't require all that flexibility, it is fairly straightforward to extend objects and introduce some more complex representations as and when it is needed - will all of this make me write code faster? Absolutely not. That is why you have spent so much time perfecting SQLAlchemy, but then again, I am not advocating for devs to go and replace their usage of ORMs, just presenting an alternative that may or may not fit their needs for a new project + give devs the chance to learn something that the ORM might have taken away.
Agreed, and this is the primary reason that ORMs are a necessary tool. And possibly that it's easier to train a junior developer to use one than it is to get them to a basic level of proficiency (and security awareness!) in SQL.
That said, I think this is one of those areas where It's Complicated, because inefficient database calls due to ORM usage is one of the primary ways I see applications completely break down. For most types of apps I've seen, which are very read-heavy and which aren't doing intensive writes, roughly all your writes should probably be using the ORM (including such niceties as validations, default scopes, all that nice stuff), and if it's simple enough, your "show" actions (fetch and display one entity) may be fine as well, but every "dashboard" and "index" action (show many entities, basically things with joins) likely need to be written in SQL.
In my experience (Rails), the object instantiation cost is insane, much greater than the actual time talking to the DB, so not only do you need to write SQL, but you need to handle the data that comes back without instantiating ActiveRecord models.
This is much harder work (more specifically, it needs much greater skill and experience, and is easier to make a mess of) versus using the ORM and models, but only on apps with tiny amounts of data per request, or very low request volume, can the ORM be a serious exclusive option for this task. Unless you want to end up like an app I once was asked to help fix, where they were on the $9,000 per month Heroku postgres instance, and since that's the biggest one, they could "scale" no bigger. (Okay, this wasn't their only problem, their main one was not understanding that you don't sprinkle analytics DB writes all over the place because now the simple high-volume "read" pageviews can't be generated using just a read replica).
That's easy. It's a database mapping library. Write the query, give it an object and it fills it in or reads from it. You can do this with annotations, or struct tags in the Go world. There's no need to introduce abstractions over SQL joins which I find very off-putting, because the abstractions are never perfect and suddenly you have to learn invented concepts just because you didn't want to write a JOIN in an SQL query but rather have some clever framework introspect some classes and automagically write the JOINs for you.
And the mismatch has two sides. If one's answer is "just use SQL!" then you're going to have new problems dealing with the mismatch coming from the SQL side of things.
The pre-ORM solution to this was not simply to shove a bunch of SQL in your application logic (though this was done), but to have a much, much more complex set of data layer logic. I'm guessing most people today writing about eschewing ORMs in favor of pure SQL have never used: stored procedures, triggers, cascades etc. I personally do miss some of the features from that era of software, but there's a lot more complexity to the "just write SQL" approach than most people realize.
0. https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...
Your example gist is essentially just "ORMs excel at this one thing, anything else is worthless" and also effectively hides code ("the benefit of the library!!!") to make it a very much disingenuous comparison.
> But why would anyone really want to, writing SQL for CRUD is really repetitive and tedious. Computer can do that for you.
Performance.
I wanted to love SQLAlchemy but even in some relatively simple things it generates really asinine queries that take insane amounts of time. It's just usually not noticeably insane amounts of time at first. So you query for some data and wind up with a query that returns in 20ms when there's 20 rows in the database but falls apart when there's 200,000.
It's also bad at projecting analytics type queries onto a transactionally-normalized database, but then most ORMs are not great at that.
I had both of these issues enough times that I instead just opted to start handcrafting queries in those cases and using a single "ResultSet" type class that projects as both a SimpleNamespace and a dict.
Also my 40% / 60% breakdown is based on the SQLAlchemy source code itself regarding what parts of the code deal with generating SQL and what parts deal with all the rest. 40% is likely a large overestimate.
I still fail to see how anybody who actually knows sql and works with “Big Kid” datasets would use an ORM.
That said, I fully agree that the ORM isn't necessarily the problem. I point the blame at over eager data modelling that ignores the OLTP/OLAP divide. Or that ignores that row creation has a different contract than row editing, such that using the same object for both leads to confusion with missing fields. Heck, even different edits have different contracts, such that mapping a single "object" to a database row is often a mistake.
If you think of relational DBs as just CRUD machines, an ORM makes total sense, but that's the original mistake.
I'm kind of on the side of avoiding ORM unless you have a clear need for it. But I've seen projects where they are very valuable. If you mainly want to register, update and delete a bunch of structured data, it's not a bad idea to put it in a dabatase. And if you do those operations a lot in some part of the application, it's not a bad idea to use an ORM there. For analysis and reports and such I would probably suggest just writing SQL directly though.
Some of us do very intense compute in very large datasets, and ORM are not capable in those tasks. At all.
That's actually the problem.
I'm sure you're aware that ORMs have this fundamental problem called the ORM impedance mismatch problem. AFAIK it remains unsolved to this day.
> You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings
This is true, but it's like saying you can disassemble a Boeing 737 to build chopsticks. It's true but completely unnecessary.
You don't actually need an ORM for that. And some of us (I know I'm a minority) actually want to have pure data objects. Some of us (talking about myself again) actually design schema-first - as in, you use DDL and your code just consumes the definition. Some of us don't care about code knowing about relationships which are an intrinsic part of the data model.
> You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings, and still use the ORM
SqlAlchemy is an amazing piece of software, and it is relevant becaise it is actually part of the problem. You don't really have an ecosystem of database abstraction libraries in python, as you have in other languages. The options are SqlAlchemy or Django ORM. Golang? You have 50 different libraries that implement 100 ways of doing the same stuff. C#? You have probably a dozen libraries besides Entity framework that provide different level of abstractions? Java? You got low level stuff up to high level ORMS. PHP? The same.
Python? You have SqlAlchemy, Django or roll your own. Both SqlAlchemy and Django can do raw queries with little effort - but if you're doing this, arent't you discarding most of the functionality that comes with it - namely, code maintenance? If you have a big project with a bunch handwritten queries (because you can do it, and quite easily, I may add), aren't you just ignoring any of the advantages of the orm? If, when you edit a model, you need to also check handwritten sql queries, you shouldn't be using an orm in the first place.
And with SqlAlchemy and Django, there is no middle ground. You either are all in, code-first definitions, or you are in a world of pain.
That is why specialized applications tend to be rewritten in other languages. Because optimizing the data layer alone often boasts 10-100x performance increase, when compared to this kind of abstraction.
> writing SQL for CRUD is really repetitive and tedious
It is. But between that and SqlAlchemy - in most languages, there are options. Not in Python. Between orm and no orm, I'll take no orm any day of the week - at least I can understand the queries.
In other word I think there are some tooling left on the table that can assist in increasing SQL literacy and comprehension.
If anyone is looking for this "pure" ORM (no query builder API) in Node there is https://github.com/craigmichaelmartin/pure-orm
The number of people who know SQL well and still choose an ORM seems to be very, very low in my experience.
I totally understand people having issues with Django's ORM due to the query laziness making performance tricky to reason about (since an expression might or might not trigger a query depending on the context). In some specialized cases there are some real performance hits from the model creation. But Django is very good at avoiding weird SQL issues, does a lot of things correctly the first time around, and also includes wonderful things like a migration layer.
You might have a database that is _really_ unamenable to something like an ORM (like if most of your tables don't have ID rows), but I wonder how much of the wisdom around ORMs is due to people being burned by half-baked ORMs over the years.
I am curious as to what a larger codebase with "just SQL queries all over" ends up looking like. I have to imagine they all end up with some (granted, specialized) query builder pattern. But I think my bias is influenced by always working on software where there are just so many columns per table that it would be way too much busywork to not use something.
ORM queries compose. That's why [Python] programmers prefers them. You can create a QuerySet in Django, and then later add a filter, and then later another filter, and then later take a slice (pagination). This is hugely important for maintainable and composable code.
Another thing that's great about Djangos ORM is that it's THIN. Very thin in fact. The entire implementation is pretty tiny and super easy to read. You can just fork it by copying the entire thing into your DB if you want.
Firstly (1); "I want to use the ORM for everything (table definitions, indexes, and queries)"
Then second (2), on the other extreme: "I don't want an ORM, I want to do everything myself, all the SQL and reading the data into objects".
Then thirdly (3) the middle ground: "I want the ORM to do the boring reading/writing data between the database and the code's objects".
The problem with ORMs is that they are often designed to do number 1, and are used to do number 3. This means there's often 'magic' in the ORM, when really all someone wanted to do was generate the code to read/write data from the database. In my experience this pushes engineers to adopt number 2.
I'm a big fan of projects like sqlc[1] which will take SQL that you write, and generate the code for reading/writing that data/objects into and out of the database. It gives you number 3 without any of the magic from number 1.
Think a simple UI switch to sort some result either ascending or descending, which will require you format either an `ASC` or a `DESC` in your SQL string.
The moment you build SQL with string formatting is the moment you're rewriting the SQL formatter from an ORM, meeting plenty of opportunities to shoot yourself in the foot.
If you want to go full-blown SQL you can use something like PostGraphile, which allows you to create all of your business entities as tables or views and then write all your business logic as SQL stored procedures, which get automatically translated into a GraphQL API for your clients to consume, but once you move beyond basic CRUD operations and reporting it becomes incredibly difficult to work with since there aren't really any good IDEs that help you manage and navigate huge pure-SQL code bases.
If you're really dead set against using a powerful ORM, it's probably still a good idea to find and use a lightweight one--something that handles the tedious CRUD operations between your tables and objects, but lets you break out and write your own raw queries when you need to do something more complex. I think there's a sweet spot between writing every line of SQL your application executes and having an ORM take care of boilerplate for you that will probably be different in every case but will never be 100% at one end or the other.
I mean you can just write SQL instead of using the ORM if your project consists of a single table with no indexes that will never change, sure.
There are a multitude of extra things to consider, but none of those things are, in my opinion, imperative to having success with SQL in Python. Will it be hard to achieve the same level of convenience that modern ORMs provide? Absolutely. But there is always a cost.
I firmly believe that for most projects (especially in the age of "services"), an approach like this is very much good enough. Also, a great way to onboard new developers and present both SQL and simple abstractions that can be applied to many other areas of building software.
Python has built-in support for SQLite in the standard library: https://docs.python.org/3/library/sqlite3.html
If you don't know what you're doing on the DB-app interface, you're still better off with an ORM most of the time. If you don't know if you know, you don't know (especially if you think you know but details are fuzzy); please go read sqlalchemy docs, no, skimming doesn't count.
If you know what you're doing but are new to Python, use sqlalchemy.core.
PS. zzzeek is a low-key god-tier hacker.
I feel like Django ORM and SQLAlchemy are the de-facto ORMs for Python and have been around for over a decade. If anything I'd recommend juniors to pick one of these over hand rolling their own solution because it's so ubiquitous in the ecosystem.
I've been burned countless times by Hibernate (and consorts) and now I argue in favour of plain SQL wherever I can. I do not imply that Hibernate is in itself bad, I just have collected many years of observations about projects built upon it, and they all had similar problems regarding tech debt and difficult maintenance, and most of them sooner or later ran into situations where Hibernate had to be worked around in very ugly ways.
Yes, I can understand some of the arguments for ORMs, especially when you get a lot of functionality automagically à la Spring Boot repositories.
And since nowadays I have more influence, I do advocate for plain SQL or - the middle ground - projects like jOOQ, but without code generation, without magic, just for type safety. We've been quite happy with this approach for a very large rewrite that is now being used productively with success.
My vision of the "Just Write SQL" paradigm would be a "class" or equivalent that would take a SQL command and return the response from the server. Obviously the response has a few different forms but if you're "just writing SQL" then those responses are database responses and not models or collections of models.
(For the record I think simple ORM type functionality is actually quite useful as your use case moves past the scale of small utility scripts.
One major benefit of stored procedures, in addition to separation of concerns, is that you can declare them SECURITY DEFINER and give them access to tables the Python process doesn't (in a way reminiscent of setuid), thus improving the security posture dramatically.
One example: you could have a procedure authenticate(login, password) that has access to the table of users and (hashed) passwords, but if the Python app server is compromised it doesn't have access to even the hashed passwords or even the ability to enumerate users of the system.
If you're going to model your data this way... you might as well use an ORM.
A better way is to just just write SQL (or datalog) and model our data, from DB all the way up to the application, as relations rather than objects.
Rather than re-hash, this idea has previously been discussed on HN here: https://news.ycombinator.com/item?id=34948816
Coleifer does not get enough credit IMHO:
Peewee has been solid since I began using it a decade ago. Coleifer's stewardship is hard to see at once, but I've interacted with him numerous times back then and the software reflects the mindset of its creator.
Sometimes I find it impossible to believe that @coleifer is just one person. Peewee has 2300 issues and 500 PR's none of which are open and outstanding, and almost all of which he has personally responded too in a genuinely helpful way. He pipes up on Stackoverflow for peewee questions too.
Also, I'm not one of those people who dislike easy things (and will often whine about JS or Python existing). I'm all for ease and focusing on the business goals. It's just that ORMs and bad schema design will make things harder.
But projects (almost always) grow, and once you need to start conditionally adding filter clauses, conditionally adding joins, things start to get very weird very fast. And no, letting the database connector library do the quoting for you won't save you from SQL injection attacks unless you're just using it to substitute primitive values.
And once all your logic is having to spend more space dealing with conditional string formatting, the clarity of what the query is actually trying to do is long gone.
I'll refrain from digging up the piece of code where I was having to get the escaping correct for a field that was embedded SQL-in-SQL-in-SQL-in-go. And I could hear the echos of the original author's "YAGNI"s haunting me.
Currently 50% of my anxiety when it comes to my bot is related to database matters. There's no type checking so I gotta be careful when writing them, and stuff might blow up weirdly at runtime. Refactoring tables is also a major pain, or at least was until I (sort of) figured out a 'routine' of how to do it.
It's doable, and I assume that teaching myself some basic SQL and using it in production was a great learning experience, but once I'd reached the point where I was inventing database migration tooling from first principles and considering how to implement that, I realized that I probably just want to look at SQLAlchemy again.
Putting it all together is another blog post. And if you have colleagues: probably needs documentation. Which you also have to maintain yourself.
Maybe there's some professional trauma at work, as many of us have been traumatized by shitty databases and shitty code working with them alike, ORM or not. But ORMs do come and go, promise bliss, deliver diddly, and I'm reading the same stuff I've been reading in 2008, as if nothing ever changed since.
Django ORM is amazing for Schema management and migrations, but I dislike their query interfaces (using "filter" instead of "where"). I really like Rails AR way of lightly wrapping SQL, with a almost 1-1, and similar names, but does not have a migration manager - and there is always the chance that your schema and your code will diverge.
If I would get a Schema / migration manager, that would allow to do type checks and that would work well with a language server for autocompletes, but use SQL or a very very thin wrapper around SQL, that would be my Goldilock solution.
By far the best database toolkit (ORM, query builder, migration engine) I have seen for any programming language.
With this package, you write the SQL inside SQL files so you can benefit from syntax highlighting, auto formatting, static analysis, etc. At the difference of writing strings of SQL inside Python files. I'm surprised this is not more popular.
Unfortunately it depends on a specific version of SQLAlchemy and won't run with the latest version.
SELECT *
FROM shoes
WHERE (CASE WHEN :brand_id IS NOT NULL THEN brand_id = :brand_id ELSE TRUE END)
AND (CASE WHEN :size IS NOT NULL THEN size = :size ELSE TRUE END)
AND (CASE WHEN :style IS NOT NULL THEN style = :style ELSE TRUE END)I have seen people write their own custom crazy version of GraphQL ("I've created a JSONified way of fetching only some fields from an API call) over ego or just ignorance. It's never a good path.
Why bother moving away from SQLAlchemy, which will do all of that for you in a simplified, type-checked and portable way? SQLAlchemy is literally one of the best ORMs out there, it's ease of use and maintainability is insane.
People that complain about ORMs might have never really used SQLAlchemy. It is that good. I'm a fan and zzzeek is huge force behind why it is so good.
And as always, if you need an escape hatch, you can use raw sql for that ONE sql statement that the ORM is giving you grief for.
ORMs are, for the most part, good at the CRUD operations - that is to say, they easily translate SELECT, UPDATE, INSERT and DELETE operations between conventional class objects and database rows.
Things they usually aren't very good at are when you start trying to do things that require a lot of optimization - it's very easy to have an ORM accidentally retrieve way more data than you need or to have it access a bunch of foreignkey data too many times (in Django you can thankfully preload the latter by specifying it in a queryset). That's less an issue for basic CRUD, but is an issue if you're doing say, mass calculation and only need one column and none of the foreignkey data for speed reasons.
Basically - an ORM is good but don't let yourself feel suffocated by it. If it's not a good fit for an ORM, then don't do it in the ORM, either use SQL code to do it in the DB server or do a simpler SELECT (in the ORM) and do the complex operation in your regular application before INSERTing it back in the db (if that's a goal for the operation anyway). If it's outside of the CRUD types of DB access already, the extra maintenance overhead you get from having non-ORM database code (if you're doing the SQL approach) in the application would be there anyway, you'd just get a very slow application instead of a hard error, and the latter is easier to troubleshoot (and often fix), while with the former you need to start pulling up profiling tools.
I find this ends up being, like, 1 or 2% of queries. It's also very hard if not impossible to guess which queries will end up in that group.
You're better off building it with the ORM first and breaking out SQL later when you are trying to performance optimize.
There is also a small % of queries which use some feature of your database engine which the ORM won't support.
You end up writing a query per usecase, rather than writing generic queries that can be stitched together however in the business logic
It's infuriating that this is still not a thing you can disable (https://code.djangoproject.com/ticket/30874). Pretty much my only gripe with the Django ORM which I'm a huge fan of (and I also write lots of SQL).
Having written the sort of SQL-inline code the author talks about, then refactored the whole thing to use Django: Django's ORM solves waaaay more problems than it creates in this regard.
While this separation isn't common in the Django ecosystem, it is very common in enterprise application design (regardless of usage of an ORM). On complex applications, Django models are often a leaky abstraction (not only because the mentioned resource connection problem, but also issues like for relations they require the inclusion of the target model, it cannot be lazy-loaded; a good example is a nullable foreign key to an optional module that may or may not be loaded), and they actually behave like a variation of the ActiveRecord pattern, that mixes two different scopes - data and operation on data. In many cases this is ok, but in many others this is a problem.
I personally use a repository pattern, coupled with a query builder and something vaguely similar to dataclasses (its a bit more complex in the sense that the data object attribute name can be different from the database field name). It is basically an object mapper with a non-related repository class.
I somewhat disagree about your point on caching. If you're working with models (that, namely, are 1:1 with DB rows) stale object problems are a reality no matter what, and having the ID be put into a pure data object generates the same issues. But these are things that are not very interesting to discuss outside of specific contexts.
I am a bit of a functional programming nerd, but I've just found that for Python stuff in particular, swimming upstream is its own bug generator relative to writing concise stuff in a very imperative fashion. Using the fat models directly is a part of that calculus for me, but YMMV and every team has different strengths.
One successful pattern I’ve seen treats the database as its own service layer. Service code does not send arbitrary SQL to the database—instead, all of the SQL queries are set as stored procedures in the DB. People sometimes freak out when I say this, so I should clarify that the stored procedures, table schemas and other things of that nature were version controlled and deployed with some minimal build tooling we’d developed in house.
I really liked this pattern. I think anything that you need to talk to across a network should be treated as a service in itself, with a well defined interface. This simplifies testing and monitoring as well. The big risk with an ORM, architecturally, is that you end up treating your database as a sidekick to your service code, or even a dumb data store—some shameful implementation detail your service keeps locked in the basement—when they’re capable of much more than that.
Granted, this problem goes away if you are building a web app or a REST API, but if I just want an ORM for a command line application, I am using django's management command functionality which is OK, but it doesn't really scale easily.
For example, I'm working on an project now that long ago added a "sellable things" store that used plain sql. There are many, many stores like this one, but it did some logic to figure out what items are sellable and return the set. Easy, developer happy, ticket closed.
Some time later, it was needed to have "sellable items of a specific type." Well the "sellable things" store was too much to clone so the developer simply pulled all the sellables and filtered in memory. Hey it's Go so it's fast right?
This continued for a couple years and now I'm joining a project with a p99 of >15s. It would have been a natural fit to return a query set of sellable things and the other callers could further refine it however they wanted. Now I'm looking at a ball of logic that should have been in the database and it's beginning to break down at scale.
This article is just that pattern with syntax sugar. It will lead to sadness.
I guess this is a good thing, as "reimplementing" Django's ORM is the opposite of what I wanted to do here :)
> ORM queries compose. That's why [Python] programmers prefers them. You can create a QuerySet in Django, and then later add a filter, and then later another filter, and then later take a slice (pagination). This is hugely important for maintainable and composable code.
I don't really disagree, but there are many ways to skin a cat. You can absolutely write maintainable code taking this approach. In fact, I can build highly testable, unit, functional, code following a abstraction very similar to this. The idea that "maintainable and composable code" can only be achieved by having a very opinionated approach to interacting with a database, is flimsy. I offer a contrary point of view: With the Django ORM, you are completely locked in to Django. You build around the framework, the framework never bends to your will. My approach is flexible enough to be used in a Django project, a flask project, a non web dev project, any scenario really. I want complete isolation in my business logic, which is what I try to convey just before my conclusion.
> With the Django ORM, you are completely locked in to Django
Another bit of nonsense again. You have a dependency. Sure. Just like you have a dependency on Python. But it's an open source dependency, and the ORM part is a tiny part that you can just copy paste into your own code base if you want.
Also, worrying about being "locked into" something that you depend on is madness. Where does it end? Do you worry about being "locked into" Python? Of course not.
> You build around the framework, the framework never bends to your will.
You don't actually seem to understand Django at all. It's just a few tiny Python libraries grouped together: an ORM, request/response stuff, routing, templates, forms. That's it. You do NOT need to follow the conventions. You can put all your views in urls.py. You can not use urls.py at all.
You do NOT bend to the frameworks will. That's just false. You bend to it by your own accord, don't blame anyone else on your choice.
The "ORM" part of Django seems to be everything in `django.db.models.Model`, which seems to require you to declare your Models as subclasses of the aforementioned class. Looking into that code though, it seems like the implementation supporting all this is around ~20,000 lines of Python: https://github.com/django/django/tree/main/django/db/models
That doesn't strike me as a super lightweight. For comparison, all of Flask (a Python WSGI web app framework, but mostly a 10+ year old project to compare to, and excluding tests) is ~4,000 lines of Python.
Is there a small subsection of the code in `django/db/models/` that is all that's necessary to use the ORM part? Or maybe I'm missing something about the "core" of the ORM?
> For comparison, all of Flask
That's... not a reasonable comparison. Flask does basically nothing. Of course it's small. And it does something totally different so why compare?
I used to be pretty anti-ORM myself because I loathed the complexity of ActiveRecord (in the Rails world), but then I discovered arel, the nice composable relational query builder underneath, and saw the light. A composable layer of abstraction over SQL is critical in an application. (I still prefer raw SQL for analytical queries.)
Elaborations on this approach:
- https://news.ycombinator.com/item?id=34948816
- https://github.com/papers-we-love/papers-we-love/blob/main/d...
I was never more productive than when using Access (and dBase II before that). Why can't we have that?
My theory: Something was lost in the jump from workgroup computing to client/server.
Imagine if Access was rebuilt on top of a client/server stack. That's kind of what Riffle is trying to do.
I've been (slowly) working on the persistence stuff. It (mostly) moots the SQL vs ORM vs query builder slap-fight.
I've got some notions (and POCs) about UI, mediated via HTTP & HTML.
I'd love to have some CRDT-like smarts; learning more is on my TODO list.
I'm still thinking about the "reactive" part. I haven't imagined anything past Access VBA style programming. I'm struggling to envision a FRP-meets-CRUD future perfect world.
Using stored procedures and triggers as much as possible.
Yup. Use of HQL/JPQL is proof that you shouldn't be using an ORM. Just use SQL instead. And once you have some SQL, it's just easier (overall) to do all SQL.
I used to think this, but at my last company we ended up rewriting all these queries to use conditional string formatting as we found it much more readable. The key was having named parameter binding for that string, so you didn't have to worry about matching up position arguments. That along with JavaScripts template string interpolation actually made the string-formatted version pretty nice to work with.
For beginners, the Elixir language constructs can be a little clunky, but once you get it, it's so productive and I miss that productivity when doing more advanced queries in other languages.
[0]: https://hexdocs.pm/ecto/Ecto.Query.html [1]: https://hexdocs.pm/ecto/Ecto.Query.html#module-fragments [2]: https://hexdocs.pm/ecto_sql/Ecto.Adapters.SQL.html#query/4
Sadly, I'm not aware of any good solutions to this. SQLAlchemy Core can build an operates on an AST, but it doesn't parse raw SQL into a query (so one has to write their queries in Python, not SQL). Some parser libraries I've seen were able to parse the query but didn't have much in terms of manipulations and compiling it back.
This means that you will need extra tooling, and if you're DIYing you will need to write it yourself.
What I've experienced (unfortunately) across multiple projects is that people who understand databases will write SQL with a nice collection of helper and wrapper functions as needed, and the people that think that databases are mysterious black boxes will reach for ORM. I've seen the ORM-happy teams getting scared at the idea of a million (1,000,000!) rows in a table, and they always neglect to set up even basic indexes or to think through what their JOINs are really doing.
YMMV but that's the pattern I see again and again.
My preferred ORM is Doctrine and it provides all of these features. It has its own variant of SQL called DQL that lets you effectively write complex select queries as raw SQL with a bunch of object-specific conveniences built in, and get back an array of objects.
You're conflating ORM with people who know nothing about databases. Why?
my biggest gripe with sqlalchemy is that sometimes I know what I need to write in SQL (have a working prototype usually) and have trouble mapping the concept to sqlalchemy.core constructs, but that's mostly inexperience.
https://en.wikipedia.org/wiki/Active_record_pattern
It was fashionable for a while to say it was an anti-pattern because that was a contrary view and ActiveRecord is very tied into building Rails applications.
For that project I was able to put SQL DDL+DML+stored procedures in version control, create/run stored procedure (TDD even) unit/integration tests on mock data against other stored procedures, had pass-fail testing/deployment in my CICD tool right alongside native app code, and did some rollback support (although that was more trouble than I think it was worth), all using Liquibase change sets (+git+Jenkins). Flyway could also have worked but we were using Liquibase.
It did require some creative thinking to apply changesets and preconditions and post conditions and ability to have stored procedures execute and read results of other stored procedures.
Last I checked, the system had been used over many years to process/evaluate $50 billion in order transactions.
An app with 30 endpoints that makes 100s of queries might need it.
In this case, it's a matter of not knowing whether you need it or even what iteration of it you need. Different ORMs have different tradeoffs, you don't even know which tradeoffs you want to make.
It's hard to be sure that CRUD is all you need. Maybe you'll be fine, maybe you'll get stuck between keeping a slow DB or rewriting everything (I've seen this movie several times). It doesn't cost much to go proper relational from the start, especially cause that involves the least tooling/boilerplate. And if you're really sure you only want objects, NoSQL DBMSes are actually designed for that.
I think I’m also just cranky over the general trend of junior devs who know a “framework” but none of the underlying concepts or code and can’t cope when their “framework” doesn’t…work.
True, that's why I built mine as a library I can reuse in my projects :) I'm still eating my own dogfood, but doing it with a framework approach.
> If you're working with models (that, namely, are 1:1 with DB rows) stale object problems
One of my common patterns is to implement cache at the service layer, not the data layer - and all data operations are performed via services. This allows caching of actual business-domain computed values, not necessarily just db rows (in fact, more often than not, caching just db rows is just a waste of memory with little to no advantage). As a quick example, imagine a purchase order with a header, a list of products and a state associated with each line - it is trivial to cache the whole purchase order info, including runtime calculations such as lead time per product, and invalidate the cache at each update operation on the different tables that may represent this purchase order. Services would have methods manipulating "purchase order" (and keeping cache state) and not ad-hoc code messing with OrderHeaderModel, OrderDetailModel, OrderProductStatusModel, etc.
Nobody is saying only use an ORM, but the anti-ORM crowd seem to think that there’s only one true way to do things.
I don’t know why so many programmers think purely in binary “good or bad”. Different projects may have different requirements.
You find ORMs useful, I do not, they aren't mutually exclusive.
Also, An ORM is inappropriate for certain forms of inter-team collaboration. Data scientists often prepare SQL that we need to integrate into our services. Asking all at a company to learn a language specific ORM isn't practical. SQL is, for better and worse, an important common tool.
My most recent experience of needing to hand-craft queries because SQLAlchemy was doing something truly hilarious was on 1.4.40-something from 2022 (I think possibly .41 or .42?).
I don't have the specifics in front of me as it was for a prior job but rough outline of what I observed:
A simple select including several related tables with filters on each table resulted in an individual subquery for each *filter* being applied. The resulting SQL was incredibly large, selected data in a poor way, and when replaced with a simplified single query that let the DB do its thing sped up about 100x.
Those days may be gone now with 2.0 but I don't consider barely a year ago "long gone".
I think there's a great place for Create/Update/Delete, and generally when doing trivial Reads. But the lesson I've learned with basically every ORM is that once a third table becomes involved you probably want to write your own SQL because they'll break in some weird way.
This is essentially how I use SQLAlchemy in current projects and it's been great in that regard.
stmt = select(...)
stmt = stmt.where(stmt.c.foo == 'bar')
stmt = stmt.where(stmt.c.bar == 'xyz')
That above pattern is one I've seen people do even recently, using the "select().c" attribute which from very early versions of SQLAlchemy is defined as "the columns from a subquery of the SELECT" ; this usage began raising deprecation warnings in 1.4 and should be fully removed by 2.1 as it was a remnant of a much earlier version of SQLAlchemy. it will do exactly as you say, "make a subquery for each filter condition" and will produce disastrous queries. people have started tripping over it recently as a result of 1.4 / 2.0's move away from the ORM `Query` object back towards `select()`.the moment you see SQLAlchemy doing something you see that seems "asinine", send an example to https://github.com/sqlalchemy/sqlalchemy/discussions and I will clarify what's going on, correct the usage so that the query you have is what you expect, and quite often we will add new warnings or documentation when we see people doing things we didn't anticipate (such as in this case, I thought we had removed .c by 2.0 but apparently it's still just deprecated. it will be gone in 2.1).
I made my point about this but will repeat, I am generally writing validation for this, adding the code that converts this into known types isn't significantly more work, if I was never writing the validation this point would make sense. Also the ORM the parent comment is an author of doesn't do this, it generates a class with a ton of ancillary ORM specific functions that I don't care for and many ORMs do that.
I don't even agree with what the article states, the article is effectively creating a custom ORM, I'm saying write a function that converts the data from your database driver to the type of thing you want, write a function that validates that data, those two can be 1 function for simple data but likely will be 1 function which is a composition of other functions.
When you get to the point where that is tedious you now likely have a good idea of what you want in an ORM and are significantly better equipped to make an informed decision for your specific project, you might find you end up only wanting a query builder.
Regarding abstraction over database drivers, well how often do you need an abstraction over database drivers, are you really using two or more different databases in a single project?
My ideal workflow is: 1. Query exactly what I want 2. Make it valid data in the form I want
That valid data shouldn't have a ton of extra stuff attached to it, I want plain old data that is validated and the correct type.
Why would I add a dependency that is complex before I can make an informed decision that I would even need it?
Extending on this, you actually shouldn't even use different databases. There are bugs and quirks between different databases on things like string matching :) Having a multiple database project is often a nightmare, as you need to track all of these small differences and nuances that may affect the way your system works.
It's a fully featured ORM... Including migrations, query API (which is HIGHLY opinionated, it looks nothing like SQL), supports async (via asgiref!), custom model definition... It's almost the definition of opinionated. Not that you can build a fully featured ORM without being opinionated. That's not a dig at Django btw.
> Also, worrying about being "locked into" something that you depend on is madness. Where does it end? Do you worry about being "locked into" Python? Of course not.
Ermm, my premise is that you DON'T have to depend on it. It's not that crazy to not want lock in when it comes to the software that handles my database. Other programming language communities seem to handle that just fine.
The rest is a bit too ad hominem for my liking, so I'll pass.
In ORMS like Django, models are defined as code-first, not schema-first. Yeah, you can use inspectdb, but in any sufficiently complex application, odds are you need to add to the generated models any custom behaviors you already implemented, and verify all the names and whatnot, because data definition and operations on data are actually mixed in the same class. More often than not, if the change is profound (eg. imagine switching from reading a User model from the database to fetch it from an external service), you may have to refactor a large portion of your code due to the way it interacts with the model - eg. search operations won't be proxied via orm, but by using external service endpoints, etc. There is no free lunch. And don't even get me started on field names that differ on the database.
I've never used access and have 0 familiarity with it. Are there any examples I could look at?
It scales badly with table size, I think by design. That's why SQLAlchemy's and Hibernate's Data Mapper pattern is slightly more cumbersome to write, but works out much better.
I just can’t wrap my head around why you even *need* a database in those cases.
In my recent projects that's essentially what I do - I use SQLAlchemy for very basic CRUD and then anything analytical gets handcrafted.
My point is different than your takeaway. In my point I’m not blaming the technology, I’m blaming the people involved with using it in production.
I specifically point out that a well versed engineer can write a SQL based system well. An ORM just means I can diffuse that responsibility over multiple people more reliably.
I find it interesting that you say that; my takes is that it's the other way around!
SQL join statements look the same no matter what programming language the reader is used to, but each ORM differs in the way the join looks to the reader. The EF method of filtering your results set in C# looks very different indeed to how the ORM for Python would do it.
Every ORM looks different, which results in a very inconsistent experience for people, especially when you bring in a DB expert to figure out why there's a slowdown somewhere, and he cannot just visually inspect the EF/SQLAlchemy code and say "Well, here's what's wrong".
DB experts can usually very easily do that just by looking at the SQL, no matter what programming language was used.
Often those may be very competing goals , where nativeness to the primary language used by the team might be more important than the nativeness of the expert.
Funny how those anecdotal bits never get published, eh? Its always "I shaved my db requirements in half by actually understanding what was going on because my orm is magic". Odd, innit?
Have you ever managed 1200 "live" (in-development) tables on a SqlAlchemy app? That is impressive. Not only because of the amount of work that entices, but also because you can just read the schema and provide a dynamic object representation of it (think dataclasses, but based on a table schema). Also, if you're querying 1200 tables, python might not be such a good idea. Impedance mismatch at a language level.
> boilerplate code for 1200 classes, yes that's kind of ORMish
Yes, please do explain how a Python ORM can save you from having 1200 classes you need to maintain, plus the need for specialized queries.
I totally remember this time. It was around 2002~2005.
I started off with writing a discussion forum and a calendar booking webapp for my high school in PHP + MySQL. After working on it for a year or so, I realized I was basically re-typing a lot of SQL and boilerplate code, which repeated no matter whether I was working on a Post object, a User object, or a Event object. I almost wrote a half-assed ORM in PHP.
I then got an internship job, and they had a simple Java webapp project. Hibernate. In retrospect it was a huge pain, and we could probably have completed the project in 30% of the time if we used something more effective (but then, web frameworks weren't a thing in 2004), but still Hibernate was probably better than writing raw SQL for our cases.
Then later, at the same company, they had another webapp project in Perl. I don't remember the name of the ORM but it was kind of "magic" - as you said, overly simplistic, everything was implicit but seems to worked.
Shortly after I encountered Django, which seems to have solved all my problems with ORMs and I been happy with the way it works for the almost 20 years of using it.
To me the utility of ORMs is just obvious once you've been through writing CRUD without ORMs on a deadline, and tried anything that's not called Hibernate.
I'm not sure why there are so many people who apparently develops or maintains a codebase that deals with only a table or two ranting about ORMs. To me this isn't an issue with ORMs at all, but rather the curious phenomenon in the industry there is extreme specialization to the point that businesses can afford to put a person (or even a team), full time, working on a small part of the business only involving one table or two. (Can't say I'm not in this situation in my professional work today either!)
To those who say don't use ORMs -- try implementing HN in a week, without an ORM. It can probably be done either way, but with an ORM you end up with 50% less code and probably less sore fingers.
Not all of us work with prototypes someone will rewrite later on. Design a high traffic, low budget site with Django, and you will quickly understand the trade offs. It's not all about less coding effort; in most dev positions that matter, your work is cheap compared to the runtime cost - eg. optimizing something to take 10% less has the potential to shave multiple times your salary off the global cost pool.
I tend to use the ORM function CUD operations, and just write raw SQL for SELECTs unless they're super-simple.
> It's also very hard if not impossible to guess which queries will end up in that group. > You're better off building it with the ORM first and breaking out SQL later when you are trying to performance optimize.
I disagree on this. It's almost no extra work to just write these optimised in the first place (if you're not trying to squash everything into an ORM workflow). So it makes sense just to write them all optimised. Ditto for doing batch inserts and updates rather than looped inserts/updates (although you can usually use the ORM for this).
Not everyone is capable of quickly optimizing SQL, and I don't think it's an absolutely necessary skill to build a decent application. Junior devs can pick up on this skill over time and as long as they can manage to avoid any obvious footguns, using the ORM is fine most of the time.
Writing queries that are optimized in the first place just means now you have to maintain a bunch of SQL and you have to rely on anyone modifying that SQL later understanding those optimizations. Sometimes it's necessary, but if it's not, I think it's much nicer to stay in ORM-land even if the query might not be optimal.
I mean that's true, but equally not everyone is capable of using an ORM. I don't think SQL is inherently any harder to learn.
At my last job, I had juniors who had never used SQL at all productive in SQL within a couple of weeks, and using "complex" SQL like JSON aggregation and windows function with a few months. They were a little intimidated by it when they started, but didn't find it too hard to learn in the end.
Using a case-sensitive filter (default for Django) in a DB with case-sensitive collation (default in Postgres)? Django will helpfully cast the tuple and your query to UPPER to match it for you, and the former wrecks indexing.
Checking if a string ends with something else? Goodbye, index.
I _think_ the latter can be worked around in PG with a GIN index, but I’m not positive (I work with MySQL much more). And in any case, you’d have to know to create that, and I imagine most devs won’t.
Fixing seemingly tiny things like that have a massive impact on large table query speed.
What is it that you do here that can't be handled by, say, django's workhorses - filter and select_related?
If it's impossible to write 90% of your queries in an ORM my suspicion would be that you're either not using the ORM correctly or you're using a crappy ORM.
- Complex joins
- Complex WHERE clauses with mixes of AND and OR (with parentheses)
- JSON aggregation
- Window functions
tend to require quite heavyweight syntax in ORMs (e.g. nested lambda functions). Whereas the corresponding SQL tends to introduce much less noise.
It's basically just another case of a dedicated language being nicer to use than a DSL embedded into a general purpose language. Normally it's not worth creating a whole language just for nicer syntax, but in the case of SQL the language already exists! So why not use it.
That adds up, with almost no downside most of the time.
This approach is more bottom up. You end up with uni directional data flow, better separation of concerns, data coupling instead of object dependencies and better performance right out of the bat.
The cost? In my experience just some basic familiarity with SQL.
sortable_fields = ["name", "age", "gpa"]
selected_filter = sortable_fields[form.filterIndex]
if form.sortBy == "asc":
query += "ORDER BY {} ASC"
elif form.sortBy == "desc":
query += "ORDER BY {} DESC"
Doesn't have any opportunity for SQL injection unless you have rogue programmers able to change code running in prod.Nope, I'm generally interpolating an inline expression consisting entirely of string literals.
The added complexity has always been a non-issue for me. Even a simple `sql.py` next to `models.py` works really well, for a pretty straightforward solution.
I wasn't saying that you couldn't see the raw SQL, I'm saying that the ORM syntax does not result in a consistent experience for everyone.
If you’re a db expert that’s doing drive-by optimisation of queries in sqla then you can print them to see what’s going on - is that not broadly what you’re concerned with in your original comment?
ORMs by their nature tend to be built around a 1:1 mapping between fields on some object type and columns in some table. Bulk queries get you multiple objects corresponding to multiple rows. Relationships get you multiple objects with some of the fields being references to the other objects. Obviously I'm simplifying here and there have also been some attempts to do things in other ways but this is basically how most of the popular ORMs work today.
However in reality a lot of useful queries return a list of flat data structures or even just a single flat data structure with some subset of the columns of all of the relevant tables and maybe a few extra columns that are calculated on demand and not stored directly in any database table. If that's the data I've read then what I really want is something like a properly-typed dataclass with exactly those fields/columns and nothing else that might add confusion or ambiguity.
Unfortunately that doesn't really fit the classic ORM and OO model. Instead we often have to work with multiple objects with some form of nesting to follow the relationships, ambiguity about which fields have actually been read from the database and can safely be accessed, possibly some inaccuracy with the types such as nullable fields that have just been read from not null columns in the database, and a lottery to see what happens if we try to access fields on those objects that might not have been read by any previous database query anywhere in the system at any point since that particular ORM-backed object was created.
I find it's one of those things where the popular approach - using an ORM in this case - works for relatively simple needs and in practice a lot of work does only have relatively simple needs so that's OK. But when I start doing more complicated things it can become a pain to work with because the whole model fundamentally doesn't fit what I'm actually doing.
Don’t get me wrong, 1C products are regular enterprise crap on top of shitty language that stuck in the last century. The latest attempt to refit it as-is to www was a paradigmal disaster. But the platform (the runtime) itself may teach Django a volume or two about query integration. They do it since the '90s and 1C developers who traditionally weren’t even considered developers had no issues with programming these systems without any deep stack knowledge. There’s no stack basically, it’s all homogeneous once you learn the fundamentals.
Edit: yes, I find it very strange too. There’s no popular/generic and simple open source platform which could bind it all together into a nice runtime. The whole ORM vs SQL and pitfalls feels so strange, as it’s not something hard in my book.
Basically you call `.raw("...")` from some model's queryset, but there's no requirement you actually query that model's table at all.
class SomeModel:
name = models.CharField()
class OtherModel:
foobar = models.CharField()
SomeModel.objects.raw("select foobar as name from thisapp_othermodel")
will yield `SomeModel`s with `name`s that are actual `OtherModel` `foobar` values.Although the worse offenders by far are those which decide ORM = bad and bypass it at every opportunity.
Why stop there? Let's ban RDBMS. If you can't contemplate a binary file structure and index strategy perfectly tailored to your application's needs ahead of using an RDBMS, why should we deign to let you use an abstraction layer with clever query planning and algorithms?
It's all too easy to morally 'ban' people from technology and gatekeep it behind wishy-washy nonsense like "ORMs are bad for beginners."
This is the problem with not using an ORM. If you cut it out and move everything to parameterized SQL queries the SLOC explodes which massively inhibits readability as well as introducing bugs.
If your issue with ORMs is just that you're familiar with SQL and you don't like how ORMs look then I think the issue is just about becoming more familiar with a decent ORM.
My experience has been the opposite: that raw SQL queries end up much shorter (and consequently more readable) than the equivalent ORM code. The exception to that is INSERT/UPDATE queries, where I do tend to use some kind of ORM/query builder. I have used both, and I prefer raw SQL for anything beyond very simple queries.
Dog.objects.filter(owner__city__name = 'New York')
The double-underscore follows the foreign keys until the last one, which is a field on the last model.Also you need to return an "id" column since Django needs a primary key.
On the flipside, you can put that query in the database as a VIEW and point the model at it, also with "managed = False".
IMO:
.where('column_a', '=', 'value1')
.and(q => q.isNull('column_b').orWhere('column_b', '=', 'value2')))
is a lot less readable than: WHERE
column_a = 'value1'
AND (column_b IS NULL OR column_b = 'value2')And obviously you can use whatever indentations you like.
- No keyword arguments
- No operator overloading (so you can't override | to get the nice "or" syntax)
In JS, theoretically you could design an API like
.where({column_a: 1}, Q(column_b__isnull=True).or({column_b: 2}))
Which really isn't bad IMO
filter( Q(column_b__isnull=True)|Q(column_b='value2'),
column_a='value1',
)
Or just turn it into another Q .filter(Q(column_a='value1'), Q(column_b__isnull=True)|Q(column_b='value2')) .filter(Q(column_a='value1') & (Q(column_b__isnull=True) | Q(column_b='value2')))