PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data(postgresql.org) |
PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data(postgresql.org) |
the very first example points to BRIN indexes resulting in smaller index than btree but with much longer search time... so i guess the 5% time figure was very use-case specific?
We do over 20,000 queries per second on one of our production mysql DB's and I'm not sure I'd trust anything else with that: http://i.imgur.com/sLZzXhS.png
Just curious if I'm missing out on some new awesomeness that PostgreSQL has or if it's just marketing.
In a word: correctness.
Yes, MySQL has an UPSERT implementation. Like so many things MySQL rushed out the door, it's also buggy and unpredictable. Did you know UPSERTing into a MySQL table with multiple unique indexes can result in duplicate records? Did you know MySQL's ON CONFLICT IGNORE will insert records that violate other not-NULL constraints? [1]
I've used both MySQL and PostgreSQL for over a decade, and working around the many MANY misbehaviors and surprises in MySQL requires continuous dev effort. PostgreSQL on the other hand is correct, unsurprising, and just as performant these days.
MySQL is what happens when you build a database out of pure WAT [2].
[1] https://wiki.postgresql.org/wiki/UPSERT#MySQL.27s_INSERT_......
You say "upserting into a mysql table". Which storage engine? MyISAM? InnoDB? I find MySQL to be both reliable and incredible durable i.e. it handles yanking the power cord quite well. The performance also scales up linearly for InnoDB even for very high traffic and concurrency applications.
We use redis, memcached and other storage engines - by no means are we tied to mysql. But for what it does, it does it incredibly well.
I'm also completely open to using PostgreSQL and I was hoping someone could give me a compelling reason to switch to it or to use it.
People like myself use Postgres because it has a much richer feature set. See http://stackoverflow.com/a/5023936/270610 for some examples. Personally I find MySQL beyond frustrating due to its lack of… well almost all of those. Recursive CTEs in particular, but arrays and rich indexing are pretty core too.
Postgres's query optimizer is far more advanced too. MySQL doesn't even optimize across views, which discourages good coding practices.
The documentation is fantastic. Complete and well-written, covers the nuances of every command, expression, and type. MySQL's doesn't hold a candle to it.
Don't know what you mean about "unfriendly". Help is built into the command-line tool, and like I said, the documentation is fantastic. Maybe MySQL is a little more "hand-holdy", but I don't care for such things so I wouldn't know.
Back in the early 2000s, LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine. Unfortunately, the reputation as being slow stuck among developers.
Postgres particularly shines on multicore systems, thanks to some clever internal design choices. Having a sophisticated cost-based query planner also helps.
As for unfriendly: Care to amplify? In my work, I've found the opposite to be true.
For example, the very first thing you tend to encounter as a new developer is "how to create a user". For MySQL, it turns out that using GRANT to grant a permission creates a user, which is counterintuitive; GRANT also sets the password, and promotes the use of cleartext passwords. By comparison, Postgres has "createuser", as well as a full-featured set of ALTER USER commands. The difference between "mysql" and "mysqladmin" is also completely unclear.
The almost complete lack of warts and legacy cruft in Postgres significantly removes the possibility of confusion, uncertainty and information overload. MySQL's manuals are littered with "if X is enabled then this behavior is different, and in versions since 5.7.3.5 this behavior has been changed slightly, and 5.7.3.6 has a bug that silently swallows errors", etc. MySQL's historical date and NULL handling alone is worth a chapter of any book.
Postgres also has a level of strictness above MySQL, which is in itself instructive. You know when you're doing something wrong. Postgres never accepts bad input. It always requires a strictly correct configuration setup.
Plus: Just type \h in psql. It has a complete reference of the entire SQL syntax.
It wasn't just the /. crowd. Back in the 3.5 days the MySQL devs were doing that too, and writing long discourses on why transaction safety was a bunch of crap and a crutch for bad application developers.
- CTE's
- Arrays/JSON type
- partial indexes
- transactional DDL
- NOTIFY
- Materialized views
- Schemas
- PostGIS
- Row level security (which is new in PG)- Arrays, particularly with GIN indexes. This makes things like tagging fantastic in Postgres. Instead of putting your tags in another table, you throw them in an array and you can do all kinds of things like set intersection-like queries.
- JSON. Postgres can store data as JSON and index and query the JSON. This essentially gives you MongoDB type queries.
I'm sure there's more but these are my favourite Postgres features.
Postgres has always taken a more 'solid' approach. One instance that made my jaw drop when I realized it: in the past (has this been fixed?), DDL (alter table, create table, etc...) were not transactional in Mysql. You could get 50% through a series of them, and find your database 100% fucked up.
That said, over the years Mysql has been improving too, for sure.
So honestly, at this point I do think some of it is impressions from the past which are no longer valid. But still, Mysqlhas done/does all sorts of things that defy the spec, convention or just common sense (I don't know if this has been fixed, but at least for many years, April 30th was treated as a valid date, and there was some profound weirdness of which I can't quite recall the details involving locale stuff).
Postgres generally takes the position that data should always be safe first and speedy sometime later. It also assumes the operator understands their tools. That second one means in comparison with Mysql, people think it is unfriendly. It isn't (if you want to see unfriendly, go work with Oracle), it just expects that its friends learn about it. Which is of course good advice when you're dealing with complicated software on which a lot of money tends to ride.
And as the PG devs have said for years, they don't compete with Mysql. They compete with Oracle. There's no reason to switch if you're happy with Mysql.
As far as why you should give up MySQL like I did four years ago: http://grimoire.ca/mysql/choose-something-else
Mmm. Assuming that your underlying disks don't lie, losing only the data that was in-flight to the WAL (or whatever is the equivalent in your DB of choice) is the absolute worst data loss you should see from a real SQL database in that situation. [0]
Think about the case where an error causes the DB software to crash... You really can't make good data robustness guarantees if an unexpected crash endangers more than just the data that's in-flight to the WAL.
I personally have had the disks that back a rather large (but relatively low-update-volume) Postgres DB drop out on multiple occasions and never lost any data at all. This shouldn't be unexpected behavior. :)
> You say "upserting into a mysql table". Which storage engine? MyISAM? InnoDB?
"In addition, an INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe. (Bug #11765650, Bug #58637)" [1]
Given that this is an unqualified assertion, straight from the MySQL docs, it seems safe to say that this is a weakness in MySQL's UPSERT-alike, rather than any limitation of a particular underlying storage engine. (Here are some bug reports that also make unqualified statements about the behavior at issue: [2][3])
> ...I was hoping someone could give me a compelling reason to switch to it or to use it.
shrug Both MySQL and Postgres are capable databases. Postgresql has better documentation, appears to have substantially better internal architecture, and -from what I remember of my early days with MySQL- has far fewer hidden sharp edges than MySQL.
I'm not here to tell you to change what DB you're currently using for your production stuff... I don't think anyone is. For my projects, I have had substantially better experiences with Postgres than MySQL.
EDIT: It occurred to me in the shower that I didn't make "loss of in-flight WAL data" sufficiently clear. Unless you explicitly ask for a mode where writes are acknowledged before the DB believes that they're safe on disk, [4] then the only "data loss" in the scenarios I described in the comment would be data that had been transmitted to the server, but not acknowledged as committed to the DB. So, correctly-written client programs would experience this "data loss" only as a transaction that failed to commit, maybe followed by DB unavailability. Sorry for the ambiguity. :(
[0] http://www.postgresql.org/docs/9.4/static/wal-reliability.ht...
[1] http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.h...
[2] http://bugs.mysql.com/bug.php?id=58637
[3] http://bugs.mysql.com/bug.php?id=72921
[4] I say "believes they're safe" because underlying storage can always lie, and there's not a goddamn thing you can reliably do about it if it does.
I should add that Postgres was the first database I ever used, and I literally learned pretty much everything I know about Postgres, SQL, as well as Relational and Set Logic from the official docs. And that was with no background in software development and an undergraduate business degree with Excel being my most technologically advanced toolset. That is a documentation success story.
With the steady addition of features, it's gotten much more complex, and there will come a time when using just the documentation won't be enough to learn how to use Postgresql to full advantage. With release of 9.5 we might be there now.
Perhaps the logical extension of the documentation is some form of coursework to enable users to learn the DB systematically. I haven't looked into it, this might already be offered.
- http://sequel.jeremyevans.net/documentation.html
- http://sequel.jeremyevans.net/rdoc/files/doc/postgresql_rdoc...
Sure if you already understand PostgreSQL. If you don't then it isn't particularly user friendly. It's awkward to navigate, doesn't explain the basics (how to actually install it) and tends to combine information that is more for advanced users on the same page as beginners.
Compare you're standard with mine (MongoDB):
Er, http://www.postgresql.org/download/ and/or http://www.postgresql.org/docs/current/static/installation.h... ?
> ...tends to combine information that is more for advanced users on the same page as beginners.
There's a tutorial for beginners: http://www.postgresql.org/docs/9.4/static/tutorial.html
Then there's reference material in the rest of the manual. The tutorial even suggests the order in which you should read the rest of the manual. :)
Dang/mods: Can we please please get something to avoid accidentally downvoting good stuff on mobile phones?
I also think that PLv8 should probably make it in the box in the next release or two. With the addition of JSON and Binary JSON data options, having a procedural interface that leverages JS in the box would be a huge win IMHO. Though I know some would be adamantly opposed to this idea.
Eeeh...a simple HA solution can be developed in about a week (I was able to do so on 9.3, and so far, it held it's ground). Also, now with 9.5's pg_rewind you can easily switch back and forth between nodes (http://www.postgresql.org/docs/9.5/static/app-pgrewind.html), simplifying things a great deal. Can't imagine that's 5-6 figures.
I agree that you don't get a Plug&Play-Solution out of the box, but from anecdotal evidence they often don't quite work as advertised anyway (remember 1995? And I'm sure your friendly DBA has some stories to share as well).
If you're going to be hosting your db on something like AWS EC2 anyways, then just buy a db product like AWS RDS, and pay for the HA option. Ends up around the same price as if you'd set up everything yourself (assuming you were going to host on AWS anyways, and not going with a low cost option), and is very easy.
col1 col2 count
---- ---- -----
a b 10
a null 5
null b 5
There's more to it than that obviously, but you can read about them here: http://www.postgresql.org/docs/devel/static/queries-table-ex... (7.2.4. GROUPING SETS, CUBE, and ROLLUP)From web server, connect to db as one user then SET ROLE to the database user. This gives you Column Security and easier auditing as well. See http://stackoverflow.com/questions/2998597/switch-role-after...
What particularly are you looking for in a "getting started" tutorial? Honestly, you should just plunge in, on some side project (or a mirror of whatever projects you've used MySQL on) and just compare.
This is a lot easier to say than do/live by, but I think you shouldn't invest in one tool choice when you haven't given the others a fair shake (once you have enough time to step back and think about your decision).
A few things you will want to look at that are different:
1. data types are much richer and more useful than in mysql
2. transactional DDL means migrations are atomic.
3. schemas are what mysql refers to as databases. Remember to set `search_path`.
4. roles and grants are somewhat more expressive and work differently than in mysql, but not that differently for the simpler use cases
5. database functions ( aka stored procedures ) are awesome as are extension languages.
But they are doing absolutely nothing about my biggest beef with PostgreSQL. Which is that there is absolutely no way to lock in good query plans. It always reserves the right to switch plans on you, and sometimes gives much, much, much worse ones. No other database does this to me. Even MySQL's stupid optimizer can be reliably channeled into specific query plans with the right use of temporary tables and indexes.
This is a problem because improvements don't matter if the query plan is "good enough". But they will care if you screw up. PostgreSQL usually does well, but sometimes screws up spectacularly.
The example that I have been struggling the most often with in the last few months is a logging table that I create summaries from. Normally I only query minutes to hours, but I set it up as a series of SQL statements so I first put the range in a table, and then have happened BETWEEN range_start AND range_end. PostgreSQL really, Really, REALLY wants to decide that the index on the timestamp is a bad idea, and wants to instead do a full table scan. Every time it does, summarization goes from under a second to taking hours.
Hopefully the new BRIN indexes will be understood by the optimizer in a way that makes it happier to use the index. But I'm not optimistic. And if I lean on it harder, I'm sure from past experience that I'll find something else that breaks.
It sounds like it's basically a BTree that stops branching at a certain threshold, but I'm almost certainly wrong.
Obviously single row accesses in a fully cached workload are going to be faster if done via a btree rather than such range maps, even if there's perfect clustering. But the price for having such an index is much lower, allowing you to have many more indexes. Additionally it can even be more efficient to access via BRIN if you access more than one row, due to fewer pages needing to be touched.
Slides are here: http://hlinnaka.iki.fi/presentations/Index-internals-Vienna2...
I'm really, really keen to use 9.5 jsonb with its insert/update changes.
My plan is to rely on these functions for now, and switch to the native implementations once 9.5 is production ready on RDS.
i want to look deeper into this, but didnt have the time but from the little i read, seems ms sql merge is more powerful
The reason PostgreSQL went with this syntax is that the goal was to create a good UPSERT and getting the concurrency considerations right with MERGE is hard (I am not sure of the current status, but when MERGE was new in MS SQL it was unusable for UPSERT) and even when you have done that it would still be cumbersome to use for UPSERT.
EDIT: The huge difference is that PostgreSQL's UPSERT always requires a unique constraint (or PK) to work, while MERGE does not. PostgreSQL relies on the unique constraint to implement the UPSERT logic.
I've used MERGE as an UPSERT using MATCHED/NOT MATCHED and SERIALIZABLE/HOLDLOCK since it was introduced in mssql 2008. It was one of the first features I upgraded my code to use, and it worked out of the box with no issues.
Here's a great post from Postgres team showing why they didn't just implement merge themselves:
http://www.postgresql.org/message-id/CAM3SWZRP0c3g6+aJ=YYDGY...
Edit: I meant apt.postgresql.org of course, not the official Ubuntu repo..
Has anyone come across a guide to using it for upgrades?
But perhaps I also have some practical advice to try.
I had a similar issue: I have a few tables with sensor data, 300-500 million rows, indexed among other things by event type. Some counting queries kept defaulting to full table scans. It turned out that this was because of limited statistics on the distribution of counts by event type.
The default_statistics_target config parameter sets how many entries Postgres keeps in the histogram of possible values per column, the default is 100 I think. Because my event types were not evenly distributed, the less frequent ones were missing from the statistics histogram altogether, and somehow this resulted in bad query plans.
As a fix, I upped the default_statistics_target to 1000, and to set it to 5000 for the biggest tables. Then after a vacuum analyze, the query planner started making sensible choices.
Another thing to try is perhaps reducing the random_page_cost config parameter from it's default of 4.0. On SSDs, random page costs are much closer to 1 than they are 4 (compared to long sequential reads).
My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.
If this happens again, I'll have to try rewriting code to send it queries with hard-coded timestamps, cross fingers and pray. I find prayer quite essential with PostgreSQL sometimes because as ineffective as it is, at times I've got nothing else.
That'd not only make production scenarios more reliable, but it'd also make it much easier to test tweaks to the cost model in practice.
The problem is that that's a rather significant project, and it's hard to get funding for that kind of work. The postgres companies aren't that big, and it's not an all that sexy feature marketing wise.
I also don't think that query hints are a good way to do it. And I don't mind if the way to do it is somewhat cumbersome. This is very much a case where 20% of the work can give 99% of the benefit.
For example what about the following approach?
1. Add an option to EXPLAIN that will cause PostgreSQL's optimizer to spit out multiple plans it considered, with costs, and with a description of the plan that PostgreSQL can easily parse and fit to a query.
2. Add a PLAN command that can be applied to a prepared statement and will set its plan. It is an error to submit a plan that does not match the query.
And now in the rare case where I don't like a query's plan I can:
EXPLAIN PLANS=3 (query);
Pick my desired plan from the list (hopefully)Then in my code I:
PREPARE foo AS (query);
PLAN foo (selected plan);
EXECUTE foo;
And now if I notice that a query performs worse than I think it should, I can make it do what I want it to.The typical solution is to modify the autovacuum settings for that table to recalculate the statistics a lot more often, and maybe even with much higher resolution, depending on your case.
You can also convince it that indices are the way to go by changing more basic settings about costs of reading a random page on disk vs reading sequentially, making full table scans more expensive, but tuning those settings away from realistic costs might have negative side effects for you.
I was able to have great success running complex queries on 100M+ row tables that were insert-only using this kind of trick, but YMMV. If nothing else fails, really experienced people are more than willing to help in the performance mailing list. They sure helped me quite a few times.
Adjusting internal costs is promising, but I'd like to avoid going there exactly because of the possible negative side effects that you mention.
SET enable_seqscan = OFF;
We use these options a lot on tables that result in odd query plans to get them doing the best option..
What scares me is that this is unreliable, and according to the documentation, the optimizer is free to choose to ignore everything that I say whenever it wants. The fact that it already HAS done that to me does not provide me comfort.
"RLS implements true per-row and per-column data access control"
The reason is that the optimizer reorders operations. So, a tricky person can write the query in a way that, for example, throws a divide-by-zero error if someone's account balance is within a certain range, even if they don't have permission to see the balance. Then they can run a few queries to determine the exact balance.
RLS builds on top of something called a "security barrier view" which prevents certain kinds of optimizations that could cause this problem.
It also offers a nicer interface that's easier to manage.
I know applications can be compromised but now you can freely share your DB freely with other teams to analyze or play around
If PostgreSQL had gone the same route as MS SQL I would have expected a similar set of bugs. I suspect all of this have been fixed by now, but I do not follow MS SQL.
Little bobby tables doesn't need the ability to dump your users or credit cards, even if he does need access to all the blog posts.
I have to read up on the feature a bit more, but it sounds a potentially massive win if you build the support into an orm / framework.
Also, why would you sync pg users in ldap? pg can auth against ldap.
I figured out why: To pull roles/groups into the database from ldap.
Maybe you mean just in case they manage to bypass the software restrictions.
Suddently, it's not that scary to ask the intern to generate a rather complex report - he can't show anyone data they're not allowed to see by accident.
Which version did you reproduce that on? While the problem has not been generally addressed, the specifically bad case of looking up values at the "growing" end of a monotonically increasing data range has been improved a bit over the years (9.0 and then some incremental improvement in 9.0.4).
If it matters, it is an Amazon RDS instance.
The point is that every other database explains precisely the recommended approach for how to install it on every platform.
Hmm.
"The graphical installer for PostgreSQL includes the PostgreSQL server, pgAdmin III; a graphical tool for managing and developing your databases, and StackBuilder; a package manager that can be used to download and install additional PostgreSQL applications and drivers.
The installer is designed to be as straightforward as possible and the fastest way to get up and running with PostgreSQL on Windows." http://www.postgresql.org/download/windows/
"Binary packages for Solaris can be downloaded from the solaris subdirectory of the version you require from our file browser.
Packages for Solaris 10 and 11 are available for Sparc and i386 platforms.
Although produced by Oracle (previously Sun), these packages are not officially supported by them.
Solaris packages are installed by unpacking the compressed tar files directly into the install directory; see the README files for details." http://www.postgresql.org/download/solaris/
"These distributions all include PostgreSQL by default. To install PostgreSQL from these repositories, use the yum command:
yum install postgresql-server
Which version of PostgreSQL you get will depend on the version of the distribution: ..." http://www.postgresql.org/download/linux/redhat/
"Debian includes PostgreSQL by default. To install PostgreSQL on Debian, use the apt-get (or other apt-driving) command: apt-get install postgresql-9.4
The repository contains many different packages including third party addons. The most common and important packages are (substitute the version number as required): ..." http://www.postgresql.org/download/linux/debian/
"Ubuntu includes PostgreSQL by default. To install PostgreSQL on Ubuntu, use the apt-get (or other apt-driving) command: apt-get install postgresql-9.4
The repository contains many different packages including third party addons. The most common and important packages are (substitute the version number as required):" http://www.postgresql.org/download/linux/ubuntu/
"RPMs for SUSE Linux and openSUSE are available from the openSUSE Build Service in the project server:database:postgresql. Platform-specific RPM packages are available for PostgreSQL as well as a variety of related software. Use the search facility to find suitable packages. Documentation is also available there." http://www.postgresql.org/download/linux/suse/
"Note! These are the generic Linux download instructions. If you are using one of the major Linux distributions, you should read the distribution specific instructions: ... PostgreSQL is available integrated with the package management on most Linux platforms. When available, this is the recommended way to install PostgreSQL, since it provides proper integration with the operating system, including automatic patching and other management functionality.
Should packages not be available for your distribution, or there are issues with your package manager, there are graphical installers available.
Finally, most Linux systems make it easy to build from source. ... [Also, i]nstallers are available for 32 and 64-bit Linux distributions and include PostgreSQL, pgAdmin and the StackBuilder utility for installation of additional packages.
Download the installer from EnterpriseDB for all supported versions." http://www.postgresql.org/download/linux/
> The point is that every other database explains precisely the recommended approach for how to install it on every platform.
It really looks like Postgresql does that, too. Do you disagree?
It looks like it takes you to a link that lets you download a .dmg file. I assume those are basically one-click installers for Mac OS X programs?
However, the EnterpriseDB for OSX is basically broken. The good news is, that the user, who needs step by step instructions to download and install, will never find it out. On the other hand, user, who will find it out (because he is unable to build any pg extension with their pgxs, for example) does not need instructions at this level.
Due to the English language however there can be some debate as to what they meant. In this case "legacy" is most likely meant to describe the "MySQL application" not "MySQL" itself.
A meaning that adds something to a sentence is a more likely meaning than one that adds nothing to a sentence. If you take "legacy" to mean "being migrated from" then the sentence becomes
This feature also removes the last significant barrier to migrating being-migrated-from MySQL applications to PostgreSQL.
It's more likely that if "being migrated from" was the intended meaning, they would have simply left the word out.
The biggest problem with that approach is that the way query planning works isn't that a 100 different plans are fully built, cost evaluated, and then compared. Instead it's more like a dynamic programming approach where you iteratively build pieces of a query plan from ground up, and then combine those pieces to build the layer one up. Given the space of possible query plans, especially with several relations and indexes on each relation involved, such an approach is required to actually ever finish planning.
> Add a PLAN command that can be applied to a prepared statement and will set its plan. It is an error to submit a plan that does not match the query.
It's not easily, if at all in the generic case!, possible to prove that a specific plan matches a query. You could obviously try to build every possible plan and match against each of those, but that's computationally infeasible (we're talking factorial number of plans, depending on relations here).
So I think such an approach has no chance of working.
What's more realistic is a running queries in a "training" mode. That training mode would, matching on the specific parsetree, store the resulting plans in a table. Before exiting training mode you'd mark all these plans approved (after looking for bad cases obviously). After that preparing a new query still does the original query planning, but by default the query stored in the "approved plans" table would be used. The cost differential and the new plan would then be associated with the currently approved plan. Regularly the DBA (or whoever fulfills that role), checks the potential plans and approves new ones.
Based on a configuration option queries without approved plans would error out, raise a log message, or just work.
Now even that has significant problems because e.g. DDL will have the tendency to "invalidate" all the approved plans. But that's manageable in comparison to being woken up Friday night.
On EXPLAIN, if you've passed my example PLANS=3 it would first do the plan in its usual way, and then try optimizing several more times, with some chance of randomly making suboptimal decisions at various decision points. It would keep doing this until it either had enough plans or else was making essentially random decisions and still couldn't find more.
I can see this requiring a significant refactor of existing code, but stochastically exploring "pretty good" plans doesn't require facing the whole tree of possible plans.
The DBA hopefully can recognize the desired plan if it turns up.
As for the PLAN command, I do not see the problem. I can look at the query and the EXPLAIN output, and I can figure out exactly how and where that query's conditions are being incorporated. You might need extra output from EXPLAIN to make it always possible to do in an automated way, but it should be possible.
Put another way, you propose that the database has a way to look at the query and a plan stored in the table and figure out how to execute that plan for that query. What would that table contain that couldn't be represented as a chunk of text supplied with a PLAN command?
You'd not get anything useful by doing that. If you really consider this as a dynamic programming problem, at which place in the 'pyramid' of steps would you choose the worse plan? To quote the source:
/*
* We employ a simple "dynamic programming" algorithm: we first find all
* ways to build joins of two jointree items, then all ways to build joins
* of three items (from two-item joins and single items), then four-item
* joins, and so on until we have considered all ways to join all the
* items into one rel.
*
If you'd just make some random 'bad' decisions, you'll not have a significant likelihood of finding actually useful good plans.> As for the PLAN command, I do not see the problem. I can look at the query and the EXPLAIN output, and I can figure out exactly how and where that query's conditions are being incorporated. You might need extra output from EXPLAIN to make it always possible to do in an automated way, but it should be possible.
Good luck. Besides generating all possible plans postgres knows to generate, you very quickly essentially run into something equivalent to the halting problem.
> Put another way, you propose that the database has a way to look at the query and a plan stored in the table and figure out how to execute that plan for that query.
What I'm proposing is matching on the parse tree of the user supplied query. For each query there's exactly one parsetree the postgres parser will generate. We have a way (for the awesome pg_stat_statements module) of building a 'hash' of that parse tree, and thus can build a fairly efficient mapping of parsetrees to additional data. In contrast to that, for plans you can have a humongous number of plans for each user supplied query.
> What would that table contain that couldn't be represented as a chunk of text supplied with a PLAN command?
It would contain a, less ambiguous, version of the user supplied query. Now you could argue that you could add that to the PLAN command for matching purposes - but then we'd need guarantee that you could supply arbitrarily corrupt plan trees to postgres, without being able to cause harm. Something that'd cause significant slowdown during execution.
EDIT: Formatting
Those also obviously correspond to the Insert/update/select/delete SQL operations.
So hooking a RESTful interface direct to your db lets JavaScript on the client talk directly to the database via an http server with no business layer in between. It's obviously a nifty workflow for the "JavaScript all the things" developer.
That said, I think restfulness is a bit of a fad tied to the popularity of javascript-everything. But it has created a limited, simple standard for http services, so that's nice.
Postgrest does not adopt this convention. This decision is not arbitrary, however. Postgrest is conceptually a function that computes a REST API for a PostgreSQL database, and tables in a PostgreSQL can (for better or worse) have compound primary keys, which do not map well to a simple URL path. So Postgrest uses a more general URL syntax. The equivalent of the above is: /some_path?id=eq.123.
Postgrest represents some novel thinking about the use of certain HTTP verbs as well. A lot of this is discussed in the Postrest introduction video here: http://postgrest.com/
As to the algorithm described in the comment, it is actually able to search through all plans. The dynamic programming bit just means that you don't have to unnecessarily traverse all possible logic paths that a naive recursive algorithm might. But if there is a good plan, it can be discovered.
It is unclear from the comment whether the algorithm can only find plans which involve adding one table at a time. This would not be unreasonable - for example I know that Oracle circa a decade ago did that to avoid having to consider too many query plans on large joins. I have no idea whether that ever got changed.
As to the parse tree, that seems like overkill to me. You have a query. It has a list of tables, and a list of join conditions. A query plan includes tables, indexes, filters, and so on. It could easily be augmented by stating which query condition was applied where.
To validate it you have to see that the lists of tables match, the list of applied conditions match, and each plan step could actually have the effect of doing that condition. If it all ties out, it is a valid plan. The fact that the plan might be crap is something that is explicitly left in the hands of the user - that's the whole point of the feature. But at that point it is clearly valid.
Those are common API conventions, but they have nothing to do with REST; REST only requires unique URI's for resources, it has nothing to say about the format of those URI's. The Postgres version does not deviate from REST, it deviates from a popular API pattern, that's all.
First, my comment says "more likely" so it isn't assuming anything.
Second, if we change "more likely" to "definitely", the assumption is merely that the sentence in question is written with the same command of the English language as the rest of the announcement, i.e. no egregiously redundant words.
Maybe not a perfect word but very easy to understand in the context.
You definitely have a point, though. How the phrase "legacy X applications" is interpreted likely depends on whether the context is an announcement by a competitor to X.