Implement table partitioning

Implement table partitioning(git.postgresql.org)

190 points by rachbelaid 9 years ago | 56 comments

samcheng 9 years ago |

Any support for "rolling" partitions? e.g. A partition for data updated less than a day ago, another for data from 2-7 days ago, etc.

I miss this from Oracle; it allows nice index optimizations as the query patterns are different for recent data vs. historical data.

I think it could be set up with a mess of triggers and a cron job... but it would be nice to have a canonical way to do this.

jtc331 9 years ago | |

The fundamental issue here is that you'd actually have to move the rows between relations given that Postgres maintains separate storage etc. for each. There's no good way to do that.

willvarfar 9 years ago | |

Living with the cron jobs for a big mysql db, and wishing the DB understood this seemingly common use-case :(

Klathmon 9 years ago | | |

Honestly i wouldn't call it "common". It's useful, and if it existed I could see it changing how I design a database, but it's not something I can say i've ever thought about needing before.

But then again, maybe i'm the outlier here.

lobster_johnson 9 years ago | |

How does this work in Oracle? Seeing as the partitioning constraint would be time-dependent, wouldn't it need to re-evaluate it at regular intervals in order to shuffle data around? Is the feature explicitly time-oriented?

mulmen 9 years ago | | |

I don't think oracle can do this exactly but the query planner does understand time based partitions so if you do something like:

   SELECT * FROM partitioned_table WHERE partition_date_key > SYSDATE - 1;

The query planner will only use the most recent partition. Combine this with Oracle's ability to merge partitions and you get "daily" partitions that become "weekly" partitions when the new week starts. Alternately you could wait a month and combine all the days of last month into a single partition and then even combine months into years.

The partition intervals are based on specific dates/times, not on the relative time from query execution.

Oracle also supports row movement which is the biggest missing feature here I believe.

rachbelaid 9 years ago |

The conversation on the patches are really interesting: https://www.postgresql.org/message-id/flat/55D3093C.5010800@... https://www.postgresql.org/message-id/flat/ad16e2f5-fc7c-cc2...

aidos 9 years ago | |

I'm always amazed by the PG community - it seems like such a constructive place.

Those patches are absolutely insane. Makes you remember how much hard work goes into building the software you use on a day to day basis.

https://www.postgresql.org/message-id/attachment/45478/0001-...

MarHoff 9 years ago | | |

I've been professionally focused on PostgreSQL based works for the last 5 years. At the highest point of the BigData hype I sometimes felt a little bit off-track, because I never got the time to investigate NoSQL solutions...

Only recently did I realize that being focused on actual data and how to process it inside PostgreSQL was maybe the best way I could spend my working time. I really can't say what's the best part of PostgreSQL, the hyperactive community, the rock solid and clear documentation or the constant roll-out of efficient, non-disruptive, user-focused features...

reactor 9 years ago | | |

I could see good amount of quality engineering there, kudos.

egeozcan 9 years ago |

If you also didn't know what exactly partitioned tables are, here's a nice introduction from Microsoft:

https://technet.microsoft.com/en-us/library/ms190787(v=sql.1...

It is for the SQL server but I assume it would be mostly relevant. Please correct me if I'm wrong.

ktopaz 9 years ago |

I don't get it? Table partition is already supported in PostgreSQL now and has been for a long time now (at least since 8.1); Where I work we utilize table partitioning with PostgreSQL 9.4 on the product we're developing.

https://www.postgresql.org/docs/current/static/ddl-partition...

tajen 9 years ago |

About donations: I believe PostgreSQL now deserves more advertising and marketing to develop its adoption in major companies and, hence, get more funding. If I donate on the website, it says it will help conferences. Where should I donate?

bigato 9 years ago |

Supposing the case in which all partitions are on the same disk and that you manage to index your data well enough according to your usage that postgres does not need to do full table scans, are there any additional performance benefits on partitioning?

gdulli 9 years ago |

This message was confusing to me because I've been using/abusing Postgres inheritance for partitioning for so long that I forgot Postgres didn't technically have a feature called "partitioning".

What I'm looking forward to finding out is if I can take an arbitrary expression on a column and have it derive all the same benefits of range partitioning like constraint exclusion.

vincentdm 9 years ago |

I really like this addition. We store a lot of data for different customers, and most of our queries are only about data from a single customer. If I understand it correctly, if we would partition by customer_id, once the query planner is able to take advantage of this new feature, it will be much faster to do such queries as it won't have to wade through rows of data from other customers.

Another common use case is that we want to know an average number for all/some customers. To do this, we run a subquery grouped by customer, and then calculate the average in a surrounding query. I hope that the query builder wil eventually become smart enough to use the GROUP BY clause to distribute this subquery to the different partitions.

tda 9 years ago |

I just tried to implement table partitioning in PostgreSQL 9.6 this week. With some triggers and check constraints this seem to work quite nicely, but I was a bit disappointed that hash based partitioning is currently not possible (at least not without extensions).

Will hash based partitioning be included in PostgreSQL 10? The post notes

  A partitioning "column" can be an expression.

so I can assume it will be supported?

jtc331 9 years ago | |

As long as the expression being hashed doesn't change then yes you could make the expression a hashing function call. If the expression being hashed is mutable there would be issues since the feature doesn't currently support updates that result in rows moving between partitions.

amitlan 9 years ago | |

Not natively, as in there is no PARTITION BY HASH (<list-of-columns>). What limitations do you face when trying to roll-your-own hash partitioning using check constraints (in 9.6)?

tda 9 years ago | | |

I wanted to partition a table by the foreign key, as the table receives a few hundred rows per foreign key per hour (it is a timeseries db).

So I figured partitioning the table by foreign key would group all data together in a way that allows for faster access (typical access pattern would be select * where foreign_key = x). However, as the number of keys in the foreign table is unbounded and can be quite large, I wanted to partition the data to a limited number of tables, with

  mod(foreign_key, number_of_partions)

If I understood correctly, check constraints can't operate on a calculated value

amitlan 9 years ago |

...this is the beginning, not the end... https://www.postgresql.org/message-id/CA%2BTgmobTxn2%2B0x96h...

vemv 9 years ago |

While seemingly extensive, I don't quite like the commit message.

I doesn't say what TP is, and what its use cases would be. That's the first thing you should say, else how am I going to understand / keep interest in the rest of the text?

pilif 9 years ago | |

The commit is written by postgres developers for postgres developers. I would say that 90% of the intended audience of that commit message doesn't need an explanation what table partitioning does.

For them this would be needless clutter that's not at all relevant to the commit.

Once we're reaching the 10.0 release, human-friendly release notes, additional manual chapters and sample code will be written for the users to understand (in-fact, the commit linked by this submission already contains quite a bit of additional documentation to be added to the manual).

Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations.