Data Organization in Spreadsheets (2017)(tandfonline.com) |
Data Organization in Spreadsheets (2017)(tandfonline.com) |
Data Organization in Spreadsheets - https://news.ycombinator.com/item?id=17790545 - Aug 2018 (27 comments)
I would start w/ different strategies on how to model data in tables. One problem that I often see in pandas data analyses is people treating the data like it's a web app database (many small, normalized tables), rather than joining the data into a few big, denormalized tables. The latter makes it easier for people to answer their own questions / vs relying on a bunch of tiny custom functions someone wrote!
* Hadley's tidy data paper: https://vita.had.co.nz/papers/tidy-data.pdf
* Normalizing data: https://en.wikipedia.org/wiki/Database_normalization
* Denormalized data: https://en.wikipedia.org/wiki/Denormalization
* Emily Riederer, column names as contracts: https://emilyriederer.netlify.app/post/column-name-contracts...