ASK HN: Are there code standards for using Pandas DataFrames in production? I'm working on a Python project where some modelling has been implemented using Pandas. I'm helping to add an API over the modelling logic, and when I see a function that accepts a dataframe (sometimes many dataframes), it feels like it's not obvious what that function requires without reading through all of the function's code (e.g. which dataframe columns it requires, maybe even their types, etc.). Requiring series doesn't seem like the right thing either because sometimes a function might require a few columns whose rows are related. Is there an accepted way to define these sort of functions that lets the caller to easily understand what columns (or even types) are required? Or am I missing something obvious and this isn't a real problem? I can think of a few ways to do it (mostly thinking decorators) but it'd be awesome to hear what people are doing in the real world. |