R and pandas and what I've learned about each(blog.yhathq.com) |
R and pandas and what I've learned about each(blog.yhathq.com) |
Nonetheless I use R, ggplot2 is excellent and its graphs are better looking than matplotlib. I really like R Markdown for intermingling R and Markdown in your code. I suppose Ipython is the python equivalent. R especially shines with its numerous libraries.
Anyway, for every project I debate whether to use R or Python, perhaps I should look into rmagic/rpy2 for iPython as a go between.
I found this link on porting ggplot2 styles into python, but it doesn't focus on grammar. http://tonysyu.github.com/mpltools/auto_examples/style/plot_...
% virtualenv --distribute --no-site-packages pandas_venv
[blahblah]
% . pandas_venv/bin/activate
(pandas_venv) % easy_install readline # Probably only needed in Mac OS X for iPython to behave
[blahblah]
(pandas_venv) % pip install ipython
[blah blah]
(pandas_venv) % pip install numpy
[lots of blahblah]
(pandas_venv) % pip install matplotlib
[quite a bit of blahblah]
(pandas_venv) % pip install pandas
[some more blah blah]
(pandas_venv) % pandas_venv/bin/ipython --no-banner
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import pylab as pl
In [4]:For most users, the easiest way to get set up is a complete Python distribution, like Anaconda, EPD or Python(x,y). See the Scipy Stack installation page:
>>> import numpy as np
>>> import scipy as sp
>>> import statsmodels as sm
>>> import matplotlib as mpl
>>> import pandas as pd
>>> import networkx as nx
>>> import sklearn as sk
>>> import nltkPylab is handy if you're just transitioning from Matlab, but otherwise, there's no reason to use it. It's a gigantic namespace, and all but a couple of functions are from numpy and matplotlib.pyplot.
Just do:
import matplotlib.pyplot as plt
Instead of: import pylab as pl
Of course, in the end it's personal preference. As long as you don't need to know where things come from, then using pylab is fine.What I miss the most:
- Matlab <--> Excel link (on Windows) - an excel add-on that lets you send back and forth arrays very easily. You need a spreadsheet when you work with datasets, and interchanging data through files just isn't that convenient.
- Matlab's IDE features (debugging, documentation, publishing, variable inspection).
- ggplot2
I'd rather have R/data.table at the prompt and python/pandas in my script, but if you have to err on one side, the python/pandas "low magic" is the side to err on. Pandas does have its own strange corners, though. For example, it seems like it tries hard to stick similar-typed columns into contiguous matrices, which leads to some unexpected casting, and I have no idea what the supposed benefit is over just keeping distinct columns.
Slightly OT:
I'm using in-memory sqlite3 with rtree to find objects within bounds in a 2D space. Is there a different library people would recommend for this in Python?
I am used to the Python syntax, and while R is another language to learn, my assumption is that for data analysis its age compared to Pandas implies stability.
I could of course be wrong.
That said, if it causes unexpected behaviour, check to see whether it's a bug.