Let's Scrape the Web with Python 3

Let's Scrape the Web with Python 3(codecr.am)

20 points by quakkels 13 years ago | 2 comments

ianb 13 years ago |

This is a very poor approach to scraping. SAX parsers aren't good for much of anything, and they are especially bad for scraping HTML. You'll get lots of errors while parsing relatively normal pages, and your logic will become very challenging to follow. There's no good reason not to use a proper parser that parses into a document, like lxml or BeautifulSoup. lxml is also very fast, so there's not even a performance argument for SAX.

jbackus 13 years ago |

I used to love scraping with the Python, BeautifulSoup, Mechanize family and wrote a lot of scripts like the one in this post. I've been using CasperJS[1] though and I don't think I'll go back.

[1] - http://casperjs.org/