This is a very poor approach to scraping. SAX parsers aren't good for much of anything, and they are especially bad for scraping HTML. You'll get lots of errors while parsing relatively normal pages, and your logic will become very challenging to follow. There's no good reason not to use a proper parser that parses into a document, like lxml or BeautifulSoup. lxml is also very fast, so there's not even a performance argument for SAX.
I used to love scraping with the Python, BeautifulSoup, Mechanize family and wrote a lot of scripts like the one in this post. I've been using CasperJS[1] though and I don't think I'll go back.