Ask HN: What is the best tech stack nowadays for mass scraping? I’m in the design phase for a new project that will involve quite a bit of web scraping, formatting that data as appropriate and saving to a DB. The sources to be scraped are quite varied. I’ve got a little bit of experience with Node/Express and Ruby/Rails. I’m more than happy to pick up Go or Python/Django or Elixir or something else if those are more appropriate. I think my hesitation with going back to Node is that I slightly prefer static typed languages, but happy to use the best tool for the job. My concern is computing/bandwidth costs as the various scrapers will be running and alternating quite frequently. I’m hoping you all could give some recommendations for a stack that makes it easy to run mass scheduled web scraping jobs with little overhead in order to reduce server costs. Thanks! |