An open source API for web scraping(github.com) |
An open source API for web scraping(github.com) |
https://falkor-api.herokuapp.com/api/query?url=https://news....
Very interesting though. Just tried scraping twitter and it works great: https://falkor-api.herokuapp.com/api/query?url=https://twitt...
Edit: works great as long as there are no quotes, hashtags, or links in the tweets. Is it possible to include sub-elements?
So basically this is a DOM API in JSON. Simple, but I like it.
Any plans to add JSONP support?
Only really started hacking around on the idea the other day so early stages. Want to add filters so you can say "grab me only the text" or "grab me just the class names". Obviously another step would be to grab multiple elements in one request.
Adding xPath support as well as CSS selectors would be a good addition.
https://falkor-api.herokuapp.com/api/query?url=http://digg.c...
https://web.archive.org/web/20140420162639/http://scrape.ly/
For example if you wanted the profile of authors of today's stories
http://scrape.ly/s/{http://news.combination.com}
{'ueoma87'}*{'next':'Next Page'}{'karma':'331',
'username':'ueoma87'}
Would've returned all the profiles of each story's author today and yesterday and so on.