Scrapinghub is a company focused on information retrieval and its later manipulation,
deeply involved on developing and contributing in Open Source projects regarding
web crawling and data processing technologies.
This year we are applying with three of our most renowned projects, Scrapy, Portia, and
You can learn more about these projects on their respective repositories on GitHub:
Scrapy is a very popular web crawling and
scraping framework for Python (10th in Github most trending Python projects)
used to write spiders for crawling and extracting data from websites.
Check Scrapy ideas
Portia is a tool that allows you
to visually scrape websites without any programming knowledge required.
Users can annotate web pages to identify the data they wish to extract,
and Portia will understand based on these annotations how to scrape data
from similar pages.
Check Portia ideas
Splash is a lightweight web browser with an HTTP API.
get detailed information and take screenshots of the crawled websites
as they are seen in a browser.
Check Splash ideas