Data Ingestion and Big Data

Build a dataset from zero to solid

Mauro Pelucchi

Big Data Data Science Web Crawling python

Web scraping, crawling and API are the first step to retrieve information to use for analysis
and to start a new business.
In this tutorial I'll show you how to use python to set up scraping and crawling processes,
how to simulate users navigation and browser behavior with a ghost browser and how to hook up and use data APIs.
I will also try to explain the technical and ethical aspects that we have to consider when we approach these kinds of challenges.

Type: Interactive (60 mins); Python level: Beginner; Domain level: Beginner


Mauro Pelucchi

Mauro Pelucchi is a senior data scientist and big data engineer
responsible for the design of the "Real-Time Labour Market Information System on Skill Requirements" for CEDEFOP.

He currently works for Burning-Glass Europe. His main tasks are related to machine learning modelling, labour market analyses, and the design of big data pipelines to process large datasets of online job vacancies.
In collaboration with the University of Milano-Bicocca, he took part in many research projects related to the labour market intelligence systems.
He collaborates with the University of Milano-Bicocca as a lecturer at the Master Business Intelligence and Big Data Analytics and with the University of Bergamo as a lecturer in Computer Engineering.