Case Study Data Science Databases Natural Language Processing Web
See in schedule: Wed, Jul 28, 10:30-11:15 CEST (45 min) Download/View SlidesWikipedia is the digital encyclopedia that we use daily to find out facts and information. What could be better than being able to extract the extreme wealth of crowd-sourced knowledge from Wikipedia without using traditional web scrapers? Various community-driven projects extract knowledge from Wikipedia and stores them structurally, retrievable using SPARQL. It can be used to mine data for a range of Data Science projects. In this talk, I will walk through the basics of the Open Web and how to use Python to use this huge open database.
The agenda includes the following:
•	Why Wikipedia?
•      Introduction to DBpedia and Wikidata
•      Introduction to Linked Data
•      How to query DBpedia/WikiData
      o	Build SPARQL Query
      o	Use Python’s SPARQLWrapper
•	Python Code Walkthrough to create
      o	A Tabular Dataset using SPARQL
      o	A Corpus for Language Models using Wikipedia and BeautifulSoup
      o	An Use-Case leveraging both SPARQLWrapper and Wikipedia to Create Domain-Specific Corpus 
Prerequisites – Basic knowledge of Python programming, Natural Language Processing, and SQL
                Type: Talk (45 mins); Python level: Intermediate; Domain level: Beginner
                
Data Scientist @ ACI Worldwide | Edu Co-Lead @ Women in AI Ireland | Python and Data Science Instructor @ WAIA | ❤ NLP