Leveraging Linked Data using Python and SPARQL

Nabanita Roy

Case Study Data Science Databases Natural Language Processing Web

See in schedule: Wed, Jul 28, 10:30-11:15 CEST (45 min) Download/View Slides

Wikipedia is the digital encyclopedia that we use daily to find out facts and information. What could be better than being able to extract the extreme wealth of crowd-sourced knowledge from Wikipedia without using traditional web scrapers? Various community-driven projects extract knowledge from Wikipedia and stores them structurally, retrievable using SPARQL. It can be used to mine data for a range of Data Science projects. In this talk, I will walk through the basics of the Open Web and how to use Python to use this huge open database.
The agenda includes the following:
• Why Wikipedia?
• Introduction to DBpedia and Wikidata
• Introduction to Linked Data
• How to query DBpedia/WikiData
o Build SPARQL Query
o Use Python’s SPARQLWrapper
• Python Code Walkthrough to create
o A Tabular Dataset using SPARQL
o A Corpus for Language Models using Wikipedia and BeautifulSoup
o An Use-Case leveraging both SPARQLWrapper and Wikipedia to Create Domain-Specific Corpus

Prerequisites – Basic knowledge of Python programming, Natural Language Processing, and SQL

Type: Talk (45 mins); Python level: Intermediate; Domain level: Beginner


Nabanita Roy

ACI Worldwide

Data Scientist @ ACI Worldwide | Edu Co-Lead @ Women in AI Ireland | Python and Data Science Instructor @ WAIA | ❤ NLP