Case Study Data Science Databases Natural Language Processing WebSee in schedule: Wed, Jul 28, 10:30-11:15 CEST (45 min) Download/View Slides
Wikipedia is the digital encyclopedia that we use daily to find out facts and information. What could be better than being able to extract the extreme wealth of crowd-sourced knowledge from Wikipedia without using traditional web scrapers? Various community-driven projects extract knowledge from Wikipedia and stores them structurally, retrievable using SPARQL. It can be used to mine data for a range of Data Science projects. In this talk, I will walk through the basics of the Open Web and how to use Python to use this huge open database.
The agenda includes the following:
• Why Wikipedia?
• Introduction to DBpedia and Wikidata
• Introduction to Linked Data
• How to query DBpedia/WikiData
o Build SPARQL Query
o Use Python’s SPARQLWrapper
• Python Code Walkthrough to create
o A Tabular Dataset using SPARQL
o A Corpus for Language Models using Wikipedia and BeautifulSoup
o An Use-Case leveraging both SPARQLWrapper and Wikipedia to Create Domain-Specific Corpus
Prerequisites – Basic knowledge of Python programming, Natural Language Processing, and SQL
Type: Talk (45 mins); Python level: Intermediate; Domain level: Beginner
Data Scientist @ ACI Worldwide | Edu Co-Lead @ Women in AI Ireland | Python and Data Science Instructor @ WAIA | ❤ NLP