Case Study Data Science Databases Natural Language Processing Web
See in schedule: Wed, Jul 28, 10:30-11:15 CEST (45 min) Download/View SlidesWikipedia is the digital encyclopedia that we use daily to find out facts and information. What could be better than being able to extract the extreme wealth of crowd-sourced knowledge from Wikipedia without using traditional web scrapers? Various community-driven projects extract knowledge from Wikipedia and stores them structurally, retrievable using SPARQL. It can be used to mine data for a range of Data Science projects. In this talk, I will walk through the basics of the Open Web and how to use Python to use this huge open database.
The agenda includes the following:
• Why Wikipedia?
• Introduction to DBpedia and Wikidata
• Introduction to Linked Data
• How to query DBpedia/WikiData
o Build SPARQL Query
o Use Python’s SPARQLWrapper
• Python Code Walkthrough to create
o A Tabular Dataset using SPARQL
o A Corpus for Language Models using Wikipedia and BeautifulSoup
o An Use-Case leveraging both SPARQLWrapper and Wikipedia to Create Domain-Specific Corpus
Prerequisites – Basic knowledge of Python programming, Natural Language Processing, and SQL
Type: Talk (45 mins); Python level: Intermediate; Domain level: Beginner
Data Scientist @ ACI Worldwide | Edu Co-Lead @ Women in AI Ireland | Python and Data Science Instructor @ WAIA | ❤ NLP