Data Analysis with pandas - Workshop

Hands-on beginners workshop on data analytics using the PyData stack

Marco Bonzanini

Data Data Science

See in schedule: Tue, Jul 27, 13:15-14:45 CEST (90 min)

This is a hands-on workshop to help the audience getting familiar with pandas, one of the main Python libraries for data analytics.

pandas is a crucial data analysis library that provides the `DataFrame` structure, a tabular representation of your data that you can use to easily inspect, summarise, select, filter, transform, sort, aggregate and plot your data.

This workshop will showcase code examples using pandas to perform all of the above operations, and will provide exercises that you can use to consolidate your new knowledge.

Outline:

** pandas basics (demo by the trainer, approx 30-45 minutes)
- Loading data from CSV files
- Inspecting the data, summary statistics
- Data selection and filtering (e.g. boolean indexing, column selection)
- Data transformation (e.g. ``apply()``, ``map()``)
- Sorting values

** Exercises (work on your solutions, approx 45 minutes)

** Break
** pandas operations (demo by the trainer, approx 30-45 minutes)
- Data aggregation (``groupby()``)
- Joining ``DataFrame`` objects (``merge()``)
- Basics of data visualisation with pandas (``plot()``)
- Optional: more complex data visualisation with matplotlib or seaborn

** Exercises (work on your solutions, approx 45 minutes)

** Wrap up: suggestions for a capstone project to take home
** Brief discussion on how to apply your new knowledge to get useful insights on some interesting dataset.

Intended audience:

This workshop is for students, developers, researchers, engineers, analysts and anybody who knows a little bit of Python and would like to start using it for data analytics. You should be somehow familiar with the basic Python syntax, but the assumption is that you are new to pandas and all in all, no expert knowledge is required.

Setup instructions:

This is a hands-on session so please bring your laptop and expect to write some code :)

The material is shared on this public repository:
https://github.com/bonzanini/data-analytics-workshop

We'll use Python 3.8, Jupyter, pandas and matplotlib (suggestion: have a fresh installation of Anaconda Python from anaconda.com/download).

If you're attending, please go through the setup instruction in the repository above before coming at the workshop.

Type: Training (180 mins); Python level: Beginner; Domain level: Beginner


Marco Bonzanini

Bonzanini Consulting Ltd

Data Science Consultant and Trainer based in London, UK.

Co-organiser of the PyData London meet-up, and co-chair of the PyData London conference.

Python publications:

- Mastering Social Media Mining with Python (book, PacktPub, 2016)
- Data Analysis with Python (video course, PacktPub, 2017)
- Practical Python Data Science Techniques (video course, PacktPub, 2017)