Event box

Python basic - Harvest data from the web In-Person

This course is aimed at students who are new to Web scraping but already have an introductory understanding of Python, such as can be gained from our “Python for Absolute Beginners” courses - pt. 1 and 2.
We expect you to be familiar with all the concepts described under "Python for Absolute Beginners" here:
https://kubdatalab.github.io/python

Web scraping, or web harvesting or data extraction is what this course is about. A basic understanding of Python programming is recommended.

The course provides insight into how you can use Python to collect data from the web. We start by discussing HTML and examining HTML elements and attributes. Then, we try working with HTML in Python, and you are introduced to two libraries: Requests and BeautifulSoup. We attempt to locate data within the HTML structure using the methods .find and .find_all, and we read/extract data from the structure. We conclude with a mini-project that involves harvesting text data from a Wikipedia page.

Regardless of your academic background, the course will be relevant if you are interested in collecting material for your assignments or if you are simply interested in more advanced use of Python.

The course is based on material available here: Harvest data from the web

Before the course, please have Python installed on your computer, as well as either Jupyter Notebook or Jupyter Lab. The easiest way is to download and install the Anaconda package, as it provides everything at once. However, if you prefer not to do this, here is a guide on how to install Python first and then Jupyter.

Related LibGuide: KUB Datalab by Christian Knudsen

Date:: 19/03/2025
Time:: 9:30 - 11:30
Time Zone:: Central European Time (change)
Location:: KUB Søndre Campus: Undervisningslokalet 1. sal
Campus:: KUB South Campus, Karen Blixens Plads 7, 2300 København S
Audience:: Level 2 - Basic / Let øvet
Categories:: English Harvesting Python Datalab

Registration has closed.

Browse/Search for more events

Event Organizer

Lars Kjær

lakj@kb.dk