Event box

Harvest data from the web with Python

Harvest data from the web with Python In-Person

Web scraping, or web harvesting or data extraction is what this course is about. A basic understanding of Python programming is recommended.

The course provides insight into how you can use Python to collect data from the web. We start by discussing HTML and examining HTML elements and attributes. Then, we try working with HTML in Python, and you are introduced to two libraries: Requests and BeautifulSoup. We attempt to locate data within the HTML structure using the methods .find and .find_all, and we read/extract data from the structure. We conclude with a mini-project that involves harvesting text data from a Wikipedia page.  

Regardless of your academic background, the course will be relevant if you are interested in collecting material for your assignments or if you are simply interested in more advanced use of Python.

The course is based on material available here: Harvest data from the web  

Before the course, please have Python installed on your computer, as well as either Jupyter Notebook or Jupyter Lab. The easiest way is to download and install the Anaconda package, as it provides everything at once. However, if you prefer not to do this, here is a guide on how to install Python first and then Jupyter.

Related LibGuide: KUB Datalab by Christian Knudsen

Date:
28/04/2025
Time:
9:30 - 11:30
Time Zone:
Central European Time (change)
Location:
Library Lighthouse, zone 1
Campus:
KUB Nørre Campus, Nørre Allé 49, 2200 København N
Audience:
  Level 3 - Advanced / Øvet  
Categories:
  English     Harvesting     Python     Datalab  

Registration is required. There are 22 seats available.

Event Organizer

Profile photo of Lars Kjær
Lars Kjær