Event box

Harvest data from the web with Python In-Person

Web scraping, or web harvesting or data extraction is what this course is about. A basic understanding of Python programming is recommended.

The course provides insight into how you can use Python to collect data from the web. We start by discussing HTML and examining HTML elements and attributes. Then, we try working with HTML in Python, and you are introduced to two libraries: Requests and BeautifulSoup. We attempt to locate data within the HTML structure using the methods .find and .find_all, and we read/extract data from the structure. We conclude with a mini-project that involves harvesting text data from a Wikipedia page.

Regardless of your academic background, the course will be relevant if you are interested in collecting material for your assignments or if you are simply interested in more advanced use of Python.

The course is based on material available here: Harvest data from the web

Before the course, please have Python installed on your computer, as well as either Jupyter Notebook or Jupyter Lab. The easiest way is to download and install the Anaconda package, as it provides everything at once. However, if you prefer not to do this, here is a guide on how to install Python first and then Jupyter.

Related LibGuide: KUB Datalab by Christian Knudsen

Date:: 21/02/2025
Time:: 9:30 - 11:30
Time Zone:: Central European Time (change)
Location:: KUB Datalab - Samf
Campus:: KUB City Campus, Gothersgade 140, 1123 København K
Audience:: Level 3 - Intermediate / Øvet
Categories:: English Harvesting Python Datalab

Registration has closed.

Browse/Search for more events

Event Organizer

Lars Kjær

lakj@kb.dk