Event box

Text Mining with Python

Text Mining with Python In-Person

In this course, you can gain knowledge of concepts such as cleaning, lemmatization, part-of-speech (POS) tagging, term frequency, TF-IDF, and collocations. A basic understanding of Python programming is recommended.

We begin by downloading and extracting text from a digitized historical book from the Royal Danish Library. After that, we look at cleaning techniques and stopwords.  

We then explore the NLTK library and use it for lemmatization and POS tagging. Next, we build a term-frequency-tool that visualizes the most common words in the text.

Then we will look on a method to find words that are not the most common in a text. For that, we use the TF-IDF algorithm and we get a high level understanding of how it works. We conclude by building a collocations tool that can find words that often appear in the same context.

If you have to hand in an assignment that must include text mining, then participation is an ideal way to find inspiration.   

The course is based on material available here:

Before the course, please have Python installed on your computer, as well as either Jupyter Notebook or Jupyter Lab. The easiest way is to download and install the Anaconda package, as it provides everything at once. However, if you prefer not to do this, here is a guide on how to install Python first and then Jupyter.

Related LibGuide: KUB Datalab by Christian Knudsen

Date:
28/04/2025
Time:
12:30 - 15:00
Time Zone:
Central European Time (change)
Location:
Library Lighthouse, zone 1
Campus:
KUB Nørre Campus, Nørre Allé 49, 2200 København N
Audience:
  Level 3 - Advanced / Øvet  
Categories:
  Analysis     Cleaning     English     Python     Visualisation     Datalab  

Registration is required. There are 22 seats available.

Event Organizer

Profile photo of Lars Kjær
Lars Kjær