Event box

Python basic - Text Mining In-Person

This course is aimed at students who are new to Text Mining but already have an introductory understanding of Python, such as can be gained from our “Python for Absolute Beginners” courses - pt. 1 and 2.
We expect you to be familiar with all the concepts described under "Python for Absolute Beginners" here:
https://kubdatalab.github.io/python

In this course, you can gain knowledge of concepts such as cleaning, lemmatization, part-of-speech (POS) tagging, term frequency, TF-IDF, and collocations. A basic understanding of Python programming is recommended.

We begin by downloading and extracting text from a digitized historical book from the Royal Danish Library. After that, we look at cleaning techniques and stopwords.

We then explore the NLTK library and use it for lemmatization and POS tagging. Next, we build a term-frequency-tool that visualizes the most common words in the text.

Then we will look on a method to find words that are not the most common in a text. For that, we use the TF-IDF algorithm and we get a high level understanding of how it works. We conclude by building a collocations tool that can find words that often appear in the same context.

If you have to hand in an assignment that must include text mining, then participation is an ideal way to find inspiration.

The course is based on material available here:

Before the course, please have Python installed on your computer, as well as either Jupyter Notebook or Jupyter Lab. The easiest way is to download and install the Anaconda package, as it provides everything at once. However, if you prefer not to do this, here is a guide on how to install Python first and then Jupyter.

Related LibGuide: KUB Datalab by Christian Knudsen