Event box

An alternative to Excel: OpenRefine can do really cool stuff

An alternative to Excel: OpenRefine can do really cool stuff In-Person

An alternative to Excel: OpenRefine can do really cool stuff

MS Excel is a widely used tool for handling tabular data, but can also be a source of frustration. It has even led to a number of “horror stories (Links to an external site.)” of how MS Excel has led to mishaps. See The Turing Way (Links to an external site.) for more "fun". To be fair the best solution may very well be to use OpenRefine and Excel in conjunction as they each have their own strengths and weaknesses.

OpenRefine can help you handle datasets in a different way that does not lead to these kinds of mishaps. OpenRefine can even do stuff that Excel cannot – e.g. make data entries consistent.

This introductory course will show you how to investigate your dataset in terms of e.g.:

  • outliers
  • inconsistencies (spelling errors, differences in abbreviations)
  • isolate specific values
  • check if values are all numbers
  • check if there are blank fields
  • sort your data
  • check for duplicates
  • remove duplicates
  • remove unwanted white spaces
  • merge cells or split cells based on a delimiter or by field length
  • and much more

During the two hour course you will be shown how to use OpenRefine to do the stuff above. The last part of the course you will have time to try out the functions on the dataset I use in my demonstration. You are also welcome to bring your own data that you want to work on.

Preparation for the course:

I expect that you have OpenRefine installed before the course, since we do not have time to do it at the course. See here how to install OpenRefine.

After the course:

After the intro you can pick up the course manual and the dataset used in the course here. See and repeat the steps we took at your own pace. NB: the material is not available until after the introduction :-)

Related LibGuide: KUB Datalab by Christian Knudsen

Date:
10/10/2022
Time:
10:00 - 12:00
Time Zone:
Central European Time (change)
Campus:
SAMF: Det Samfundsvidenskabelige Fakultetsbibliotek, Gothersgade 140, 1123 København K
Categories:
  Cleaning     Datalab     English  
Registration has closed.

Event Organizer

Profile photo of Asger Væring Larsen
Asger Væring Larsen