Event box

Online introduction to OpenRefine – how to clean messy data

Online introduction to OpenRefine – how to clean messy data

Sometimes data is messy and needs cleaning before you can analyze and visualize the data. For example, to merge different spellings (e.g. ‘Denmark’ and ‘DK’) or to delete blanks before a number. OpenRefine is an open source and free tool, which can help you clean messy data.
At the course you will:

  • Import an excel-file to OpenRefine.
  • Work with several data cleaning options in OpenRefine.
  • Edit the data via OpenRefine’s graphical user interface as well as via simple coding.
  • Export the file to excel after the cleaning.

You can work with your own data (excel-file), or you can use the excel-file provided by the instructor. If you bring your own file, you will have the opportunity to test whether OpenRefine is the right tool for your data. 

The course is a basic introduction. You are not expected to have worked with OpenRefine before the course.

Preparation before the course:

  • Install OpenRefine on your computer: https://openrefine.org/download.html
  • If possible, bring your own excel-file with messy data. If you have specific questions about how to clean you data, please send the question and excel-file to kubdatalab@kb.dk at least two days before the course. 

After the course you will have:

  • A basic understanding of OpenRefine – for which tasks you can use the tool and for which tasks are other tools more appropriate
  • Imported/exported an excel file to/from OpenRefine
  • Worked with simple data cleaning operations via OpenRefine’s graphical user interface and via OpenRefine’s coding option (GREL)

Materials

  • Instructor’s presentation (to be uploadet)
  • Exercises for the course (we will work with these exercises during the course) (to be uploadet)
  • Excel-file you can use if you don’t bring your own data (to be uploadet)
  • Introduction to OpenRefine (textbook) (to be uploadet)

Where

A zoom-link will be sent to attendees the day before the course.

Please also note: If there are non-Danish speakers, this course will take place in English. If all attendees are Danish speakers, it will take place in Danish.

Related LibGuide: KUB Datalab by Halle Rashdan

Date:
22/03/2021
Time:
14:00 - 15:00
Time Zone:
Central European Time (change)
Location:
Online på Zoom
Categories:
  Cleaning     Datalab     English  

Registration is required. There are 24 seats available.

Event Organizer

Marianne Gauffriau
Profile photo of Erik Schwägermann
Erik Schwägermann