OpenRefine for non-programmers: Data Cleaning, Mining, Transformations, and Text Normalization

Abstract

OpenRefine is a tool for working with semi-structured datasets. It allows you to explore data, easily find facet patterns within data, enables simple detection of data inconsistencies, and offers quick clean-up and transformation options. Open Refine is an often intuitive but powerful tool for normalizing data before importing the dataset into a presentation application (e.g. mapping, charting, or analyzing.) In this hands-on class, we'll explore how Refine can help with common data cleaning challenges.

Publication
Date

Series Dates: Fall of 2011 to the Spring of 2017

OpenRefine is an Open Source data science tool for data cleaning, transformation, clustering, editing, augmenting, and more. The web interface is easier to use than command-line interfaces or programming tools, however, OpenRefine is extensible and programmable making is a useful, reproducible data tool that can expand with your needs. I often use Regular Expressions inside of OpenRefine’s GREL expressions to find or transform patterns. Additionally, OpenRefine can be used to orchestrate API tools for web scraping and normalizing content on the web.