Reading 3
- Hadley Wickham, 2014, Tidy Data download ↗
- Broman & Woo, 2018, Data organization in spreadsheets download ↗
This reading focuses on the process of cleaning data, also known as data “munging” or data “wrangling”. Primarily we focus on spreadsheet type of data, because this is likely to be some of the first data you encounter in future classes and it is ubiquitous in the real world scenarios. There are many other instances of tidy data when dealing with images, text, video, so on and so forth, but the skills and theory behind tidy data (as presented in these readings) are foundational and absolutely required to begin wrangling other, more complex, data types. An example in class of wrangling more complex data will be seen in the final weeks during a talk on deep learning.
Additional Resources
Here is an example ↗ from Garrett Grolemund on how you can use a programming language such as R to work with and quickly turn a dataset into tidy data. This walk-through provides some insight into how data is wrangled, as well as, some of the benefits afforded to the analyst when working with data that is tidy.