A Beginner’s Guide to Taming Messy Ecological Data in R
read_csv()pivot_longer()ggplot2All our data cleaning and wrangling is powered by the Tidyverse — a collection of R packages designed for clean, readable data workflows.
Q1: What’s your current familiarity with R?
Q2: How do you plan to use these skills?
Source: USDA National Agricultural Statistics Service (NASS)
esmis.nal.usda.gov/publication/honey-bee-colonies
Tracks the percentage of U.S. honey bee colonies affected by various stressors across 44 states and 6 quarters (Q1 2024 – Q2 2025).
| Variable | Type | Description |
|---|---|---|
Quarter |
character | Survey quarter (e.g. Q1_2024) |
Period |
character | Text label (e.g. Jan-Mar 2024) |
State |
character | U.S. state name — inconsistently cased in raw data |
Varroa_Mites_Pct |
numeric | % of colonies affected by Varroa mites |
Other_Pests_Parasites_Pct |
numeric | % affected by other pests or parasites |
Diseases_Pct |
numeric | % affected by diseases |
Pesticides_Pct |
numeric | % affected by pesticide exposure |
Other_Pct |
numeric | % affected by other causes |
Unknown_Pct |
numeric | % affected by unknown causes |
To follow along, please download the workshop materials:
The folder includes:
honeybee_stressors_messy.csv — the “messified” dataset we just looked atstressors.csv — the original dataset from the USDAworkshop_template.qmd — the starter file we’ll code in togetherworkshop_solution.qmd — the completed file to reference if you miss a step!| Resource | Link |
|---|---|
| R for Data Science | r4ds.hadley.nz |
| ggplot2 Cheatsheet | rstudio.github.io/cheatsheets |
| Tidyverse Docs | tidyverse.org |
Thank you for joining!
Questions? Feel free to email me!
From Field to Figure Workshop