Housekeeping
Tidy Data
Next Week
Week 7: Advanced Data Wrangling, Part 2 (More Functions for Wrangling Data and Functions)
Week 8: Advanced Data Wrangling, Part 3 (Data Merging and Exporting Data)
Week 9: Advanced Data Viz, Part 1 (Highlighting and Decluttering)
Week 10: Catch-Up Week
Week 11: Advanced Data Viz, Part 2 (Explaining and Making Your Viz Sparkle)
Week 12: Advanced Quarto
Week 13: Wrap Up
# A tibble: 10 × 13
country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` `1997`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghan… 28.8 30.3 32.0 34.0 36.1 38.4 39.9 40.8 41.7 41.8
2 Albania 55.2 59.3 64.8 66.2 67.7 68.9 70.4 72 71.6 73.0
3 Algeria 43.1 45.7 48.3 51.4 54.5 58.0 61.4 65.8 67.7 69.2
4 Angola 30.0 32.0 34 36.0 37.9 39.5 39.9 39.9 40.6 41.0
5 Argent… 62.5 64.4 65.1 65.6 67.1 68.5 69.9 70.8 71.9 73.3
6 Austra… 69.1 70.3 70.9 71.1 71.9 73.5 74.7 76.3 77.6 78.8
7 Austria 66.8 67.5 69.5 70.1 70.6 72.2 73.2 74.9 76.0 77.5
8 Bahrain 50.9 53.8 56.9 59.9 63.3 65.6 69.1 70.8 72.6 73.9
9 Bangla… 37.5 39.3 41.2 43.5 45.3 46.9 50.0 52.8 56.0 59.4
10 Belgium 68 69.2 70.2 70.9 71.4 72.8 73.9 75.4 76.5 77.5
# ℹ 2 more variables: `2002` <dbl>, `2007` <dbl>
# A tibble: 120 × 3
country year life_expectancy
<chr> <chr> <dbl>
1 Afghanistan 1952 28.8
2 Afghanistan 1957 30.3
3 Afghanistan 1962 32.0
4 Afghanistan 1967 34.0
5 Afghanistan 1972 36.1
6 Afghanistan 1977 38.4
7 Afghanistan 1982 39.9
8 Afghanistan 1987 40.8
9 Afghanistan 1992 41.7
10 Afghanistan 1997 41.8
# ℹ 110 more rows
# A tibble: 3 × 2
time_period address
<chr> <chr>
1 Childhood 537 Westdale Avenue, Swarthmore, PA, 19801
2 Childhood 690 Omar Circle, Yellow Springs, OH, 45387
3 Grad School 3809 Meade Avenue, San Diego, CA, 92116
# A tibble: 3 × 5
time_period street city state zip_code
<chr> <chr> <chr> <chr> <chr>
1 Childhood 537 Westdale Avenue Swarthmore PA 19801
2 Childhood 690 Omar Circle Yellow Springs OH 45387
3 Grad School 3809 Meade Avenue San Diego CA 92116
# A tibble: 4 × 2
name favorite_sport
<chr> <chr>
1 David Soccer, Basketball
2 Elias Baseball, Soccer, Skiing
3 Leila Aerial Dance, Roller Skating
4 Rachel Soccer, Baseball
# A tibble: 9 × 2
name favorite_sport
<chr> <chr>
1 David Soccer
2 David Basketball
3 Elias Baseball
4 Elias Soccer
5 Elias Skiing
6 Leila Aerial Dance
7 Leila Roller Skating
8 Rachel Soccer
9 Rachel Baseball
When working with “select all that apply” variables in the past in datasets where one row = one individual, I’ve typically dealt with this by converting each response option to its own column where a 1 (“yes”) is present if that response option was selected. This has worked well for my purposes, but I understand now that it violates Tidy Data Rule 1 because a single variable is spread across multiple columns. Am I understanding correctly that while the approach I’ve used in the paste is not inherently better or worse than tidy format, the advantage to making it tidy is that it will be easier to analyze using tidyverse? Is this still true if the unit of analysis I’m interested in is the individual and not the activity (for example)?
Lessons on additional data wrangling functions and learn to make your own functions
No project assignment (but there will be one in week 8)