First R Project
This project was done the summer of 2022 and later added to the site.
clean up workspace environment Link to heading
rm(list = ls())
all packages used for the assignment Link to heading
library(tidyverse)
Chapter 12 Exercises Link to heading
library(dcData)
data(BabyNames)
Chapter 12 Link to heading
summary(BabyNames)
name | sex | count | year |
---|---|---|---|
Length:1792091 | Length:1792091 | Min. : 5.0 | Min. :1880 |
Class :character | Class :character | 1st Qu.: 7.0 | 1st Qu.:1948 |
Mode :character | Mode :character | Median : 12.0 | Median :1981 |
Mean : 186.1 | Mean :1972 | ||
3rd Qu.: 32.0 | 3rd Qu.:2000 | ||
Max. :99674.0 | Max. :2013 | ||
1-10 of 10 rows |
Problem 12.1 Link to heading
BothSexes <-
BabyNames %>%
pivot_wider(names_from = sex, values_from = count) %>%
filter(M > 5 | F > 5)
BothSexes$math <- abs(log( BothSexes[[3]] / BothSexes[[4]] ))
Balance <-
BothSexes %>%
arrange(math)
head(Balance, n=10)
name | year | F | M |
---|---|---|---|
Erie | 1880 | 6 | 6 |
Sammie | 1880 | 6 | 6 |
Theo | 1881 | 8 | 8 |
Bird | 1881 | 6 | 6 |
Augustine | 1882 | 8 | 8 |
Tommie | 1882 | 8 | 8 |
Orrie | 1883 | 7 | 7 |
Verne | 1884 | 8 | 8 |
Jewel | 1884 | 6 | 6 |
Tracy | 1885 | 12 | 12 |
Balance2 <-
Balance %>%
filter(F > 100 | M > 100)
head(Balance2, n=10)
name | year | F | M |
---|---|---|---|
Lavern | 1953 | 103 | 103 |
Marion | 1977 | 229 | 229 |
Dusty | 1979 | 194 | 194 |
Justice | 2003 | 665 | 665 |
Baby | 2003 | 245 | 245 |
Tegan | 2006 | 145 | 145 |
Ryley | 2007 | 186 | 186 |
Rian | 2007 | 128 | 128 |
Jaylin | 2008 | 524 | 524 |
Leslie | 1946 | 2139 | 2139 |
1-10 of 10 rows |
Problem 12.2 Link to heading
- Version One
- Version two is wider,
BothSexes <-
BabyNames %>%
pivot_wider(names_from = sex, values_from = count) %>%
filter(M > 5 | F > 5)
- Version 3 is wider
BothSexes <-
BabyNames %>%
pivot_wider(names_from = year, values_from = count) %>%
filter(M > 5 | F > 5)
I would say that it is easier to start from version 2 as both the year and the sexes are already columns. I would go to version 2 as it is better suited to find the ratio of male to female as both are columns
Problem 12.3 Link to heading
A is wider than C . B is wider than C. And A is wider than B. Frame B would be the best to look at 2000 to 2001. I would subtract the count of both years Frame B again would be the best. I would just sum the values for each year.
Problem 12.4 Link to heading
You can’t easily compare before and after of a subject. I wouldn’t change the data table as to change it would make it too narrow.
Chapter 13.2 Link to heading
Problem 2 Link to heading
Calculating Top 100 Counts Link to heading
Rankings <-
BabyNames %>%
group_by(year) %>%
top_n(100, count) %>%
summarise(total = sum(count))
Rankings$ranking <- "Top_100"
Calculating Bottom Counts Link to heading
Rankings2 <-
BabyNames %>%
group_by(year) %>%
summarise(total = sum(count))
Rankings2$total <- Rankings2$total - Rankings$total
Rankings2$ranking <- "Below"
Merging the Two Counts and Reordering Link to heading
Rankings <- rbind(Rankings, Rankings2)
Rankings <-
Rankings %>%
arrange(year)
Conclusion Link to heading
This class used the tidy packages and although I am not a fan of it, it was required per the grading rubric.