First R Project

Page content

This project was done the summer of 2022 and later added to the site.

clean up workspace environment

rm(list = ls())

all packages used for the assignment

library(tidyverse)

Chapter 12 Exercises

library(dcData)
data(BabyNames)

Chapter 12

summary(BabyNames)
namesexcountyear
Length:1792091Length:1792091Min. : 5.0Min. :1880
Class :characterClass :character1st Qu.: 7.01st Qu.:1948
Mode :characterMode :characterMedian : 12.0Median :1981
Mean : 186.1Mean :1972
3rd Qu.: 32.03rd Qu.:2000
Max. :99674.0Max. :2013
1-10 of 10 rows

Problem 12.1

BothSexes <-
  BabyNames %>%
  pivot_wider(names_from = sex, values_from = count) %>%
  filter(M > 5 | F > 5)
BothSexes$math <- abs(log( BothSexes[[3]] / BothSexes[[4]] ))

Balance <- 
  BothSexes %>%
  arrange(math)
  
head(Balance, n=10)
nameyearFM
Erie188066
Sammie188066
Theo188188
Bird188166
Augustine188288
Tommie188288
Orrie188377
Verne188488
Jewel188466
Tracy18851212
Balance2 <-
  Balance %>%
  filter(F > 100 | M > 100)

head(Balance2, n=10)
nameyearFM
Lavern1953103103
Marion1977229229
Dusty1979194194
Justice2003665665
Baby2003245245
Tegan2006145145
Ryley2007186186
Rian2007128128
Jaylin2008524524
Leslie194621392139
1-10 of 10 rows

Problem 12.2

  1. Version One
  2. Version two is wider,
BothSexes <-
  BabyNames %>%
  pivot_wider(names_from = sex, values_from = count) %>%
  filter(M > 5 | F > 5)
  1. Version 3 is wider
BothSexes <-
  BabyNames %>%
  pivot_wider(names_from = year, values_from = count) %>%
  filter(M > 5 | F > 5)

I would say that it is easier to start from version 2 as both the year and the sexes are already columns. I would go to version 2 as it is better suited to find the ratio of male to female as both are columns

Problem 12.3

A is wider than C . B is wider than C. And A is wider than B. Frame B would be the best to look at 2000 to 2001. I would subtract the count of both years Frame B again would be the best. I would just sum the values for each year.

Problem 12.4

You can’t easily compare before and after of a subject. I wouldn’t change the data table as to change it would make it too narrow.

Chapter 13.2

Problem 2

Calculating Top 100 Counts

Rankings <-
  BabyNames %>%
  group_by(year) %>%
  top_n(100, count) %>%
  summarise(total = sum(count))
Rankings$ranking <- "Top_100"

Calculating Bottom Counts

Rankings2 <-
  BabyNames %>%
  group_by(year) %>%
  summarise(total = sum(count))
Rankings2$total <- Rankings2$total - Rankings$total
Rankings2$ranking <- "Below"

Merging the Two Counts and Reordering

Rankings <- rbind(Rankings, Rankings2)
Rankings <- 
  Rankings %>%
  arrange(year)

Conclusion

This class used the tidy packages and although I am not a fan of it, it was required per the grading rubric.