Tibble manipulation
Third assignment
The goal of this assignment is to practice working with tables using tidyverse. Tidyverse is a collection of R packages for data wrangling and visualization (among other things). A great resource to learn how to use it is R for data Science.
For this assignment I ask that you create both a script to manipulate the table and then a Quarto file to print and cross-reference the table (combining what you learned in the second assignment).
1 Instructions
Create a branch for the assignment, e.g.
tibble
. You will work here and only move your changes tomain
if and when you want to submit.-
Create an R script where you will insert the necessary code to do the following:
Load the appropriate libraries (tidyverse and mclm).
Read the brown corpus.
Create an association scores table of the collocations of a word of your choosing.
-
Save the file.
OPTION A: Manipulate the table as done in class: turn it into a tibble with
as_tibble()
, modify columns, select some columns to show, filter the rows, rearrange the order. Then write it to a file withwrite_tsv()
.OPTION B: Save the association scores object to a file with
write_assoc()
.
-
Create a Quarto report where you will only load the {mclm} and {kableExtra} packages.
-
Read the association scores object:1
OPTION A: If you wrote it with
write_tsv()
, useread_tsv()
.OPTION B: If you wrote it with
write_assoc()
, you can either useread_tsv()
orread_assoc()
followed byas_tibble()
.
If you hadn’t manipulated the table, this is the time to do so.
Print the table with {kableExtra}, editing it as well if you so wish. Don’t forget to add a caption!
Include some text cross-referencing the table and maybe commenting on the result.
-
2 Tips
2.1 Manipulating the table
Use
mutate()
to change the values of a column.-
Use
filter()
to subset the rows based on values in the columns. You can also use theslice_
family of functions to subset with other criteria:slice_head(n = 3)
to select the first three rows;slice_tail(n = 5)
to select the last five rows.slice_sample(n = 10)
to select ten random rows,slice_sample(prop = 0.5)
to select a random 50% of the rows.
Use
select()
to subset the columns. You can also userename()
to rename columns without removing the rest.Use
arrange()
to sort the tibble based on the values in a column.
2.2 Association scores
Use
assoc_scores()
aftersurf_cooc()
to create an association scores object.Use
write_assoc(scores_object, filename)
to save the object from your R script.Use
scores_object <- read_assoc(filename)
to read the object in the Quarto file.
2.3 KableExtra
Check out the documentation for HTML or PDF output to learn about {kableExtra} features.
3 Git workflow
git status # check that you're on main, nothing to commit...
git branch tibble
git checkout tibble
# work on your .qmd file, render
git status # check everything is fine
git add .
git commit -m "practice with tibbles"
# you may also make several commits as you add a figure, a table...
git checkout main
git status # check everything is fine. New files should not be there
git merge tibble
# Now the .qmd file, the rendered file and the help files should be present
git push
# and send me a message!
Footnotes
If you use
read_tsv()
, theshow_col_types = FALSE
argument will hide the printed output with the description of the column types, e.g.my_data <- read_tsv("filepath", show_col_types = FALSE)
.↩︎