Tibble manipulation
Third assignment
The goal of this assignment is to practice working with tables using tidyverse. Tidyverse is a collection of R packages for data wrangling and visualization (among other things). A great resource to learn how to use it is R for data Science.
For this assignment I ask that you create both a script to manipulate the table and then a Quarto file to print and cross-reference the table (combining what you learned in the second assignment).
1 Instructions
Create a branch for the assignment, e.g.
tibble. You will work here and only move your changes tomainif and when you want to submit.-
Create an R script where you will insert the necessary code to do the following:
Load the appropriate libraries (tidyverse and mclm).
Read the brown corpus.
Create an association scores table of the collocations of a word of your choosing.
-
Save the file.
OPTION A: Manipulate the table as done in class: turn it into a tibble with
as_tibble(), modify columns, select some columns to show, filter the rows, rearrange the order. Then write it to a file withwrite_tsv().OPTION B: Save the association scores object to a file with
write_assoc().
-
Create a Quarto report where you will only load the {mclm} and {kableExtra} packages.
-
Read the association scores object:1
OPTION A: If you wrote it with
write_tsv(), useread_tsv().OPTION B: If you wrote it with
write_assoc(), you can either useread_tsv()orread_assoc()followed byas_tibble().
If you hadn’t manipulated the table, this is the time to do so.
Print the table with {kableExtra}, editing it as well if you so wish. Don’t forget to add a caption!
Include some text cross-referencing the table and maybe commenting on the result.
-
If you try to use Latex in a table, for example r"($\chi^2$)" to obtain \(\chi^2\), you might notice that it prints well in the interactive session but not in the rendered HTML document. This is (I think) a bug somewhere in the rendering of tables, but there is a workaround:
Somewhere in your Quarto document, paste the following text (as normal text, not as R code):
<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [["$","$"]]}})</script>
<script async src="https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>This will activate parsing of Latex inside the HTML tables.
In PDF output instead you just have to add the option escape = FALSE to your kbl() call. Notice, however, that you should not have unescaped Latex characters in other elements of the table! (No underscores, for example).
2 Tips
2.1 Manipulating the table
Use
mutate()to change the values of a column.-
Use
filter()to subset the rows based on values in the columns. You can also use theslice_family of functions to subset with other criteria:slice_head(n = 3)to select the first three rows;slice_tail(n = 5)to select the last five rows.slice_sample(n = 10)to select ten random rows,slice_sample(prop = 0.5)to select a random 50% of the rows.
Use
select()to subset the columns. You can also userename()to rename columns without removing the rest.Use
arrange()to sort the tibble based on the values in a column.
An example is the code below, which starts with an assoc object (product of assoc_scores()) and ends with a tibble with a selection of rows and columns.
In line 2 we use
as_tibble()to turn theassocobject into atibbleto manipulate with {tidyverse} functions.In line 3 we use
filter()to subset the rows that have PMI larger than 1, G_signed larger than or equal to 5, and a type ending with “nn”, i.e. nouns.In line 4 we use
select()to subset the columnstype,a,PMIandORas well as those ending in “signed” and at the same time renameato “freq”.In lines 5 through 8 we use
mutate()to create a new columnlog_ORthat contains the logarithm of theORcolumn, and we modify thetypecolumn to remove the “/nn” ending from its elements.In line 1 we assign the whole operation, initially applied to
hot_assoc, to a variable calledsubsetted.
Each operation of the pipe acts on the output of the operation before it.
2.2 Association scores
Use
assoc_scores()aftersurf_cooc()to create an association scores object.Use
write_assoc(scores_object, filename)to save the object from your R script.Use
scores_object <- read_assoc(filename)to read the object in the Quarto file.
2.3 KableExtra
Check out the documentation for HTML or PDF output to learn about {kableExtra} features.
3 Git workflow
git status # check that you're on main, nothing to commit...
git branch tibble
git checkout tibble
# work on your .qmd file, render
git status # check everything is fine
git add .
git commit -m "practice with tibbles"
# you may also make several commits as you add a figure, a table...
git checkout main
git status # check everything is fine. New files should not be there
git merge tibble
# Now the .qmd file, the rendered file and the help files should be present
git push
# and send me a message!Footnotes
If you use
read_tsv(), theshow_col_types = FALSEargument will hide the printed output with the description of the column types, e.g.my_data <- read_tsv("filepath", show_col_types = FALSE).↩︎