Methods of Corpus Linguistics

Introduction to the course

Mariana Montes

Outline

  • Topics
  • Software to install
  • Format of the classes
  • Exam and submission procedure

Topics

Corpus linguistics

  • What is a corpus?

  • What is corpus linguistics?

  • Collocations and keywords analysis

  • Comparison of variants and of varieties

Statistical techniques

  • Association measures

Comparison of variants

  • Logistic regression

  • Conditional inference trees

Comparison of varieties

  • Correspondence Analysis

  • Factor Analysis

Software to install

Programs

  • R – the programming language

  • RStudio – IDE for R

  • Quarto – publishing system

  • Tinytex (I will show you how)

  • (Optionally) Git

Costs?

Everything open source and free!

R packages from CRAN

After installing R, you should install the following R packages.

  • tidyverse (a group of data manipulation packages)
  • ca (for Correspondence Analysis)
  • here (for file paths)
  • xml2 (to work with XML files)
  • easystats (for reporting stats)1
  • ggeffects (to plot regression effects)

R development packages

  • mclm (“masterclm/mclm”)
  • mclmtutorials (“masterclm/mclmtutorials”)
  • learnr (“rstudio/learnr”)
  • gradethis (“rstudio/gradethis”)
  • (Optionally) glossr (“montesmariana/glossr”) if you want to write interlinear glosses

How to install R packages

  • From CRAN: with install.packages("package")
install.packages(c("tidyverse", "ca", "here", "xml2", "easystats", "ggeffects"))
  • Development packages: with remotes::install_github("user/repo")
library(remotes)
install_github("masterclm/mclm")
install_github("masterclm/mclmtutorials")
install_github("rstudio/learnr")
install_github("rstudio/gradethis")

Format of the classes

Technical setup

  • Basics of R, RStudio, R projects

  • Basics of Quarto (to write your paper!)

  • Basics of Git (option to submit the paper)

Theoretical classes

Lectures on the different topics, with slides

  • Corpus linguistics

  • Association measures

  • Logistic regression and conditional trees

  • Correspondence Analysis

  • Factor Analysis

Case studies

Going through analyses, with notebooks and code you can copy-paste

  • Collocation and keyword analysis

  • Alternation studies: analysis of variants

  • Lectometry: analysis of varieties

  • Register analysis: analysis of varieties

Exam and submission procedure

Exam format

  • Paper with analysis

    • Choose a corpus and at least one technique

    • Define a research question that the technique can address

    • Small literature review

    • Perform analysis

  • Full project to be submitted: R code, paper written in Quarto, bibliography

Submission procedure

Option 1: Toledo

  • Turn your project folder into a zip file (excluding corpus) and submit via Toledo Assignments

Option 2: Git & GitHub

  • Follow the instructions to set up the Git repository and push your project

  • Optional intermediate assignments to get used to the tasks, Git and GitHub

Git submission - setup

  • Create an R project with a git repository

  • Stage and commit your changes, either in the Git tab or by typing git add . followed by git commit -m "some message" in the Git Bash Terminal

  • Add a remote to the repository by typing git remote add origin <url> in the Git Bash Terminal

  • Set the main branch to “main” with git branch -M main if it isn’t already.

  • Upload your changes with git push -u origin main

Git submission - later

  • Stage your changes with git add . (. to stage everything)

  • Take a snapshot of your work with git commit -m "some message"

  • When you want to submit, run git push

Basics of R