Skip to contents

We walk through the basics of litr with a simple example.

Suppose we want to create a package called rleastsquares that does least squares. To do so, we will create a .Rmd file called create-rleastsquares.Rmd from a litr template:

litr::draft("rleastsquares")

This generates create-rleastsquares.Rmd, and opening it you’ll notice some special lines in the yaml:

knit: litr::render
output: litr::litr_html_document
params:
  package_name: "rleastsquares" # <-- change this to your package name
  package_parent_dir: "." # <-- relative to this file's location

This is what will lead to an R package being created when you press “Knit”.

Package setup

Note: Every R package needs a DESCRIPTION file. We’ll start by filling in the relevant information.

usethis::create_package(
  path = ".",
  fields = list(
    Package = param$package_name,
    Version = "0.0.0.9000",
    Title = "Fit Least Squares",
    Description = "A package that fits least squares.",
    `Authors@R` = person(
      given = "First",
      family = "Last",
      email = "you@gmail.com",
      role = c("aut", "cre")
      )
  )
)
usethis::use_mit_license(copyright_holder = "F. Last")

Writing a function for the package

Since this is an R markdown file, we can use latex to explain our code, provide derivations, etc.

Suppose we have a response vector \(y\in\mathbb R^n\) and a data matrix \(X\in\mathbb R^{n\times p}\).

We want to find the solution to the problem

\[ \min_{\beta\in\mathbb R^p}\|y-X\beta\|^2 \]

We’ll assume that \(X\) is full rank with \(n > p\). We know that the solution is given by

\[ \hat\beta=(X^TX)^{-1}X^Ty. \]

We’ll write a function that does exactly that!

#' Get the OLS solution
#' 
#' @param y our response, which is an n-vector
#' @param X our data matrix, which is n by p
#' @export
do_least_squares <- function(y, X) {
  if(nrow(X) != length(y))
    stop("The number of rows of X must match the length of y.")
  as.numeric(solve(crossprod(X), crossprod(X, y)))
}

Note: Code chunks whose first line starts with #' are added to the package. If you’re not familiar with roxygen2, see here for more.

Now that we’ve defined do_least_squares(), let’s try it out!

set.seed(123)
n <- 100
p <- 1
x <- cbind(1, matrix(rnorm(n*p), n, p))
beta_star <- c(2, 0.5)
sigma <- 0.1
y <- x %*% beta_star + sigma * rnorm(n)

Note: This code chunk does not start with #', so it is not added to the package.

betahat <- do_least_squares(y, x)
plot(x[, 2], y)
abline(betahat[1], betahat[2], col = 2, lwd=2)

Let’s see how this compares to lm’s answer.

fit_lm <- lm(y ~ x[, 2])
fit_lm$coefficients
## (Intercept)      x[, 2] 
##   1.9897197   0.4947528

Compare that to…

betahat
## [1] 1.9897197 0.4947528

Ok, do_least_squares() appears to be working. Let’s define a formal unit test based on the example above.

testthat::test_that("do_least_squares() works", {
  set.seed(123)
  n <- 100
  p <- 1
  x <- cbind(1, matrix(rnorm(n*p), n, p))
  beta_star <- c(2, 0.5)
  sigma <- 0.1
  y <- x %*% beta_star + sigma * rnorm(n)
  fit_lm <- lm(y ~ x[, 2])
  # do lm and our function give the same coefficient vector?
  testthat::expect_equal(do_least_squares(y, x),
                         as.numeric(fit_lm$coefficients))
  # do we get the desired error when there is a length mismatch?
  testthat::expect_error(do_least_squares(y[-1], x), "must match")
})
## Test passed

Note: Code chunks that have one or more lines starting with test_that( or testthat::test_that( are added to the package as tests.

Some fancier features

Finer control over where in the package your code is sent

As noted above, litr detects whether to send a code chunk to the package based on whether it starts with #' or has test_that in it. However, sometimes you’ll want finer control. In this case you can override the behavior by explicitly specifying the target location. To do so, use a code chunk option of the form send_to="R/myfile.R". This will add your code to that particular file (either creating it if need be or else appending it). There are two primary use cases for this feature: (a) when you don’t want to use roxygen2 to document a function and (b) when you want several functions to appear together in the same .R file.

Using a function from a different package

Imagine we wanted to actually use a function from another package in our own. For example, perhaps we want to use lsfit() from the stats package:

#' Get the OLS solution using lsfit()
#' 
#' @param y our response, which is an n-vector
#' @param X our data matrix, which is n by p
#' @export
do_least_squares_with_lsfit <- function(y, X) {
  fit <- stats::lsfit(x = X, y = y, intercept = FALSE)
  return(as.numeric(fit$coefficients))
}

And then we also update the DESCRIPTION file with this package dependence:

usethis::use_package("stats")

To use a function from another package, simply use the pkg:: prefix when calling it and then add usethis::use_package("pkg") to include the package dependency.

Including a dataset in your package

There’s a template for that.

Using Rcpp in your package

There’s a template for that.

Including a README, vignettes, and pkgdown site

There’s a template for that.

Defining your package with bookdown

If you’re writing a large package, it may be convenient to define it across multiple .Rmd files. You can use bookdown for this, which leads to a nice looking online book with multiple chapters. There’s a template for that.

Documenting the package

At the end of a litr document, it is important to call litr::document(), which turns the royxgen2 into traditional documentation files in our R package.

litr::document() # <-- use instead of devtools::document()