# matrixStats: Optimized Subsetted Matrix Calculations

The matrixStats package provides highly optimized functions for computing common summaries over rows and columns of matrices. In a previous blog post, I showed that, instead of using apply(X, MARGIN = 2, FUN = median), we can speed up calculations dramatically by using colMedians(X). In the most recent release (version 0.50.0), matrixStats has been extended to perform optimized calculations also on a subset of rows and/or columns specified via new arguments rows and cols, e.

# Milestone: 7000 Packages on CRAN

Another 1,000 packages were added to CRAN, which took less than 9 months. Today (August 12, 2015), the Comprehensive R Archive Network (CRAN) package page reports: “Currently, the CRAN package repository features 7002 available packages.” While the previous 1,000 packages took 355 days, going from 6,000 to 7,000 packages took 286 days - which means that now a new CRAN package is born on average every 6.9 hours (or 3.

# Performance: Calling R_CheckUserInterrupt() Every 256 Iteration is Actually Faster than Every 1,000,000 Iteration

If your native code takes more than a few seconds to finish, it is a nice courtesy to the user to check for user interrupts (Ctrl-C) once in a while, say, every 1,000 or 1,000,000 iteration. The C-level API of R provides R_CheckUserInterrupt() for this (see ‘Writing R Extensions’ for more information on this function). Here’s what the code would typically look like: for (int ii = 0; ii < n; ii++) { /* Some computational expensive code */ if (ii % 1000 == 0) R_CheckUserInterrupt() } This uses the modulo operator % and tests when it is zero, which happens every 1,000 iteration.

# To Students: matrixStats for Google Summer of Code

We are pleased to announce our proposal ‘Subsetted and parallel computations in matrixStats’ for Google Summer of Code. The project is aimed for a student with experience in R and C, it runs for three months, and the student gets paid 5500 USD by Google. Students from (almost) all over the world can apply. Application deadline is March 27, 2015. I, Henrik Bengtsson, and Héctor Corrada Bravo will be joint mentors.

# How to: Package Vignettes in Plain LaTeX

Ever wanted to include a plain-LaTeX vignette in your package and have it compiled into a PDF? The R.rsp package provides a four-line solution for this. But, first, what’s R.rsp? R.rsp is an R package that implements a compiler for the RSP markup language. RSP can be used to embed dynamic R code in any text-based source document to be compiled into a final document, e.g. RSP-embedded LaTeX into PDF, RSP-embedded Markdown into HTML, RSP-embedded HTML into HTML and so on.

# Package: matrixStats 0.13.1 - Methods that Apply to Rows and Columns of a Matrix (and Vectors)

A new release 0.13.1 of matrixStats is now on CRAN. The source code is available on GitHub. What does it do? The matrixStats package provides highly optimized functions for computing common summaries over rows and columns of matrices, e.g. rowQuantiles(). There are also functions that operate on vectors, e.g. logSumExp(). Their implementations strive to minimize both memory usage and processing time. They are often remarkably faster compared to good old apply() solutions.

# Milestone: 6000 Packages on CRAN

Another 1,000 packages were added to CRAN and this time in less than 12 months. Today (2014-10-29) on The Comprehensive R Archive Network (CRAN) package page: “Currently, the CRAN package repository features 6000 available packages.” Going from 5,000 to 6,000 packages took 355 days - which means that it on average was only ~8.5 hours between each new packages added. It is actually even more frequent since dropped packages are not accounted for.

# Pitfall: Did You Really Mean to Use matrix(nrow, ncol)?

Are you a good R citizen and preallocates your matrices? If you are allocating a numeric matrix in one of the following two ways, then you are doing it the wrong way! x <- matrix(nrow = 500, ncol = 100) or x <- matrix(NA, nrow = 500, ncol = 100) Why? Because it is counter productive. And why is that? In the above, x becomes a logical matrix, and not a numeric matrix as intended.

# Performance: captureOutput() is Much Faster than capture.output()

The R function capture.output() can be used to “collect” the output of functions such as cat() and print() to strings. For example, > s <- capture.output({ + cat("Hello\nworld!\n") + print(pi) + }) > s [1] "Hello" "world!" "[1] 3.141593" More precisely, it captures all output sent to the standard output and returns a character vector where each element correspond to a line of output. By the way, it does not capture the output sent to the standard error, e.

# Speed Trick: Assigning Large Object NULL is Much Faster than using rm()!

When processing large data sets in R you often also end up creating large temporary objects. In order to keep the memory footprint small, it is always good to remove those temporary objects as soon as possible. When done, removed objects will be deallocated from memory (RAM) the next time the garbage collection runs. Better: Use rm(list = "x") instead of rm(x), if using rm() To remove an object in R, one can use the rm() function (with alias remove()).

# This Day in History (1997-04-01)

Today it’s 16 years ago and 367,496 messages later since Martin Mächler started the R-help (321,119 msgs), R-devel (45,830 msgs) and R-announce (547 msgs) mailing lists [1] - a great benefit to all of us. Special thanks to Martin and also thanks to everyone else contributing to these forums. [1] https://stat.ethz.ch/pipermail/r-help/1997-April/001490.html

# Speed Trick: unlist(..., use.names=FALSE) is Heaps Faster!

Sometimes a minor change to your R code can make a big difference in processing time. Here is an example showing that if you’re don’t care about the names attribute when unlist():ing a list, specifying argument use.names = FALSE can speed up the processing lots! > x <- split(sample(1000, size = 1e6, rep = TRUE), rep(1:1e5, times = 10)) > t1 <- system.time(y1 <- unlist(x)) > t2 <- system.time(y2 <- unlist(x, use.

# Force R Help HTML Server to Always Use the Same URL Port

The below code shows how to configure the help.ports option in R such that the built-in R help server always uses the same URL port. Just add it to the .Rprofile file in your home directory (iff missing, create it). For more details, see help("Startup"). # Force the URL of the help to http://127.0.0.1:21510 options(help.ports = 21510) A slighter fancier version is to use a environment variable to set the port(s):

# Set Package Repositories at Startup

The below code shows how to configure the repos option in R such that install.packages() etc. will locate the packages without having to explicitly specify the repository. Just add it to the .Rprofile file in your home directory (iff missing, create it). For more details, see help("Startup"). local({ repos <- getOption("repos") # http://cran.r-project.org/ # For a list of CRAN mirrors, see getCRANmirrors(). repos["CRAN"] <- "http://cran.stat.ucla.edu" # http://www.stats.ox.ac.uk/pub/RWin/ReadMe if (.Platform\$OS.type == "windows") { repos["CRANextra"] <- "http://www.
• page 2 of 2

#### Henrik Bengtsson

MSc CS | PhD Math Stat | Associate Professor | R

Associate Professor