matrixStats: Consistent Support for Name Attributes via GSoC Project

August 23, 2021 in R

Author: Angelina Panagopoulou, GSoC student developer, undergraduate in the Department of Informatics & Telecommunications (DIT), University of Athens, Greece We are glad to announce recent CRAN releases of matrixStats with support for handling and returning name attributes. This feature is added to make matrixStats functions handle names in the same manner as the corresponding base R functions. In particular, the behavior of matrixStats functions is now the same as the apply() function in R, resolving previous lack of, or inconsistent, handling of row and column names.

Performance: Avoid Coercing Indices To Doubles

April 2, 2018 in R

x[idxs + 1] or x[idxs + 1L]? That is the question. Assume that we have a vector $x$ of $n = 100,000$ random values, e.g. > n <- 100000 > x <- rnorm(n) and that we wish to calculate the $n-1$ first-order differences $y=(y_1, y_2, …, y_{n-1})$ where $y_i=x_{i+1} - x_i$. In R, we can calculate this using the following vectorized form: > idxs <- seq_len(n - 1) > y <- x[idxs + 1] - x[idxs] We can certainly do better if we turn to native code, but is there a more efficient way to implement this using plain R code?

matrixStats: Optimized Subsetted Matrix Calculations

December 16, 2015 in R

The matrixStats package provides highly optimized functions for computing common summaries over rows and columns of matrices. In a previous blog post, I showed that, instead of using apply(X, MARGIN = 2, FUN = median), we can speed up calculations dramatically by using colMedians(X). In the most recent release (version 0.50.0), matrixStats has been extended to perform optimized calculations also on a subset of rows and/or columns specified via new arguments rows and cols, e.

Performance: Calling R_CheckUserInterrupt() Every 256 Iteration is Actually Faster than Every 1,000,000 Iteration

June 5, 2015 in R

If your native code takes more than a few seconds to finish, it is a nice courtesy to the user to check for user interrupts (Ctrl-C) once in a while, say, every 1,000 or 1,000,000 iteration. The C-level API of R provides R_CheckUserInterrupt() for this (see ‘Writing R Extensions’ for more information on this function). Here’s what the code would typically look like: for (int ii = 0; ii < n; ii++) { /* Some computational expensive code */ if (ii % 1000 == 0) R_CheckUserInterrupt() } This uses the modulo operator % and tests when it is zero, which happens every 1,000 iteration.

To Students: matrixStats for Google Summer of Code

March 12, 2015 in R

We are pleased to announce our proposal ‘Subsetted and parallel computations in matrixStats’ for Google Summer of Code. The project is aimed for a student with experience in R and C, it runs for three months, and the student gets paid 5500 USD by Google. Students from (almost) all over the world can apply. Application deadline is March 27, 2015. I, Henrik Bengtsson, and Héctor Corrada Bravo will be joint mentors.

Package: matrixStats 0.13.1 - Methods that Apply to Rows and Columns of a Matrix (and Vectors)

January 25, 2015 in R

A new release 0.13.1 of matrixStats is now on CRAN. The source code is available on GitHub. What does it do? The matrixStats package provides highly optimized functions for computing common summaries over rows and columns of matrices, e.g. rowQuantiles(). There are also functions that operate on vectors, e.g. logSumExp(). Their implementations strive to minimize both memory usage and processing time. They are often remarkably faster compared to good old apply() solutions.

Pitfall: Did You Really Mean to Use matrix(nrow, ncol)?

June 17, 2014 in R

Are you a good R citizen and preallocates your matrices? If you are allocating a numeric matrix in one of the following two ways, then you are doing it the wrong way! x <- matrix(nrow = 500, ncol = 100) or x <- matrix(NA, nrow = 500, ncol = 100) Why? Because it is counter productive. And why is that? In the above, x becomes a logical matrix, and not a numeric matrix as intended.

Performance: captureOutput() is Much Faster than capture.output()

May 26, 2014 in R

The R function capture.output() can be used to “collect” the output of functions such as cat() and print() to strings. For example, > s <- capture.output({ + cat("Hello\nworld!\n") + print(pi) + }) > s [1] "Hello" "world!" "[1] 3.141593" More precisely, it captures all output sent to the standard output and returns a character vector where each element correspond to a line of output. By the way, it does not capture the output sent to the standard error, e.

Speed Trick: Assigning Large Object NULL is Much Faster than using rm()!

May 25, 2013 in R

When processing large data sets in R you often also end up creating large temporary objects. In order to keep the memory footprint small, it is always good to remove those temporary objects as soon as possible. When done, removed objects will be deallocated from memory (RAM) the next time the garbage collection runs. Better: Use rm(list = "x") instead of rm(x), if using rm() To remove an object in R, one can use the rm() function (with alias remove()).

Speed Trick: unlist(..., use.names=FALSE) is Heaps Faster!

January 7, 2013 in R

Sometimes a minor change to your R code can make a big difference in processing time. Here is an example showing that if you’re don’t care about the names attribute when unlist():ing a list, specifying argument use.names = FALSE can speed up the processing lots! > x <- split(sample(1000, size = 1e6, rep = TRUE), rep(1:1e5, times = 10)) > t1 <- system.time(y1 <- unlist(x)) > t2 <- system.time(y2 <- unlist(x, use.