Performance: Avoid Coercing Indices To Doubles

April 2, 2018 in R

x[idxs + 1] or x[idxs + 1L]? That is the question. Assume that we have a vector

of

random values, e.g. > n <- 100000 > x <- rnorm(n) and that we wish to calculate the

first-order differences

where

. In R, we can calculate this using the following vectorized form: > idxs <- seq_len(n - 1) > y <- x[idxs + 1] - x[idxs] We can certainly do better if we turn to native code, but is there a more efficient way to implement this using plain R code?

matrixStats: Optimized Subsetted Matrix Calculations

December 16, 2015 in R

The matrixStats package provides highly optimized functions for computing common summaries over rows and columns of matrices. In a previous blog post, I showed that, instead of using apply(X, MARGIN = 2, FUN = median), we can speed up calculations dramatically by using colMedians(X). In the most recent release (version 0.50.0), matrixStats has been extended to perform optimized calculations also on a subset of rows and/or columns specified via new arguments rows and cols, e.

Pitfall: Did You Really Mean to Use matrix(nrow, ncol)?

June 17, 2014 in R

Are you a good R citizen and preallocates your matrices? If you are allocating a numeric matrix in one of the following two ways, then you are doing it the wrong way! x <- matrix(nrow = 500, ncol = 100) or x <- matrix(NA, nrow = 500, ncol = 100) Why? Because it is counter productive. And why is that? In the above, x becomes a logical matrix, and not a numeric matrix as intended.

Performance: Avoid Coercing Indices To Doubles

matrixStats: Optimized Subsetted Matrix Calculations

Pitfall: Did You Really Mean to Use matrix(nrow, ncol)?

Henrik Bengtsson