A piece of an ice core - more pleasing to look at than yet another illustration of a CPU core (Image credit: Ludovic Brucker, NASA’s Goddard Space Flight Center)
parallelly 1.25.0 is on CRAN. It comes with two major improvements:
You can now use availableCores(omit = n) to ask for all but n CPU cores
makeClusterPSOCK() can finally use the built-in SSH client on MS Windows 10 to set up remote workers
This is a guest post by Chris Paciorek, Department of Statistics, University of California at Berkeley.
In this post, I’ll demonstrate that you can easily use the future package in R on a cluster of machines running in the cloud, specifically on a Kubernetes cluster.
This allows you to easily doing parallel computing in R in the cloud. One advantage of doing this in the cloud is the ability to easily scale the number and type of (virtual) machines across which you run your parallel computation.
This is an announcement that future.BatchJobs - A Future API for Parallel and Distributed Processing using BatchJobs has been archived on CRAN. The package has been deprecated for years with a recommendation of using future.batchtools instead. The latter has been on CRAN since June 2017 and builds upon the batchtools package, which itself supersedes the BatchJobs package.
To wrap up the three-and-a-half year long life of future.
Luke Zappia's summary of the talk I presented Future: A Simple, Extendable, Generic Framework for Parallel Processing in R at the European Bioconductor Meeting 2020, which took place online during the week of December 14-18, 2020.
You’ll find my slides (39 slides + Q&A slides; 35 minutes) below:
Title & Abstract HTML (Google Slides; requires online access) PDF (flat slides) Video (YouTube) I want to thank the organizers for inviting me to this Bioconductor conference.
I presented Future: Simple, Friendly Parallel Processing for R (67 minutes; 59 slides + Q&A slides) at New York Open Statistical Programming Meetup, on November 9, 2020:
HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (presentation starts at 0h10m30s, Q&A starts at 1h17m40s) I like to thanks everyone who attented and everyone who asked lots of brilliant questions during the Q&A. I’d also want to express my gratitude to Amada, Jared, and Noam for the invitation and making this event possible.
future 1.20.1 is on CRAN. It adds some new features, deprecates old and unwanted behaviors, adds a couple of vignettes, and fixes a few bugs.
Interactive debugging First out among the new features, and a long-running feature request, is the addition of argument split to plan(), which allows us to split, or “tee”, any output produced by futures.
The default is split = FALSE for which standard output and conditions are captured by the future and only relayed after the future has been resolved, i.
parallelly adverb
par·al·lel·ly | \ ˈpa-rə-le(l)li \ Definition: in a parallel manner future noun
fu·ture | \ ˈfyü-chər \ Definition: existing or occurring at a later time I’ve cleaned up around the house - with the recent release of future 1.20.1, the package gained a dependency on the new parallelly package. Now, if you’re like me and concerned about bloating package dependencies, I’m sure you immediately wondered why I chose to introduce a new dependency.
Each time we use R to analyze data, we rely on the assumption that functions used produce correct results. If we can’t make this assumption, we have to spend a lot of time validating every nitty detail. Luckily, we don’t have to do this. There are many reasons for why we can comfortably use R for our analyses and some of them are unique to R. Here are some I could think of while writing this blog post - I’m sure I forgot something:
Parallel ‘Digital Rain’ by Jahobr
After two-and-a-half months, future 1.19.1 is now on CRAN. As usual, there are some bug fixes and minor improvements here and there (NEWS), including things needed by the next version of furrr. For those of you who use Slurm or LSF/OpenLava as a scheduler on your high-performance compute (HPC) cluster, future::availableCores() will now do a better job respecting the CPU resources that those schedulers allocate for your R jobs.
If you ever need to figure out if a function call in R generated a random number or not, here is a simple trick that you can use in an interactive R session. Add the following to your ~/.Rprofile(*):
if (interactive()) { invisible(addTaskCallback(local({ last <- .GlobalEnv.Random.seed if (!identical(curr, last)) { msg <- "TRACKER: .Random.seed changed" if (requireNamespace("crayon", quietly=TRUE)) msg <- crayon::blurred(msg) message(msg) last <<- curr } TRUE } }), name = "RNG tracker")) } It works by checking whether or not the state of the random number generator (RNG), that is, .
There are new versions of future and future.apply - your friends in the parallelization business - on CRAN. These updates are mostly maintenance updates with bug fixes, some improvements, and preparations for upcoming changes. It’s been some time since I blogged about these packages, so here is the summary of the main updates this far since early 2020:
future:
values() for lists and other containers was renamed to value() to simplify the API [future 1.
Source: Wiktionary.org I presented Progressr: An Inclusive, Unifying API for Progress Updates (15 minutes; 20 slides) at e-Rum 2020, on June 17, 2020:
Abstract HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (starts at 00h49m58s) I am grateful for everyone involved who made e-Rum 2020 possible. I cannot imagine having to cancel the on-site Milano conference that had planned for more than a year and then start over to re-organize and create a fabulous online experience for ~1,500 participants in such short notice.
Design: Dan LaBar I presented Future: Simple Async, Parallel & Distributed Processing in R Why and What’s New? at rstudio::conf 2020 in San Francisco, USA, on January 29, 2020. Below are the slides for my talk (17 slides; ~18+2 minutes):
HTML (incremental Google Slides; requires online access) PDF (flat slides) Video with closed captions (official rstudio::conf recording) First of all, a big thank you goes out to Dan LaBar (@embiggenData) for proposing and contributing the original design of the future hex sticker.
No dogs were harmed while making this release
future 1.15.0 is now on CRAN, accompanied by a recent, related update of future.callr 0.5.0. The main update is a change to the Future API:
resolved() will now also launch lazy futures
Although this change does not look much to the world, I’d like to think of this as part of a young person slowly finding themselves. This change in behavior helps us in cases where we create lazy futures upfront;
Below are the slides for my Future: Simple Parallel and Distributed Processing in R that I presented at the useR! 2019 conference in Toulouse, France on July 9-12, 2019.
My talk (25 slides; ~15+3 minutes):
Title: Future: Simple Parallel and Distributed Processing in R HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (official recording) I want to send out a big thank you to everyone making the useR!
New release: startup 0.12.0 is now on CRAN. This version introduces support for processing some of the R startup files with a certain frequency, e.g. once per day, once per week, or once per month. See below for two examples.
startup::startup() is cross platform.
The startup package makes it easy to split up a long, complicated .Rprofile startup file into multiple, smaller files in a .Rprofile.d/ folder. For instance, setting R option repos in a separate file ~/.
A bit late but here are my slides on Future: Friendly Parallel Processing in R for Everyone that I presented at the satRday LA 2019 conference in Los Angeles, CA, USA on April 6, 2019.
My talk (33 slides; ~45 minutes):
Title: : Friendly Parallel and Distributed Processing in R for Everyone HTML (incremental slides; requires online access) PDF (flat slides) Video (44 min; YouTube; sorry, different page numbers) Thank you all for making this a stellar satRday event.
Below are links to my slides from my talk on Future: Friendly Parallel Processing in R for Everyone that I presented last month at the satRday Paris 2019 conference in Paris, France (February 23, 2019).
My talk (32 slides; ~40 minutes):
Title: Future: Friendly Parallel Processing in R for Everyone HTML (incremental slides; requires online access) PDF (flat slides) A big shout out to the organizers, all the volunteers, and everyone else for making it a great satRday.
A commonly asked question in the R community is:
How can I parallelize the following for-loop?
The answer almost always involves rewriting the for (...) { ... } loop into something that looks like a y <- lapply(...) call. If you can achieve that, you can parallelize it via for instance y <- future.apply::future_lapply(...) or y <- foreach::foreach() %dopar% { ... }.
For some for-loops it is straightforward to rewrite the code to make use of lapply() instead, whereas in other cases it can be a bit more complicated, especially if the for-loop updates multiple variables in each iteration.
New versions of the following future backends are available on CRAN:
future.callr - parallelization via callr, i.e. on the local machine future.batchtools - parallelization via batchtools, i.e. on a compute cluster with job schedulers (SLURM, SGE, Torque/PBS, etc.) but also on the local machine future.BatchJobs - (maintained for legacy reasons) parallelization via BatchJobs, which is the predecessor of batchtools These releases fix a few small bugs and inconsistencies that were identified with help of the future.