parallelly 1.25.0: availableCores(omit=n) and, Finally, Built-in SSH Support for MS Windows 10 Users

April 30, 2021 in R

A piece of an ice core - more pleasing to look at than yet another illustration of a CPU core (Image credit: Ludovic Brucker, NASA’s Goddard Space Flight Center) parallelly 1.25.0 is on CRAN. It comes with two major improvements: You can now use availableCores(omit = n) to ask for all but n CPU cores makeClusterPSOCK() can finally use the built-in SSH client on MS Windows 10 to set up remote workers

Continue reading

Using Kubernetes and the Future Package to Easily Parallelize R in the Cloud

April 8, 2021 in R

This is a guest post by Chris Paciorek, Department of Statistics, University of California at Berkeley. In this post, I’ll demonstrate that you can easily use the future package in R on a cluster of machines running in the cloud, specifically on a Kubernetes cluster. This allows you to easily doing parallel computing in R in the cloud. One advantage of doing this in the cloud is the ability to easily scale the number and type of (virtual) machines across which you run your parallel computation.

Continue reading

future.BatchJobs - End-of-Life Announcement

January 8, 2021 in R

This is an announcement that future.BatchJobs - A Future API for Parallel and Distributed Processing using BatchJobs has been archived on CRAN. The package has been deprecated for years with a recommendation of using future.batchtools instead. The latter has been on CRAN since June 2017 and builds upon the batchtools package, which itself supersedes the BatchJobs package. To wrap up the three-and-a-half year long life of future.

Continue reading

My Keynote 'Future' Presentation at the European Bioconductor Meeting 2020

December 19, 2020 in R

Luke Zappia's summary of the talk I presented Future: A Simple, Extendable, Generic Framework for Parallel Processing in R at the European Bioconductor Meeting 2020, which took place online during the week of December 14-18, 2020. You’ll find my slides (39 slides + Q&A slides; 35 minutes) below: Title & Abstract HTML (Google Slides; requires online access) PDF (flat slides) Video (YouTube) I want to thank the organizers for inviting me to this Bioconductor conference.

Continue reading

NYC R Meetup: Slides on Future

November 12, 2020 in R

I presented Future: Simple, Friendly Parallel Processing for R (67 minutes; 59 slides + Q&A slides) at New York Open Statistical Programming Meetup, on November 9, 2020: HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (presentation starts at 0h10m30s, Q&A starts at 1h17m40s) I like to thanks everyone who attented and everyone who asked lots of brilliant questions during the Q&A. I’d also want to express my gratitude to Amada, Jared, and Noam for the invitation and making this event possible.

Continue reading

future 1.20.1 - The Future Just Got a Bit Brighter

November 6, 2020 in R

future 1.20.1 is on CRAN. It adds some new features, deprecates old and unwanted behaviors, adds a couple of vignettes, and fixes a few bugs. Interactive debugging First out among the new features, and a long-running feature request, is the addition of argument split to plan(), which allows us to split, or “tee”, any output produced by futures. The default is split = FALSE for which standard output and conditions are captured by the future and only relayed after the future has been resolved, i.

Continue reading

parallelly, future - Cleaning Up Around the House

November 4, 2020 in R

parallelly adverb par·al·lel·ly | \ ˈpa-rə-le(l)li \ Definition: in a parallel manner future noun fu·ture | \ ˈfyü-chər \ Definition: existing or occurring at a later time I’ve cleaned up around the house - with the recent release of future 1.20.1, the package gained a dependency on the new parallelly package. Now, if you’re like me and concerned about bloating package dependencies, I’m sure you immediately wondered why I chose to introduce a new dependency.

Continue reading

Trust the Future

November 4, 2020 in R

Each time we use R to analyze data, we rely on the assumption that functions used produce correct results. If we can’t make this assumption, we have to spend a lot of time validating every nitty detail. Luckily, we don’t have to do this. There are many reasons for why we can comfortably use R for our analyses and some of them are unique to R. Here are some I could think of while writing this blog post - I’m sure I forgot something:

Continue reading

future 1.19.1 - Making Sure Proper Random Numbers are Produced in Parallel Processing

September 22, 2020 in R

Parallel ‘Digital Rain’ by Jahobr After two-and-a-half months, future 1.19.1 is now on CRAN. As usual, there are some bug fixes and minor improvements here and there (NEWS), including things needed by the next version of furrr. For those of you who use Slurm or LSF/OpenLava as a scheduler on your high-performance compute (HPC) cluster, future::availableCores() will now do a better job respecting the CPU resources that those schedulers allocate for your R jobs.

Continue reading

Detect When the Random Number Generator Was Used

September 21, 2020 in R

If you ever need to figure out if a function call in R generated a random number or not, here is a simple trick that you can use in an interactive R session. Add the following to your ~/.Rprofile(*): if (interactive()) { invisible(addTaskCallback(local({ last <- .GlobalEnv

.Random.seed function(...) { curr <- .GlobalEnv

.Random.seed if (!identical(curr, last)) { msg <- "TRACKER: .Random.seed changed" if (requireNamespace("crayon", quietly=TRUE)) msg <- crayon::blurred(msg) message(msg) last <<- curr } TRUE } }), name = "RNG tracker")) } It works by checking whether or not the state of the random number generator (RNG), that is, .

Continue reading

future and future.apply - Some Recent Improvements

July 11, 2020 in R

There are new versions of future and future.apply - your friends in the parallelization business - on CRAN. These updates are mostly maintenance updates with bug fixes, some improvements, and preparations for upcoming changes. It’s been some time since I blogged about these packages, so here is the summary of the main updates this far since early 2020: future: values() for lists and other containers was renamed to value() to simplify the API [future 1.

Continue reading

e-Rum 2020 Slides on Progressr

July 4, 2020 in R

Source: Wiktionary.org I presented Progressr: An Inclusive, Unifying API for Progress Updates (15 minutes; 20 slides) at e-Rum 2020, on June 17, 2020: Abstract HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (starts at 00h49m58s) I am grateful for everyone involved who made e-Rum 2020 possible. I cannot imagine having to cancel the on-site Milano conference that had planned for more than a year and then start over to re-organize and create a fabulous online experience for ~1,500 participants in such short notice.

Continue reading

rstudio::conf 2020 Slides on Futures

February 1, 2020 in R

Design: Dan LaBar I presented Future: Simple Async, Parallel & Distributed Processing in R Why and What’s New? at rstudio::conf 2020 in San Francisco, USA, on January 29, 2020. Below are the slides for my talk (17 slides; ~18+2 minutes): HTML (incremental Google Slides; requires online access) PDF (flat slides) Video with closed captions (official rstudio::conf recording) First of all, a big thank you goes out to Dan LaBar (@embiggenData) for proposing and contributing the original design of the future hex sticker.

Continue reading

future 1.15.0 - Lazy Futures are Now Launched if Queried

November 9, 2019 in R

No dogs were harmed while making this release future 1.15.0 is now on CRAN, accompanied by a recent, related update of future.callr 0.5.0. The main update is a change to the Future API: resolved() will now also launch lazy futures Although this change does not look much to the world, I’d like to think of this as part of a young person slowly finding themselves. This change in behavior helps us in cases where we create lazy futures upfront;

Continue reading

useR! 2019 Slides on Futures

July 12, 2019 in R

Below are the slides for my Future: Simple Parallel and Distributed Processing in R that I presented at the useR! 2019 conference in Toulouse, France on July 9-12, 2019. My talk (25 slides; ~15+3 minutes): Title: Future: Simple Parallel and Distributed Processing in R HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (official recording) I want to send out a big thank you to everyone making the useR!

Continue reading

startup - run R startup files once per hour, day, week, ...

May 26, 2019 in R

New release: startup 0.12.0 is now on CRAN. This version introduces support for processing some of the R startup files with a certain frequency, e.g. once per day, once per week, or once per month. See below for two examples. startup::startup() is cross platform. The startup package makes it easy to split up a long, complicated .Rprofile startup file into multiple, smaller files in a .Rprofile.d/ folder. For instance, setting R option repos in a separate file ~/.

Continue reading

SatRday LA 2019 Slides on Futures

May 16, 2019 in R

A bit late but here are my slides on Future: Friendly Parallel Processing in R for Everyone that I presented at the satRday LA 2019 conference in Los Angeles, CA, USA on April 6, 2019. My talk (33 slides; ~45 minutes): Title: : Friendly Parallel and Distributed Processing in R for Everyone HTML (incremental slides; requires online access) PDF (flat slides) Video (44 min; YouTube; sorry, different page numbers) Thank you all for making this a stellar satRday event.

Continue reading

SatRday Paris 2019 Slides on Futures

March 7, 2019 in R

Below are links to my slides from my talk on Future: Friendly Parallel Processing in R for Everyone that I presented last month at the satRday Paris 2019 conference in Paris, France (February 23, 2019). My talk (32 slides; ~40 minutes): Title: Future: Friendly Parallel Processing in R for Everyone HTML (incremental slides; requires online access) PDF (flat slides) A big shout out to the organizers, all the volunteers, and everyone else for making it a great satRday.

Continue reading

Parallelize a For-Loop by Rewriting it as an Lapply Call

January 11, 2019 in R

A commonly asked question in the R community is: How can I parallelize the following for-loop? The answer almost always involves rewriting the for (...) { ... } loop into something that looks like a y <- lapply(...) call. If you can achieve that, you can parallelize it via for instance y <- future.apply::future_lapply(...) or y <- foreach::foreach() %dopar% { ... }. For some for-loops it is straightforward to rewrite the code to make use of lapply() instead, whereas in other cases it can be a bit more complicated, especially if the for-loop updates multiple variables in each iteration.

Continue reading

Maintenance Updates of Future Backends and doFuture

January 7, 2019 in R

New versions of the following future backends are available on CRAN: future.callr - parallelization via callr, i.e. on the local machine future.batchtools - parallelization via batchtools, i.e. on a compute cluster with job schedulers (SLURM, SGE, Torque/PBS, etc.) but also on the local machine future.BatchJobs - (maintained for legacy reasons) parallelization via BatchJobs, which is the predecessor of batchtools These releases fix a few small bugs and inconsistencies that were identified with help of the future.

Continue reading

NEWER POSTS
OLDER POSTS
page 2 of 4

Henrik Bengtsson

MSc CS | PhD Math Stat | Associate Professor | R Foundation | R Consortium

Associate Professor