Happy New Year! I made some updates to the future framework during 2021 that involve overall improvements and essential preparations to go forward with some exciting new features that I’m keen to work on during 2022. The future framework makes it easy to parallelize existing R code - often with only a minor change of code. The goal is to lower the barriers so that anyone can quickly and safely speed up their existing R code in a worry-free manner.
parallelly 1.29.0 is on CRAN. The parallelly package enhances the parallel package - our built-in R package for parallel processing - by improving on existing features and by adding new ones. Somewhat simplified, parallelly provides the things that you would otherwise expect to find in the parallel package. The future package rely on the parallelly package internally for local and remote parallelization. Since my previous post on parallelly five months ago, the parallelly package had some bugs fixed, and it gained a few new features;
Author: Angelina Panagopoulou, GSoC student developer, undergraduate in the Department of Informatics & Telecommunications (DIT), University of Athens, Greece We are glad to announce recent CRAN releases of matrixStats with support for handling and returning name attributes. This feature is added to make matrixStats functions handle names in the same manner as the corresponding base R functions. In particular, the behavior of matrixStats functions is now the same as the apply() function in R, resolving previous lack of, or inconsistent, handling of row and column names.
progressr 0.8.0 is on CRAN. It comes with some new features: A new ‘rstudio’ handler that reports on progress via the RStudio job interface in RStudio withProgressShiny() now updates the detail part, instead of the message part In addition to signalling relative amounts of progress, it’s now also possible to signal total amounts If you’re curious what progressr is about, have a look at my e-Rum 2020 presentation.
parallelly 1.26.0 is on CRAN. It comes with one major improvement and one new function: The setup of parallel workers is now much faster, which comes from using a concurrent, instead of sequential, setup strategy The new freePort() can be used to find a TCP port that is currently available Faster setup of local, parallel workers In R 4.0.0, which was released in May 2020, parallel::makeCluster(n) gained the power of setting up the n local cluster nodes all at the same time, which greatly reduces to total setup time.
A piece of an ice core - more pleasing to look at than yet another illustration of a CPU core (Image credit: Ludovic Brucker, NASA’s Goddard Space Flight Center) parallelly 1.25.0 is on CRAN. It comes with two major improvements: You can now use availableCores(omit = n) to ask for all but n CPU cores makeClusterPSOCK() can finally use the built-in SSH client on MS Windows 10 to set up remote workers
This is a guest post by Chris Paciorek, Department of Statistics, University of California at Berkeley. In this post, I’ll demonstrate that you can easily use the future package in R on a cluster of machines running in the cloud, specifically on a Kubernetes cluster. This allows you to easily doing parallel computing in R in the cloud. One advantage of doing this in the cloud is the ability to easily scale the number and type of (virtual) machines across which you run your parallel computation.
This is an announcement that future.BatchJobs - A Future API for Parallel and Distributed Processing using BatchJobs has been archived on CRAN. The package has been deprecated for years with a recommendation of using future.batchtools instead. The latter has been on CRAN since June 2017 and builds upon the batchtools package, which itself supersedes the BatchJobs package. To wrap up the three-and-a-half year long life of future.
Luke Zappia's summary of the talk I presented Future: A Simple, Extendable, Generic Framework for Parallel Processing in R at the European Bioconductor Meeting 2020, which took place online during the week of December 14-18, 2020. You’ll find my slides (39 slides + Q&A slides; 35 minutes) below: Title & Abstract HTML (Google Slides; requires online access) PDF (flat slides) Video (to be uploaded by the organizers) I want to thank the organizers for inviting me to this Bioconductor conference.
I presented Future: Simple, Friendly Parallel Processing for R (67 minutes; 59 slides + Q&A slides) at New York Open Statistical Programming Meetup, on November 9, 2020: HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (presentation starts at 0h10m30s, Q&A starts at 1h17m40s) I like to thanks everyone who attented and everyone who asked lots of brilliant questions during the Q&A. I’d also want to express my gratitude to Amada, Jared, and Noam for the invitation and making this event possible.
future 1.20.1 is on CRAN. It adds some new features, deprecates old and unwanted behaviors, adds a couple of vignettes, and fixes a few bugs. Interactive debugging First out among the new features, and a long-running feature request, is the addition of argument split to plan(), which allows us to split, or “tee”, any output produced by futures. The default is split = FALSE for which standard output and conditions are captured by the future and only relayed after the future has been resolved, i.
parallelly adverb par·al·lel·ly | \ ˈpa-rə-le(l)li \ Definition: in a parallel manner future noun fu·ture | \ ˈfyü-chər \ Definition: existing or occurring at a later time I’ve cleaned up around the house - with the recent release of future 1.20.1, the package gained a dependency on the new parallelly package. Now, if you’re like me and concerned about bloating package dependencies, I’m sure you immediately wondered why I chose to introduce a new dependency.
Each time we use R to analyze data, we rely on the assumption that functions used produce correct results. If we can’t make this assumption, we have to spend a lot of time validating every nitty detail. Luckily, we don’t have to do this. There are many reasons for why we can comfortably use R for our analyses and some of them are unique to R. Here are some I could think of while writing this blog post - I’m sure I forgot something:
Parallel ‘Digital Rain’ by Jahobr After two-and-a-half months, future 1.19.1 is now on CRAN. As usual, there are some bug fixes and minor improvements here and there (NEWS), including things needed by the next version of furrr. For those of you who use Slurm or LSF/OpenLava as a scheduler on your high-performance compute (HPC) cluster, future::availableCores() will now do a better job respecting the CPU resources that those schedulers allocate for your R jobs.
There are new versions of future and future.apply - your friends in the parallelization business - on CRAN. These updates are mostly maintenance updates with bug fixes, some improvements, and preparations for upcoming changes. It’s been some time since I blogged about these packages, so here is the summary of the main updates this far since early 2020: future: values() for lists and other containers was renamed to value() to simplify the API [future 1.
Source: Wiktionary.org I presented Progressr: An Inclusive, Unifying API for Progress Updates (15 minutes; 20 slides) at e-Rum 2020, on June 17, 2020: Abstract HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (starts at 00h49m58s) I am grateful for everyone involved who made e-Rum 2020 possible. I cannot imagine having to cancel the on-site Milano conference that had planned for more than a year and then start over to re-organize and create a fabulous online experience for ~1,500 participants in such short notice.
Design: Dan LaBar I presented Future: Simple Async, Parallel & Distributed Processing in R Why and What’s New? at rstudio::conf 2020 in San Francisco, USA, on January 29, 2020. Below are the slides for my talk (17 slides; ~18+2 minutes): HTML (incremental Google Slides; requires online access) PDF (flat slides) Video with closed captions (official rstudio::conf recording) First of all, a big thank you goes out to Dan LaBar (@embiggenData) for proposing and contributing the original design of the future hex sticker.
No dogs were harmed while making this release future 1.15.0 is now on CRAN, accompanied by a recent, related update of future.callr 0.5.0. The main update is a change to the Future API: resolved() will now also launch lazy futures Although this change does not look much to the world, I’d like to think of this as part of a young person slowly finding themselves. This change in behavior helps us in cases where we create lazy futures upfront;
Below are the slides for my Future: Simple Parallel and Distributed Processing in R that I presented at the useR! 2019 conference in Toulouse, France on July 9-12, 2019. My talk (25 slides; ~15+3 minutes): Title: Future: Simple Parallel and Distributed Processing in R HTML (incremental Google Slides; requires online access) PDF (flat slides) Video (official recording) I want to send out a big thank you to everyone making the useR!
New release: startup 0.12.0 is now on CRAN. This version introduces support for processing some of the R startup files with a certain frequency, e.g. once per day, once per week, or once per month. See below for two examples. startup::startup() is cross platform. The startup package makes it easy to split up a long, complicated .Rprofile startup file into multiple, smaller files in a .Rprofile.d/ folder. For instance, setting R option repos in a separate file ~/.
- OLDER POSTS
- page 1 of 2