Deep Learning and AI

Update Alert: PyTorch 1.11

March 14, 2022 • 7 min read

Major Update for PyTorch and PyTorch Libraries

PyTorch released new updates yesterday to both PyTorch and PyTorch libraries. PyTorch 1.11 is a major release that includes a host of new features, enhancements, breaking changes, deprecations and beta features.

The full release notes are available here.

Highlights include:

TorchData is a new library for common modular data loading primitives for easily constructing flexible and performant data pipelines. View it on GitHub.
functorch, a library that adds composable function transforms to PyTorch, is now available in beta. View it on GitHub.
Distributed Data Parallel (DDP) static graph optimizations available in stable.

Let's take a closer look at these highlights.

TorchData

The beta release of TorchData is a library of common modular data loading primitives for easily constructing flexible and performant data pipelines. TorchData is meant to help fix the issues with the existing DataLoader which bundled too many features together, is difficult to extend, and can lead to having to rewrite the same data loading utilities over and over again. The goal with TorchData is to enable composable data loading through Iterable-style and Map-style building blocks called "DataPipes" that work well out of the box with PyTorch's DataLoader.

A DataPipe takes in some access function over Python data structures, __iter__ for IterDataPipe and __getitem__ for MapDataPipe, and returns a new access function with a slight transformation applied. You can chain multiple DataPipes together to form a data pipeline that performs all the necessary data transformation.

PyTorch has implemented over 50 DataPipes that provide different core functionalities, such as opening files, parsing texts, transforming samples, caching, shuffling, and batching. For users who are interested in connecting to cloud providers (such as Google Drive or AWS S3), the fsspec and iopath DataPipes will allow you to do so. The documentation provides detailed explanations and usage examples of each IterDataPipe and MapDataPipe.

What's the future for DataLoader?

There will be a new version of DataLoader in the next release. At the high level, the plan is that DataLoader V2 will only be responsible for multiprocessing, distributed, and similar functionalities, not data processing logic. All data processing features, such as the shuffling and batching, will be moved out of DataLoader to DataPipe. At the same time, the current/old version of DataLoader should still be available and you can use DataPipes with that as well.

To install TorchData via pip, please first install PyTorch 1.11 and then run the following command:

pip install torchdata

The documentation for TorchData is now live. It contains a tutorial that covers how to use DataPipes, use them with DataLoader, and implement custom ones. FAQs and future plans related to DataLoader are described in our project’s README file.

functorch

The beta release of functorch is heavily inspired by Google JAX. functorch is a library that adds composable function transforms to PyTorch. It aims to provide composable vmap (vectorization) and autodiff transforms that work with PyTorch modules and PyTorch autograd with good eager-mode performance.

Composable function transforms can help with a number of use cases that are tricky to do in PyTorch today:

computing per-sample-gradients (or other per-sample quantities)
running ensembles of models on a single machine
efficiently batching together tasks in the inner-loop of MAML
efficiently computing Jacobians and Hessians as well as batched ones

Composing vmap (vectorization), vjp (reverse-mode AD), and jvp (forward-mode AD) transforms allows us to effortlessly express the above without designing a separate library for each.

To install functorch via pip, please first install PyTorch 1.11 and then run the following command:

pip install functorch

Distributed Training

Distributed Data Parallel (DDP) static graph

DDP static graph assumes that your model employs the same set of used/unused parameters in every iteration, so that it can deterministically know states like which hooks will fire, how many times the hooks will fire and gradients computation ready order after the first iteration.

Static graph caches these states in the first iteration, and thus it could support features that DDP can not support in previous releases, e.g., support multiple activation checkpoints on the same parameters regardless of whether there are unused parameters or not. The static graph feature also applies performance optimizations when there are unused parameters, e.g., it avoids traversing graphs to search unused parameters every iteration, and enables dynamic bucketing order.

These optimizations in the DDP static graph brought 10% QPS gain for some recommendation models.

To enable static graph, just simply set static_graph=True in the DDP API like this:

ddp_model = DistributedDataParallel(model, static_graph=True)

Comparison between DataParallel and DistributedDataParallel

Despite the added complexity, there are several reasons to consider using DistributedDataParallel over DataParallel:

First, DataParallel is single-process, multi-thread, and only works on a single machine, while DistributedDataParallel is multi-process and works for both single- and multi- machine training. DataParallel is usually slower than DistributedDataParallel even on a single machine due to GIL contention across threads, per-iteration replicated model, and additional overhead introduced by scattering inputs and gathering outputs.
If your model is too large to fit on a single GPU, you must use model parallel to split it across multiple GPUs. DistributedDataParallel works with model parallel; DataParallel does not at this time. When DDP is combined with model parallel, each DDP process would use model parallel, and all processes collectively would use data parallel.
If your model needs to span multiple machines or if your use case does not fit into data parallelism paradigm, please see the RPC API for more generic distributed training support.

Other Things to Note in PyTorch 1.11

PyTorch also introduced the beta release of TorchRec and a number of improvements to the current PyTorch domain libraries, alongside the PyTorch 1.11 release. These updates focus on developing common and extensible APIs across all domains to make it easier to build ecosystem projects on PyTorch. Highlights include:

TorchRec, a PyTorch domain library for Recommendation Systems, is available in beta. View it on GitHub.
TorchAudio - Added Enformer- and RNN-T-based models and recipes to support the full development lifecycle of a streaming ASR model. See the release notes here.
TorchText - Added beta support for RoBERTa and XLM-R models, byte-level BPE tokenizer, and text datasets backed by TorchData. See the release notes here.
TorchVision - Added 4 new model families and 14 new classification datasets such as CLEVR, GTSRB, FER2013. See the release notes here.

As we mentioned earlier, you can view the full release notes here.

Have any questions? Feel free to contact us about using PyTorch with our Deep Learning, Machine Learning and AI systems.

Blog