This document is relevant for: Inf1

PyTorch Neuron (torch-neuron) release notes#

This document lists the release notes for the Pytorch-Neuron package.

Known Issues and Limitations - Updated 03/21/2023#

Min & Max Accuracy#

The index outputs of the aten::argmin, aten::argmax, aten::min, and aten::max operator implementations are sensitive to precision. For models that contain these operators and have float32 inputs, we recommend using the --fp32-cast=matmult --fast-math no-fast-relayout compiler option to avoid numerical imprecision issues. Additionally, the aten::min and aten::max operator implementations do not currently support int64 inputs when dim=0. For more information on precision and performance-accuracy tuning, see Mixed precision and performance-accuracy tuning (neuron-cc).

Python 3.5#

If you attempt to import torch.neuron from Python 3.5 you will see this error in - please use Python 3.6 or greater:

File "/tmp/install_test_env/lib/python3.5/site-packages/torch_neuron/", line 29
   f'Invalid dependency version torch=={torch.__version__}. '
SyntaxError: invalid syntax
  • Torchvision has dropped support for Python 3.5

  • HuggingFace transformers has dropped support for Python 3.5


When versions of torchvision and torch are mismatched, this can result in exceptions when compiling torchvision based models. Specific versions of torchvision are built against each release of torch. For example:

  • torch==1.5.1 matches torchvision==0.6.1

  • torch==1.7.1 matches torchvision==0.8.2

  • etc.

Simultaneously installing both torch-neuron and torchvision is the recommended method of correctly resolving versions.

Dynamic Batching#

Dynamic batching does not work properly for some models that use the aten::size operator. When this issue occurs, the input batch sizes are not properly recorded at inference time, resulting in an error such as:

RuntimeError: The size of tensor a (X) must match the size of tensor b (Y) at non-singleton dimension 0.

This error typically occurs when aten::size operators are partitioned to CPU. We are investigating a fix for this issue.

PyTorch Neuron release [package ver. 1.*.*., SDK ver. 2.13.0]#

Date: 08/28/2023

  • Added support for clamp_min/clamp_max ATEN operators.

PyTorch Neuron release []#

Date: 06/14/2023

New in this release#

  • Added support for Python 3.10

Bug fixes#

  • torch.pow Operation now correctly handles mismatch between base and exponent data types

PyTorch Neuron release []#

Date: 05/1/2023

  • Minor updates.

PyTorch Neuron release []#

Date: 03/28/2023

New in this release#

  • Added support for torch==1.13.1

  • New releases of torch-neuron no longer include versions for torch==1.7 and torch==1.8

  • Added support for Neuron runtime 2.12

  • Added support for new operators:

    • aten::tensordot

    • aten::adaptive_avg_pool1d

    • aten::prelu

    • aten::reflection_pad2d

    • aten::baddbmm

    • aten::repeat

  • Added a separate_weights flag to torch_neuron.trace() to support models that are larger than 2GB

Bug fixes#

PyTorch Neuron release []#

Date: 11/23/2022

New in this release#

  • Added PyTorch 1.12 support

  • Added Python 3.8 support

  • Added new operators support. See PyTorch Neuron (torch-neuron) Supported operators

  • Added support for aten::lstm. See: Developer Guide - PyTorch Neuron (torch-neuron) LSTM Support

  • Improved logging:

    • Improved error messages for specific compilation failure modes, including out-of-memory errors

    • Added a warning to show the code location of prim::PythonOp operations

    • Removed overly-verbose tracing messages

    • Added improved error messages for neuron-cc and tensorflow dependency issues

    • Added more debug information when an invalid dynamic batching configuration is used

  • Added new beta explicit NeuronCore placement API. See: torch_neuron_core_placement_api

  • Added new guide for NeuronCore placement. See: PyTorch Neuron (torch-neuron) Core Placement

  • Improved torch_neuron.trace() performance when using large graphs

  • Reduced host memory usage of loaded models in

  • Added single_fusion_ratio_threshold argument to torch_neuron.trace() to give more fine-grained control of partitioned graphs

Bug fixes#

  • Improved handling of tensor mutations which previously caused accuracy issues on certain models (i.e. yolor, yolov5)

  • Fixed an issue where inf and -inf values would cause unexpected NaN values. This could occur with newer versions of transformers

  • Fixed an issue where torch.neuron.DataParallel() would not fully utilize all NeuronCores for specific batch sizes

  • Fixed and improved operators:

    • aten::upsample_bilinear2d: Improved error messages in cases where the operation cannot be supported

    • aten::_convolution: Added support for output_padding argument

    • aten::div: Added support for rounding_mode argument

    • aten::sum: Fixed to handle non-numeric data types

    • aten::expand: Fixed to handle scalar tensors

    • aten::permute: Fixed to handle negative indices

    • aten::min: Fixed to support more input types

    • aten::max: Fixed to support more input types

    • aten::max_pool2d: Fixed to support both 3-dimensional and 4-dimensional input tensors

    • aten::Int: Fixed an issue where long values would incorrectly lose precision

    • aten::constant_pad_nd: Fixed to correctly use non-0 padding values

    • aten::pow: Fixed to support more input types & values

    • aten::avg_pool2d: Added support for count_include_pad argument. Added support for ceil_mode argument if padding isn’t specified

    • aten::zero: Fixed to handle scalars correctly

    • prim::Constant: Fixed an issue where -inf was incorrectly handled

    • Improved handling of scalars in arithmetic operators

PyTorch Neuron release []#

Date: 04/29/2022

New in this release#

  • Added support PyTorch 1.11.

  • Updated PyTorch 1.10 to version 1.10.2.

  • End of support for torch-neuron 1.5, see End of support for torch-neuron version 1.5.

  • Added support for new operators:

    • aten::masked_fill_

    • aten::new_zeros

    • aten::frobenius_norm

Bug fixes#

  • Improved aten::gelu accuracy

  • Updated aten::meshgrid to support optional indexing argument introduced in torch 1.10 , see PyTorch issue 50276

PyTorch Neuron release []#

Date: 03/25/2022

New in this release#

  • Added full support for aten::max_pool2d_with_indices - (Was previously supported only when indices were unused).

  • Added new torch-neuron packages compiled with -D_GLIBCXX_USE_CXX11_ABI=1, the new packages support PyTorch 1.8, PyTorch 1.9, and PyTorch 1.10. To install the additional packages compiled with -D_GLIBCXX_USE_CXX11_ABI=1 please change the package repo index to (

PyTorch Neuron release []#

Date: 01/20/2022

New in this release#

  • Added PyTorch 1.10 support

  • Added new operators support, see PyTorch Neuron (torch-neuron) Supported operators

  • Updated aten::_convolution to support 2d group convolution

  • Updated neuron::forward operators to allocate less dynamic memory. This can increase performance on models with many input & output tensors.

  • Updated neuron::forward to better handle batch sizes when dynamic_batch_size=True. This can increase performance at inference time when the input batch size is exactly equal to the traced model batch size.

Bug fixes#

  • Added the ability to torch.jit.trace a torch.nn.Module where a submodule has already been traced with torch_neuron.trace() on a CPU-type instance. Previously, if this had been executed on a CPU-type instance, an initialization exception would have been thrown.

  • Fixed aten::matmul behavior on 1-dimensional by n-dimensional multiplies. Previously, this would cause a validation error.

  • Fixed binary operator type promotion. Previously, in unusual situations, operators like aten::mul could produce incorrect results due to invalid casting.

  • Fixed aten::select when index was -1. Previously, this would cause a validation error.

  • Fixed aten::adaptive_avg_pool2d padding and striding behavior. Previously, this could generate incorrect results with specific configurations.

  • Fixed an issue where dictionary inputs could be incorrectly traced when the tensor values had gradients.

PyTorch Neuron release [2.0.536.0]#

Date: 01/05/2022

New in this release#

PyTorch Neuron release [2.0.468.0]#

Date: 12/15/2021

New in this release#

  • Added support for aten::cumsum operation.

  • Fixed aten::expand to correctly handle adding new dimensions.

PyTorch Neuron release [2.0.392.0]#

Date: 11/05/2021

  • Updated Neuron Runtime (which is integrated within this package) to libnrt to fix a container issue that was preventing the use of containers when /dev/neuron0 was not present. See details here neuron-runtime-release-notes.

PyTorch Neuron release [2.0.318.0]#

Date: 10/27/2021

New in this release#

Resolved Issues#

  • Fixed a performance issue when using both the dynamic_batch_size=True trace option and --neuron-core-pipeline compiler option. Dynamic batching now uses OpenMP to execute pipeline batches concurrently.

  • Fixed torch_neuron.trace issues:

    • Fixed a failure when the same submodule was traced with multiple inputs

    • Fixed a failure where some operations would fail to be called with the correct arguments

    • Fixed a failure where custom operators (torch plugins) would cause a trace failure

  • Fixed variants of aten::upsample_bilinear2d when scale_factor=1

  • Fixed variants of aten::expand using dim=-1

  • Fixed variants of aten::stack using multiple different input data types

  • Fixed variants of aten::max using indices outputs


Date: 08/12/2021


  • Minor updates.


Date: 07/02/2021


  • Added support for dictionary outputs using strict=False flag. See /neuron-guide/neuron-frameworks/pytorch-neuron/troubleshooting-guide.rst.

  • Updated aten::batch_norm to correctly implement the affine flag.

  • Added support for aten::erf and prim::DictConstruct. See PyTorch Neuron (torch-neuron) Supported operators.

  • Added dynamic batch support. See /neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.rst.


Date: 5/28/2021


  • Added support for PyTorch 1.8.1

    • Models compatibility

      • Models compiled with previous versions of PyTorch Neuron (<1.8.1) are compatible with PyTorch Neuron 1.8.1.

      • Models compiled with PyTorch Neuron 1.8.1 are not backward compatible with previous versions of PyTorch Neuron (<1.8.1) .

    • Updated tutorials to use Hugging Face Transformers 4.6.0.

    • Added a new set of forward operators (forward_v2)

    • Host memory allocation when loading the same model on multiple NeuronCores is significantly reduced

    • Fixed an issue where models would not deallocate all memory within a python session after being garbage collected.

    • Fixed a TorchScript/C++ issue where loading the same model multiple times would not use multiple NeuronCores by default.

  • Fixed logging to no longer configure the root logger.

  • Removed informative messages that were produced during compilations as warnings. The number of warnings reduced significantly.

  • Convolution operator support has been extended to include ConvTranspose2d variants.

  • Reduce the amount of host memory usage during inference.


Date: 4/30/2021



Date: 3/4/2021


  • Minor enhancements.


Date: 2/24/2021


  • Fix for CVE-2021-3177.


Date: 1/30/2021



Date: 12/23/2020


  • We are dropping support for Python 3.5 in this release

  • torch.neuron.trace behavior will now throw a RuntimeError in the case that no operators are compiled for neuron hardware

  • torch.neuron.trace will now display compilation progress indicators (dots) as default behavior (neuron-cc must updated to the December release to greater to see this feature)

  • Added new operator support. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.

  • Extended the BERT pretrained tutorial to demonstrate execution on multiple cores and batch modification, updated the tutorial to accomodate changes in the Hugging Face Transformers code for version 4.0

  • Added a tutorial for torch-serve which extends the BERT tutorial

  • Added support for PyTorch 1.7


Date: 11/17/2020


  • Fixed bugs in comparison operators, and added remaining variantes (eq, ne, gt, ge, lt, le)

  • Added support for prim::PythonOp - note that this must be run on CPU and not Neuron. We recommend you replace this code with PyTorch operators if possible

  • Support for a series of new operators. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.

  • Performance improvements to the runtime library

  • Correction of a runtime library bug which caused models with large tensors to generate incorrect results in some cases


Date: 09/22/2020


  • Various minor improvements to the Pytorch autopartitioner feature

  • Support for the operators aten::constant_pad_nd, aten::meshgrid

  • Improved performance on various torchvision models. Of note are resnet50 and vgg16


Date: 08/08/2020


  • Various minor improvements to the Pytorch autopartitioner feature

  • Support for the aten:ones operator


Date: 08/05/2020


Various minor improvements.


Date: 07/16/2020


This release adds auto-partitioning, model analysis and PyTorch 1.5.1 support, along with a number of new operators

Major New Features#

  • Support for Pytorch 1.5.1

  • Introduce an automated operator device placement mechanism in torch.neuron.trace to run sub-graphs that contain operators that are not supported by the neuron compiler in native PyTorch. This new mechanism is on by default and can be turned off by adding argument fallback=False to the compiler arguments.

  • Model analysis to find supported and unsupported operators in a model

Resolved Issues#


Date 6/11/2020


Major New Features#

Resolved Issues#

Known Issues and Limitations#


Date: 5/11/2020


Additional PyTorch operator support and improved support for model saving and reloading.

Major New Features#

  • Added Neuron Compiler support for a number of previously unsupported PyTorch operators. Please see :ref:`neuron-cc-ops-pytorch`for the complete list of operators.

  • Add support for torch.neuron.trace on models which have previously been saved using and then reloaded.

Resolved Issues#

Known Issues and Limitations#


Date: 3/26/2020


Major New Features#

Resolved Issues#

Known Issues and limitations#


Date: 2/27/2020


Added Neuron Compiler support for a number of previously unsupported PyTorch operators. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.

Major new features#

  • None

Resolved issues#

  • None


Date: 1/27/2020


Major new features#

Resolved issues#

  • Python 3.5 and Python 3.7 are now supported.

Known issues and limitations#

Other Notes#


Date: 12/20/2019


This is the initial release of torch-neuron. It is not distributed on the DLAMI yet and needs to be installed from the neuron pip repository.

Note that we are currently using a TensorFlow as an intermediate format to pass to our compiler. This does not affect any runtime execution from PyTorch to Neuron Runtime and Inferentia. This is why the neuron-cc installation must include [tensorflow] for PyTorch.

Major new features#

Resolved issues#

Known issues and limitations#

Models TESTED#

The following models have successfully run on neuron-inferentia systems

  1. SqueezeNet

  2. ResNet50

  3. Wide ResNet50

Pytorch Serving#

In this initial version there is no specific serving support. Inference works correctly through Python on Inf1 instances using the neuron runtime. Future releases will include support for production deployment and serving of models

Profiler support#

Profiler support is not provided in this initial release and will be available in future releases

Automated partitioning#

Automatic partitioning of graphs into supported and non-supported operations is not currently supported. A tutorial is available to provide guidance on how to manually parition a model graph. Please see pytorch-manual-partitioning-jn-tutorial

PyTorch dependency#

Currently PyTorch support depends on a Neuron specific version of PyTorch v1.3.1. Future revisions will add support for 1.4 and future releases.

Trace behavior#

In order to trace a model it must be in evaluation mode. For examples please see ResNet50 model for Inferentia

Six pip package is required#

The Six package is required for the torch-neuron runtime, but it is not modeled in the package dependencies. This will be fixed in a future release.

Multiple NeuronCore support#

If the num-neuroncores options is used the number of cores must be manually set in the calling shell environment variable for compilation and inference.

For example: Using the keyword argument compiler_args=[‘—num-neuroncores’, ‘4’] in the trace call, requires NEURONCORE_GROUP_SIZES=4 to be set in the environment at compile time and runtime

CPU execution#

At compilation time a constant output is generated for the purposes of tracing. Running inference on a non neuron instance will generate incorrect results. This must not be used. The following error message is generated to stderr:

Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only
indicate tensor shape

Other notes#

  • Python version(s) supported:

    • 3.6

  • Linux distribution supported:

    • DLAMI Ubuntu 18 and Amazon Linux 2 (using Python 3.6 Conda environments)

    • Other AMIs based on Ubuntu 18

    • For Amazon Linux 2 please install Conda and use Python 3.6 Conda environment

This document is relevant for: Inf1