This document is relevant for: Inf1
PyTorch Neuron (torch-neuron
) release notes#
This document lists the release notes for the Pytorch-Neuron package.
Known Issues and Limitations - Updated 03/21/2023#
Min & Max Accuracy#
The index outputs of the aten::argmin
, aten::argmax
, aten::min
, and
aten::max
operator implementations are sensitive to precision. For models
that contain these operators and have float32
inputs, we recommend using the
--fp32-cast=matmult --fast-math no-fast-relayout
compiler option to avoid
numerical imprecision issues. Additionally, the aten::min
and aten::max
operator implementations do not currently support int64
inputs when
dim=0
. For more information on precision and performance-accuracy tuning,
see Mixed precision and performance-accuracy tuning (neuron-cc).
Python 3.5#
If you attempt to import torch.neuron from Python 3.5 you will see this error in 1.1.7.0 - please use Python 3.6 or greater:
File "/tmp/install_test_env/lib/python3.5/site-packages/torch_neuron/__init__.py", line 29
f'Invalid dependency version torch=={torch.__version__}. '
^
SyntaxError: invalid syntax
Torchvision has dropped support for Python 3.5
HuggingFace transformers has dropped support for Python 3.5
Torchvision#
When versions of torchvision
and torch
are mismatched, this
can result in exceptions when compiling torchvision
based
models. Specific versions of torchvision
are built against each release
of torch
. For example:
torch==1.5.1
matchestorchvision==0.6.1
torch==1.7.1
matchestorchvision==0.8.2
etc.
Simultaneously installing both torch-neuron
and torchvision
is the
recommended method of correctly resolving versions.
Dynamic Batching#
Dynamic batching does not work properly for some models that use the
aten::size
operator. When this issue occurs, the input batch sizes are not
properly recorded at inference time, resulting in an error such as:
RuntimeError: The size of tensor a (X) must match the size of tensor b (Y) at non-singleton dimension 0.
This error typically occurs when aten::size
operators are partitioned to
CPU. We are investigating a fix for this issue.
PyTorch Neuron release [package ver. 1.*.*.2.11.6.0, SDK ver. 2.20.0]#
Date: 09/16/2024
Minor updates.
PyTorch Neuron release [package ver. 1.*.*.2.10.12.0, SDK ver. 2.19.0]#
Date: 07/03/2024
Minor updates.
PyTorch Neuron release [package ver. 1.*.*.2.9.74.0, SDK ver. 2.18.0]#
Date: 04/01/2024
Minor updates.
PyTorch Neuron release [package ver. 1.*.*.2.9.17.0, SDK ver. 2.16.0]#
Date: 12/21/2023
Minor updates.
PyTorch Neuron release [package ver. 1.*.*.2.9.6.0, SDK ver. 2.15.0]#
Date: 10/26/2023
Minor updates.
PyTorch Neuron release [package ver. 1.*.*.2.9.1.0, SDK ver. 2.13.0]#
Date: 08/28/2023
Added support for clamp_min/clamp_max ATEN operators.
PyTorch Neuron release [package ver. 1.*.*.2.8.9.0, SDK ver. 2.12.0]#
Date: 07/19/2023
Minor updates.
PyTorch Neuron release [2.7.10.0]#
Date: 06/14/2023
New in this release#
Added support for Python 3.10
Bug fixes#
torch.pow Operation now correctly handles mismatch between base and exponent data types
PyTorch Neuron release [2.7.1.0]#
Date: 05/1/2023
Minor updates.
PyTorch Neuron release [2.6.5.0]#
Date: 03/28/2023
New in this release#
Added support for
torch==1.13.1
New releases of
torch-neuron
no longer include versions fortorch==1.7
andtorch==1.8
Added support for Neuron runtime 2.12
Added support for new operators:
aten::tensordot
aten::adaptive_avg_pool1d
aten::prelu
aten::reflection_pad2d
aten::baddbmm
aten::repeat
Added a
separate_weights
flag totorch_neuron.trace()
to support models that are larger than 2GB
Bug fixes#
Fixed
aten::_convolution
with grouping for:torch.nn.Conv1d
torch.nn.Conv3d
torch.nn.ConvTranspose2d
Fixed
aten::linear
to support 1d input tensorsFixed an issue where an input could not be directly returned from the network
PyTorch Neuron release [2.5.0.0]#
Date: 11/23/2022
New in this release#
Added PyTorch 1.12 support
Added Python 3.8 support
Added new operators support. See PyTorch Neuron (torch-neuron) Supported operators
Added support for
aten::lstm
. See: Developer Guide - PyTorch Neuron (torch-neuron) LSTM SupportImproved logging:
Improved error messages for specific compilation failure modes, including out-of-memory errors
Added a warning to show the code location of
prim::PythonOp
operationsRemoved overly-verbose tracing messages
Added improved error messages for
neuron-cc
andtensorflow
dependency issuesAdded more debug information when an invalid dynamic batching configuration is used
Added new beta explicit NeuronCore placement API. See: torch_neuron_core_placement_api
Added new guide for NeuronCore placement. See: PyTorch Neuron (torch-neuron) Core Placement
Improved
torch_neuron.trace()
performance when using large graphsReduced host memory usage of loaded models in
libtorchneuron.so
Added
single_fusion_ratio_threshold
argument totorch_neuron.trace()
to give more fine-grained control of partitioned graphs
Bug fixes#
Improved handling of tensor mutations which previously caused accuracy issues on certain models (i.e. yolor, yolov5)
Fixed an issue where
inf
and-inf
values would cause unexpectedNaN
values. This could occur with newer versions oftransformers
Fixed an issue where
torch.neuron.DataParallel()
would not fully utilize all NeuronCores for specific batch sizesFixed and improved operators:
aten::upsample_bilinear2d
: Improved error messages in cases where the operation cannot be supportedaten::_convolution
: Added support foroutput_padding
argumentaten::div
: Added support forrounding_mode
argumentaten::sum
: Fixed to handle non-numeric data typesaten::expand
: Fixed to handle scalar tensorsaten::permute
: Fixed to handle negative indicesaten::min
: Fixed to support more input typesaten::max
: Fixed to support more input typesaten::max_pool2d
: Fixed to support both 3-dimensional and 4-dimensional input tensorsaten::Int
: Fixed an issue where long values would incorrectly lose precisionaten::constant_pad_nd
: Fixed to correctly use non-0 padding valuesaten::pow
: Fixed to support more input types & valuesaten::avg_pool2d
: Added support forcount_include_pad
argument. Added support forceil_mode
argument if padding isn’t specifiedaten::zero
: Fixed to handle scalars correctlyprim::Constant
: Fixed an issue where-inf
was incorrectly handledImproved handling of scalars in arithmetic operators
PyTorch Neuron release [2.3.0.0]#
Date: 04/29/2022
New in this release#
Added support PyTorch 1.11.
Updated PyTorch 1.10 to version 1.10.2.
End of support for torch-neuron 1.5, see End of support for torch-neuron version 1.5.
Added support for new operators:
aten::masked_fill_
aten::new_zeros
aten::frobenius_norm
Bug fixes#
Improved
aten::gelu
accuracyUpdated
aten::meshgrid
to support optional indexing argument introduced intorch 1.10
, see PyTorch issue 50276
PyTorch Neuron release [2.2.0.0]#
Date: 03/25/2022
New in this release#
Added full support for
aten::max_pool2d_with_indices
- (Was previously supported only when indices were unused).Added new torch-neuron packages compiled with
-D_GLIBCXX_USE_CXX11_ABI=1
, the new packages support PyTorch 1.8, PyTorch 1.9, and PyTorch 1.10. To install the additional packages compiled with-D_GLIBCXX_USE_CXX11_ABI=1
please change the package repo index tohttps://pip.repos.neuron.amazonaws.com (https://pip.repos.neuron.amazonaws.com/)/cxx11/
PyTorch Neuron release [2.1.7.0]#
Date: 01/20/2022
New in this release#
Added PyTorch 1.10 support
Added new operators support, see PyTorch Neuron (torch-neuron) Supported operators
Updated
aten::_convolution
to support 2d group convolutionUpdated
neuron::forward
operators to allocate less dynamic memory. This can increase performance on models with many input & output tensors.Updated
neuron::forward
to better handle batch sizes whendynamic_batch_size=True
. This can increase performance at inference time when the input batch size is exactly equal to the traced model batch size.
Bug fixes#
Added the ability to
torch.jit.trace
atorch.nn.Module
where a submodule has already been traced withtorch_neuron.trace()
on a CPU-type instance. Previously, if this had been executed on a CPU-type instance, an initialization exception would have been thrown.Fixed
aten::matmul
behavior on 1-dimensional by n-dimensional multiplies. Previously, this would cause a validation error.Fixed binary operator type promotion. Previously, in unusual situations, operators like
aten::mul
could produce incorrect results due to invalid casting.Fixed
aten::select
when index was -1. Previously, this would cause a validation error.Fixed
aten::adaptive_avg_pool2d
padding and striding behavior. Previously, this could generate incorrect results with specific configurations.Fixed an issue where dictionary inputs could be incorrectly traced when the tensor values had gradients.
PyTorch Neuron release [2.0.536.0]#
Date: 01/05/2022
New in this release#
Added new operator support for specific variants of operations (See PyTorch Neuron (torch-neuron) Supported operators)
Added optional
optimizations
keyword totorch_neuron.trace()
which accepts a list ofOptimization
passes.
PyTorch Neuron release [2.0.468.0]#
Date: 12/15/2021
New in this release#
Added support for
aten::cumsum
operation.Fixed
aten::expand
to correctly handle adding new dimensions.
PyTorch Neuron release [2.0.392.0]#
Date: 11/05/2021
Updated Neuron Runtime (which is integrated within this package) to
libnrt 2.2.18.0
to fix a container issue that was preventing the use of containers when /dev/neuron0 was not present. See details here neuron-runtime-release-notes.
PyTorch Neuron release [2.0.318.0]#
Date: 10/27/2021
New in this release#
PyTorch Neuron 1.x now support Neuron Runtime 2.x (
libnrt.so
shared library) only.Important
You must update to the latest Neuron Driver (
aws-neuron-dkms
version 2.1 or newer) for proper functionality of the new runtime library.Read Introducing Neuron Runtime 2.x (libnrt.so) application note that describes why are we making this change and how this change will affect the Neuron SDK in detail.
Read Migrate your application to Neuron Runtime 2.x (libnrt.so) for detailed information of how to migrate your application.
Introducing PyTorch 1.9.1 support (support for
torch==1.9.1)
Added
torch_neuron.DataParallel
, see ResNet-50 tutorial [html] and Data Parallel Inference on Torch Neuron application note.Added support for tracing on GPUs
Added support for
ConvTranspose1d
Added support for new operators:
aten::empty_like
aten::log
aten::type_as
aten::movedim
aten::einsum
aten::argmax
aten::min
aten::argmin
aten::abs
aten::cos
aten::sin
aten::linear
aten::pixel_shuffle
aten::group_norm
aten::_weight_norm
Added
torch_neuron.is_available()
Resolved Issues#
Fixed a performance issue when using both the
dynamic_batch_size=True
trace option and--neuron-core-pipeline
compiler option. Dynamic batching now usesOpenMP
to execute pipeline batches concurrently.Fixed
torch_neuron.trace
issues:Fixed a failure when the same submodule was traced with multiple inputs
Fixed a failure where some operations would fail to be called with the correct arguments
Fixed a failure where custom operators (torch plugins) would cause a trace failure
Fixed variants of
aten::upsample_bilinear2d
whenscale_factor=1
Fixed variants of
aten::expand
usingdim=-1
Fixed variants of
aten::stack
using multiple different input data typesFixed variants of
aten::max
using indices outputs
[1.8.1.1.5.21.0]#
Date: 08/12/2021
Summary#
Minor updates.
[1.8.1.1.5.7.0]#
Date: 07/02/2021
Summary#
Added support for dictionary outputs using
strict=False
flag. See /neuron-guide/neuron-frameworks/pytorch-neuron/troubleshooting-guide.rst.Updated
aten::batch_norm
to correctly implement theaffine
flag.Added support for
aten::erf
andprim::DictConstruct
. See PyTorch Neuron (torch-neuron) Supported operators.Added dynamic batch support. See /neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.rst.
[1.8.1.1.4.1.0]#
Date: 5/28/2021
Summary#
Added support for PyTorch 1.8.1
Models compatibility
Models compiled with previous versions of PyTorch Neuron (<1.8.1) are compatible with PyTorch Neuron 1.8.1.
Models compiled with PyTorch Neuron 1.8.1 are not backward compatible with previous versions of PyTorch Neuron (<1.8.1) .
Updated tutorials to use Hugging Face Transformers 4.6.0.
Added a new set of forward operators (forward_v2)
Host memory allocation when loading the same model on multiple NeuronCores is significantly reduced
Fixed an issue where models would not deallocate all memory within a python session after being garbage collected.
Fixed a TorchScript/C++ issue where loading the same model multiple times would not use multiple NeuronCores by default.
Fixed logging to no longer configure the root logger.
Removed informative messages that were produced during compilations as warnings. The number of warnings reduced significantly.
Convolution operator support has been extended to include ConvTranspose2d variants.
Reduce the amount of host memory usage during inference.
[1.7.1.1.3.5.0]#
Date: 4/30/2021
Summary#
ResNext models now functional with new operator support
Yolov5 support refer to aws/aws-neuron-sdk#253 note ultralytics/yolov5#2953 which optimized YoloV5 for AWS Neuron
Convolution operator support has been extended to include most Conv1d and Conv3d variants
New operator support. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.
[1.7.1.1.2.16.0]#
Date: 3/4/2021
Summary#
Minor enhancements.
[1.7.1.1.2.15.0]#
Date: 2/24/2021
Summary#
Fix for CVE-2021-3177.
[1.7.1.1.2.3.0]#
Date: 1/30/2021
Summary#
Made changes to allow models with -inf scalar constants to correctly compile
Added new operator support. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.
[1.1.7.0]#
Date: 12/23/2020
Summary#
We are dropping support for Python 3.5 in this release
torch.neuron.trace behavior will now throw a RuntimeError in the case that no operators are compiled for neuron hardware
torch.neuron.trace will now display compilation progress indicators (dots) as default behavior (neuron-cc must updated to the December release to greater to see this feature)
Added new operator support. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.
Extended the BERT pretrained tutorial to demonstrate execution on multiple cores and batch modification, updated the tutorial to accomodate changes in the Hugging Face Transformers code for version 4.0
Added a tutorial for torch-serve which extends the BERT tutorial
Added support for PyTorch 1.7
[1.0.1978.0]#
Date: 11/17/2020
Summary#
Fixed bugs in comparison operators, and added remaining variantes (eq, ne, gt, ge, lt, le)
Added support for prim::PythonOp - note that this must be run on CPU and not Neuron. We recommend you replace this code with PyTorch operators if possible
Support for a series of new operators. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.
Performance improvements to the runtime library
Correction of a runtime library bug which caused models with large tensors to generate incorrect results in some cases
[1.0.1721.0]#
Date: 09/22/2020
Summary#
Various minor improvements to the Pytorch autopartitioner feature
Support for the operators aten::constant_pad_nd, aten::meshgrid
Improved performance on various torchvision models. Of note are resnet50 and vgg16
[1.0.1532.0]#
Date: 08/08/2020
Summary#
Various minor improvements to the Pytorch autopartitioner feature
Support for the aten:ones operator
[1.0.1522.0]#
Date: 08/05/2020
Summary#
Various minor improvements.
[1.0.1386.0]#
Date: 07/16/2020
Summary#
This release adds auto-partitioning, model analysis and PyTorch 1.5.1 support, along with a number of new operators
Major New Features#
Support for Pytorch 1.5.1
Introduce an automated operator device placement mechanism in torch.neuron.trace to run sub-graphs that contain operators that are not supported by the neuron compiler in native PyTorch. This new mechanism is on by default and can be turned off by adding argument fallback=False to the compiler arguments.
Model analysis to find supported and unsupported operators in a model
Resolved Issues#
[1.0.1168.0]#
Date 6/11/2020
Summary#
Major New Features#
Resolved Issues#
Known Issues and Limitations#
[1.0.1001.0]#
Date: 5/11/2020
Summary#
Additional PyTorch operator support and improved support for model saving and reloading.
Major New Features#
Added Neuron Compiler support for a number of previously unsupported PyTorch operators. Please see :ref:`neuron-cc-ops-pytorch`for the complete list of operators.
Add support for torch.neuron.trace on models which have previously been saved using torch.jit.save and then reloaded.
Resolved Issues#
Known Issues and Limitations#
[1.0.825.0]#
Date: 3/26/2020
Summary#
Major New Features#
Resolved Issues#
Known Issues and limitations#
[1.0.763.0]#
Date: 2/27/2020
Summary#
Added Neuron Compiler support for a number of previously unsupported PyTorch operators. Please see PyTorch Neuron (torch-neuron) Supported operators for the complete list of operators.
Major new features#
None
Resolved issues#
None
[1.0.672.0]#
Date: 1/27/2020
Summary#
Major new features#
Resolved issues#
Python 3.5 and Python 3.7 are now supported.
Known issues and limitations#
Other Notes#
[1.0.627.0]#
Date: 12/20/2019
Summary#
This is the initial release of torch-neuron. It is not distributed on the DLAMI yet and needs to be installed from the neuron pip repository.
Note that we are currently using a TensorFlow as an intermediate format to pass to our compiler. This does not affect any runtime execution from PyTorch to Neuron Runtime and Inferentia. This is why the neuron-cc installation must include [tensorflow] for PyTorch.
Major new features#
Resolved issues#
Known issues and limitations#
Models TESTED#
The following models have successfully run on neuron-inferentia systems
SqueezeNet
ResNet50
Wide ResNet50
Pytorch Serving#
In this initial version there is no specific serving support. Inference works correctly through Python on Inf1 instances using the neuron runtime. Future releases will include support for production deployment and serving of models
Profiler support#
Profiler support is not provided in this initial release and will be available in future releases
Automated partitioning#
Automatic partitioning of graphs into supported and non-supported operations is not currently supported. A tutorial is available to provide guidance on how to manually parition a model graph. Please see pytorch-manual-partitioning-jn-tutorial
PyTorch dependency#
Currently PyTorch support depends on a Neuron specific version of PyTorch v1.3.1. Future revisions will add support for 1.4 and future releases.
Trace behavior#
In order to trace a model it must be in evaluation mode. For examples please see ResNet50 model for Inferentia
Six pip package is required#
The Six package is required for the torch-neuron runtime, but it is not modeled in the package dependencies. This will be fixed in a future release.
Multiple NeuronCore support#
If the num-neuroncores options is used the number of cores must be manually set in the calling shell environment variable for compilation and inference.
For example: Using the keyword argument compiler_args=[’—num-neuroncores’, ‘4’] in the trace call, requires NEURONCORE_GROUP_SIZES=4 to be set in the environment at compile time and runtime
CPU execution#
At compilation time a constant output is generated for the purposes of tracing. Running inference on a non neuron instance will generate incorrect results. This must not be used. The following error message is generated to stderr:
Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only
indicate tensor shape
Other notes#
Python version(s) supported:
3.6
Linux distribution supported:
DLAMI Ubuntu 18 and Amazon Linux 2 (using Python 3.6 Conda environments)
Other AMIs based on Ubuntu 18
For Amazon Linux 2 please install Conda and use Python 3.6 Conda environment
This document is relevant for: Inf1