Neuron Apache MXNet (Incubating) Release Notes

This document lists the release notes for MXNet-Neuron framework.

[1.5.1.1.6.5.0]

Date 08/12/2021

Summary

Minor bug fixes and enhancements for MXNet 1.5 Neuron.

[1.8.0.1.3.4.0]

Date 08/12/2021

Summary

Minor bug fixes and enhancements for MXNet 1.8 Neuron.

[1.5.1.1.6.1.0]

Date 07/02/2021

Summary

Minor bug fixes and enhancements for MXNet 1.5 Neuron.

[1.8.0.1.3.0.0]

Date 07/02/2021

Summary

Support for Autoloop, Cpredict API and minor bug fixes and enhancements for MXNet 1.8 Neuron.

Major New Features

  • Added support for Autoloop feature for MXNet 1.8 Neuron.

Resolved Issues

  • Added support for CPredict API.

[1.8.0.1.2.1.0]

Date 5/28/2021

Summary

Minor bug fixes and enhancements for MXNet 1.8 Neuron

Resolved Issues

  • Added support for Neuron profiler

[1.8.0.1.1.2.0]

Date 4/30/2021

Summary

Initial release of Apache MXNet (Incubating) 1.8 for Neuron

Major New Features

[1.5.1.1.4.x.x]

Date 5/28/2021

Summary

  • Minor enhancements.

[1.5.1.1.4.4.0]

Date 4/30/2021

Summary

  • Resolve an issue with Neuron profiling.

Resolved Issues

  • Issue: when Neuron profiling is enabled in MXNet-Neuron 1.5.1 (using NEURON_PROFILE=<dir>), and TensorBoard is used to read in the profiled data, user would see an error messsage “panic: runtime error: index out of range”. This issue is resolved in this release.

[1.5.1.1.3.8.0]

Date 3/4/2021

Summary

Minor enhancements.

[1.5.1.1.3.7.0]

Date 2/24/2021

Summary

Fix for CVE-2021-3177.

[1.5.1.1.3.2.0]

Date 1/30/2021

Summary

Various minor improvements

[1.5.1.1.2.1.0]

Date 12/23/2020

Summary

Various minor improvements

[1.5.1.1.1.88.0]

Date 11/17/2020

Summary

This release includes the bug fix for MXNet Model Server not being able to clean up Neuron RTD states after model is unloaded (deleted) from model server.

Resolved Issues

  • Issue: MXNet Model Server is not able to clean up Neuron RTD states after model is unloaded (deleted) from model server.

    • Workaround for earlier versions: run “/opt/aws/neuron/bin/neuron-cli reset“ to

    clear Neuron RTD states after all models are unloaded and server is shut down.

[1.5.1.1.1.52.0]

Date 09/22/2020

Summary

Various minor improvements.

Major New Features

Resolved Issues

  • Issue: When first importing MXNet into python process and subprocess call is invoked, user may get an OSError exception “OSError: [Errno 14] Bad address” during subprocess call (see https://github.com/apache/incubator-mxnet/issues/13875 for more details). This issue is fixed with a mitigation patch from MXNet for Open-MP fork race conditions.

    • Workaround for earlier versions: Export KMP_INIT_AT_FORK=false before running python process.

[1.5.1.1.1.1.0]

Date 08/08/2020

Summary

Various minor improvements.

Major New Features

Resolved Issues

[1.5.1.1.0.2101.0]

Date 08/05/2020

Summary

Various minor improvements.

Major New Features

Resolved Issues

[1.5.1.1.0.2093.0]

Date 07/16/2020

Summary

This release contains a few bug fixes and user experience improvements.

Major New Features

Resolved Issues

  • User can specify NEURONCORE_GROUP_SIZES without brackets (for example, “1,1,1,1”), as can be done in TensorFlow-Neuron and PyTorch-Neuron.

  • Fixed a memory leak when inferring neuron subgraph properties

  • Fixed a bug dealing with multi-input subgraphs

[1.5.1.1.0.2033.0]

Date 6/11/2020

Summary

  • Added support for profiling during inference

Major New Features

  • Profiling can now be enabled by specifying the profiling work directory using NEURON_PROFILE environment variable during inference. For an example of using profiling, see Getting Started: TensorBoard-Neuron (Deprecated). (Note that graph view of MXNet graph is not available via TensorBoard).

Resolved Issues

Known Issues and Limitations

Other Notes

[1.5.1.1.0.1900.0]

Date 5/11/2020

Summary

Improved support for shared-memory communication with Neuron-Runtime.

Major New Features

  • Added support for the BERT-Base model (base: L-12 H-768 A-12), max sequence length 64 and batch size of 8.

  • Improved security for usage of shared-memory for data transfer between framework and Neuron-Runtime

  • Improved allocation and cleanup of shared-memory resource

  • Improved container support by automatic falling back to GRPC data transfer if shared-memory cannot be allocated by Neuron-Runtime

Resolved Issues

  • User is unable to allocate Neuron-Runtime shared-memory resource when using MXNet-Neuron in a container to communicate with Neuron-Runtime in another container. This is resolved by automatic falling back to GRPC data transfer if shared-memory cannot be allocated by Neuron-Runtime.

  • Fixed issue where some large models could not be loaded on inferentia.

Known Issues and Limitations

Other Notes

[1.5.1.1.0.1596.0]

Date 3/26/2020

Summary

No major changes or fixes

Major New Features

Resolved Issues

Known Issues and Limitations

Other Notes

[1.5.1.1.0.1498.0]

Date 2/27/2020

Summary

No major changes or fixes.

Major New Features

Resolved Issues

The issue(s) below are resolved:

  • Latest pip version 20.0.1 breaks installation of MXNet-Neuron pip wheel which has py2.py3 in the wheel name.

Known Issues and Limitations

  • User is unable to allocate Neuron-Runtime shared-memory resource when using MXNet-Neuron in a container to communicate with Neuron-Runtime in another container. To work-around, please set environment variable NEURON_RTD_USE_SHM to 0.

Other Notes

[1.5.1.1.0.1401.0]

Date 1/27/2020

Summary

No major changes or fixes.

Major New Features

Resolved Issues

  • The following issue is resolved when the latest multi-model-server with version >= 1.1.0 is used with MXNet-Neuron. You would still need to use “/opt/aws/neuron/bin/neuron-cli reset” to clear all Neuron RTD states after multi-model-server is exited:

    • Issue: MXNet Model Server is not able to clean up Neuron RTD states after model is unloaded (deleted) from model server and previous workaround “/opt/aws/neuron/bin/neuron-cli reset” is unable to clear all Neuron RTD states.

Known Issues and Limitations

  • Latest pip version 20.0.1 breaks installation of MXNet-Neuron pip wheel which has py2.py3 in the wheel name. This breaks all existing released versions. The error looks like:

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
ERROR: Could not find a version that satisfies the requirement mxnet-neuron (from versions: none)
ERROR: No matching distribution found for mxnet-neuron
  • Work around: install the older version of pip using “pip install pip==19.3.1”.

Other Notes

[1.5.1.1.0.1325.0]

Date 12/1/2019

Summary

Major New Features

Resolved Issues

  • Issue: Compiler flags cannot be passed to compiler during compile call. The fix: compiler flags can be passed to compiler during compile call using “flags” option followed by a list of flags.

  • Issue: Advanced CPU fallback option is a way to attempt to improve the number of operators on Inferentia. The default is currently set to on, which may cause failures. The fix: This option is now off by default.

Known Issues and Limitations

  • Issue: MXNet Model Server is not able to clean up Neuron RTD states after model is unloaded (deleted) from model server and previous workaround “/opt/aws/neuron/bin/neuron-cli reset” is unable to clear all Neuron RTD states.

    • Workaround: run “sudo systemctl restart neuron-rtd“ to clear Neuron RTD states after all models are unloaded and server is shut down.

Other Notes

[1.5.1.1.0.1349.0]

Date 12/20/2019

Summary

No major changes or fixes. Released with other Neuron packages.

[1.5.1.1.0.1325.0]

Date 12/1/2019

Summary

Major New Features

Resolved Issues

  • Issue: Compiler flags cannot be passed to compiler during compile call. The fix: compiler flags can be passed to compiler during compile call using “flags” option followed by a list of flags.

  • Issue: Advanced CPU fallback option is a way to attempt to improve the number of operators on Inferentia. The default is currently set to on, which may cause failures. The fix: This option is now off by default.

Known Issues and Limitations

  • Issue: MXNet Model Server is not able to clean up Neuron RTD states after model is unloaded (deleted) from model server and previous workaround “/opt/aws/neuron/bin/neuron-cli reset” is unable to clear all Neuron RTD states.

    • Workaround: run “sudo systemctl restart neuron-rtd“ to clear Neuron RTD states after all models are unloaded and server is shut down.

Other Notes

[1.5.1.1.0.1260.0]

Date: 11/25/2019

Summary

This version is available only in released DLAMI v26.0 and is based on MXNet version 1.5.1. Please Known Issues to latest version.

Major new features

Resolved issues

Known issues and limitations

  • Issue: Compiler flags cannot be passed to compiler during compile call.

  • Issue: Advanced CPU fallback option is a way to attempt to improve the number of operators on Inferentia. The default is currently set to on, which may cause failures.

    • Workaround: explicitly turn it off by setting compile option op_by_op_compiler_retry to 0.

  • Issue: Temporary files are put in current directory when debug is enabled.

    • Workaround: create a separate work directory and run the process from within the work directory

  • Issue: MXNet Model Server is not able to clean up Neuron RTD states after model is unloaded (deleted) from model server.

    • Workaround: run “/opt/aws/neuron/bin/neuron-cli reset“ to clear Neuron RTD states after all models are unloaded and server is shut down.

  • Issue: MXNet 1.5.1 may return inconsistent node names for some operators when they are the primary outputs of a Neuron subgraph. This causes failures during inference.

    compile_args = { 'excl_node_names': ["node_name_to_exclude"] }
    

Models Supported

The following models have successfully run on neuron-inferentia systems

  1. Resnet50 V1/V2

  2. Inception-V2/V3/V4

  3. Parallel-WaveNet

  4. Tacotron 2

  5. WaveRNN

Other Notes

  • Python versions supported:

    • 3.5, 3.6, 3.7

  • Linux distribution supported:

    • Ubuntu 18, Amazon Linux 2