Tutorial: Neuron Apache MXNet Model Serving#

This MXNet Neuron Model Serving (MMS) example is adapted from the MXNet vision service example which uses pretrained squeezenet to perform image classification:

Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed along with Neuron Compiler.


If you are using MXNet-1.5, please note that MXNet-1.5 entered maintenance mode and require Neuron Runtime 1.x, please see 10/27/2021 - Neuron support for Apache MXNet 1.5 enters maintenance mode. To setup development environment for MXNet-1.5 see installation instructions at MXNet Neuron Setup.

If using DLAMI, you can activate the environment aws_neuron_mxnet_p36 and skip the installation part in the first step below.

  1. First, install Java runtime and multi-model-server:

cd ~/
# sudo yum -y install -q jre # for AML2
sudo apt-get install -y -q default-jre  # for Ubuntu
pip install multi-model-server

Download the example code:

git clone
cd ~/multi-model-server/examples/mxnet_vision
  1. Compile ResNet50 model to Inferentia target by saving the following Python script to and run “python

from packaging import version
import numpy as np
import mxnet as mx

mxnet_version = version.parse(mx.__version__)
if mxnet_version >= version.parse("1.8"):
   import mx_neuron as neuron
   from mxnet.contrib import neuron


nn_name = "resnet-50"

#Load a model
sym, args, auxs = mx.model.load_checkpoint(nn_name, 0)

#Define compilation parameters
#  - input shape and dtype
inputs = {'data' : mx.nd.zeros([1,3,224,224], dtype='float32') }

# compile graph to inferentia target
csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs)

# save compiled model
mx.model.save_checkpoint(nn_name + "_compiled", 0, csym, cargs, cauxs)
  1. Prepare signature file signature.json to configure the input name and shape:

  "inputs": [
      "data_name": "data",
      "data_shape": [
  1. Prepare synset.txt which is a list of names for ImageNet prediction classes:

curl -O
  1. Create custom service class following template in model_server_template folder:

cp -r ../model_service_template/* .

Edit to use the appropriate context.

Make the following change:

from packaging import version

mxnet_version = version.parse(mx.__version__)
if mxnet_version >= version.parse("1.8"):
   import mx_neuron as neuron
self.mxnet_ctx = mx.neuron()

Comment out the existing context set:

#self.mxnet_ctx = mx.cpu() if gpu_id is None else mx.gpu(gpu_id)

Also, comment out unnecessary data copy for model_input in

#model_input = [item.as_in_context(self.mxnet_ctx) for item in model_input]
  1. Package the model with model-archiver:

cd ~/multi-model-server/examples
model-archiver --force --model-name resnet-50_compiled --model-path mxnet_vision --handler mxnet_vision_service:handle
  1. Start MXNet Model Server (MMS) and load model using RESTful API. Please ensure that Neuron RTD is running with default settings (see rtd-getting-started):

cd ~/multi-model-server/
multi-model-server --start --model-store examples
# Pipe to log file if you want to keep a log of MMS
curl -v -X POST "http://localhost:8081/models?initial_workers=1&max_workers=1&synchronous=true&url=resnet-50_compiled.mar"
sleep 10 # allow sufficient time to load model

Each worker requires a NeuronCore group that can accommodate the compiled model. Additional workers can be added by increasing max_workers configuration as long as there are enough NeuronCores available. Use neuron-top to see which models are loaded on specific NeuronCores.

  1. Test inference using an example image:

curl -O
curl -X POST -T kitten_small.jpg

You will see the following output:

    "probability": 0.6375716328620911,
    "class": "n02123045 tabby, tabby cat"
    "probability": 0.1692783385515213,
    "class": "n02123159 tiger cat"
    "probability": 0.12187337130308151,
    "class": "n02124075 Egyptian cat"
    "probability": 0.028840631246566772,
    "class": "n02127052 lynx, catamount"
    "probability": 0.019691042602062225,
    "class": "n02129604 tiger, Panthera tigris"
  1. To cleanup after test, issue a delete command via RESTful API and stop the model server:

curl -X DELETE

multi-model-server --stop

