This document is relevant for: Inf1
Tutorial: Neuron Apache MXNet Model Serving#
This MXNet Neuron Model Serving (MMS) example is adapted from the MXNet vision service example which uses pretrained squeezenet to perform image classification: awslabs/multi-model-server.
Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed along with Neuron Compiler.
Warning#
If you are using MXNet-1.5, please note that MXNet-1.5 entered maintenance mode and require Neuron Runtime 1.x, please see 10/27/2021 - Neuron support for Apache MXNet 1.5 enters maintenance mode. To setup development environment for MXNet-1.5 see installation instructions at MXNet Neuron Setup.
If using DLAMI, you can activate the environment aws_neuron_mxnet_p36 and skip the installation part in the first step below.
First, install Java runtime and multi-model-server:
cd ~/
# sudo yum -y install -q jre # for AML2
sudo apt-get install -y -q default-jre # for Ubuntu
pip install multi-model-server
Download the example code:
git clone https://github.com/awslabs/multi-model-server
cd ~/multi-model-server/examples/mxnet_vision
Compile ResNet50 model to Inferentia target by saving the following Python script to compile_resnet50.py and run “
python compile_resnet50.py
”
from packaging import version
import numpy as np
import mxnet as mx
mxnet_version = version.parse(mx.__version__)
if mxnet_version >= version.parse("1.8"):
import mx_neuron as neuron
else:
from mxnet.contrib import neuron
path='http://data.mxnet.io/models/imagenet/'
mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')
mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')
mx.test_utils.download(path+'synset.txt')
nn_name = "resnet-50"
#Load a model
sym, args, auxs = mx.model.load_checkpoint(nn_name, 0)
#Define compilation parameters
# - input shape and dtype
inputs = {'data' : mx.nd.zeros([1,3,224,224], dtype='float32') }
# compile graph to inferentia target
csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs)
# save compiled model
mx.model.save_checkpoint(nn_name + "_compiled", 0, csym, cargs, cauxs)
Prepare signature file
signature.json
to configure the input name and shape:
{
"inputs": [
{
"data_name": "data",
"data_shape": [
1,
3,
224,
224
]
}
]
}
Prepare
synset.txt
which is a list of names for ImageNet prediction classes:
curl -O https://s3.amazonaws.com/model-server/model_archive_1.0/examples/squeezenet_v1.1/synset.txt
Create custom service class following template in model_server_template folder:
cp -r ../model_service_template/* .
Edit mxnet_model_service.py
to use the appropriate context.
Make the following change:
from packaging import version
mxnet_version = version.parse(mx.__version__)
if mxnet_version >= version.parse("1.8"):
import mx_neuron as neuron
self.mxnet_ctx = mx.neuron()
Comment out the existing context set:
#self.mxnet_ctx = mx.cpu() if gpu_id is None else mx.gpu(gpu_id)
Also, comment out unnecessary data copy for model_input in
mxnet_model_service.py
:
#model_input = [item.as_in_context(self.mxnet_ctx) for item in model_input]
Package the model with model-archiver:
cd ~/multi-model-server/examples
model-archiver --force --model-name resnet-50_compiled --model-path mxnet_vision --handler mxnet_vision_service:handle
Start MXNet Model Server (MMS) and load model using RESTful API. Please ensure that Neuron RTD is running with default settings (see rtd-getting-started):
cd ~/multi-model-server/
multi-model-server --start --model-store examples
# Pipe to log file if you want to keep a log of MMS
curl -v -X POST "http://localhost:8081/models?initial_workers=1&max_workers=1&synchronous=true&url=resnet-50_compiled.mar"
sleep 10 # allow sufficient time to load model
Each worker requires a NeuronCore group that can accommodate the compiled
model. Additional workers can be added by increasing max_workers
configuration as long as there are enough NeuronCores available. Use
neuron-top
to see which models are loaded on specific NeuronCores.
Test inference using an example image:
curl -O https://raw.githubusercontent.com/awslabs/multi-model-server/master/docs/images/kitten_small.jpg
curl -X POST http://127.0.0.1:8080/predictions/resnet-50_compiled -T kitten_small.jpg
You will see the following output:
[
{
"probability": 0.6375716328620911,
"class": "n02123045 tabby, tabby cat"
},
{
"probability": 0.1692783385515213,
"class": "n02123159 tiger cat"
},
{
"probability": 0.12187337130308151,
"class": "n02124075 Egyptian cat"
},
{
"probability": 0.028840631246566772,
"class": "n02127052 lynx, catamount"
},
{
"probability": 0.019691042602062225,
"class": "n02129604 tiger, Panthera tigris"
}
]
To cleanup after test, issue a delete command via RESTful API and stop the model server:
curl -X DELETE http://127.0.0.1:8081/models/resnet-50_compiled
multi-model-server --stop
This document is relevant for: Inf1