Neuron Apache MXNet - Configurations for NeuronCore Groups Using Resnet50#

Introduction:#

In this tutorial we will compile and deploy Resnet-50 model in parallel using the concept of NeuronCore Groups on an Inf1 instance. This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger. For simplicity we will run this tutorial on inf1.6xlarge but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs.

Set environment variable NEURON_RT_NUM_CORES to the total number of Neuron cores that will be utilized. The consecutive NeuronCore groups will be created by Neuron Runtime and place the models to the cores according to the compiled size.

Note that in order to map a model to a group, the model must be compiled to fit within the group size. To limit the number of NeuronCores during compilation, use compiler_args dictionary with field “–neuroncore-pipeline-cores“ set to the group size. For exmaple, if NEURON_RT_NUM_CORES=4 and two models compiled with “–neuroncore-pipeline-cores=3“ and “–neuroncore-pipeline-cores=1“ were loaded, the first model would occupy NC0-2 and the second model would occupy NC3.

compile_args = {'--neuroncore-pipeline-cores' : 2}
sym, args, auxs = neuron.compile(sym, args, auxs, inputs, **compile_args)

In this tutorial we provide two main sections:

  1. Compile the Resnet50 model for Neuron

  2. Run inference using NeuronCore Groups

Please use environment conda_aws_neuron_mxnet_p36.

Compile model for Neuron#

Model must be compiled to Inferentia target before it can be used on Inferentia. In the following we will compile the the flag, –neuroncore-pipeline-cores set to 2 and run it. The files resnet-50_compiled-0000.params and resnet-50_compiled-symbol.json will be created in local directory

[ ]:
from packaging import version
import mxnet as mx
import numpy as np

import mx_neuron as neuron

path='http://data.mxnet.io/models/imagenet/'
mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')
mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')
sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)

# Compile for Inferentia using Neuron, fit to NeuronCore group size of 2
inputs = { "data" : mx.nd.ones([1,3,224,224], name='data', dtype='float32') }
compile_args = {'--neuroncore-pipeline-cores' : 2}
sym, args, aux = neuron.compile(sym, args, aux, inputs, **compile_args)

#save compiled model
mx.model.save_checkpoint("resnet-50_compiled", 0, sym, args, aux)

Run inference using NeuronCore Groups#

Within the framework, the model can be mapped to specific cores using ctx=mx.neuron(N) context where N specifies the index of the Neuron core to deploy. For more information, see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/appnotes/perf/flex-eg.html .

[ ]:
import os
import warnings

mx.test_utils.download(path+'synset.txt')

fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg?raw=true')
img = mx.image.imread(fname) # convert into format (batch, RGB, width, height)
img = mx.image.imresize(img, 224, 224) # resize
img = img.transpose((2, 0, 1)) # Channel first
img = img.expand_dims(axis=0) # batchify
img = img.astype(dtype='float32')

sym, args, aux = mx.model.load_checkpoint('resnet-50_compiled', 0)
softmax = mx.nd.random_normal(shape=(1,))
args['softmax_label'] = softmax
args['data'] = img

os.environ["NEURON_RT_NUM_CORES"] = '4'


# Inferentia context - group index 1 (size 2) would skip NC0 and place the
# compiled model onto NC1,2
ctx = mx.neuron(1)

exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null')

with open('synset.txt', 'r') as f:
     labels = [l.rstrip() for l in f]

exe.forward(data=img)
prob = exe.outputs[0].asnumpy()# print the top-5
prob = np.squeeze(prob)
a = np.argsort(prob)[::-1]
for i in a[0:5]:
     print('probability=%f, class=%s' %(prob[i], labels[i]))

You can experiment with different Neuron core group combinations and different models.

Troubleshooting#

If not enough NeuronCores are provided, an error message will be displayed:

mxnet.base.MXNetError: [04:01:39] src/operator/subgraph/neuron/./neuron_util.h:541: Check failed: rsp.status().code() == 0: Failed load model with Neuron-RTD Error. Neuron-RTD Status Code: 9, details: ""