A typical Neuron developer flow includes compilation phase and then deployment (inference) on inf1 instance/s.
To quickly start developing with Neuron:
Setup your environment to run one of the Neuron tutorials on AWS ML accelerator instance:
You can also check Setup Guide for more options of installing Neuron.
For Neuron containers setup please visit Containers.
Run a tutorial from one of the leading machine learning frameworks supported by Neuron:
Learn more about Neuron
Customers can train their models anywhere and easily migrate their ML applications to Neuron and run their high-performance production predictions with Inferentia. Once a model is trained to the required accuracy, model is compiled to an optimized binary form, referred to as a Neuron Executable File Format (NEFF), and loaded by the Neuron runtime driver to execute inference input requests on the Inferentia chips. Developers have the option to train their models in fp16 or keep training in 32-bit floating point for best accuracy and Neuron will auto-cast the 32-bit trained model to run at speed of 16-bit using bfloat16.