This document is relevant for: Inf1

LibTorch C++ Tutorial#


This tutorial demonstrates the use of LibTorch with Neuron, the SDK for Amazon Inf1, Inf2 and Trn1 instances. By the end of this tutorial, you will understand how to write a native C++ application that performs inference on EC2 Inf1, Inf2 and Trn1 instances. We will use an inf1.6xlarge and a pretrained BERT-Base model to determine if one sentence is a paraphrase of another.

Verify that this tutorial is running in a virtual environement that was set up according to the Torch-Neuronx Installation Guide <> or Torch-Neuron Installation Guide <>


The tutorial has been tested on Inf1, Inf2 and Trn1 instances on ubuntu instances.

Run the tutorial#

This tutorial is self contained. It produces similar output to [html] [notebook].

Note: The tutorial will use about 8.5 GB of disk space. Please ensure you have sufficient space before beginning.

Right-click and copy this link address to the tutorial archive.

wget <paste archive URL>
tar xvf libtorch_demo.tar.gz

Your directory tree should now look like this:

├── bert_neuronx
│   ├──
│   └──
├── core_count
│   ├──
│   └── main.cpp
├── example_app
│   ├──
│   ├── core_count.hpp
│   ├── example_app.cpp
│   ├── README.txt
│   ├── utils.cpp
│   └── utils.hpp
├── neuron.patch
├── tokenizer.json
└── tokenizers_binding
    ├── remote_rust_tokenizer.h
    ├── tokenizer.json
    ├── tokenizer_test
    ├── tokenizer_test.cpp

This tutorial uses the HuggingFace Tokenizers library implemented in Rust. Install Cargo, the package manager for the Rust programming language.


Amazon Linux

sudo apt install -y cargo
sudo yum install -y cargo

Run the setup script to download additional depdendencies and build the app. (This may take a few minutes to complete.)

cd libtorch_demo
chmod +x && ./
+ PATH_NEURON_LIB=/opt/aws/neuron/lib/
+ g++ utils.cpp example_app.cpp -o ../example-app -O2 -D_GLIBCXX_USE_CXX11_ABI=0 -I../libtorch/include -L../tokenizers_binding/lib -L/opt/aws/neuron/lib/ -L../libtorch/lib -Wl,-rpath,libtorch/lib -Wl,-rpath,tokenizers_binding/lib -Wl,-rpath,/opt/aws/neuron/lib/ -ltokenizers -ltorchneuron -ltorch_cpu -lc10 -lpthread -lnrt
Successfully completed setup


The setup script should have compiled and saved a PyTorch model compiled for neuron ( Run the provided sanity tests to ensure everything is working properly.

Running tokenization sanity checks.

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Tokenizing: 100%|██████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 15021.69it/s]
Python took 0.67 seconds.
Sanity check passed.
Begin 10000 timed tests.
End timed tests.
C++ took 0.226 seconds.

Tokenization sanity checks passed.
Running end-to-end sanity check.

The company HuggingFace is based in New York City
HuggingFace's headquarters are situated in Manhattan
not paraphrase: 10%
paraphrase: 90%

The company HuggingFace is based in New York City
Apples are especially bad for your health
not paraphrase: 94%
paraphrase: 6%

Sanity check passed.

Finally, run the example app directly to benchmark the BERT model.


You can safely ignore the warning about None of PyTorch, Tensorflow >= 2.0, .... This occurs because the test runs in a small virtual environment that doesn’t require the full frameworks.

Getting ready................
Completed 32000 operations in 43 seconds => 4465.12 pairs / second

Summary information:
Batch size = 6
Num neuron cores = 16
Num runs per neuron core = 2000

Congratulations! By now you should have successfully built and used a native C++ application with LibTorch.


  • In the event of SIGBUS errors you may have insufficient disk space for the creation of temporary model files at runtime. Consider clearing space or mounting additional disk storage.

  • In the event of a neuron runtime failure, confirm that the Neuron kernel module is loaded using sudo modprobe neuron.

This document is relevant for: Inf1