This document is relevant for: Inf1

LibTorch C++ Tutorial#


This tutorial demonstrates the use of LibTorch with Neuron, the SDK for Amazon Inf1, Inf2 and Trn1 instances. By the end of this tutorial, you will understand how to write a native C++ application that performs inference on EC2 Inf1, Inf2 and Trn1 instances. We will use an inf1.6xlarge and a pretrained BERT-Base model to determine if one sentence is a paraphrase of another.

Verify that this tutorial is running in a virtual environement that was set up according to the Torch-Neuronx Installation Guide <> or Torch-Neuron Installation Guide <>


The tutorial has been tested on Inf1, Inf2 and Trn1 instances on ubuntu instances.

Run the tutorial#

This tutorial is self contained. It produces similar output to [html] [notebook].

Note: The tutorial will use about 8.5 GB of disk space. Please ensure you have sufficient space before beginning.

Right-click and copy this link address to the tutorial archive.

wget <paste archive URL>
tar xvf libtorch_demo.tar.gz

Your directory tree should now look like this:

├── bert_neuronx
│   ├──
│   └──
├── core_count
│   ├──
│   └── main.cpp
├── example_app
│   ├──
│   ├── core_count.hpp
│   ├── example_app.cpp
│   ├── README.txt
│   ├── utils.cpp
│   └── utils.hpp
├── neuron.patch
├── tokenizer.json
└── tokenizers_binding
    ├── remote_rust_tokenizer.h
    ├── tokenizer.json
    ├── tokenizer_test
    ├── tokenizer_test.cpp

This tutorial uses the HuggingFace Tokenizers library implemented in Rust. Install Cargo, the package manager for the Rust programming language.

Run the setup script to download additional depdendencies and build the app. (This may take a few minutes to complete.)

cd libtorch_demo
chmod +x && ./
+ g++ utils.cpp example_app.cpp -o ../example-app -O2 -D_GLIBCXX_USE_CXX11_ABI=0 -I../libtorch/include -L../tokenizers_binding/lib -L/opt/aws/neuron/lib/ -L../libtorch/lib -Wl,-rpath,libtorch/lib -Wl,-rpath,tokenizers_binding/lib -Wl,-rpath,/opt/aws/neuron/lib/ -ltokenizers -ltorchneuron -ltorch_cpu -lc10 -lpthread -lnrt
Successfully completed setup


The setup script should have compiled and saved a PyTorch model compiled for neuron ( Run the provided sanity tests to ensure everything is working properly.

Running tokenization sanity checks.

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Tokenizing: 100%|██████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 15021.69it/s]
Python took 0.67 seconds.
Sanity check passed.
Begin 10000 timed tests.
End timed tests.
C++ took 0.226 seconds.

Tokenization sanity checks passed.
Running end-to-end sanity check.

The company HuggingFace is based in New York City
HuggingFace's headquarters are situated in Manhattan
not paraphrase: 10%
paraphrase: 90%

The company HuggingFace is based in New York City
Apples are especially bad for your health
not paraphrase: 94%
paraphrase: 6%

Sanity check passed.

Finally, run the example app directly to benchmark the BERT model.


You can safely ignore the warning about None of PyTorch, Tensorflow >= 2.0, .... This occurs because the test runs in a small virtual environment that doesn’t require the full frameworks.

Getting ready................
Completed 32000 operations in 43 seconds => 4465.12 pairs / second

Summary information:
Batch size = 6
Num neuron cores = 16
Num runs per neuron core = 2000

Congratulations! By now you should have successfully built and used a native C++ application with LibTorch.


  • In the event of SIGBUS errors you may have insufficient disk space for the creation of temporary model files at runtime. Consider clearing space or mounting additional disk storage.

  • In the event of a neuron runtime failure, confirm that the Neuron kernel module is loaded using sudo modprobe neuron.

This document is relevant for: Inf1