.. _torch_neuron_core_placement_guide:

.. meta::
   :noindex:
   :nofollow:
   :description: This content is archived and no longer maintained.
   :date-modified: 2026-03-11

PyTorch Neuron (``torch-neuron``) Core Placement
================================================

.. warning::

   This document is archived. torch-neuron (Inf1) is no longer officially supported
   by the AWS Neuron SDK. It is provided for reference only. For current
   framework support, see :doc:`/frameworks/index`.


This programming guide describes the available techniques and APIs to be able
to allocate NeuronCores to a process and place models onto specific NeuronCores.
In order of precedence, the current recommendation is to use the following
placement techniques:

1. For most regular models, default core placement should be used in
   conjunction with ``NEURON_RT_NUM_CORES`` (:ref:`torch_placement_default`)
2. For more specific core placement for NeuronCore Pipelined models, then
   ``NEURONCORE_GROUP_SIZES`` should be used (:ref:`torch_placement_ncg`).
3. Finally, for even more granular control, then the beta
   explicit placement APIs may be used (:ref:`torch_placement_explicit`).

.. contents:: Table of Contents
    :depth: 3

The following guide will assume a machine with 8 NeuronCores:

- NeuronCores will use the notation ``nc0``, ``nc1``, etc.
- NeuronCore Groups will use the notation ``ncg0``, ``ncg1`` etc.
- Models will use the notation ``m0``, ``m1`` etc.

NeuronCores, NeuronCore Groups, and model allocations will be displayed in
the following format:

.. raw:: html
    :file: images/0-0-legend.svg

Note that the actual cores that are visible to the process can be adjusted
according to the :ref:`nrt-configuration`.

NeuronCore Pipeline
-------------------

A key concept to understand the intent behind certain core placement strategies
is NeuronCore Pipelining (See :ref:`neuroncore-pipeline`). NeuronCore Pipelining
allows a model to be automatically split into pieces and executed on different
NeuronCores.

For most models only 1 NeuronCore will be required for execution. A model will
**only** require more than one NeuronCore when using NeuronCore Pipeline.
When model pipelining is enabled, the model is split between multiple
NeuronCores and data is transferred between them. For example, if the compiler
flag ``--neuroncore-pipeline-cores 4`` is used, this splits the model into
4 pieces to be executed on 4 separate NeuronCores.

.. _torch_placement_default:

Default Core Allocation & Placement
-----------------------------------

The most basic requirement of an inference application is to be able to place a
single model on a single NeuronCore. More complex applications may use multiple
NeuronCores or even multiple processes each executing different models. The
important thing to note about designing an inference application is that a
single NeuronCore will always be allocated to a single process. *Processes do
not share NeuronCores*. Different configurations can be used to ensure that
an application process has enough NeuronCores allocated to execute its model(s):

- Default: A process will attempt to take ownership of **all NeuronCores**
  visible on the instance. This should be used when an instance is only running
  a single inference process since no other process will be allowed to take
  ownership of any NeuronCores.
- ``NEURON_RT_NUM_CORES``: Specify the **number of NeuronCores** to allocate
  to the process. This places no restrictions on which NeuronCores will be used,
  however, the resulting NeuronCores will always be contiguous. This should be
  used in multi-process applications where each process should only use a subset
  of NeuronCores.
- ``NEURON_RT_VISIBLE_CORES``: Specifies exactly **which NeuronCores** are
  allocated to the process by index. Similar to ``NEURON_RT_NUM_CORES``, this
  can be used in multi-process applications where each process should only use a
  subset of NeuronCores. This provides more fined-grained controls over the
  exact NeuronCores that are allocated to a given process.
- ``NEURONCORE_GROUP_SIZES``: Specifies a number of **NeuronCore Groups** which
  are allocated to the process. This is described in more detail in the
  :ref:`torch_placement_ncg` section.

See the :ref:`nrt-configuration` for more environment variable details.

Example: Default
^^^^^^^^^^^^^^^^

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0
    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1


.. raw:: html
    :file: images/0-1-default-2.svg

With no environment configuration, the process will take ownership of all
NeuronCores. In this example, only two of the NeuronCores are used by the
process and the remaining are allocated but left idle.


Example: ``NEURON_RT_NUM_CORES``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Environment Setup**:

.. code-block:: bash

    export NEURON_RT_NUM_CORES = '2'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0
    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1

.. raw:: html
    :file: images/0-2-default-rt-num-cores.svg

Since there is no other process on the instance, only the first 2 NeuronCores
will be acquired by the process. Models load in a simple linear order to the
least used NeuronCores.


Example: ``NEURON_RT_VISIBLE_CORES``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Environment Setup**:

.. code-block:: bash

    export NEURON_RT_VISIBLE_CORES = '4-5'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc4
    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc5


.. raw:: html
    :file: images/0-3-default-rt-visible-cores.svg

Unlike ``NEURON_RT_NUM_CORES``, setting the visible NeuronCores allows the
process to take control of a specific contiguous set. This allows an application
to have a more fine-grained control of where models will be placed.


Example: Overlapping Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Environment Setup**:

.. code-block:: bash

    export NEURON_RT_VISIBLE_CORES = '0-1'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0
    m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1
    m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1

.. raw:: html
    :file: images/0-4-default-overlap-model-2.svg

.. raw:: html
    :file: images/0-4-default-overlap.svg

This shows how models may share NeuronCores but the default model placement
will attempt to evenly distribute NeuronCore usage rather than overlapping all
models on a single NeuronCore.


Example: Multiple Processes
^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Environment Setup**:

.. code-block:: bash

    export NEURON_RT_NUM_CORES = '2'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0
    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1


In this example, if the script is run **twice**, the following allocations
will be made:

.. raw:: html
    :file: images/0-5-default-multiprocess.svg

Note that each process will take ownership of as many NeuronCores as is
specified by the ``NEURON_RT_NUM_CORES`` configuration.


.. _torch_placement_ncg:

NEURONCORE_GROUP_SIZES
----------------------

.. important::

    The use of explicit core placement should only be used when a specific
    performance goal is required. By default ``torch-neuron`` places models on
    the **least used** NeuronCores. This should be optimal for most
    applications.

    Secondly, ``NEURONCORE_GROUP_SIZES`` is being deprecated in a future
    release and should be avoided in favor of newer placement methods.
    Use ``NEURON_RT_NUM_CORES`` or ``NEURON_RT_VISIBLE_CORES`` with default
    placement if possible (See :ref:`torch_placement_default`)


In the current release of NeuronSDK, the most well-supported method of placing
models onto specific NeuronCores is to use the ``NEURONCORE_GROUP_SIZES``
environment variable. This will define a set of "NeuronCore Groups" for the
application process.

NeuronCore Groups are *contiguous sets of NeuronCores* that are allocated to
a given process. Creating groups allows an application to ensure that a
model has a defined set of NeuronCores that will always be allocated to it.

Note that NeuronCore Groups *can* be used to allocate non-pipelined models
(those requiring exactly 1 NeuronCore) to specific NeuronCores but this is
not the primary intended use. The intended use of NeuronCore Groups is to
ensure pipelined models (those requiring >1 NeuronCore) have exclusive access
to a specific set of contiguous NeuronCores.

In the cases where models are being used *without* NeuronCore Pipeline, the
general recommendation is to use default placement
(See :ref:`torch_placement_default`).

The following section demonstrates how ``NEURONCORE_GROUP_SIZES`` can be used
and the issues that may arise.

Example: Single NeuronCore Group
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In the example where one model requires 4 NeuronCores, the correct environment
configuration would be:

**Environment Setup**:

.. code-block:: bash

    export NEURONCORE_GROUP_SIZES = '4'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-4-neuron-pipeline-cores.pt')  # Loads to nc0-nc3


.. raw:: html
    :file: images/1-ncg-4.svg

This is the most basic usage of a NeuronCore Group. The environment setup
causes the process to take control of 4 NeuronCores and then the script loads
a model compiled with a NeuronCore Pipeline size of 4 to the first group.


Example: Multiple NeuronCore Groups
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

With more complicated configurations, the intended use of
``NEURONCORE_GROUP_SIZES`` is to create 1 Group per model with the correct size
to ensure that the models are placed on the intended NeuronCores. Similarly, the
environment would need to be configured to create a NeuronCore Group for each
model:

**Environment Setup**:

.. code-block:: bash

    export NEURONCORE_GROUP_SIZES = '3,4,1'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt')  # Loads to nc0-nc2
    m1 = torch.jit.load('model-with-4-neuron-pipeline-cores.pt')  # Loads to nc3-nc6
    m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc7




.. raw:: html
    :file: images/2-ncg-3-4-1.svg


Issue: Overlapping Models with Differing Model Sizes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When multiple models are loaded to a single NeuronCore Group, this can cause
unintended inefficiencies. A single model is only intended to span a single
NeuronCore Group. Applications with many models of varying sizes can be
restricted by NeuronCore Group configurations since the most optimal model
layout may require more fine-grained controls.

**Environment Setup**:

.. code-block:: bash

    export NEURONCORE_GROUP_SIZES = '2,2'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1
    m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc2-nc3
    m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0
    m3 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc2
    m4 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0


.. raw:: html
    :file: images/3-models-m4-0-warning.svg

.. raw:: html
    :file: images/3-models-m2-0-m3-2.svg

.. raw:: html
    :file: images/3-ncg-2-2.svg


Here the ``NEURONCORE_GROUP_SIZES`` does not generate an optimal layout
because placement strictly follows the layout of NeuronCore Groups. A
potentially more optimal layout would be to place ``m4`` onto ``nc1``. In this
case, since a pipelined model will not be able to have exclusive access to a set
of NeuronCores, the default NeuronCore placement (no NeuronCore Groups
specified) would more evenly distribute the models.

Also note here that this is an example of where the order of model loads
affects which model is assigned to which NeuronCore Group. If the order of the
load statements is changed, models may be assigned to different NeuronCore
Groups.


Issue: Incompatible Model Sizes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Another problem occurs when attempting to place a model which does not evenly
fit into a single group:

**Environment Setup**:

.. code-block:: bash

    export NEURONCORE_GROUP_SIZES = '2,2'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1
    m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc2-nc3
    m2 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt')  # Loads to nc0-nc2


.. raw:: html
    :file: images/4-models-m2-0-2-warning.svg

.. raw:: html
    :file: images/3-ncg-2-2.svg


The model will be placed *across* NeuronCore Groups since there is no obvious
group to assign the model to according to the environment variable
configuration. Depending on the individual model and application requirements,
the placement here may not be optimal.


Issue: Multiple Model Copies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is common in inference serving applications to use multiple replicas of a
single model across different NeuronCores. This allows the hardware to be fully
utilized to maximize throughput. In this scenario, when using NeuronCore
Groups, the only way to replicate a model on multiple NeuronCores is to create a
*new model* object. In the example below, 4 models loads are performed to place
a model in each NeuronCore Group.

**Environment Setup**:

.. code-block:: bash

    export NEURONCORE_GROUP_SIZES = '2,2,2,2'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    models = list()
    for _ in range(4):
        model = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')
        models.append(model)


.. raw:: html
    :file: images/3-ncg-2-2-2-2-copies.svg


The largest consequence of this type of model allocation is that the application
code is responsible for routing inference requests to models. There are a
variety of ways to implement the inference switching but in all cases routing
logic needs to be implemented in the application code.


Issue Summary
^^^^^^^^^^^^^

The use of ``NEURONCORE_GROUP_SIZES`` has the following problems:

- **Variable Sized Models**: Models which require crossing NeuronCore Group
  boundaries may be placed poorly. This means group configuration limits the
  size of which models can be loaded.
- **Model Load Order**: Models are loaded to NeuronCore Groups greedily. This
  means that the order of model loads can potentially negatively affect
  application performance by causing unintentional overlap.
- **Implicit Placement**: NeuronCore Groups cannot be explicitly chosen in the
  application code.
- **Manual Replication**: When loading multiple copies of a model to different
  NeuronCore Groups, this requires that multiple model handles are used.


.. _torch_placement_explicit:

Explicit Core Placement
-------------------------------------

To address the limitations of ``NEURONCORE_GROUP_SIZES``, a new set of APIs has
been added which allows specific NeuronCores to be chosen by the application
code. These can be found in the :ref:`torch_neuron_core_placement_api` documentation.


Example: Manual Core Selection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The most direct usage of the placement APIs is to manually select the
start NeuronCore that each model is loaded to. This will automatically use as
many NeuronCores as is necessary for that model (1 for most models, >1 for
NeuronCore Pipelines models).

**Environment Setup**:

.. code-block:: bash

    export NEURON_RT_NUM_CORES = '4'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    # NOTE: Order of loads does NOT matter

    with torch_neuron.experimental.neuron_cores_context(2):
        m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc2-nc3

    with torch_neuron.experimental.neuron_cores_context(0):
        m2 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt')  # Loads to nc0-nc2

    with torch_neuron.experimental.neuron_cores_context(0):
        m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1

    with torch_neuron.experimental.neuron_cores_context(3):
        m3 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc3


.. raw:: html
    :file: images/5-models-m2-0-2-m3-3.svg

.. raw:: html
    :file: images/5-placement.svg


Note that this directly solves the ``NEURONCORE_GROUP_SIZES`` issues of:

- **Variable Sized Models**: Now since models are directly placed on the
  NeuronCores requested by the application, there is no disconnect
  between the model sizes and NeuronCore Group sizes.
- **Model Load Order**: Since the NeuronCores are explicitly selected, there is
  no need to be careful about the order in which models are loaded since they
  can be placed deterministically regardless of the load order.
- **Implicit Placement**: Similarly, explicit placement means there is no chance
  that a model will end up being allocated to an incorrect NeuronCore Group.


Example: Automatic Multicore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using explicit core placement it is possible to replicate a model to multiple
NeuronCores simultaneously. This means that a single model object within python
can utilize all available NeuronCores (or NeuronCores allocated to the process).

**Environment Setup**:

.. code-block:: bash

    export NEURON_RT_NUM_CORES = '8'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    with torch_neuron.experimental.multicore_context():
        m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads replications to nc0-nc7


.. raw:: html
    :file: images/6-multicore.svg


This addresses the last ``NEURONCORE_GROUP_SIZES`` issue of:

- **Manual Replication**: Since models can be automatically replicated to
  multiple NeuronCores, this means that applications no longer need to implement
  routing logic and perform multiple loads.

This API has a secondary benefit that the exact same loading logic can be used
on an ``inf1.xlarge`` or an ``inf1.6xlarge``. In either case, it will use all
of the NeuronCores that are visible to the process. This means that no special
logic needs to be coded for different instance types.


Example: Explicit Replication
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Replication is also possible with the
:func:`~torch_neuron.experimental.neuron_cores_context` API. The number of
replications is chosen by ``replications = floor(nc_count / cores_per_model)``.


**Environment Setup**:

.. code-block:: bash

    export NEURON_RT_NUM_CORES = '8'

**Python Script**:

.. code-block:: python

    import torch
    import torch_neuron

    with torch_neuron.experimental.neuron_cores_context(start_nc=2, nc_count=4):
        m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads replications to nc2-nc5


.. raw:: html
    :file: images/7-replication.svg
