This document is relevant for: Inf1
PyTorch Neuron (torch-neuron
) Core Placement API [Beta]#
Warning
The following functionality is beta and will not be supported in
future releases of the Neuron SDK. This module serves only as a preview for
future functionality. In future releases, equivalent functionality may
be moved directly to the torch_neuron
module and will no longer be
available in the torch_neuron.experimental
module.
Functions which enable placement of torch.jit.ScriptModule
to specific
NeuronCores. Two sets of functions are provided which can be used
interchangeably but have different performance characteristics and advantages:
The
multicore_context()
&neuron_cores_context()
functions are context managers that allow a model to be placed on a given NeuronCore attorch.jit.load()
time. These functions are the most efficient way of loading a model since the model is loaded directly to a NeuronCore. The alternative functions described below require that a model is unloaded from one core and then reloaded to another.The
set_multicore()
&set_neuron_cores()
functions allow a model that has already been loaded to a NeuronCore to be moved to a different NeuronCore. This functionality is less efficient than directly loading a model to a NeuronCore within a context manager but allows device placement to be fully dynamic at runtime. This is analogous to thetorch.nn.Module.to()
function for device placement.
Important
A prerequisite to enable placement functionality is that
the loaded torch.jit.ScriptModule
has already been compiled with
the torch_neuron.trace()
API. Attempting to place a regular
torch.nn.Module
onto a NeuronCore prior to compilation will do
nothing.
- torch_neuron.experimental.multicore_context()#
A context which loads all Neuron subgraphs to all visible NeuronCores.
This loads each Neuron subgraph within a
torch.jit.ScriptModule
to multiple NeuronCores without requiring multiple calls totorch.jit.load()
. This allows a singletorch.jit.ScriptModule
to use multiple NeuronCores for concurrent threadsafe inferences. Executions use a round-robin strategy to distribute across NeuronCores.Any calls to
torch.jit.load()
will cause any underlying Neuron subgraphs to load to the specified NeuronCores within this context. This context manager only needs to be used during the model load. After loading, inferences do not need to occur in this context in order to use the correct NeuronCores.Note that this context is not threadsafe. Using multiple core placement contexts from multiple threads may not correctly place models.
- Raises:
RuntimeError – If the Neuron runtime cannot be initialized.
Examples
Multiple Core Replication: Directly load a model to all visible NeuronCores. This allows a single
torch.jit.ScriptModule
to use all NeuronCores by running round-robin executions.>>> with torch_neuron.experimental.multicore_context(): >>> model = torch.jit.load('example_neuron_model.pt') >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 1 >>> model(example) # Executes on NeuronCore 2
- torch_neuron.experimental.neuron_cores_context(start_nc: int = -1, nc_count: int = -1)#
A context which sets the NeuronCore start/count for all Neuron subgraphs.
Any calls to
torch.jit.load()
will cause any underlying Neuron subgraphs to load to the specified NeuronCores within this context. This context manager only needs to be used during the model load. After loading, inferences do not need to occur in this context in order to use the correct NeuronCores.Note that this context is not threadsafe. Using multiple core placement contexts from multiple threads may not correctly place models.
- Parameters:
start_nc – The starting NeuronCore index where the Module is placed. The value
-1
automatically loads to the optimal NeuronCore (least used). Note that this index is always relative to NeuronCores visible to this process.nc_count – The number of NeuronCores to use. The value
-1
will load a model to exactly the number of cores required by that model (1 for most models, >1 when using NeuronCore Pipeline). Ifnc_count
is greater than the number of NeuronCores required by the model, the model will be replicated across multiple NeuronCores.(replications = floor(nc_count / cores_per_model))
- Raises:
RuntimeError – If the Neuron runtime cannot be initialized.
ValueError – If the
nc_count
is an invalid number of NeuronCores.
Examples
Single Load: Directly load a model from disk to the first visible NeuronCore.
>>> with torch_neuron.experimental.neuron_cores_context(start_nc=0, nc_count=1): >>> model = torch.jit.load('example_neuron_model.pt') >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0
Multiple Core Replication: Directly load a model from disk to 2 NeuronCores. This allows a single
torch.jit.ScriptModule
to use multiple NeuronCores by running round-robin executions.>>> with torch_neuron.experimental.neuron_cores_context(start_nc=2, nc_count=2): >>> model = torch.jit.load('example_neuron_model.pt') >>> model(example) # Executes on NeuronCore 2 >>> model(example) # Executes on NeuronCore 3 >>> model(example) # Executes on NeuronCore 2
Multiple Model Load: Directly load 2 models from disk and pin them to separate NeuronCores. This causes each
torch.jit.ScriptModule
to always execute on a specific NeuronCore.>>> with torch_neuron.experimental.neuron_cores_context(start_nc=2): >>> model1 = torch.jit.load('example_neuron_model.pt') >>> with torch_neuron.experimental.neuron_cores_context(start_nc=0): >>> model2 = torch.jit.load('example_neuron_model.pt') >>> model1(example) # Executes on NeuronCore 2 >>> model1(example) # Executes on NeuronCore 2 >>> model2(example) # Executes on NeuronCore 0 >>> model2(example) # Executes on NeuronCore 0
- torch_neuron.experimental.set_multicore(trace: torch.jit.ScriptModule)#
Loads all Neuron subgraphs in a torch Module to all visible NeuronCores.
This loads each Neuron subgraph within a
torch.jit.ScriptModule
to multiple NeuronCores without requiring multiple calls totorch.jit.load()
. This allows a singletorch.jit.ScriptModule
to use multiple NeuronCores for concurrent threadsafe inferences. Executions use a round-robin strategy to distribute across NeuronCores.This will unload the model from an existing NeuronCore if it is already loaded.
Requires Torch 1.8+
- Parameters:
trace – A torch module which contains one or more Neuron subgraphs.
- Raises:
RuntimeError – If the Neuron runtime cannot be initialized.
Examples
Multiple Core Replication: Move a model across all visible NeuronCores after loading. This allows a single
torch.jit.ScriptModule
to use all NeuronCores by running round-robin executions.>>> model = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_multicore(model) >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 1 >>> model(example) # Executes on NeuronCore 2
- torch_neuron.experimental.set_neuron_cores(trace: torch.jit.ScriptModule, start_nc: int = -1, nc_count: int = -1)#
Set the NeuronCore start/count for all Neuron subgraphs in a torch Module.
This will unload the model from an existing NeuronCore if it is already loaded.
Requires Torch 1.8+
- Parameters:
trace – A torch module which contains one or more Neuron subgraphs.
start_nc – The starting NeuronCore index where the Module is placed. The value
-1
automatically loads to the optimal NeuronCore (least used). Note that this index is always relative to NeuronCores visible to this process.nc_count – The number of NeuronCores to use. The value
-1
will load a model to exactly the number of cores required by that model (1 for most models, >1 when using NeuronCore Pipeline). Ifnc_count
is greater than the number of NeuronCores required by the model, the model will be replicated across multiple NeuronCores.(replications = floor(nc_count / cores_per_model))
- Raises:
RuntimeError – If the Neuron runtime cannot be initialized.
ValueError – If the
nc_count
is an invalid number of NeuronCores.
Examples
Single Load: Move a model to the first visible NeuronCore after loading.
>>> model = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model, start_nc=0, nc_count=1) >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0
Multiple Core Replication: Replicate a model to 2 NeuronCores after loading. This allows a single
torch.jit.ScriptModule
to use multiple NeuronCores by running round-robin executions.>>> model = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model, start_nc=2, nc_count=2) >>> model(example) # Executes on NeuronCore 2 >>> model(example) # Executes on NeuronCore 3 >>> model(example) # Executes on NeuronCore 2
Multiple Model Load: Move and pin 2 models to separate NeuronCores. This causes each
torch.jit.ScriptModule
to always execute on a specific NeuronCore.>>> model1 = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model1, start_nc=2) >>> model2 = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model2, start_nc=0) >>> model1(example) # Executes on NeuronCore 2 >>> model1(example) # Executes on NeuronCore 2 >>> model2(example) # Executes on NeuronCore 0 >>> model2(example) # Executes on NeuronCore 0
This document is relevant for: Inf1