This document is relevant for: Inf2
, Trn1
, Trn1n
nki.baremetal#
- nki.baremetal(kernel=None, **kwargs)[source]#
Compile and run a NKI kernel on NeuronDevice without involving ML frameworks such as PyTorch and JAX. If you decorate your NKI kernel function with decorator
@nki.baremetal(...)
, you may call the NKI kernel function directly just like any other Python function. You must run this API on a Trn/Inf instance with NeuronDevices (v2 or beyond) attached.Note
The decorated function using
nki.baremetal
expects numpy.ndarray as input/output tensors instead of ML framework tensor objects.This decorator compiles the NKI kernel into an executable on NeuronDevices (
NEFF
) and also collects an execution trace (NTFF
) by running theNEFF
on the local NeuronDevice. See Profiling NKI kernels with Neuron Profile for more information on how to visualize the execution trace for profiling purposes.Since
nki.baremetal
runs the compiled NEFF without invoking any ML framework, it is the fastest way to compile and run any NKI kernel standalone on NeuronDevice. Therefore, this decorator is useful for quickly iterating an early implementation of a NKI kernel to reach functional correctness before porting it to the ML framework and injecting the kernel into the full ML model. To iterate over NKI kernel performance quickly, NKI also provides nki.benchmark decorator which uses the same underlying mechanism asnki.baremetal
but additionally collects latency statistics in different percentiles.- Parameters:
save_neff_name – A file path to save your NEFF file. By default, this is unspecified, and the NEFF file will be deleted automatically after execution.
save_trace_name – A file path to save your NTFF file. By default, this is unspecified, and the NTFF file will be deleted automatically after execution. Known issue: if
save_trace_name
is specified,save_neff_name
must be set to “file.neff”.additional_compile_opt – Additional Neuron compiler flags to pass in when compiling the kernel.
artifacts_dir – A directory path to save Neuron compiler artifacts. The directory must be empty before running the kernel. A non-empty directory would lead to a compilation error.
- Returns:
None
from neuronxcc.nki import baremetal import neuronxcc.nki.language as nl import numpy as np @baremetal(save_neff_name='file.neff', save_trace_name='profile.ntff') def nki_tensor_tensor_add(a_tensor, b_tensor, c_tensor): a = nl.load(a_tensor) b = nl.load(b_tensor) c = a + b nl.store(c_tensor, c) a = np.zeros([128, 1024], dtype=np.float32) b = np.random.random_sample([128, 1024]).astype(np.float32) c = np.ndarray(shape=(128, 1024), dtype=np.float32) nki_tensor_tensor_add(a, b, c) assert np.allclose(c, b)
This document is relevant for: Inf2
, Trn1
, Trn1n