Neuron Kernel Interface (NKI) release notes#
Neuron Kernel Interface (NKI) (Beta) [2.26]#
Date: 09/18/2025
new
nki.languageAPIs:nki.language.gelu_apprx_sigmoid- Gaussian Error Linear Unit activation function with sigmoid approximation.nki.language.tile_size.total_available_sbuf_sizeto get total available SBUF size
new
nki.isaAPIs:nki.isa.select_reduce- selectively copy elements with max reductionnki.isa.sequence_bounds- compute sequence bounds of segment IDsnki.isa.dma_transposeaxesparam to define 4D transpose for some supported casesdge_modeto specify Descriptor Generation Engine (DGE).
nl.gelu_apprx_sigmoidop support onnki.isa.activation
fixes / improvements:
nki.language.storesupports PSUM buffer with extra additional copy inserted.
docs/tutorial improvements:
nki.isa.dma_transposeAPI doc and examplenki.simulate_kernelexample improvementuse
nl.fp32.minin tutorial code instead of a magic number
better error reporting:
indirect indexing on transpose
mask expressions
Neuron Kernel Interface (NKI) (Beta) [2.24]#
Date: 06/24/2025
sqrtvalid data range extended for accuracy improvement with wider numerical values support.nki.language.gather_flattenednew APInki.isa.nc_match_replace8additional paramdst_idximproved docs/examples on
nki.isa.nc_match_replace8,nki.isa.nc_stream_shuffleimproved error messages
Neuron Kernel Interface (NKI) (Beta) [2.23]#
Date: 05/20/2025
nki.isa.range_select(for trn2) new instructionabs,powerops supported on to nki.isa tensor instructionabsop supported onnki.isa.activationinstructionGpSIMD engine support added to
add,multiplyin 32bit integer to nki.isa tensor operationsnki.isa.tensor_copy_predicatedsupport for reversing predicate.nki.isa.tensor_copy_dynamic_src,tensor_copy_dynamic_dstengine selection.nki.isa.dma_copyadditional support withdge_mode,oob_mode, and in-place addrmw_op.+=, -=, /=, *=operators now work consistently across loop types, PSUM, and SBUF,fixed simulation for instructions:
nki.language.rand,random_seed,nki.isa.dropoutfixed simulation masking behavior
Added warning when the block dimension is used for SBUF and PSUM tensors, see: NKI Block Dimension Migration Guide
Neuron Kernel Interface (NKI) (Beta) [2.22]#
Date: 04/03/2025
New modules and APIs:
nki.profilenki.isanew APIs:tensor_copy_dynamic_dsttensor_copy_predicatedmax8,nc_find_index8,nc_match_replace8nc_stream_shuffle
nki.languagenew APIs:mod,fmod,reciprocal,broadcast_to,empty_like
Improvements:
nki.isa.nc_matmulnow supports PE tiling featurenki.isa.activationupdated to support reduce operation andreducecommandsnki.isa.engineenumengineparameter added to morenki.isaAPIs that support engine selection (ie,tensor_scalar,tensor_tensor,memset)Documentation for
nki.kernelshave been moved to the GitHub: https://aws-neuron.github.io/nki-samples. The source code can be viewed at aws-neuron/nki-samples.These kernels are still shipped as part of Neuron package in
neuronxcc.nki.kernelsmodule
Documentation updates:
Kernels public repository https://aws-neuron.github.io/nki-samples
Updated profiling guide to use
nki.profileinstead ofnki.benchmarkNKI ISA Activation functions table now have valid input data ranges listed
NKI ISA Supported Math operators now have supported engine listed
Clarify
+=syntax support/limitation
Neuron Kernel Interface (NKI) (Beta) [2.21]#
Date: 12/16/2024
New modules and APIs:
nki.compilermodule with Allocation Control and Kernel decorators, see guide for more info.nki.isa: new APIs (activation_reduce,tensor_partition_reduce,scalar_tensor_tensor,tensor_scalar_reduce,tensor_copy,tensor_copy_dynamic_src,dma_copy), new activation functions(identity,silu,silu_dx), and target query APIs (nc_version,get_nc_version).nki.language: new APIs (shared_identity_matrix,tan,silu,silu_dx,left_shift,right_shift,ds,spmd_dim,nc).New
datatype <nl_datatypes>:float8_e5m2New
kernels(allocated_fused_self_attn_for_SD_small_head_size,allocated_fused_rms_norm_qkv) added, kernels moved to public repository.
Improvements:
Semantic analysis checks for nki.isa APIs to validate supported ops, dtypes, and tile shapes.
Standardized naming conventions with keyword arguments for common optional parameters.
Transition from function calls to kernel decorators (
jit,benchmark,baremetal,simulate_kernel).
Documentation updates:
Neuron Kernel Interface (NKI) (Beta)#
Date: 12/03/2024
NKI support for Trainium2, including full integration with Neuron Compiler. Users can directly shard NKI kernels across multiple Neuron Cores from an SPMD launch grid. See tutorial for more info. See Trainium2 Architecture Guide for an initial version of the architecture specification (more details to come in future releases).
New calling convention in NKI kernels, where kernel output tensors are explicitly returned from the kernel instead of pass-by-reference. See any NKI tutorial for code examples.
Neuron Kernel Interface (NKI) (Beta) [2.20]#
Date: 09/16/2024
This release includes the beta launch of the Neuron Kernel Interface (NKI) (Beta). NKI is a programming interface enabling developers to build optimized compute kernels on top of Trainium and Inferentia. NKI empowers developers to enhance deep learning models with new capabilities, performance optimizations, and scientific innovation. It natively integrates with PyTorch and JAX, providing a Python-based programming environment with Triton-like syntax and tile-level semantics offering a familiar programming experience for developers. Additionally, to enable bare-metal access precisely programming the instructions used by the chip, this release includes a set of NKI APIs (
nki.isa) that directly emit Neuron Instruction Set Architecture (ISA) instructions in NKI kernels.