nc_matmul
|
Compute stationary.T @ moving matrix multiplication using Tensor Engine. |
nc_transpose
|
Perform a 2D transpose between the partition axis and the free axis of input data , i.e., a PF-transpose, using Tensor or Vector Engine. |
activation
|
Apply an activation function on every element of the input tile using Scalar Engine. |
activation_reduce
|
Perform the same computation as nisa.activation and also a reduction along the free dimension of the nisa.activation result using Scalar Engine. |
tensor_reduce
|
Apply a reduction operation to the free axes of an input data tile using Vector Engine. |
tensor_partition_reduce
|
Apply a reduction operation across partitions of an input data tile using GpSimd Engine. |
tensor_tensor
|
Perform an element-wise operation of input two tiles using Vector Engine or GpSimd Engine. |
tensor_tensor_scan
|
Perform a scan operation of two input tiles using Vector Engine. |
scalar_tensor_tensor
|
Apply up to two math operators using Vector Engine: (data <op0> operand0) <op1> operand1 . |
tensor_scalar
|
Apply up to two math operators to the input data tile by broadcasting scalar/vector operands in the free dimension using Vector or Scalar or GpSimd Engine: (data <op0> operand0) <op1> operand1 . |
tensor_scalar_reduce
|
Perform the same computation as nisa.tensor_scalar with one math operator and also a reduction along the free dimension of the nisa.tensor_scalar result using Vector Engine. |
tensor_copy
|
Create a copy of src tile within NeuronCore on-chip SRAMs using Vector, Scalar or GpSimd Engine. |
tensor_copy_dynamic_src
|
Create a copy of src tile within NeuronCore on-chip SRAMs using Vector or Scalar or GpSimd Engine, with src located at a dynamic offset within each partition. |
tensor_copy_dynamic_dst
|
Create a copy of src tile within NeuronCore on-chip SRAMs using Vector or Scalar or GpSimd Engine, with dst located at a dynamic offset within each partition. |
tensor_copy_predicated
|
Conditionally copy elements from the src tile to the destination tile on SBUF / PSUM based on a predicate using Vector Engine. |
reciprocal
|
Compute reciprocal of each element in the input data tile using Vector Engine. |
iota
|
Build a constant literal in SBUF using GpSimd Engine, rather than transferring the constant literal values from the host to device. |
dropout
|
Randomly replace some elements of the input tile data with zeros based on input probabilities using Vector Engine. |
affine_select
|
Select elements between an input tile on_true_tile and a scalar value on_false_value according to a boolean predicate tile using GpSimd Engine. |
range_select
|
Select elements from on_true_tile based on comparison with bounds using Vector Engine. |
select_reduce
|
Selectively copy elements from either on_true or on_false to the destination tile based on a predicate using Vector Engine, with optional reduction (max). |
sequence_bounds
|
Compute the sequence bounds for a given set of segment IDs using GpSIMD Engine. |
memset
|
Initialize a tile filled with a compile-time constant value using Vector or GpSimd Engine. |
bn_stats
|
Compute mean- and variance-related statistics for each partition of an input tile data in parallel using Vector Engine. |
bn_aggr
|
Aggregate one or multiple bn_stats outputs to generate a mean and variance per partition using Vector Engine. |
local_gather
|
Gather SBUF data in src_buffer using index on GpSimd Engine. |
dma_copy
|
Copy data from src to dst using DMA engine. |
dma_transpose
|
Perform a transpose on input src using DMA Engine. |
max8
|
Find the 8 largest values in each partition of the source tile. |
nc_find_index8
|
Find indices of the 8 given vals in each partition of the data tensor. |
nc_match_replace8
|
Replace first occurrence of each value in vals with imm in data using the Vector engine and return the replaced tensor. |
nc_stream_shuffle
|
Apply cross-partition data movement within a quadrant of 32 partitions from source tile src to destination tile dst using Vector Engine. |