This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.reciprocal#
- nki.isa.reciprocal(data, *, dtype=None, mask=None, engine=0, **kwargs)[source]#
Compute reciprocal of each element in the input
data
tile using Scalar Engine or Vector Engine.The target compute engine can be specified with the
engine
parameter. Vector Engine performs reciprocal at a higher precision compared to Scalar Engine; however, the computation throughput of reciprocal on Vector Engine is about 8x lower than Scalar Engine for large input tiles. For input tiles with a small number of elements per partition (less than 64), we suggest using Vector Engine for the better precision and comparable performance between Scalar and Vector Engines due to instruction initiation intervals.Estimated instruction cost:
Cost (Engine Cycles)
Condition
max(MIN_II, N)
engine
set tonki.isa.scalar_engine
max(MIN_II, 8*N)
engine
set tonki.isa.vector_engine
where,
N
is the number of elements per partition indata
.MIN_II
is the minimum instruction initiation interval for small input tiles.MIN_II
is roughly 64 engine cycles.
- Parameters:
data – the input tile
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
engine – (optional) the engine to use for the operation:
nki.isa.vector_engine
,nki.isa.scalar_engine
, ornki.isa.unknown_engine
(default, compiler selects best engine based on the input tile shape).
- Returns:
an output tile of reciprocal computation
Example:
import neuronxcc.nki as nki import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ... x = nl.load(in_tensor[nl.mgrid[0:128, 0:512]]) y = nisa.reciprocal(x)
This document is relevant for: Inf2
, Trn1
, Trn2