nki.isa.nc_stream_shuffle#
- nki.isa.nc_stream_shuffle(dst, src, shuffle_mask, name=None)[source]#
Apply cross-partition data movement within a quadrant of 32 partitions from source tile
srcto destination tiledstusing Vector Engine.Both source and destination tiles can be in either SBUF or PSUM, and passed in by reference as arguments. In-place shuffle is allowed, i.e.,
dstsame assrc.shuffle_maskis a 32-element list. Each mask element must be in data type int or affine expression.shuffle_mask[i]indicates which input partition the output partition [i] copies from within each 32-partition quadrant. The special valueshuffle_mask[i]=255means the output tensor in partition [i] will be unmodified.nc_stream_shufflecan be applied to multiple of quadrants. In the case with more than one quadrant, the shuffle is applied to each quadrant independently, and the sameshuffle_maskis used for each quadrant. For more information about the cross-partition data movement, see Cross-partition Data Movement.This API has 3 constraints on
srcanddst:dstmust have same data type assrc.dstmust have the same number of elements per partition assrc.The access start partition of
src(src_start_partition), does not have to match or be in the same quadrant as that ofdst(dst_start_partition). However,src_start_partition/dst_start_partitionneeds to follow some special hardware rules with the number of active partitionsnum_active_partitions.num_active_partitions = ceil(max(src_num_partitions, dst_num_partitions)/32) * 32, wheresrc_num_partitionsanddst_num_partitionsrefer to the number of partitions thesrcanddsttensors access respectively.src_start_partition/dst_start_partitionis constrained based on the value ofnum_active_partitions:
If
num_active_partitionsis 96/128,src_start_partition/dst_start_partitionmust be 0.If
num_active_partitionsis 64,src_start_partition/dst_start_partitionmust be 0/64.If
num_active_partitionsis 32,src_start_partition/dst_start_partitionmust be 0/32/64/96.
- Parameters:
dst – the destination tile
src – the source tile
shuffle_mask – a 32-element list that specifies the shuffle source and destination partition