Index _ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | Z _ __init__() (nki.isa.nc_version method) A abs (C++ function) abs() (in module nki.language) abs_max() (in module nki.language) abs_min() (in module nki.language) abs_out (C++ function) accessor (C++ function), [1] activate2() (in module nki.isa) activation() (in module nki.isa) activation_reduce() (in module nki.isa) adam_kernel() (in module nkilib.experimental.optimizer) adamw_kernel() (in module nkilib.experimental.optimizer) add (C++ function), [1] add() (in module nki.language) add_out (C++ function), [1] add_scalar_kernel() (in module nkilib.experimental.foreach) add_tensor_kernel() (in module nkilib.experimental.foreach) addcdiv_kernel() (in module nkilib.experimental.foreach) addcmul_kernel() (in module nkilib.experimental.foreach) affine_range() (in module nki.language) affine_select() (in module nki.isa) align_stack_curr_addr() (nkilib.core.utils.allocator.SbufManager method) all() (in module nki.language) all_gather() (in module nki.collectives) all_reduce() (in module nki.collectives) all_to_all() (in module nki.collectives) all_to_all_v() (in module nki.collectives) allgather_compute_matmul() (in module nkilib.experimental.collectives) allgather_sb2sb() (in module nkilib.experimental.collectives) allgather_sb2sb_tiled() (in module nkilib.experimental.collectives) alloc() (nkilib.core.utils.allocator.SbufManager method) alloc_heap() (nkilib.core.utils.allocator.SbufManager method) alloc_stack() (nkilib.core.utils.allocator.SbufManager method) arctan() (in module nki.language) argsort_unstable() (in module nkilib.experimental.subkernels) attention_block_tkg() (in module nkilib.core.attention_block_tkg.attention_block_tkg) attention_cte() (in module nkilib.core.attention.attention_cte) attention_segmented_cte() (in module nkilib.core.attention) attention_tkg() (in module nkilib.core.attention_tkg) AttnTKGConfig (class in nkilib.core.attention_tkg) B benchmark() built-in function BF16 bfloat16 (in module nki.language) bitwise_and (C++ function), [1], [2] bitwise_and() (in module nki.language) bitwise_and_out (C++ function), [1], [2] bitwise_not (C++ function) bitwise_not_out (C++ function) bitwise_or (C++ function), [1], [2] bitwise_or() (in module nki.language) bitwise_or_out (C++ function), [1], [2] bitwise_xor() (in module nki.language) block_len (nkilib.core.attention_tkg.AttnTKGConfig attribute) blockwise_mm_bwd() (in module nkilib.experimental.moe.bwd) bn_aggr() (in module nki.isa) bn_stats() (in module nki.isa) bn_stats_fmax (nki.language.tile_size attribute) bool_ (in module nki.language) broadcast() (nkilib.core.utils.tensor_view.TensorView method) broadcast_to() (in module nki.language) bs (nkilib.core.attention_tkg.AttnTKGConfig attribute) build_all_to_all_v_metadata() (in module nkilib.experimental.subkernels) built-in function benchmark() compile() get_reports() model_index.append() model_index.copy() model_index.create() model_index.filter() model_index.load() model_index.move() model_index.save() print_reports() torch.neuron.DataParallel() torch.neuron.DataParallel.disable_dynamic_batching(), [1] torch_neuron.trace() torch_neuronx.analyze() torch_neuronx.async_load() torch_neuronx.bucket_model_trace() torch_neuronx.DataParallel() torch_neuronx.dynamic_batch() torch_neuronx.experimental.profiler.profile() torch_neuronx.experimental.profiler.profile.start() torch_neuronx.lazy_load() torch_neuronx.move_trace_to_device() torch_neuronx.multicore_context() torch_neuronx.neuron_cores_context() torch_neuronx.PartitionerConfig() torch_neuronx.replace_weights() torch_neuronx.set_multicore() torch_neuronx.set_neuron_cores() torch_neuronx.trace() write_csv() write_json() bypass (in module nki.language) C CCE ceil (C++ function) ceil() (in module nki.language) ceil_nisa_kernel() (in module nkilib.core.attention) ceil_out (C++ function) cFP8 clamp (C++ function) clamp_out (C++ function) close (C++ function), [1] close_scope() (nkilib.core.utils.allocator.SbufManager method) Collective Communication Engine collective_permute() (in module nki.collectives) collective_permute_implicit() (in module nki.collectives) collective_permute_implicit_current_processing_rank_id() (in module nki.collectives) collective_permute_implicit_reduce() (in module nki.collectives) compile() built-in function compute_fused_gate_up_down_mxfp8() (in module nkilib.experimental.mlp_mxfp8.mlp_fwd_mxfp8) compute_phase1_down_proj_mm_grad_mxfp8() (in module nkilib.experimental.mlp_mxfp8.mlp_bwd_mxfp8) compute_phase2_hidden_states_grad_mxfp8() (in module nkilib.experimental.mlp_mxfp8.mlp_bwd_mxfp8) compute_phase3_gate_up_weight_grad_mxfp8() (in module nkilib.experimental.mlp_mxfp8.mlp_bwd_mxfp8) compute_phase4_down_weight_grad_mxfp8() (in module nkilib.experimental.mlp_mxfp8.mlp_bwd_mxfp8) conv1d() (in module nkilib.experimental.conv) conv3d() (in module nkilib.experimental.conv) copy() (in module nki.language) core_barrier() (in module nki.isa) cos (C++ function) cos() (in module nki.language) cos_out (C++ function) create_auto_alloc_manager() (in module nkilib.core.utils.allocator) cross_entropy_backward() (in module nkilib.experimental.loss) cross_entropy_forward() (in module nkilib.experimental.loss) cumsum() (in module nkilib.core.cumsum) curr_sprior (nkilib.core.attention_tkg.AttnTKGConfig attribute) CustomOps D d_head (nkilib.core.attention_tkg.AttnTKGConfig attribute) depthwise_conv1d_implicit_gemm() (in module nkilib.experimental.conv) device_print() (in module nki.language) dge_mode (class in nki.isa) div (C++ function), [1] div_out (C++ function), [1] div_scalar_kernel() (in module nkilib.experimental.foreach) div_tensor_kernel() (in module nkilib.experimental.foreach) dma_compute() (in module nki.isa) dma_copy() (in module nki.isa) dma_engine (class in nki.isa) dma_transpose() (in module nki.isa) DP DPr dropout() (in module nki.isa) (in module nki.language) ds() (in module nki.language) dynamic_elementwise_add() (in module nkilib.experimental.dynamic_shapes) dynamic_range() (in module nki.language) E empty (C++ function) empty_like() (in module nki.language) engine (class in nki.isa) eps (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs attribute) equal() (in module nki.language) erf() (in module nki.language) erf_dx() (in module nki.language) exp (C++ function) exp() (in module nki.language) exp_out (C++ function) expand_dim() (nkilib.core.utils.tensor_view.TensorView method) expand_dims() (in module nki.language) exponential() (in module nki.isa) eye (C++ function) F fill_ (C++ function) find_nonzero_indices() (in module nkilib.core.subkernels) fine_grained_allgather() (in module nkilib.experimental.collectives) flatten_dims() (nkilib.core.utils.tensor_view.TensorView method) float16 (in module nki.language) float32 (in module nki.language) FLOAT32_TO_FLOAT16 (torch_neuron.Optimization attribute) float4_e2m1fn_x4 (in module nki.language) float8_e4m3 (in module nki.language) float8_e4m3fn (in module nki.language) float8_e4m3fn_x4 (in module nki.language) float8_e5m2 (in module nki.language) float8_e5m2_x4 (in module nki.language) floor (C++ function) floor() (in module nki.language) floor_nisa_kernel() (in module nkilib.core.attention) floor_out (C++ function) flush_logs() (nkilib.core.utils.allocator.SbufManager method) FP16 FP32 full (C++ function) full() (in module nki.language) full_sprior (nkilib.core.attention_tkg.AttnTKGConfig attribute) fuse_rope (nkilib.core.attention_tkg.AttnTKGConfig attribute) G gather_flattened() (in module nki.language) gelu() (in module nki.language) gelu_apprx_sigmoid() (in module nki.language) gelu_apprx_sigmoid_dx() (in module nki.language) gelu_apprx_tanh() (in module nki.language) gelu_dx() (in module nki.language) gemm_moving_fmax (nki.language.tile_size attribute) gemm_stationary_fmax (nki.language.tile_size attribute) generate_random() (in module nkilib.experimental.rng) get_accessor_coherence_policy (C++ function) get_cpu_count (C++ function) get_cpu_id (C++ function) get_dst_tensor (C++ function) get_free_space() (nkilib.core.utils.allocator.SbufManager method) get_heap_curr_addr() (nkilib.core.utils.allocator.SbufManager method) get_name_prefix() (nkilib.core.utils.allocator.SbufManager method) get_nc_version() (in module nki.isa) get_program_sharding_info() (in module nkilib.experimental.mlp_mxfp8.mlp_bwd_mxfp8) get_reports() built-in function get_rng_state_gpsimd() (in module nkilib.experimental.rng) get_stack_curr_addr() (nkilib.core.utils.allocator.SbufManager method) get_total_space() (nkilib.core.utils.allocator.SbufManager method) get_used_space() (nkilib.core.utils.allocator.SbufManager method) get_view() (nkilib.core.utils.tensor_view.TensorView method) GPSIMD Engine GpSimdE greater() (in module nki.language) greater_equal() (in module nki.language) H has_dynamic_access() (nkilib.core.utils.tensor_view.TensorView method) has_lower_bound() (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs method) HBM hbm (in module nki.language) High Bandwidth Memory I increment_section() (nkilib.core.utils.allocator.SbufManager method) Inf1 Inf2 Inferentia int16 (in module nki.language) int32 (in module nki.language) int8 (in module nki.language) invert() (in module nki.language) iota() (in module nki.isa) is_hbm() (in module nki.language) is_on_chip() (in module nki.language) is_psum() (in module nki.language) is_sbuf() (in module nki.language) J jit() (in module nki) K k_out_in_sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) kv_parallel_segmented_prefill() (in module nkilib.core.attention) L l1_norm_kernel() (in module nkilib.experimental.foreach) l2_norm_kernel() (in module nkilib.experimental.foreach) left_shift() (in module nki.language) lerp_kernel() (in module nkilib.experimental.foreach) less() (in module nki.language) less_equal() (in module nki.language) linear_scan() (in module nkilib.experimental.scan) linf_norm_kernel() (in module nkilib.experimental.foreach) load() (in module nki.language) load_kv_cache() (in module nkilib.core.attention) load_transpose2d() (in module nki.language) local_gather() (in module nki.isa) log (C++ function) log() (in module nki.language) log10 (C++ function) log10_out (C++ function) log2 (C++ function) log2_out (C++ function) log_out (C++ function) logical_and() (in module nki.language) logical_not() (in module nki.language) logical_or() (in module nki.language) logical_xor() (in module nki.language) lower_bound (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs attribute) M matmul() (in module nki.language) matmul_mxfp8() (in module nkilib.experimental.matmul_mxfp8) matmul_perf_mode (class in nki.isa) max() (in module nki.language) max8() (in module nki.isa) maximum() (in module nki.language) mean() (in module nki.language) memset() (in module nki.isa) min() (in module nki.language) minimum() (in module nki.language) mish() (in module nki.language) mlp() (in module nkilib.core.mlp) mlp_backward_mxfp8_base_nki() (in module nkilib.experimental.mlp_mxfp8.mlp_bwd_mxfp8) mlp_backward_mxfp8_nki() (in module nkilib.experimental.mlp_mxfp8.mlp_bwd_mxfp8) mlp_forward_mxfp8_nki() (in module nkilib.experimental.mlp_mxfp8.mlp_fwd_mxfp8) model_index.append() built-in function model_index.copy() built-in function model_index.create() built-in function model_index.filter() built-in function model_index.load() built-in function model_index.move() built-in function model_index.save() built-in function module placement moe_cte() (in module nkilib.core.moe_cte) moe_tkg() (in module nkilib.core.moe_tkg) mul (C++ function), [1] mul_out (C++ function), [1] mul_scalar_kernel() (in module nkilib.experimental.foreach) mul_tensor_kernel() (in module nkilib.experimental.foreach) multiply() (in module nki.language) mx_moe_block_tkg_wrapper() (in module nkilib.experimental.moe_block) N NC nc_find_index8() (in module nki.isa) nc_match_replace8() (in module nki.isa) nc_matmul() (in module nki.isa) nc_matmul_mx() (in module nki.isa) nc_n_gather() (in module nki.isa) nc_stream_shuffle() (in module nki.isa) nc_transpose() (in module nki.isa) nc_version (class in nki.isa) ND ndarray() (in module nki.language) needs_rms_normalization() (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs method) negative() (in module nki.language) Neuron Device Neuron Kernel Interface neuron-cc neuron-cc command line option, [1], [2] neuron-cc command line option neuron-cc, [1], [2] neuron-monitor neuron-monitor command line option neuron-monitor command line option neuron-monitor NeuronCore, [1] NeuronCore-v1 NeuronCore-v2 NeuronCore-v3 NeuronDevice NeuronLink NeuronLink-v1 NeuronLink-v2 NeuronLink-v3 neuronx-cc neuronx-cc command line option, [1], [2] neuronx-cc command line option neuronx-cc, [1], [2] NKI no_reorder() (in module nki.language) nonzero_with_count() (in module nki.isa) norm_type (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs attribute) not_equal() (in module nki.language) nrt_add_tensor_to_tensor_set (C function) nrt_allocate_tensor_set (C function) nrt_close (C function) nrt_debug_client_connect (C function) nrt_debug_client_connect_close (C function) nrt_debug_client_read_one_event (C function) nrt_destroy_tensor_set (C function) nrt_execute (C function) nrt_execute_repeat (C function) nrt_free_model_tensor_info (C function) nrt_get_model_instance_count (C function) nrt_get_model_nc_count (C function) nrt_get_model_tensor_info (C function) nrt_get_tensor_from_tensor_set (C function) nrt_get_total_nc_count (C function) nrt_get_version (C function) nrt_get_visible_nc_count (C function) nrt_init (C function) nrt_load (C function) nrt_load_collectives (C function) nrt_profile_start (C function) nrt_profile_stop (C function) nrt_tensor_allocate (C function) nrt_tensor_allocate_empty (C function) nrt_tensor_allocate_slice (C function) nrt_tensor_attach_buffer (C function) nrt_tensor_check_output_completion (C function) nrt_tensor_copy (C function) nrt_tensor_free (C function) nrt_tensor_get_size (C function) nrt_tensor_get_va (C function) nrt_tensor_read (C function) nrt_tensor_write (C function) nrt_unload (C function) num_programs() (in module nki.language) NxD Core NxD Inference NxD Training O ones (C++ function) ones() (in module nki.language) oob_mode (class in nki.isa) open_scope() (nkilib.core.utils.allocator.SbufManager method) operator= (C++ function), [1] out_in_sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) output_projection_cte() (in module nkilib.core.output_projection.output_projection_cte) output_projection_tkg() (in module nkilib.core.output_projection.output_projection_tkg) P pad() (in module nkilib.experimental.pad) Partial Sum Buffer permute() (nkilib.core.utils.tensor_view.TensorView method) permute_routed_tokens() (in module nkilib.experimental.subkernels) placement module pmax (nki.language.tile_size attribute) pop_heap() (nkilib.core.utils.allocator.SbufManager method) pow (C++ function), [1], [2] pow_out (C++ function), [1], [2] power() (in module nki.language) PP PPr pre_combine_dequant_scales() (in module nkilib.core.quantization) prelu (in module nki.language) print_reports() built-in function private_hbm (in module nki.language) prod() (in module nki.language) program_id() (in module nki.language) program_ndim() (in module nki.language) PSUM psum (in module nki.language) psum_fmax (nki.language.tile_size attribute) psum_fmax_bytes (nki.language.tile_size attribute) psum_min_align (nki.language.tile_size attribute) psum_num_banks (nki.language.tile_size attribute) Q q_head (nkilib.core.attention_tkg.AttnTKGConfig attribute) qk_in_sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) qkv() (in module nkilib.core.qkv) quantization_type (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs attribute) quantize_block_mxfp8_kernel() (in module nkilib.experimental.quantize_mxfp8) quantize_mx() (in module nki.isa) R rand() (in module nki.language) rand2() (in module nki.isa) rand_get_state() (in module nki.isa) rand_set_state() (in module nki.isa) random_seed() (in module nki.language) range_select() (in module nki.isa) rank_id() (in module nki.collectives) read (C++ function) read_stream_accessor (C++ function) rearrange() (nkilib.core.utils.tensor_view.TensorView method) reciprocal() (in module nki.isa) (in module nki.language) reduce_cmd (class in nki.isa) reduce_scatter() (in module nki.collectives) register_alloc() (in module nki.isa) register_load() (in module nki.isa) register_move() (in module nki.isa) register_store() (in module nki.isa) relu() (in module nki.language) ReplicaGroup (class in nki.collectives) reshape() (nkilib.core.utils.tensor_view.TensorView method) reshape_dim() (nkilib.core.utils.tensor_view.TensorView method) right_shift() (in module nki.language) ring_attention_spmd_bwd() (in module nkilib.experimental.attention) ring_attention_spmd_fwd() (in module nkilib.experimental.attention) rms_norm() (in module nki.language) rmsnorm_quant_kernel() (in module nkilib.core.rmsnorm_quant.rmsnorm_quant) RmsNormQuantKernelArgs (class in nkilib.core.rmsnorm_quant.rmsnorm_quant) RNE rng() (in module nki.isa) RoPE() (in module nkilib.core.rope) RoPE_sbuf() (in module nkilib.core.rope) router_topk() (in module nkilib.core.router_topk) router_topk_input_w_load() (in module nkilib.core.router_topk) router_topk_input_x_load() (in module nkilib.core.router_topk) row_quantization() (in module nkilib.core.quantization) rsqrt() (in module nki.language) RT S s_active (nkilib.core.attention_tkg.AttnTKGConfig attribute) SBUF sbuf (in module nki.language) sbuf_fmax (nki.language.tile_size attribute) sbuf_fmax_bytes (nki.language.tile_size attribute) sbuf_min_align (nki.language.tile_size attribute) sbuf_size_bytes (nki.language.tile_size attribute) SbufManager (class in nkilib.core.utils.allocator) Scalar Engine scalar_tensor_tensor() (in module nki.isa) ScalarE select() (nkilib.core.utils.tensor_view.TensorView method) select_reduce() (in module nki.isa) selective_scan() (in module nkilib.experimental.scan) sendrecv() (in module nki.isa) sequence_bounds() (in module nki.isa) sequential_range() (in module nki.language) set_accessor_coherence_policy (C++ function) set_name_prefix() (nkilib.core.utils.allocator.SbufManager method) set_rng_seed() (in module nki.isa) set_rng_state_gpsimd() (in module nkilib.experimental.rng) shape (nkilib.core.utils.tensor_view.TensorView attribute) shared_hbm (in module nki.language) shared_identity_matrix() (in module nki.language) should_store_packed_scales() (in module nkilib.experimental.quantize_mxfp8) sigmoid() (in module nki.language) sign() (in module nki.language) silu() (in module nki.language) silu_dx() (in module nki.language) simulate() (in module nki) sin (C++ function) sin() (in module nki.language) sin_out (C++ function) slice() (nkilib.core.utils.tensor_view.TensorView method) softmax() (in module nki.language) softplus() (in module nki.language) sqrt() (in module nki.language) sqrt_kernel() (in module nkilib.experimental.foreach) square() (in module nki.language) squeeze_dim() (nkilib.core.utils.tensor_view.TensorView method) SR ssd() (in module nkilib.experimental.scan) State Buffer static_quantization() (in module nkilib.core.quantization) static_range() (in module nki.language) store() (in module nki.language) strided_mm1 (nkilib.core.attention_tkg.AttnTKGConfig attribute) strides (nkilib.core.utils.tensor_view.TensorView attribute) sub (C++ function), [1] sub_out (C++ function), [1] sub_scalar_kernel() (in module nkilib.experimental.foreach) sub_tensor_kernel() (in module nkilib.experimental.foreach) subtract() (in module nki.language) sum() (in module nki.language) Sync Engine T tan (C++ function) tan() (in module nki.language) tan_out (C++ function) tanh() (in module nki.language) tcm_accessor (C++ function), [1] tcm_to_tensor (C++ function) Tensor Engine tensor_copy() (in module nki.isa) tensor_copy_predicated() (in module nki.isa) tensor_partition_reduce() (in module nki.isa) tensor_reduce() (in module nki.isa) tensor_scalar() (in module nki.isa) tensor_scalar_cumulative() (in module nki.isa) tensor_scalar_reduce() (in module nki.isa) tensor_tensor() (in module nki.isa) tensor_tensor_scan() (in module nki.isa) tensor_to_tcm (C++ function) TensorE TensorView (class in nkilib.core.utils.tensor_view) TF32 tfloat32 (in module nki.language) tile_size (class in nki.language) topk_reduce() (in module nkilib.experimental.subkernels) torch.neuron.DataParallel() built-in function torch.neuron.DataParallel.disable_dynamic_batching() built-in function, [1] torch::neuron::tcm_free (C++ function) torch::neuron::tcm_malloc (C++ function) torch_neuron.experimental.multicore_context() (in module placement) torch_neuron.experimental.neuron_cores_context() (in module placement) torch_neuron.experimental.set_multicore() (in module placement) torch_neuron.experimental.set_neuron_cores() (in module placement) torch_neuron.Optimization (built-in class) torch_neuron.trace() built-in function torch_neuronx.analyze() built-in function torch_neuronx.async_load() built-in function torch_neuronx.bucket_model_trace() built-in function torch_neuronx.BucketModelConfig (built-in class) torch_neuronx.DataParallel() built-in function torch_neuronx.dynamic_batch() built-in function torch_neuronx.experimental.profiler.profile() built-in function torch_neuronx.experimental.profiler.profile.start() built-in function torch_neuronx.lazy_load() built-in function torch_neuronx.move_trace_to_device() built-in function torch_neuronx.multicore_context() built-in function torch_neuronx.neuron_cores_context() built-in function torch_neuronx.PartitionerConfig() built-in function torch_neuronx.replace_weights() built-in function torch_neuronx.set_multicore() built-in function torch_neuronx.set_neuron_cores() built-in function torch_neuronx.trace() built-in function total_available_sbuf_size (nki.language.tile_size attribute) TP tp_k_prior (nkilib.core.attention_tkg.AttnTKGConfig attribute) TPr Trainium/Inferentia2 Trainium2 transformer_tkg() (in module nkilib.experimental.transformer) transpose() (in module nki.language) Trn1 Trn2 trunc() (in module nki.language) U uint16 (in module nki.language) uint32 (in module nki.language) uint8 (in module nki.language) use_gpsimd_sb2sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) use_pos_id (nkilib.core.attention_tkg.AttnTKGConfig attribute) V var() (in module nki.language) Vector Engine VectorE VirtualRegister (class in nki.isa) W where() (in module nki.language) write (C++ function) write_csv() built-in function write_json() built-in function write_stream_accessor (C++ function) Z zeros (C++ function) zeros() (in module nki.language) zeros_like() (in module nki.language)