Index _ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | Z _ __init__() (nki.isa.nc_version method) A abs (C++ function) abs_out (C++ function) accessor (C++ function), [1] activation() (in module nki.isa) activation_reduce() (in module nki.isa) add (C++ function), [1] add_out (C++ function), [1] affine_range() (in module nki.language) affine_select() (in module nki.isa) attention_cte() (in module nkilib.core.attention_cte) attention_tkg() (in module nkilib.core.attention_tkg) AttnTKGConfig (class in nkilib.core.attention_tkg) B benchmark() built-in function BF16 bfloat16 (in module nki.language) bitwise_and (C++ function), [1], [2] bitwise_and_out (C++ function), [1], [2] bitwise_not (C++ function) bitwise_not_out (C++ function) bitwise_or (C++ function), [1], [2] bitwise_or_out (C++ function), [1], [2] block_len (nkilib.core.attention_tkg.AttnTKGConfig attribute) bn_aggr() (in module nki.isa) bn_stats() (in module nki.isa) bool_ (in module nki.language) bs (nkilib.core.attention_tkg.AttnTKGConfig attribute) built-in function benchmark() compile() get_reports() model_index.append() model_index.copy() model_index.create() model_index.filter() model_index.load() model_index.move() model_index.save() print_reports() torch.neuron.DataParallel() torch.neuron.DataParallel.disable_dynamic_batching(), [1] torch_neuron.trace() torch_neuronx.analyze() torch_neuronx.async_load() torch_neuronx.bucket_model_trace() torch_neuronx.DataParallel() torch_neuronx.dynamic_batch() torch_neuronx.experimental.profiler.profile() torch_neuronx.experimental.profiler.profile.start() torch_neuronx.lazy_load() torch_neuronx.move_trace_to_device() torch_neuronx.multicore_context() torch_neuronx.neuron_cores_context() torch_neuronx.PartitionerConfig() torch_neuronx.replace_weights() torch_neuronx.set_multicore() torch_neuronx.set_neuron_cores() torch_neuronx.trace() write_csv() write_json() C CCE ceil (C++ function) ceil_out (C++ function) cFP8 clamp (C++ function) clamp_out (C++ function) close (C++ function), [1] Collective Communication Engine compile() built-in function core_barrier() (in module nki.isa) cos (C++ function) cos_out (C++ function) curr_sprior (nkilib.core.attention_tkg.AttnTKGConfig attribute) CustomOps D d_head (nkilib.core.attention_tkg.AttnTKGConfig attribute) dge_mode (class in nki.isa) div (C++ function), [1] div_out (C++ function), [1] dma_compute() (in module nki.isa) dma_copy() (in module nki.isa) dma_transpose() (in module nki.isa) DP DPr dropout() (in module nki.isa) ds() (in module nki.language) E empty (C++ function) engine (class in nki.isa) eps (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs attribute) exp (C++ function) exp_out (C++ function) eye (C++ function) F fill_ (C++ function) float16 (in module nki.language) float32 (in module nki.language) FLOAT32_TO_FLOAT16 (torch_neuron.Optimization attribute) float4_e2m1fn_x4 (in module nki.language) float8_e4m3 (in module nki.language) float8_e4m3fn_x4 (in module nki.language) float8_e5m2 (in module nki.language) float8_e5m2_x4 (in module nki.language) floor (C++ function) floor_out (C++ function) FP16 FP32 full (C++ function) full_sprior (nkilib.core.attention_tkg.AttnTKGConfig attribute) fuse_rope (nkilib.core.attention_tkg.AttnTKGConfig attribute) G get_accessor_coherence_policy (C++ function) get_cpu_count (C++ function) get_cpu_id (C++ function) get_dst_tensor (C++ function) get_nc_version() (in module nki.isa) get_reports() built-in function GPSIMD Engine GpSimdE H has_lower_bound() (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs method) HBM hbm (in module nki.language) High Bandwidth Memory I Inf1 Inf2 Inferentia int16 (in module nki.language) int32 (in module nki.language) int8 (in module nki.language) iota() (in module nki.isa) J jit() (in module nki) K k_out_in_sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) L local_gather() (in module nki.isa) log (C++ function) log10 (C++ function) log10_out (C++ function) log2 (C++ function) log2_out (C++ function) log_out (C++ function) lower_bound (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs attribute) M max8() (in module nki.isa) memset() (in module nki.isa) mlp_kernel() (in module nkilib.core.mlp) model_index.append() built-in function model_index.copy() built-in function model_index.create() built-in function model_index.filter() built-in function model_index.load() built-in function model_index.move() built-in function model_index.save() built-in function module placement mul (C++ function), [1] mul_out (C++ function), [1] N NC nc_find_index8() (in module nki.isa) nc_match_replace8() (in module nki.isa) nc_matmul() (in module nki.isa) nc_matmul_mx() (in module nki.isa) nc_stream_shuffle() (in module nki.isa) nc_transpose() (in module nki.isa) nc_version (class in nki.isa) ND ndarray() (in module nki.language) needs_rms_normalization() (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs method) Neuron Device Neuron Kernel Interface neuron-cc neuron-cc command line option, [1], [2] neuron-cc command line option neuron-cc, [1], [2] neuron-ls neuron-ls command line option neuron-ls command line option neuron-ls neuron-monitor neuron-monitor command line option neuron-monitor command line option neuron-monitor neuron-profile neuron-profile command line option, [1] neuron-profile command line option neuron-profile, [1] NeuronCore, [1] NeuronCore-v1 NeuronCore-v2 NeuronCore-v3 NeuronDevice NeuronLink NeuronLink-v1 NeuronLink-v2 NeuronLink-v3 neuronx-cc neuronx-cc command line option, [1], [2] neuronx-cc command line option neuronx-cc, [1], [2] NKI norm_type (nkilib.core.rmsnorm_quant.rmsnorm_quant.RmsNormQuantKernelArgs attribute) nrt_add_tensor_to_tensor_set (C function) nrt_allocate_tensor_set (C function) nrt_close (C function) nrt_debug_client_connect (C function) nrt_debug_client_connect_close (C function) nrt_debug_client_read_one_event (C function) nrt_destroy_tensor_set (C function) nrt_execute (C function) nrt_execute_repeat (C function) nrt_free_model_tensor_info (C function) nrt_get_model_instance_count (C function) nrt_get_model_nc_count (C function) nrt_get_model_tensor_info (C function) nrt_get_tensor_from_tensor_set (C function) nrt_get_total_nc_count (C function) nrt_get_version (C function) nrt_get_visible_nc_count (C function) nrt_init (C function) nrt_load (C function) nrt_load_collectives (C function) nrt_profile_start (C function) nrt_profile_stop (C function) nrt_tensor_allocate (C function) nrt_tensor_allocate_empty (C function) nrt_tensor_allocate_slice (C function) nrt_tensor_attach_buffer (C function) nrt_tensor_check_output_completion (C function) nrt_tensor_free (C function) nrt_tensor_get_size (C function) nrt_tensor_get_va (C function) nrt_tensor_read (C function) nrt_tensor_write (C function) nrt_unload (C function) num_programs() (in module nki.language) NxD Core NxD Inference NxD Training O ones (C++ function) operator= (C++ function), [1] out_in_sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) output_projection_cte() (in module nkilib.core.output_projection.output_projection_cte) output_projection_tkg() (in module nkilib.core.output_projection.output_projection_tkg) P Partial Sum Buffer placement module pow (C++ function), [1], [2] pow_out (C++ function), [1], [2] PP PPr print_reports() built-in function private_hbm (in module nki.language) program_id() (in module nki.language) program_ndim() (in module nki.language) PSUM psum (in module nki.language) Q q_head (nkilib.core.attention_tkg.AttnTKGConfig attribute) qk_in_sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) qkv() (in module nkilib.core.qkv) quantize_mx() (in module nki.isa) R range_select() (in module nki.isa) read (C++ function) read_stream_accessor (C++ function) reciprocal() (in module nki.isa) reduce_cmd (class in nki.isa) register_alloc() (in module nki.isa) register_load() (in module nki.isa) register_move() (in module nki.isa) register_store() (in module nki.isa) rmsnorm_quant_kernel() (in module nkilib.core.rmsnorm_quant.rmsnorm_quant) RmsNormQuantKernelArgs (class in nkilib.core.rmsnorm_quant.rmsnorm_quant) RNE RT S s_active (nkilib.core.attention_tkg.AttnTKGConfig attribute) SBUF sbuf (in module nki.language) Scalar Engine scalar_tensor_tensor() (in module nki.isa) ScalarE select_reduce() (in module nki.isa) sendrecv() (in module nki.isa) sequence_bounds() (in module nki.isa) sequential_range() (in module nki.language) set_accessor_coherence_policy (C++ function) shared_hbm (in module nki.language) sin (C++ function) sin_out (C++ function) SR State Buffer static_range() (in module nki.language) strided_mm1 (nkilib.core.attention_tkg.AttnTKGConfig attribute) sub (C++ function), [1] sub_out (C++ function), [1] Sync Engine T tan (C++ function) tan_out (C++ function) tcm_accessor (C++ function), [1] tcm_to_tensor (C++ function) Tensor Engine tensor_copy() (in module nki.isa) tensor_copy_dynamic_dst() (in module nki.isa) tensor_copy_dynamic_src() (in module nki.isa) tensor_copy_predicated() (in module nki.isa) tensor_partition_reduce() (in module nki.isa) tensor_reduce() (in module nki.isa) tensor_scalar() (in module nki.isa) tensor_scalar_reduce() (in module nki.isa) tensor_tensor() (in module nki.isa) tensor_tensor_scan() (in module nki.isa) tensor_to_tcm (C++ function) TensorE TF32 tfloat32 (in module nki.language) tile_size (class in nki.language) torch.neuron.DataParallel() built-in function torch.neuron.DataParallel.disable_dynamic_batching() built-in function, [1] torch::neuron::tcm_free (C++ function) torch::neuron::tcm_malloc (C++ function) torch_neuron.experimental.multicore_context() (in module placement) torch_neuron.experimental.neuron_cores_context() (in module placement) torch_neuron.experimental.set_multicore() (in module placement) torch_neuron.experimental.set_neuron_cores() (in module placement) torch_neuron.Optimization (built-in class) torch_neuron.trace() built-in function torch_neuronx.analyze() built-in function torch_neuronx.async_load() built-in function torch_neuronx.bucket_model_trace() built-in function torch_neuronx.BucketModelConfig (built-in class) torch_neuronx.DataParallel() built-in function torch_neuronx.dynamic_batch() built-in function torch_neuronx.experimental.profiler.profile() built-in function torch_neuronx.experimental.profiler.profile.start() built-in function torch_neuronx.lazy_load() built-in function torch_neuronx.move_trace_to_device() built-in function torch_neuronx.multicore_context() built-in function torch_neuronx.neuron_cores_context() built-in function torch_neuronx.PartitionerConfig() built-in function torch_neuronx.replace_weights() built-in function torch_neuronx.set_multicore() built-in function torch_neuronx.set_neuron_cores() built-in function torch_neuronx.trace() built-in function TP tp_k_prior (nkilib.core.attention_tkg.AttnTKGConfig attribute) TPr Trainium/Inferentia2 Trainium2 Trn1 Trn2 U uint16 (in module nki.language) uint32 (in module nki.language) uint8 (in module nki.language) use_gpsimd_sb2sb (nkilib.core.attention_tkg.AttnTKGConfig attribute) use_pos_id (nkilib.core.attention_tkg.AttnTKGConfig attribute) V Vector Engine VectorE W write (C++ function) write_csv() built-in function write_json() built-in function write_stream_accessor (C++ function) Z zeros (C++ function) zeros() (in module nki.language)