.. _trn1-performance: Trn1/Trn1n Performance ======================= .. contents:: Table of contents :local: *Last update: April 12th, 2023* .. _NLP: Training Performance (Trn1 / Trn1n) ----------------------------------- .. csv-table:: :file: trn1_trn1n_nlp_data.csv :header-rows: 1 Inference Performance --------------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data.csv') df_prices = pd.read_csv('trn1_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (/sec)']) * (df['On-Demand hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model', 'Framework', 'Inst. Type', 'Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (/sec)'] = df['Throughput (/sec)'].round(0).astype('int',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: **Throughput optimization was performed by selecting a batch size which maximized the metric. All compiler flags, data types, and parameters are identical between model configurations** **Cost per 1M inferences** is calculated using On-Demand hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data.csv') df_prices = pd.read_csv('trn1_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (/sec)']) * (df['On-Demand hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model', 'Framework', 'Inst. Type', 'Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (/sec)'] = df['Throughput (/sec)'].round(0).astype('int',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: **Latency optimization was performed by selecting a batch size which maximized the metric. All compiler flags, data types, and parameters are identical between model configurations** **Cost per 1M inferences** is calculated using On-Demand hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.