{ "cells": [ { "cell_type": "markdown", "id": "spectacular-payroll", "metadata": {}, "source": [ "# Tensorflow ResNet 50 Optimization Tutorial" ] }, { "cell_type": "markdown", "id": "equivalent-stack", "metadata": {}, "source": [ "## Note: this tutorial runs on tensorflow-neuron 1.x only" ] }, { "cell_type": "markdown", "id": "alpine-aside", "metadata": {}, "source": [ "## Introduction: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we provide three main sections:\n", "\n", "* Take a Resnet 50 model and perform optimizations on it\n", "\n", "* Compile the model with different batch sizes and Neuroncore Group sizes (read about Neuroncore Group sizes here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/nrt-theory-of-operation.html#neuron-core-group)\n", "\n", "* Run inference on our multiple compiled models to see which has the best throughput\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "id": "opened-forty", "metadata": {}, "source": [ "## Install Dependencies" ] }, { "cell_type": "code", "execution_count": null, "id": "meaningful-algebra", "metadata": {}, "outputs": [], "source": [ "!pip install pillow requests # Necessary for loading images\n", "!pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\n", "!pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "markdown", "id": "remarkable-exercise", "metadata": {}, "source": [ "## Compile" ] }, { "cell_type": "markdown", "id": "consecutive-right", "metadata": {}, "source": [ "The following example shows how to compile a FP16 ResNet50 network using various batching parameters to find the optimal solution. On inf1.6xlarge, run through the following steps to get a optimized Resnet 50 model.\n", "First, extract Keras ResNet50 FP32 (resnet50_fp32_keras.pb will be generated):" ] }, { "cell_type": "code", "execution_count": null, "id": "vertical-finland", "metadata": {}, "outputs": [], "source": [ "import re\n", "import argparse\n", "import tensorflow as tf\n", "import numpy as np\n", "\n", "from tensorflow.keras.applications.resnet50 import ResNet50\n", "from tensorflow.keras.preprocessing import image\n", "from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions\n", "\n", "from google.protobuf import text_format\n", "import tensorflow.python.saved_model\n", "\n", "# set Keras global configurations\n", "tf.keras.backend.set_learning_phase(0)\n", "tf.keras.backend.set_image_data_format('channels_last')\n", "\n", "float_type = 'float32'\n", "float_type2 = 'fp32'\n", "tf.keras.backend.set_floatx(float_type)\n", "\n", "# load pre-trained model using Keras\n", "model_name = 'resnet50_%s_keras'%float_type2\n", "model = ResNet50(weights='imagenet')\n", "\n", "# various save files\n", "frozen_file = model_name + '.pb'\n", "opt_file = model_name + '_opt.pb'\n", "\n", "# obtain parameters\n", "model_input = model.input.name.replace(':0', '')\n", "model_output = model.output.name.replace(':0', '')\n", "batch, height, width, channels = model.input.shape\n", "\n", "print (\"model, frozen file, optimized file, input size, input node, output node,\")\n", "print (\"%s, %s, %s, %dx%dx%d, %s, %s\" %(model_name, frozen_file, opt_file, width, height, channels, model_input, model_output) ) \n", "\n", "# obtain the TF session\n", "sess = tf.compat.v1.keras.backend.get_session()\n", "\n", "# save checkpoint files for freeze_graph\n", "ckpt_file = '/tmp/' + model_name + '/' + model_name + '.ckpt'\n", "graph_file = '/tmp/' + model_name + '/' + model_name + '.pb'\n", "tf.compat.v1.train.Saver().save(sess, ckpt_file)\n", "tf.io.write_graph(sess.graph.as_graph_def(), logdir='.', name=graph_file, as_text=False)\n", "\n", "print(model_output)\n", "with tf.compat.v1.Session(graph=tf.Graph()) as sess:\n", " saver = tf.compat.v1.train.import_meta_graph(ckpt_file + '.meta')\n", " saver.restore(sess, ckpt_file)\n", " output_graph_def = tf.compat.v1.graph_util.convert_variables_to_constants(\n", " sess, tf.compat.v1.get_default_graph().as_graph_def(), [model_output])\n", " output_graph_def = tf.compat.v1.graph_util.remove_training_nodes(\n", " output_graph_def, protected_nodes=[model_output])\n", " with open(frozen_file, 'wb') as f:\n", " f.write(output_graph_def.SerializeToString())" ] }, { "cell_type": "markdown", "id": "romance-cyprus", "metadata": {}, "source": [ "Optimize the extracted Keras ResNet50 FP32 graph for inference before casting (resnet50_fp32_keras_opt.pb will be generated) with the following transformations to the graph:\n", "\n", "* Remove Identity and CheckNumerics nodes\n", "* Fold FusedBatchNorm constants into previous Conv2D weights\n", "* Fold other constants\n", "* Strip unused nodes\n", "* Sort by execution order" ] }, { "cell_type": "code", "execution_count": null, "id": "higher-grant", "metadata": {}, "outputs": [], "source": [ "import copy\n", "import string\n", "\n", "from google.protobuf import text_format\n", "from tensorflow.core.framework import node_def_pb2\n", "from tensorflow.core.framework import attr_value_pb2\n", "from tensorflow.python.framework import tensor_util\n", "from tensorflow.tools.graph_transforms import TransformGraph\n", "\n", "def clear_input(node):\n", " for i in range(len(node.input)):\n", " node.input.pop()\n", "\n", "def replace_name(node, name):\n", " node.name = name\n", " \n", "def replace_input(node, input_name, new_name):\n", " # node.input.replace(input_name, new_name)\n", " temp = []\n", " for i in node.input:\n", " temp.extend([new_name if i == input_name else i])\n", " clear_input(node)\n", " for i in temp:\n", " node.input.extend([i])\n", "\n", "def swap_names(node1, node2):\n", " temp = node2.name\n", " node2.name = node1.name\n", " node1.name = temp\n", "\n", "def get_const_node(const_node_name, const_by_name):\n", " name = re.sub(\"/read$\", \"\", const_node_name)\n", " return const_by_name[name]\n", "\n", "def get_const_ndarray(const_node_name, const_by_name):\n", " name = re.sub(\"/read$\", \"\", const_node_name)\n", " node = const_by_name[name]\n", " return tf.make_ndarray(node.attr.get(\"value\").tensor)\n", "\n", "def adjust_bias_values(bias_node, fbn_node, const_by_name):\n", " bias_val = get_const_ndarray(bias_node.input[1], const_by_name) \n", " gamma_val = get_const_ndarray(fbn_node.input[1], const_by_name) \n", " mean_val = get_const_ndarray(fbn_node.input[3], const_by_name) \n", " variance_val = get_const_ndarray(fbn_node.input[4], const_by_name) \n", " new_bias = bias_val * gamma_val / np.sqrt(variance_val)\n", " new_tensor = tensor_util.make_tensor_proto(new_bias, new_bias.dtype, new_bias.shape)\n", " bias_const_node = get_const_node(bias_node.input[1], const_by_name)\n", " bias_const_node.attr[\"value\"].CopyFrom(attr_value_pb2.AttrValue(tensor=new_tensor))\n", "\n", "def MoveBiasAddAfterFusedBatchNorm(graphdef):\n", " \"\"\"fold_batch_norm function of TransformGraph is unable to fold Keras ResNet50\n", " because of BiasAdd between Conv2D and FusedBatchNorm (BiasAdd is not needed\n", " if FusedBatchNorm is used, but it exists in Keras ResNet50). Here, we \n", " move BiasAdd to after FusedBatchNorm, and adjust bias value by gamma/sqrt(variance).\n", " \"\"\"\n", " sess = tf.compat.v1.Session(graph=tf.import_graph_def(graphdef))\n", " output_graph_def = tf.compat.v1.GraphDef()\n", " node_by_name = {}\n", " const_by_name = {}\n", " for node in graphdef.node:\n", " # Hack: use FusedBatchNormV2 so fold_batch_norm can recognize\n", " if node.op == \"FusedBatchNormV3\":\n", " node.op = \"FusedBatchNorm\"\n", " del(node.attr[\"U\"])\n", " #import pdb; pdb.set_trace()\n", " copied_node = node_def_pb2.NodeDef()\n", " copied_node.CopyFrom(node)\n", " node_by_name[node.name] = copied_node\n", " skip_add_node = False\n", " # Switch Mul/BiasAdd in Keras RN50 so fold_batch_norm transform would work\n", " if node.op == \"Const\":\n", " const_by_name[node.name] = copied_node \n", " elif node.op.startswith(\"FusedBatchNorm\"):\n", " inputs = node.input\n", " for i in inputs:\n", " input_node = node_by_name[i]\n", " if input_node.op == \"BiasAdd\":\n", " output_graph_def.node.remove(input_node)\n", " input_node_input0 = input_node.input[0]\n", " # Adjust bias values (multiply by scale/sqrt(variance))\n", " adjust_bias_values(input_node, node, const_by_name)\n", " # Hack: swap names to avoid changing input of activation\n", " swap_names(copied_node, input_node)\n", " # Fix inputs for these two ops\n", " replace_input(copied_node, i, input_node_input0)\n", " replace_input(input_node, input_node_input0, copied_node.name)\n", " # Fix order in node list\n", " output_graph_def.node.extend([copied_node])\n", " output_graph_def.node.extend([input_node])\n", " skip_add_node = True\n", " # Add maybe-modified nodes if not already done\n", " if not skip_add_node:\n", " output_graph_def.node.extend([copied_node])\n", " return output_graph_def\n", "\n", "def FoldFusedBatchNorm(graph_def):\n", " \"\"\"Optimize training graph for inference:\n", " - Remove Identity and CheckNumerics nodes\n", " - Fold FusedBatchNorm constants into previous Conv2D weights\n", " - Fold other constants\n", " - Strip unused nodes\n", " - Sort by execution order\n", " \"\"\"\n", " transformed_graph_def = TransformGraph (\n", " graph_def,\n", " ['input_1'],\n", " ['probs/Softmax'],\n", " [\n", " 'add_default_attributes',\n", " 'remove_nodes(op=Identity, op=CheckNumerics)',\n", " 'fold_constants(ignore_errors=true)',\n", " 'fold_batch_norms',\n", " 'fold_old_batch_norms',\n", " 'strip_unused_nodes',\n", " 'sort_by_execution_order',\n", " ])\n", " return transformed_graph_def\n", "\n", "def load_graph(model_file):\n", " graph_def = tf.compat.v1.GraphDef()\n", "\n", " with open(model_file, \"rb\") as f:\n", " graph_def.ParseFromString(f.read())\n", " return graph_def\n", "\n", "\n", "graph_orig = load_graph('resnet50_fp32_keras.pb')\n", "graph_mod = MoveBiasAddAfterFusedBatchNorm(graph_orig)\n", "graph_mod2 = FoldFusedBatchNorm(graph_mod)\n", "with tf.io.gfile.GFile('resnet50_fp32_keras_opt.pb', \"wb\") as f:\n", " f.write(graph_mod2.SerializeToString())" ] }, { "cell_type": "markdown", "id": "corresponding-acquisition", "metadata": {}, "source": [ "Convert full graph to FP16 (resnet50_fp16_keras_opt.pb will be generated.\n", "This will take about a minute." ] }, { "cell_type": "code", "execution_count": null, "id": "detected-training", "metadata": {}, "outputs": [], "source": [ "from tensorflow.core.framework import graph_pb2\n", "from tensorflow.python.platform import gfile\n", "\n", "def ConvertFP32ToOther(graphdef):\n", " \"\"\"Converts an FP32 network by casting all constants (weights) to a lower\n", " precision floating point type (FP16) and updating the dtypes\n", " everywhere.\"\"\"\n", " cast_type = \"float16\"\n", " sess = tf.Session(graph=tf.import_graph_def(graphdef))\n", " output_graph_def = graph_pb2.GraphDef()\n", " dummy_tensor = sess.run(tf.constant([0.1]))\n", " dummy_tensor_proto = tensor_util.make_tensor_proto(dummy_tensor, \\\n", " dtype=cast_type, shape=dummy_tensor.shape)\n", " dummy_tensor32 = sess.run(tf.constant([0.1]))\n", " dummy_tensor_proto32 = tensor_util.make_tensor_proto(dummy_tensor, \\\n", " dtype=tf.float32, shape=dummy_tensor.shape)\n", " dt_float_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto32.dtype)\n", " dt_half_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto.dtype)\n", " for node in graphdef.node:\n", " output_node = node_def_pb2.NodeDef()\n", " output_node.CopyFrom(node)\n", " if (node.op == \"Const\"):\n", " if (node.attr[\"dtype\"] == dt_float_type_attr):\n", " a = tensor_util.MakeNdarray(node.attr[\"value\"].tensor)\n", " a = tf.cast(a, cast_type)\n", " a = sess.run(a)\n", " output_node.attr[\"dtype\"].CopyFrom(dt_half_type_attr)\n", " output_node.attr[\"value\"].CopyFrom(\n", " attr_value_pb2.AttrValue(\n", " tensor=tensor_util.make_tensor_proto(a,\\\n", " dtype=cast_type, shape=a.shape)))\n", " else:\n", " if (\"T\" in node.attr.keys()):\n", " if (output_node.attr[\"T\"] == dt_float_type_attr):\n", " output_node.attr[\"T\"].CopyFrom(dt_half_type_attr)\n", " if (\"Tparams\" in node.attr.keys()):\n", " if (output_node.attr[\"Tparams\"] == dt_float_type_attr):\n", " output_node.attr[\"Tparams\"].CopyFrom(dt_half_type_attr)\n", " if (\"dtype\" in node.attr.keys()):\n", " if (node.attr[\"dtype\"] == dt_float_type_attr):\n", " output_node.attr[\"dtype\"].CopyFrom(dt_half_type_attr)\n", " if (\"SrcT\" in node.attr.keys()):\n", " if (node.attr[\"SrcT\"] == dt_float_type_attr):\n", " output_node.attr[\"SrcT\"].CopyFrom(dt_half_type_attr)\n", " if (\"DstT\" in node.attr.keys()):\n", " if (node.attr[\"DstT\"] == dt_float_type_attr):\n", " output_node.attr[\"DstT\"].CopyFrom(dt_half_type_attr)\n", " output_graph_def.node.extend([output_node])\n", " return output_graph_def\n", "\n", "def load_graph(model_file):\n", " graph_def = tf.GraphDef()\n", "\n", " with open(model_file, \"rb\") as f:\n", " graph_def.ParseFromString(f.read())\n", "\n", " return graph_def\n", "\n", "graph_f32 = load_graph('resnet50_fp32_keras_opt.pb')\n", "graph_f16 = ConvertFP32ToOther(graph_f32)\n", "output_xformed_graph_name = 'resnet50_fp16_keras_opt.pb'\n", "with gfile.GFile(output_xformed_graph_name, \"wb\") as f:\n", " f.write(graph_f16.SerializeToString())\n" ] }, { "cell_type": "markdown", "id": "correct-travel", "metadata": {}, "source": [ "Run the compilation script to sweep through various batch sizes up to 5 and several NeuronCore Group sizes up to 16. The script calls the compilation script pb2sm_compile.py which tries to perform compilation. Some error messages are expected due to known issues (see Known Issues section in the tutorial). If you run all the configurations it will take about 45 minutes." ] }, { "cell_type": "code", "execution_count": null, "id": "shared-ratio", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "#!/usr/bin/env bash\n", "\n", "echo \"\" > full_sweep.log\n", "echo \"\" > full_sweep_results.txt\n", "\n", "results=()\n", "for b in $(seq 1 5); do \n", " for i in 1 2 4 8 12 16; do \n", " python pb2sm_compile.py --batch_size=$b --neuroncore-pipeline-cores=$i | tee -a full_sweep.log;\n", " results[$b]+=\", \"`tail -1 full_sweep.log`\n", " done\n", "done\n", "\n", "head=\"batch\"\n", "for i in 1 2 4 8 12 16; do\n", " head+=\", nc${i}\"\n", "done \n", "echo $head | tee -a full_sweep_results.txt\n", "for b in $(seq 1 5); do \n", " echo $b${results[$b]} | tee -a full_sweep_results.txt\n", "done" ] }, { "cell_type": "markdown", "id": "attached-austin", "metadata": {}, "source": [ "You should see some output like this:\n", "```\n", "INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\n", "\n", "1\n", "\n", "*** Batch size 1, num NeuronCores 2 (input shape: (1, 224, 224, 3), saved model dir: rn50_fp16_compiled_b1_nc2) ***\n", "\n", "INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\n", "\n", "1\n", "\n", "*** Batch size 1, num NeuronCores 4 (input shape: (1, 224, 224, 3), saved model dir: rn50_fp16_compiled_b1_nc4) ***\n", "\n", "INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\n", "\n", "1\n", "\n", "... (outputs removed)\n", "\n", "*** Batch size 5, num NeuronCores 16 (input shape: (5, 224, 224, 3), saved model dir: rn50_fp16_compiled_b5_nc16) ***\n", "\n", "ERROR: Compilation finished in 120 seconds with less than 50% operations placed on Inferentia (0.0%)\n", "\n", "INFO: Retry compilation without static weights\n", "\n", "ERROR: Retry compilation finished in 137 seconds with less than 50% operations placed on Inferentia (0.0%)\n", "\n", "0\n", "\n", "The file full_sweep_results.txt shows a summary of the sweep results with latest Neuron 1/27/20 release (0 means compilation unsuccessful and 0 ops mapped to Inferentia, 1 means most ops mapped to Inferentia and non-static weights, 2 means most ops mapped to Inferentia and using static weights):\n", "\n", "batch, nc1, nc2, nc4, nc8, nc12, nc16\n", "1, 1, 1, 1, 2, 2, 2\n", "2, 1, 1, 0, 1, 2, 2\n", "3, 1, 1, 1, 1, 1, 1\n", "4, 1, 1, 0, 1, 1, 1\n", "5, 1, 1, 0, 0, 0, 0\n", "```\n" ] }, { "cell_type": "markdown", "id": "surprised-abortion", "metadata": {}, "source": [ "## Inference" ] }, { "cell_type": "markdown", "id": "departmental-surprise", "metadata": {}, "source": [ "Run inference over different batch sizes and Neuroncore groups to obtain throughput and latency results for ResNet50. To apply dynamic batching, the user batch size is set to 10x the compiled batch size, in order to keep input queue full and to amortize framework-to-Neuron overhead.\n", "\n", "Note: The results are based on the Neuron v1.12.2 (Mar 4th 2021) release. These will continue improve as we increase Neuron performance.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "requested-inspiration", "metadata": {}, "outputs": [], "source": [ "!cd ~/aws-neuron-sdk/src/examples/tensorflow/keras_resnet50/\n", "!echo \"\" > batch.log\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=1 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=2 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=4 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=8 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=12 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=16 | tee -a batch.log; done" ] }, { "cell_type": "markdown", "id": "split-genesis", "metadata": {}, "source": [ "The file batch.log now contains the results for each batch size. We can look at the throughput values to get an idea of which models are performing well. The output should look something like this:\n", "\n", "The model best model configuration for throughput (if you run on an Inf1.6xlarge as suggested in the tutorial) is batch size 5 NeuronCore group size 2. Increasing batch size usually helps to increase throughput (up to a certain extent)." ] }, { "cell_type": "markdown", "id": "filled-township", "metadata": {}, "source": [ "```\n", "*** Compiled batch size 5, user batch size 10, num NeuronCores 2 (input shape: (10, 224, 224, 3), saved model dir: ./rn50_fp16_compiled_b5_nc2/1) ***\n", "\n", "Instance type inf1.6xlarge with 16 NeuronCores\n", "NEURON_MAX_NUM_INFERS (env): 5\n", "NEURONCORE_GROUP_SIZES (env): 2,2,2,2,2,2,2,2\n", "NUM THREADS: 16\n", "NUM_LOOPS_PER_THREAD: 400\n", "USER_BATCH_SIZE: 10\n", "Throughput values collected:\n", "[10680, 10700, 10660]\n", "\n", "(rest of outputs removed)\n", "```" ] }, { "cell_type": "markdown", "id": "189c4f0e-1a4e-4067-921f-95449c45dedd", "metadata": {}, "source": [ "## Known Issues\n", "\n", "### Unable to compile with batch and num NeuronCores combination\n", "\n", "For some combination of batch and number of NeuronCores setting, you may\n", "see an internal compiler error as below. Please see the sweep result\n", "above for Neuron 1/27/20 release. Furthermore, if using auto-casting to\n", "bfloat16 from FP32 network and batch size is larger than 1 would result\n", "in the same error.\n", "\n", "\n", "```bash\n", "\n", "INFO:tensorflow:fusing subgraph neuron_op_a73aed4b95ca5d5b with neuron-cc; log file is at /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neuron-cc.log\n", " WARNING:tensorflow:Failed to fuse subgraph neuron_op_a73aed4b95ca5d5b with '/home/ubuntu/test_venv/bin/neuron-cc compile /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neff --io-config \"{\\\"inputs\\\": {\\\"input_10/_0:0\\\": [[6, 224, 224, 3], \\\"float16\\\"]}, \\\"outputs\\\": [\\\"probs/Softmax:0\\\"]}\" --batching_en --rematerialization_en --sb_size 120 --spill_dis --enable-replication True'\n", " WARNING:tensorflow:neuron-cc error message:\n", " WARNING:tensorflow:01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: ***************************************************************\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: An Internal Compiler Error has occurred\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: ***************************************************************\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Please contact Customer Support and provide the following details.\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error message: Non-zero exit status (134) for command: /home/ubuntu/test_venv/lib/python3.6/site-packages/neuroncc/starfish/bin/list_sch --hhir hh-tr-external-move.json --verbose 0 --sb_size 120 --arith_intensity_target 2300 --sb_watermark_low 0.250000 --sb_watermark_high 0.750000 --sb_size_tol 1 --alloc simple1 --alloc_opt --depth_diff 0.100000 --verbose_start_cycle 0 --tt_dist --mm_meet_cnt 1 --load_speed_factor 0.300000 --schir sch_tmp.json --spill_depth_limit 5 --spill_dis --true_dep --mm_order --batching_en --rematerialization_en\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error class: CompilerInternalError\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error location: job.Scheduler.3\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Command line: /home/ubuntu/test_venv/bin/neuron-cc compile /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neff --io-config '{\"inputs\": {\"input_10/_0:0\": [[6, 224, 224, 3], \"float16\"]}, \"outputs\": [\"probs/Softmax:0\"]}' --batching_en --rematerialization_en --sb_size 120 --spill_dis --enable-replication True\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Internal details:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: File \"neuroncc/driver/Job.py\", line 207, in neuroncc.driver.Job.runSingleInputFn\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: File \"neuroncc/driver/jobs/Scheduler.py\", line 58, in neuroncc.driver.jobs.Scheduler.Scheduler.runSingleInput\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: File \"neuroncc/driver/Job.py\", line 145, in neuroncc.driver.Job.Job.shellCommand\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Version information:\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: Neuron Compiler version 1.0.6632.0+6001610955\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: HWM version 1.0.839.0-6001300654\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: NEFF version 0.6\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: TVM version 1.0.1589.0+6001610955\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: NumPy version 1.16.5\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: MXNet not available\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: TF version 1.15.0\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]:\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "gentle-census", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 5 }