Continuously Train, Optimize, and Deploy Tensorflow AI Models

The artifacts generated by Tensorflow AI model training are a bit more complex - and flexible - than traditional ML artifacts.  In this post, I describe the key components to saving, restoring, and optimizing Tensorflow AI Models.

The goal of training an ML or AI model is to ultimately serve the model for prediction aka. inference.  Therefore, restoring, optimizing, and deploying a trained model are just as important as the actual training itself.  Additionally, once a model is trained, it can be usually be simplified and optimized specifically for serving.

The faster we can retrain and push new models to production, the faster we can adapt to real-time trends and maximize our model's predictive power.  The holy grail of machine learning and artificial intelligence is continuous model training, optimizing, and deploying within a real-time environment using live production data.

In this post, we describe how to achieve this holy grail!

Model Architecture

Neurons and layers make up the model architecture of a neural network.  The number of neurons within a layer - and number of layers - are chosen upfront.  The choices are somewhat arbitrary and usually based on empirical testing.  And while the model architecture can technically change during training, this is not common.

Model Training

Once a model architecture is chosen, we train the model using labeled training data.  We're training the edge weights between the neurons and layers in our network based on initially-random inputs and known outputs (aka labels).

Similar to traditional machine learning, we usually hold out 15-20% of the training data to test/validate our model after training.  If the model doesn't do well on this labeled test data, we should modify our hyper-parameters - including the model architecture!

Model Saving

Graph:  Model Architecture

Variables:  Stored Separately


Saving V1, V2




Signature:  Inputs and Outputs (like Method Signature)







Model Optimizing

Running on your trained model will strip many training-specific nodes from the graph.  This helps to improve inference performance.  You can reference the accompanying for example usage.

Nodes and Node Attributes Not Needed for Inference

  • devices:  GPUs and other worker types used during training may not be available in the deployment environment.  passing clear_devices=True will strip the device information from the graph.
  • drop out nodes:  only used for training
  • - Removing training-only operations like checkpoint saving.

    - Stripping out parts of the graph that are never reached.

    - Removing debug operations like CheckNumerics.

    - Folding batch normalization ops into the pre-calculated weights.

    - Fusing common operations into unified versions.

Model Inference

Inference is a fancy synonym for prediction.  In a neural network context, this is simply a forward pass through your network.  Inputs are fed into the network.  Outputs are the predictions or classifications by the model architecture (ie. layers, neurons) and the parameters (ie. weights, biases) learned during the training phase.



Have more questions? Submit a request