Deploying a PyTorch Model Locally

10 min readDec 1, 2020

So, you’ve trained your amazing new AI Neural Net. It correctly picks stock prices, and ensures that you can make millions of dollars.

But how do you actually use this thing?

To use a trained neural net model, we call this an inference. Every run — which is data applied to the input of the model, and an output of the model, is 1 inference. When we have a trained neural net model, we don’t generally update the neuron weights during an inference. (Note — model inference is also known as prediction, serving, and model evaluation.)

A model can be composed of one or more neural nets. This is where it gets a little tricky — one or more of those neural nets might be updating its weights to learn more. You never know.

In this article, I will refer to model as the system composing the entire set of AI magic — all neural nets and code in the system are part of this. A single neural net is a neural net in the model. We’ll discuss how to finalize the model and deploy it locally in python.

Sounds easy, right? Zoom! Let’s get going!

This photo has no relation to neural nets, i just searched for zoom, and it came up. It was one of the few not related to zoom video conferencing. Also, I would love to visit here.

Let’s start with a quick review of a neural net. We’ll use the DCGAN we trained earlier, in the previous articles I’ve written. Here’s a link to that article: https://allangraves.medium.com/gans-on-the-azure-ml-sea-part-1-e3af65061900

The 5 layers of our DCGAN, translating an input vector into a a 3 channel image

As a quick review, we’ve got a 5 layer neural net, which translates a single 100 length vector (z) into a 3 layer image 64x64. Each step is a simple convolution network.

We set up 2 neural nets, and had them work to train each other. 1 of them was called a Discriminator — this model was used to check the output of the Generator. The Generator was the neural net that did all the work — trying to create new faces. The Discriminator basically told the Generator if the current image was something that it thought was real or not. The Generator just kept on trying.

To deploy this model, we only need to save and deploy our generator model. That’s not to say that there isn’t value in saving our Discriminator, as if we ever wanted to retrain the model, we could easily do so later on, and not have to retrain the Discriminator. You could also use the Discriminator as a final check on the output of the Generator before showing the picture to the user.

At every step of the training process, we made a call:

optimizerG.step()

This was the actual call that worked through the neural network and updated the parameters. If it’s been a while since you worked with neural nets, remember, we have two passes:

Forward — this pass takes input and calculates an output
Backward — this steps through the neural network, updating the parameters of the neural network. This is the important step to training — without this, the neural network doesn’t learn.

The ‘step’ call above is the call in PyTorch that actually updates the parameters in the model.

The value that it uses was computed elsewhere:

# Calculate G's loss based on this output
errG = criterion(output, label)# Calculate gradients for G
errG.backward()

This value uses a loss function to calculate errors — we set up the Generator to use the Discriminator’s output as its label. The ‘step’ function will take the output from the error here and use the new gradients calculated in the backward call to update the Generator.

So, how do we save this?

In PyTorch, a neural net is just a dictionary. (It’s python, so pretty much everything useful is a dictionary.)

After a quick run on my machine, this is what we see for the state dictionary:

Generator’s state dictionary:
main.0.weight torch.Size([100, 512, 4, 4])
main.1.weight torch.Size([512])
main.1.bias torch.Size([512])
main.1.running_mean torch.Size([512])
main.1.running_var torch.Size([512])
main.1.num_batches_tracked torch.Size([])
main.3.weight torch.Size([512, 256, 4, 4])
main.4.weight torch.Size([256])
main.4.bias torch.Size([256])
main.4.running_mean torch.Size([256])
main.4.running_var torch.Size([256])
main.4.num_batches_tracked torch.Size([])
main.6.weight torch.Size([256, 128, 4, 4])
main.7.weight torch.Size([128])
main.7.bias torch.Size([128])
main.7.running_mean torch.Size([128])
main.7.running_var torch.Size([128])
main.7.num_batches_tracked torch.Size([])
main.9.weight torch.Size([128, 64, 4, 4])
main.10.weight torch.Size([64])
main.10.bias torch.Size([64])
main.10.running_mean torch.Size([64])
main.10.running_var torch.Size([64])
main.10.num_batches_tracked torch.Size([])
main.12.weight torch.Size([64, 3, 4, 4])

For each of the items in it, you could dump the values even more. You would eventually see numbers for the current training iteration. Since every training iteration is different, the numbers are merely 1 potential solution for the result.

Similarly, for the Optimizer, you can dump it, and see pages and pages of output for its current state:

Optimizer’s state dictionary:
state {0: {‘step’: 7915, ‘exp_avg’: tensor([[[[-8.4379e-03, -1.6962e-03, 2.0153e-03, -2.3263e-03],
 [ 9.3564e-03, 2.8464e-03, -1.4765e-03, -5.5706e-03],
 [-5.1847e-03, -2.0944e-03, 3.0152e-03, 1.6600e-03],
 [ 3.3945e-03, 2.1832e-03, -7.7789e-03, 1.3872e-03]],[[ 1.3178e-04, 5.9188e-04, -3.4215e-03, 2.8209e-03],
 [ 3.4224e-03, 6.9770e-03, 5.4208e-05, 3.6286e-03],
 [-1.9070e-03, -3.6697e-03, 7.6668e-04, 6.0801e-03],
 [-8.1516e-04, 8.2781e-04, -1.6558e-03, -1.4505e-03]],[[-3.2616e-03, 1.8192e-04, 4.1123e-03, -2.3757e-03],
 [ 2.2800e-04, 2.8144e-03, -4.4250e-03, 1.5944e-03],
 [ 2.1202e-03, -1.6148e-03, 1.3747e-03, 2.0928e-03],
 [ 9.3368e-04, 3.6862e-03, -2.0053e-03, 2.0065e-03]],…,[[-4.8926e-03, -1.9031e-03, -6.0549e-04, -3.5536e-03],
 [ 1.8643e-03, 1.0315e-04, -4.2139e-04, 5.9912e-04],
 [-6.5301e-03, -3.1602e-03, -2.5134e-03, 2.4062e-04],
 [-6.7217e-03, -2.8157e-03, -4.1863e-03, 6.1009e-04]],[[-4.3491e-03, -4.4880e-03, 2.9170e-03, -1.8324e-03],
 [ 5.1785e-03, -2.6050e-03, -7.0698e-04, 6.6632e-04],
 [-2.8962e-03, -3.9267e-03, -1.3776e-05, 3.3616e-03],
 [-2.6761e-03, -1.0645e-02, 4.2122e-04, -6.2535e-04]],[[ 1.4652e-04, 4.9615e-03, -3.5904e-03, -2.4742e-03],

For more on this, see: https://pytorch.org/tutorials/beginner/saving_loading_models.html

We essentially have 3 possibilities for saving things:

Save the state dictionary — this will save the parameters. For training purposes later, you will want to save the optimizer for the particular net as well. This means that for DCGAN, you’ll want to save both the Discriminator, Generator, and their respective optimizers. (You would still need some way to save batch number, epoch, and other training based checkpoints.) According to the PyTorch documentation, this is the recommended method for Inference. This method is best used if you are going to use PyTorch Python to generate inference, or want to compare different models later on. I expect that sometime after PyTorch v1.6 the rest of the ecosystem will catch up with this method.
Next, we have the save of the entire model. This saves the entire model using Python’s pickle. Pickle essentially writes an object to disk (we call this serializing it). The model is actually bound to an exact data path, so if you do later move things around, they can easily break. This is best used by Azure if you are going to register the model, download the model, deploy the model elsewhere using PyTorch Android, ONXX, etc.
Lastly, we have a checkpoint model. This one is handy to resume training with later — it saves any parameter you tell it to in a handy way, so you can load it later. When working with Azure, this is best used if you plan to continue training later, and are just taking a break. (For instance, you’re going to hit your 21 day limit on your pay-as-you-go Azure account.)

The best method for you will depend on exactly what you are attempting to do!

So, for now, let’s go ahead and save and download the files. To a copy of our last script, we’ll add the following to our training script (dcgan_azure-cuda-save.py):

torch.save(model, os.path.join(args.output, 'dcgan.pt'))

That will save our model in full pickle format, and place it in the output directory.

Now, we could add a call to our driver script to do run.download_file, and download the model automatically. I don’t feel like leaving my terminal waiting though, that’s really only good for small projects. If your project takes days or weeks to train, you don’t want to possibly kill the driver script.

So, instead, we’ll use run.get_file_names() to list all the files we created, and then use the Azure Storage Explorer to download that file.

Another way of doing it would be to store the run id, and have a separate script download the model from the run ids. This might be easier if you’re doing hyperparameterization and have 1100s of runs. You could easily downoad the best run (for an example of finding the best run, see https://allangraves.medium.com/finding-waldo-with-hyperparameterization-29b6dbfb1888).

for file in run.get_file_names():
   print("File:" + str(file))

This code will print out the file list at the end of our run.

So here’s a fun time to discover something. I’ve been passing through the same dataset blob location to the training script, wanting to put all the output in a single location.

However, this lead to the following error:

OSError: [Errno 30] Read-only file system: '/tmp/tmpbgzzc7lr/real.png'

This had me digging into some conventions in Azure. I learned that there are 2 special folders:

./logs — these logs are uploaded real time, so you can view them right from the portal while the run is going, just like the other Azure default logs.
./outputs — these files are added to the run as artifacts. So they get logged as part of your experiment history.

Which means, that we do want to log to the ./outputs directory, not to the ../outputs directory. The former is on the writeable blob storage, the latter is on the read-only dataset storage.

After solving that issue, we get the following under our ‘Outputs + logs’ folder in our latest run:

Cool — there’s our finalized model from PyTorch!

And of course, the list of files we created during the run:

File:azureml-logs/55_azureml-execution-tvmps_bdf2d5554aaea8ecb28f826c58cd5da88f16c5a9a4b27fecfef7edda79dcb6c8_d.txt
File:azureml-logs/65_job_prep-tvmps_bdf2d5554aaea8ecb28f826c58cd5da88f16c5a9a4b27fecfef7edda79dcb6c8_d.txt
File:azureml-logs/70_driver_log.txt
File:azureml-logs/process_info.json
File:azureml-logs/process_status.json
File:logs/azureml/97_azureml.log
File:logs/azureml/dataprep/backgroundProcess.log
File:logs/azureml/dataprep/backgroundProcess_Telemetry.log
File:logs/azureml/dataprep/engine_spans_50c08491–8d59–4cf0–81f6–43aac45d3e6c.jsonl
File:logs/azureml/dataprep/engine_spans_cb744712–614b-4295–9815–5cbdaac9fc46.jsonl
File:logs/azureml/dataprep/python_span_50c08491–8d59–4cf0–81f6–43aac45d3e6c.jsonl
File:logs/azureml/dataprep/python_span_cb744712–614b-4295–9815–5cbdaac9fc46.jsonl
File:logs/azureml/job_prep_azureml.log
File:outputs/real.png
File:outputs/sample0.png
File:outputs/sample320.png
File:outputs/sample365.png
File:outputs/sample410.png
File:outputs/sample455.png
File:outputs/sample500.png
File:outputs/sample544.png

At this point, we could create a new run object and download the files using python, or we could download the model manually.

If you click the ‘…’ button next to dcgan.pt, you can select Download.

Let’s create a new directory and copy that file into it — I’ll call my directory ‘nets’.

When we use the model in inference mode, we’ll use it from that directory.

Okay — so now that we’ve created a new neural net, and downloaded it, let’s put it in our python directory. To do this in WSL, open a new Windows Terminal, and copy from your Downloads directory.

cp /mnt/c/Users/<user id>/Downloads/dcgan.pt nets/

This command copies the dcgan.pt to our nets directory.

This script is going to be pretty simple — we’ll do the following:

Add in our Generator structure
Set up the variables that the model needs to run
Load the model
Send a vector into the model
Write the resulting picture out as a file

We’ll call this new program ‘run-dcgan.py’.

The call to load our module is simple:

netG = torch.load(args.net_path)

This results in

AttributeError: Can't get attribute 'Generator' on <module '__main__' from 'run-dcgan.py'>

The module unpickler can’t tell what the class is — it knows it should find a Generator class, but our new program doesn’t have this class in it yet.

So let’s add that class in to our script:

class Generator(nn.Module):
def __init__(self, ngpu):
super(Generator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d(nz, ngf * 8, 4, 1, 0, bias=False),
<snip>
nn.Tanh()
# state size. (nc) x 64 x 64
)

See the source for the full definition.

Once we’ve done this, we can try and run it again.

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.

Boo! Runtime Errors! This one at least has a nice error message. It turns out that the tensors that describe our neural net also are locked to a particular device. So, we’ll add some mapping to map the gpu to the cpu for this run:

device = torch.device(“cuda:0” if (torch.cuda.is_available() and ngpu > 0) else “cpu”)netG = torch.load(args.net_path, map_location=device)

This tells the script to map anything from a GPU to the CPU here. This will let us run the neural net anywhere, because it is a very simple net.

For a more complicated nets, even an inference might be run on a gpu (or more than one!). You might also choose this method if you needed to make inferences many, many times a second.

Lastly, we need to tell PyTorch to not calculate gradients and be ready for backprop — we aren’t training this model.

netG.eval()

Is the magic call we use here.

Once that’s done — we just need to pass the input vector, and we get a picture!

To pass the input vector, we’ll just use a fixed vector of random noise:

fixed_noise = torch.randn(64, nz, 1, 1, device=device)

And finally, to force our generator to create a set of images:

# generate a new Generator using the fixed_noise.
fake = netG(fixed_noise)
# put some better borders between the images in our output - we'll have an array of 64 of them.
output=vutils.make_grid(fake, padding=2, normalize=True)
# save the output in faces.png
vutils.save_image(output, os.path.join(args.output, "faces.png"), normalize=True)