Converting and deploying a PyTorch model tutorial

This tutorial describes how to convert and deploy a PyTorch model using the ML SDK for Vulkan®. In this tutorial, we generate a sample PyTorch file with a single MaxPool2D operation to demonstrate each step of the end-to-end workflow.

1. Run the following python script to create a PyTorch model for a single MaxPool2D operation. For the model input,use a NumPy file. To convert the model to TOSA FlatBuffers, use ExecuTorch:

#!/usr/bin/env python3
#
# SPDX-FileCopyrightText: Copyright 2024-2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
# SPDX-License-Identifier: Apache-2.0
#
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from executorch.backends.arm.arm_backend import ArmCompileSpecBuilder
from executorch.backends.arm.tosa_partitioner import TOSAPartitioner
from executorch.exir import EdgeCompileConfig
from executorch.exir import to_edge_transform_and_lower

# Define model
class MaxPoolModel(nn.Module):
    def __init__(self):
        super(MaxPoolModel, self).__init__()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

    def forward(self, x):
        x = self.pool(x)
        return x


# Generate test input
example_input = torch.randn(1, 3, 64, 64)
np.save("input-0.npy", example_input.numpy())

model = MaxPoolModel().eval()

# Save model intermediates
compile_spec = (
    ArmCompileSpecBuilder()
    .tosa_compile_spec("TOSA-1.0+FP")
    .dump_intermediate_artifacts_to(".")
    .build()
)
partitioner = TOSAPartitioner(compile_spec)

exported_program = torch.export.export_for_training(model, (example_input,))

to_edge_transform_and_lower(
    exported_program,
    partitioner=[partitioner],
    compile_config=EdgeCompileConfig(
        _check_ir_validity=False,
    ),
)
python MaxPool2DModel.py

This generates a TOSA Flatbuffers ${NAME}.tosa, where the tool generates ${NAME}

  1. Convert the TOSA FlatBuffers file into a VGF file:

model-converter --input ${NAME}.tosa --output maxpool.vgf

Note

For more information about the ML SDK Model Converter, see: ML SDK Model Converter

3. Use the VGF Dump Tool to generate a Scenario Template. To run a scenario on the ML SDK Scenario Runner, you must have a scenario specification in the form of a JSON file. Use the VGF file that was generated in the previous step and pass it to the VGF Dump Tool:

$vgf_dump --input maxpool.vgf --output scenario.json --scenario-template

Note

For more information about VGF Library and the VGF Dump Tool, see: ML SDK VGF Library

  1. The generated scenario.json file contains placeholder names for input and output bindings for the scenario. You must replace these names with the actual input and output filenames that will be used when running the scenario. In the example scenario.json file generated in the preceding step:

    1. Replace the name TEMPLATE_PATH_TENSOR_INPUT_0 with the actual input file input-0.npy.

    2. Replace the name TEMPLATE_PATH_TENSOR_OUTPUT_0 with the actual output filename output-0.npy.

Note

For more information about the test description format, see: JSON Test Description Specification.

  1. Run the ML SDK Scenario Runner on the Emulation Layer:

scenario-runner --scenario scenario.json

The output from the scenario is produced as a file named output-0.npy. The file is specified in scenario.json.

Note

For more information about building and running the ML SDK Scenario Runner, see: ML SDK Scenario Runner.

For more information about building and setting up the Emulation Layer, see: ML Emulation Layer for Vulkan®