Vulkan® Custom Shader Conversion to VGF

Overview

A TOSA custom operation is interpreted as a Vulkan® compute shader and converted into a VGF representation when all of the following conditions are met:

  • Domain Name (domain_name) is set to:

    "com.arm.VulkanCustomShader"
    
  • Operator Name (operator_name) encodes the logical name of the shader.

  • Implementation Attributes (implementation_attrs) follow the schema defined in:

    Vulkan® Shader in TOSA Custom Operation Attributes

When these conditions are satisfied, the custom operation is treated as a Vulkan® compute shader node during VGF generation.

This section defines both the semantic contract and the partitioning behavior required to correctly lower mixed ML and shader models.

Required TOSA Fields

Domain Name (domain_name)

Must be:

com.arm.VulkanCustomShader

This value acts as the feature gate that enables Vulkan® shader lowering.

Operator Name (operator_name)

Type: string

Encodes the logical name of the shader.

This name is used to identify the shader during conversion and may be used for:

  • Debug labeling

  • Pipeline identification

  • Symbol naming in generated artifacts

Implementation Attributes (implementation_attrs)

Type: JSON object

Must conform to the schema defined in:

docs/source/vkshader-custom-op-encoding.rst

This schema defines:

  • Entry point

  • Workgroup sizes

  • Shader source or SPIR-V™ binary

  • Descriptor bindings

  • Resource formats

  • Push constant layout

Model Partitioning

ML models are represented as directed acyclic graphs (DAGs), where nodes are operations and edges represent data dependencies.

Models may contain a mix of:

  • Regular ML operations

  • Shader custom operations (as defined above)

Vulkan® execution constraints require that:

  • ML workloads cannot be dispatched in the same pipeline call as shader workloads.

  • Neural Engine accelerators and classical GPU engines must be scheduled through separate Vulkan® pipelines.

Therefore, before VGF generation, the model is partitioned into sub-models that can be executed independently.

Partitioning Strategy

The partitioning follows these principles:

  • Each shader custom operation forms its own partition.

  • Regular ML operations occurring: - before the first shader custom operation, - between two shader custom operations, - or after the last shader custom operation, are grouped into ML partitions.

  • Connections crossing partition boundaries become explicit inputs and outputs of the generated sub-models.

This transforms a single model DAG into a sequence (or DAG) of sub-graphs, each mapped to an independent Vulkan® pipeline.

Execution Correctness

Partitioning preserves the semantics of the original model by:

  • Respecting all data dependencies.

  • Ensuring that an operation executes only after its inputs are produced.

  • Introducing synchronization between partitions where required (e.g., memory barriers and cache invalidation).

A partition cannot execute until all partitions producing its required inputs have completed.

This guarantees ordering correctness while allowing execution on heterogeneous GPU engines.

Partitioning Algorithm (Conceptual)

The model is traversed in breadth-first order. Each operation is assigned a partition_id.

  • Shader custom operations always receive a new partition.

  • Regular ML operations inherit or create partitions based on the partitions of their parent operations.

All operations sharing the same partition_id belong to the same sub-model.

The algorithm ensures:

  • Parent operations are processed before their children.

  • Shader and ML operations never coexist in the same partition.

  • Partition identifiers increase monotonically.

Conversion Semantics

When a TOSA custom operation satisfies the required constraints:

  1. The operation is isolated into its own partition.

  2. A VGF compute node is generated for that partition.

  3. The shader is materialized as: - SPIR-V™ binary (decoded from base64 if necessary), or - Compiled from textual source if GLSL or HLSL.

  4. Descriptor sets and bindings are created according to the indexed input_<index>_* and output_<index>_* attributes.

  5. Workgroup sizes are mapped to the compute dispatch configuration.

  6. Synchronization is inserted between adjacent partitions as required.

The conversion is deterministic and fully driven by the schema-defined attributes.

Minimal Example

{
  "domain_name": "com.arm.VulkanCustomShader",
  "operator_name": "MyComputeKernel",
  "implementation_attrs": {
    "entry_point": "main",
    "workgroup_sizes": [8, 8, 1],
    "shader_language": "GLSL",
    "shader_code": "void main() { }",

    "input_0_vkformat": "VK_FORMAT_R32_SFLOAT",
    "input_0_vkdescriptortype": "VK_DESCRIPTOR_TYPE_STORAGE_BUFFER",
    "input_0_binding": 0,
    "input_0_descriptorset": 0
  }
}

Design Notes

  • Domain name (domain_name) acts as an explicit opt-in mechanism.

  • The schema ensures that only well-formed Vulkan®-compatible shaders are converted.

  • The conversion assumes compute shader semantics.

  • Partitioning enables efficient scheduling across GPU engines while preserving correctness.

Any custom operation not satisfying these constraints is not interpreted as a Vulkan® shader and is handled according to the default custom operation path.