Vulkan® Custom Shader Conversion to VGF ======================================= Overview -------- A TOSA custom operation is interpreted as a Vulkan® compute shader and converted into a VGF representation when all of the following conditions are met: - Domain Name (``domain_name``) is set to: :: "com.arm.VulkanCustomShader" - Operator Name (``operator_name``) encodes the logical name of the shader. - Implementation Attributes (``implementation_attrs``) follow the schema defined in: :doc:`vkshader-custom-op-encoding` When these conditions are satisfied, the custom operation is treated as a Vulkan® compute shader node during VGF generation. This section defines both the semantic contract and the partitioning behavior required to correctly lower mixed ML and shader models. Required TOSA Fields -------------------- Domain Name (``domain_name``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Must be:** :: com.arm.VulkanCustomShader This value acts as the feature gate that enables Vulkan® shader lowering. Operator Name (``operator_name``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Type:** string Encodes the logical name of the shader. This name is used to identify the shader during conversion and may be used for: - Debug labeling - Pipeline identification - Symbol naming in generated artifacts Implementation Attributes (``implementation_attrs``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Type:** JSON object Must conform to the schema defined in: :: docs/source/vkshader-custom-op-encoding.rst This schema defines: - Entry point - Workgroup sizes - Shader source or SPIR-V™ binary - Descriptor bindings - Resource formats - Push constant layout Model Partitioning ------------------ ML models are represented as directed acyclic graphs (DAGs), where nodes are operations and edges represent data dependencies. Models may contain a mix of: - Regular ML operations - Shader custom operations (as defined above) Vulkan® execution constraints require that: - ML workloads cannot be dispatched in the same pipeline call as shader workloads. - Neural Engine accelerators and classical GPU engines must be scheduled through separate Vulkan® pipelines. Therefore, before VGF generation, the model is partitioned into sub-models that can be executed independently. Partitioning Strategy --------------------- The partitioning follows these principles: - Each shader custom operation forms its own partition. - Regular ML operations occurring: - before the first shader custom operation, - between two shader custom operations, - or after the last shader custom operation, are grouped into ML partitions. - Connections crossing partition boundaries become explicit inputs and outputs of the generated sub-models. This transforms a single model DAG into a sequence (or DAG) of sub-graphs, each mapped to an independent Vulkan® pipeline. Execution Correctness --------------------- Partitioning preserves the semantics of the original model by: - Respecting all data dependencies. - Ensuring that an operation executes only after its inputs are produced. - Introducing synchronization between partitions where required (e.g., memory barriers and cache invalidation). A partition cannot execute until all partitions producing its required inputs have completed. This guarantees ordering correctness while allowing execution on heterogeneous GPU engines. Partitioning Algorithm (Conceptual) ----------------------------------- The model is traversed in breadth-first order. Each operation is assigned a ``partition_id``. - Shader custom operations always receive a new partition. - Regular ML operations inherit or create partitions based on the partitions of their parent operations. All operations sharing the same ``partition_id`` belong to the same sub-model. The algorithm ensures: - Parent operations are processed before their children. - Shader and ML operations never coexist in the same partition. - Partition identifiers increase monotonically. Conversion Semantics -------------------- When a TOSA custom operation satisfies the required constraints: 1. The operation is isolated into its own partition. 2. A VGF compute node is generated for that partition. 3. The shader is materialized as: - SPIR-V™ binary (decoded from base64 if necessary), or - Compiled from textual source if GLSL or HLSL. 4. Descriptor sets and bindings are created according to the indexed ``input__*`` and ``output__*`` attributes. 5. Workgroup sizes are mapped to the compute dispatch configuration. 6. Synchronization is inserted between adjacent partitions as required. The conversion is deterministic and fully driven by the schema-defined attributes. Minimal Example --------------- .. code-block:: json { "domain_name": "com.arm.VulkanCustomShader", "operator_name": "MyComputeKernel", "implementation_attrs": { "entry_point": "main", "workgroup_sizes": [8, 8, 1], "shader_language": "GLSL", "shader_code": "void main() { }", "input_0_vkformat": "VK_FORMAT_R32_SFLOAT", "input_0_vkdescriptortype": "VK_DESCRIPTOR_TYPE_STORAGE_BUFFER", "input_0_binding": 0, "input_0_descriptorset": 0 } } Design Notes ------------ - Domain name (``domain_name``) acts as an explicit opt-in mechanism. - The schema ensures that only well-formed Vulkan®-compatible shaders are converted. - The conversion assumes compute shader semantics. - Partitioning enables efficient scheduling across GPU engines while preserving correctness. Any custom operation not satisfying these constraints is not interpreted as a Vulkan® shader and is handled according to the default custom operation path.