minuet.nn.functional.convolution#

Functions

`cuda_time_gather`(weights, source_masks, ...)	Benchmark the gather operation
`cuda_time_gemm`(weights, source_masks, ...[, ...])	Benchmark the GEMM operation
`cuda_time_scatter`(weights, source_masks, ...)	Benchmark the scatter operation
`set_gemm_parallel_level`(level)	Set the parallelization level (number of CUDA streams) for executing GEMM operations
`sparse_convolution_forward`(sources, weights, ...)	Executes the forward pass of a sparse convolution

cuda_time_gather(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, tile_size: int, allow_shortcut_matmul: bool = False, threshold: float = 0) → float#

Benchmark the gather operation

Parameters:

weights – the weight tensors
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_sizes – the sizes of each weight in the kernel map
tile_size – the tile size for the gather operation
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
threshold – the threshold that controls the padding of the GEMM operands

Returns:

the measured time of the gather operation

cuda_time_gemm(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, allow_shortcut_matmul: bool = False, parallel: int | None = None, threshold: float = 0)#

Benchmark the GEMM operation

Parameters:

weights – the weight tensors
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_sizes – the sizes of each weight in the kernel map
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
parallel – the parallelization level of the GEMM operation
threshold – the threshold that controls the padding of the GEMM operands

Returns:

the measured time of the GEMM operation

cuda_time_scatter(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, tile_size: int, allow_shortcut_matmul: bool = False, threshold: float = 0)#

Benchmark the scatter operation

Parameters:

weights – the weight tensors
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_sizes – the sizes of each weight in the kernel map
tile_size – the tile size for the scatter operation
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
threshold – the threshold that controls the padding of the GEMM operands

Returns:

the measured time of the scatter operation

set_gemm_parallel_level(level: int)#

Set the parallelization level (number of CUDA streams) for executing GEMM operations

Parameters:: level – the parallelization level

sparse_convolution_forward(sources: Tensor, weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_order: Tensor | None, kernel_map_sizes: Tensor, gather_tile_size: int, scatter_tile_size: int, allow_shortcut_matmul: bool = False, parallel: int | None = None, threshold: float | None = 0) → Tensor#

Executes the forward pass of a sparse convolution

Parameters:

sources – the feature tensor of the input SparseTensor
weights – the weight tensor of the SparseConv
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_order – the order of the sorted weights by the kernel map sizes
kernel_map_sizes – the sizes of each weight in the kernel map
gather_tile_size – the tile size for the gather operation
scatter_tile_size – the tile size for the scatter operation
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
parallel – the parallelization level of the GEMM operation
threshold – the threshold that controls the padding of the GEMM operands

Returns:

the feature tensor of the output SparseTensor