minuet.nn.functional.convolution#

Functions

cuda_time_gather(weights, source_masks, ...)

Benchmark the gather operation

cuda_time_gemm(weights, source_masks, ...[, ...])

Benchmark the GEMM operation

cuda_time_scatter(weights, source_masks, ...)

Benchmark the scatter operation

set_gemm_parallel_level(level)

Set the parallelization level (number of CUDA streams) for executing GEMM operations

sparse_convolution_forward(sources, weights, ...)

Executes the forward pass of a sparse convolution

cuda_time_gather(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, tile_size: int, allow_shortcut_matmul: bool = False, threshold: float = 0) float#

Benchmark the gather operation

Parameters:
  • weights – the weight tensors

  • source_masks – the source masks from the kernel map

  • target_masks – the target masks from the kernel map

  • kernel_map_sizes – the sizes of each weight in the kernel map

  • tile_size – the tile size for the gather operation

  • allow_shortcut_matmul – whether allows shortcut of computing trivial weight

  • threshold – the threshold that controls the padding of the GEMM operands

Returns:

the measured time of the gather operation

cuda_time_gemm(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, allow_shortcut_matmul: bool = False, parallel: int | None = None, threshold: float = 0)#

Benchmark the GEMM operation

Parameters:
  • weights – the weight tensors

  • source_masks – the source masks from the kernel map

  • target_masks – the target masks from the kernel map

  • kernel_map_sizes – the sizes of each weight in the kernel map

  • allow_shortcut_matmul – whether allows shortcut of computing trivial weight

  • parallel – the parallelization level of the GEMM operation

  • threshold – the threshold that controls the padding of the GEMM operands

Returns:

the measured time of the GEMM operation

cuda_time_scatter(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, tile_size: int, allow_shortcut_matmul: bool = False, threshold: float = 0)#

Benchmark the scatter operation

Parameters:
  • weights – the weight tensors

  • source_masks – the source masks from the kernel map

  • target_masks – the target masks from the kernel map

  • kernel_map_sizes – the sizes of each weight in the kernel map

  • tile_size – the tile size for the scatter operation

  • allow_shortcut_matmul – whether allows shortcut of computing trivial weight

  • threshold – the threshold that controls the padding of the GEMM operands

Returns:

the measured time of the scatter operation

set_gemm_parallel_level(level: int)#

Set the parallelization level (number of CUDA streams) for executing GEMM operations

Parameters:

level – the parallelization level

sparse_convolution_forward(sources: Tensor, weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_order: Tensor | None, kernel_map_sizes: Tensor, gather_tile_size: int, scatter_tile_size: int, allow_shortcut_matmul: bool = False, parallel: int | None = None, threshold: float | None = 0) Tensor#

Executes the forward pass of a sparse convolution

Parameters:
  • sources – the feature tensor of the input SparseTensor

  • weights – the weight tensor of the SparseConv

  • source_masks – the source masks from the kernel map

  • target_masks – the target masks from the kernel map

  • kernel_map_order – the order of the sorted weights by the kernel map sizes

  • kernel_map_sizes – the sizes of each weight in the kernel map

  • gather_tile_size – the tile size for the gather operation

  • scatter_tile_size – the tile size for the scatter operation

  • allow_shortcut_matmul – whether allows shortcut of computing trivial weight

  • parallel – the parallelization level of the GEMM operation

  • threshold – the threshold that controls the padding of the GEMM operands

Returns:

the feature tensor of the output SparseTensor