minuet.nn.functional.convolution#
Functions
|
Benchmark the gather operation |
|
Benchmark the GEMM operation |
|
Benchmark the scatter operation |
|
Set the parallelization level (number of CUDA streams) for executing GEMM operations |
|
Executes the forward pass of a sparse convolution |
- cuda_time_gather(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, tile_size: int, allow_shortcut_matmul: bool = False, threshold: float = 0) float #
Benchmark the gather operation
- Parameters:
weights – the weight tensors
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_sizes – the sizes of each weight in the kernel map
tile_size – the tile size for the gather operation
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
threshold – the threshold that controls the padding of the GEMM operands
- Returns:
the measured time of the gather operation
- cuda_time_gemm(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, allow_shortcut_matmul: bool = False, parallel: int | None = None, threshold: float = 0)#
Benchmark the GEMM operation
- Parameters:
weights – the weight tensors
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_sizes – the sizes of each weight in the kernel map
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
parallel – the parallelization level of the GEMM operation
threshold – the threshold that controls the padding of the GEMM operands
- Returns:
the measured time of the GEMM operation
- cuda_time_scatter(weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_sizes: Tensor, tile_size: int, allow_shortcut_matmul: bool = False, threshold: float = 0)#
Benchmark the scatter operation
- Parameters:
weights – the weight tensors
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_sizes – the sizes of each weight in the kernel map
tile_size – the tile size for the scatter operation
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
threshold – the threshold that controls the padding of the GEMM operands
- Returns:
the measured time of the scatter operation
- set_gemm_parallel_level(level: int)#
Set the parallelization level (number of CUDA streams) for executing GEMM operations
- Parameters:
level – the parallelization level
- sparse_convolution_forward(sources: Tensor, weights: Tensor, source_masks: Tensor, target_masks: Tensor, kernel_map_order: Tensor | None, kernel_map_sizes: Tensor, gather_tile_size: int, scatter_tile_size: int, allow_shortcut_matmul: bool = False, parallel: int | None = None, threshold: float | None = 0) Tensor #
Executes the forward pass of a sparse convolution
- Parameters:
sources – the feature tensor of the input
SparseTensor
weights – the weight tensor of the
SparseConv
source_masks – the source masks from the kernel map
target_masks – the target masks from the kernel map
kernel_map_order – the order of the sorted weights by the kernel map sizes
kernel_map_sizes – the sizes of each weight in the kernel map
gather_tile_size – the tile size for the gather operation
scatter_tile_size – the tile size for the scatter operation
allow_shortcut_matmul – whether allows shortcut of computing trivial weight
parallel – the parallelization level of the GEMM operation
threshold – the threshold that controls the padding of the GEMM operands
- Returns:
the feature tensor of the output
SparseTensor