Skip to content

Conversation

@YaelGitAccount
Copy link
Contributor

Summary

Adds CUDA support for GGML_OP_CONV_3D, enabling full 3D convolution on NVIDIA GPUs with correct multi-dimensional indexing.
The implementation matches the CPU semantics exactly, including fused channel dimensions and nb[] byte-stride layout.

Changes

  • Added conv3d.cu and conv3d.cuh with CUDA kernel and helpers
  • Added dispatch path in ggml-cuda.cu
  • Updated operator registration in ggml-cuda.cu
  • Updated docs/ops.md and docs/ops/CUDA.csv to include CONV_3D

Implementation

  • One CUDA thread per output element (batch × OC × OD × OH × OW)
  • Correct fused-dimension addressing:
    • Input: b * IC + ic
    • Kernel: oc * IC + ic
    • Output: b * OC + oc
  • Full nb[] stride-aware indexing matching CPU layout
  • Supports F32 input/output and F16/F32 kernel weights
  • Fully respects stride, padding, dilation, and 3D spatial dimensions
  • Follows existing CUDA backend structure and coding conventions

Testing

  • All CONV_3D backend tests pass for CUDA (F32/F16 kernels, all shapes)
  • Numerical parity with CPU across all tested configurations
  • No regressions in CUDA backend test suite
  • Full backend test suite passes (no global regressions)

Compatibility

  • CUDA backend only
  • CPU path unchanged
  • No external dependencies added
  • Preserves GGML tensor layout conventions

@github-actions github-actions bot added documentation Improvements or additions to documentation Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 14, 2025
@YaelGitAccount
Copy link
Contributor Author

This PR is ready for review.
Tagging @CISC and @slaren— your feedback would be greatly appreciated whenever you have the chance.
Thanks for your work on maintaining and improving the CUDA backend!

@CISC
Copy link
Collaborator

CISC commented Nov 14, 2025

Unfortunately, the reason no backends support CONV_3D is that ggml_conv_3d uses the IM2COL_3D op instead. This is an unused op.

@Green-Sky
Copy link
Collaborator

There also exists #16948 . You can use the conv3d test program from that pr to compare the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants