NewsLab
Jun 28 17:06 UTC

Decoupling Compute and Memory for Async GPUs (news.ycombinator.com)

8 points|by yiyingzhang||2 comments|Read full story on news.ycombinator.com
Cool open-source project that introduces a new programming model for decoupling compute and memory for NVIDIA GPUs that supports asynchronous memory operations (e.g., Hopper). 12% perf improvement over SOTA and 67% less kernel code.

Paper: "VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU" arXiv:2605.03190

Comments (2)

2 shown
  1. 1. bobbyzhu2008||context
    67% less kernel code is the more interesting number here — Hopper's async capabilities have been underutilized largely because the programming model is painful. Curious how it handles cases where compute and memory phases aren't cleanly separable.
  2. 2. jhap||context
    This seems like a better version of CUDA, for Hopper GPUs?