NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Decoupling Compute and Memory for Async GPUs
bobbyzhu2008 21 hours ago [-]
67% less kernel code is the more interesting number here — Hopper's async capabilities have been underutilized largely because the programming model is painful. Curious how it handles cases where compute and memory phases aren't cleanly separable.
jhap 19 hours ago [-]
This seems like a better version of CUDA, for Hopper GPUs?
preetham_rangu 7 hours ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 14:24:41 GMT+0000 (Coordinated Universal Time) with Vercel.