Next.js App Router + React Server Components Demo

NHacker Next

new
past
show
ask
show
jobs
submit

▲Modern GPU Programming for MLSys (mlc.ai)

76 points by crowwork 4 days ago | 16 comments

throwaw12 19 hours ago [-]

So many frameworks are being built.

What are the state of the art frameworks in ML programming area? Similar to what React is for web and tailwind for CSS

Triton, ONNX, JAX, PyTorch, cublass, .....

I know they might be for different purposes, but having some idea what is for what and when to use would be helpful

mathisfun123 19 hours ago [-]

> ONNX, JAX, PyTorch

these are model-level frameworks

> Triton

this is a kernel DSL

> [cublas]

this is a BLAS library built atop CUDA

> I know they might be for different purposes, but having some idea what is for what and when to use would be helpful

when people ask this question i always ask: who are you and what is your job? if you're not an ML/DL/AI person then you knowing the specifics is about as useful as me knowing the specifics of react/express/angular/tailwind/django/whatever as an ML person. this is not meant to be condescending, this is meant to allay your anxiety, ie that if you ever find yourself in the position where you have to know these things for your job, it won't be that hard to figure out (just like it isn't that hard to figure out the difference between react and express and django if you're a webdev).

throwaw12 19 hours ago [-]

I am a product engineer in yet another enterprise SaaS CRUD shop, who wants to learn more about the landscape and find the way to enter it eventually.

mathisfun123 19 hours ago [-]

> wants to learn more about the landscape and find the way to enter it eventually

let's swap roles and let's pretend i'm an ML engineer asking you how to enter CRUD. what would you tell me? my strong suspicion (if i caught you in an honest, frank, moment) is you would say to me "why the fuck would you want to do that - it sucks". i have this suspicion because i did actually used to do CRUD and it does suck! but here's your moment of zen: so does ML/DL/AI. it really really does suck. it's basically just as bad as webdev in terms of tedium/boredom/incidental complexity/etc. it's not fun, interesting, exciting, whatever else you're projecting based on an outside-looking-in-perspective.

now i'll acknowledge that there's one big difference: the pay is way better at the far end of the distribution - meaning if you can get to a FAANG ML team then you'll get more money than you're probably getting now (and a ton more stress too) and it's even more than the CRUD devs in FAANG. fine. but ask yourself if it's really worth learning a whole heap of new bullshit just for a chance at more money (no guarantee).

okay now a useful/practical answer: i went back to school for a PhD but i should've just dropped out with the MS. do that. even better do Georgia Tech's online MS.

golly_ned 15 hours ago [-]

I'm from the ML platforms and systems domain.

I strongly recommend it if one's able. It's a bit more stable than a quickly evolving ML/DL/AI ecosystem or frontend ecosystem. The skills are more durable. It repays deep investment and knowledge.

It allows you to straddle both the distributed systems and services domain and the ML domain.

ML systems problems are extremely interesting since they require extremes of compute, storage, network, and latency, in very different parts of the model lifecycle. Its unique problem is the scarcity and cost of hardware accelerators.

I've worked eleven years in the space and rarely have had the desire to leave.

mathisfun123 15 hours ago [-]

> rarely have had the desire to leave.

I'm currently a GPU compiler engineer in FAANG specializing in compute (not graphics). So clearly ML systems. Prior I have worked at every level of stack above and during my PhD I worked below (RTL). I hate it and think about leaving every day (I stay because of the money and like wtf else am I gonna do lol).

saagarjha 9 hours ago [-]

Are you willing to take a pay cut?

dv35z 17 hours ago [-]

Would you (or someone else passionate about this topic) consider answering the question directly? I am curious about this too.

golly_ned 15 hours ago [-]

Pytorch is widely accepted as the de facto ML framework in both research and industry. TensorFlow comes second in industry. Jax is hardly used at all, but uses the same backend as TensorFlow.

Triton is a python-like language to define ML math operations that run efficiently on hardware accelerators like GPUs or TPUs. OpenAI open sourced it. If there's a particular math operation you have a unique need for in your model, and it hasn't already been implemented by some other library, and it's important for efficiency, you'd probably write it in triton these days. It'll be compiled to an intermediate representation, then to an efficient runtime.

The course linked deals with "MLSys", or "ml systems". That means using GPUs and other hardware accelerators efficiently to run ML math operations on one or more computers.

95% of working ML engineers will never need to write Triton, and will be more than satisfied with PyTorch. Many more ML engineers will, nevertheless, write Triton code, because it is interesting, fun, easy, and people are impressed when you tell them you did.

Hosting pytorch models efficiently is currently awkward, because there's no clear winner in the ecosystem. ONNX is a way of representing model graphs in a framework-agnostic way. Other systems can interpret ONNX graphs to do inference. So sometimes, when someone wants to host a pytorch model, they turn it into an ONNX model and run it with an efficient runtime on CPUs or GPUs.

mathisfun123 15 hours ago [-]

> Triton is a python-like language to define ML math operations that run efficiently on hardware accelerators like GPUs or TPUs. OpenAI open sourced it.

This is incorrect. Triton has literally no path to TPU and it has always been open source because it was Philippe Tillet's PhD project (OAI simply hired Philippe).

> 95% of working ML engineers will never need to write Triton, and will be more than satisfied with PyTorch.

Maybe 95% of hobbyist ML engineers but professional ML engineers are absolutely writing Triton day-to-day (eg FB has an army of such people). Even if you're not writing Triton you're still using Triton through inductor.

> because it is interesting, fun, easy, and people are impressed when you tell them you did

Professionals write Triton not for any of the reasons you mentioned but for the same reason they wrote CUDA kernels prior: it's a path to peak performance for their specific workloads (where stock PyTorch kernels have mediocre performance).

mathisfun123 1 days ago [-]

"Modern [NVIDIA GPU] Programming for ..."

Everything after "Pipelining GEMM with TMA" (inclusive) is specific to NVIDIA. Which is fine but the title (of the guide itself) is clearly misleading.

nh23423fefe 22 hours ago [-]

> Our main target is the Blackwell generation,

misleading?

mathisfun123 21 hours ago [-]

what is it with hn people where they willfully misinterpret the simplest observations;

> the title (of the guide itself) is clearly misleading.

...

> title: the distinguishing name of a written, printed, or filmed production

do you understand now? or do i need to also define for you the word misleading?

nh23423fefe 21 hours ago [-]

nah talking to you sucks.

hazard 22 hours ago [-]

This looks great, but I'd really like to see associated exercises (and solutions) to make it useful for self-study

reinitctxoffset 18 hours ago [-]

I can't signal boost this enough.

I spent months, months of late nights watching commits to nvfuser and shit, I wrote a SASS decompiler instrumented everything trying to learn Blackwell.

This is the first time I've seen something so clean, just a real work of scholarship on it.

My hat is off to the authors and the contribution it represents.

If I would caution a reader anything it's that the 2CTA (sm_100 sm_110) patterns here are different on 1CTA in important ways and it's not a better / worse thing, they are good for different workloads.

Really outstanding work. I proves q lot of this in lean4 and published but I got lazy short of really doing the pedagogical work.

This is what you should be starting with if you want to max out 2CTA gear, it's immaculate.

Rendered at 18:37:12 GMT+0000 (Coordinated Universal Time) with Vercel.