NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Task Failed Successfully: Saturating NIC and Disk Bandwidth (blog.mrcroxx.com)
nycerrrrrrrrrr 1 days ago [-]
This might be orthogonal to the TLB miss overhead you found, but have you looked at using P2PDMA to transfer directly from the NVMe SSDs to the NIC? Not sure how the CRC calculation would play into that.
MrCroxx 10 hours ago [-]
Thank you for your reply. This is a long-running service. Without CRC validation, errors caused by partial writes could accumulate over time and affect correctness. Therefore, we adopted this approach.
nycerrrrrrrrrr 4 hours ago [-]
Depending on your hardware you may be able to do the CRC validation in the NIC. If you're using Intel you could use DSA also, but then you're still copying the data through DRAM.

Just throwing out some ideas, obviously the best solution is the one that you already have working :)

jeffbee 1 days ago [-]
It seems that you could have reached this conclusion faster by elaborating on your use of the profiler. Don't assume that cycles are spent on instructions. Look at your IPC and drill down into what CPU-bound means for your workload. In your case I think a standard top down analysis would have made the virtual memory management cost jump right out.
MrCroxx 9 hours ago [-]
Thank you for your suggestion. You are absolutely right, XD. In fact, the order of events in this blog post does not match the actual order in which I debugged and analyzed the issue.

After I identified the TLB misses and confirmed that huge pages were effective, I noticed that there were still many suspicious points in the flame graph. Interestingly, my agent attributed the effectiveness of huge pages to those suspicious points, which turned out to be unrelated to the bottleneck at the time. That sparked my curiosity.

The structure of this blog post was mainly chosen to make the story easier to follow, while also covering the various issues I investigated in depth along the way.

In fact, I recently switched from my previous job building data infrastructure on cloud to an HPC-related role, so I am still not very familiar with some of the mature practices and established conclusions in the HPC world.

So thank you very much for your suggestions. I also hope to learn about more and better methods that can help people identify root causes more quickly and accurately in complex scenarios.

MrCroxx 5 days ago [-]
Author here. This post is a write-up of a performance-debugging rabbit hole I hit while trying to saturate NICs with NVMe reads using io_uring and RDMA.

The short version: READ_FIXED fixed the obvious per-I/O GUP overhead in a small demo, but the larger deployment still got stuck at roughly half of line rate. After ruling out io-wq backlog, request splitting, fd lookup, and CRC arithmetic, the actual wall turned out to be dTLB misses from scanning 1,028 KiB buffers backed by 4 KiB pages. Moving the read arena to hugepages brought the system close to NIC saturation.

The funny part is that an AI agent suggested hugepages early and got the optimization right, but its explanation was wrong. This post is mostly about reconstructing the evidence for why it worked.

I’d be very interested in feedback from people who have used AI to debug performance issues in a complex system.

ozgrakkurt 1 days ago [-]
I disagree with the AI part. Because hugepages is one of the things that can be guessed to improve performance when doing something with substantial amount of data.

So anyone familiar with the space could have suggested something like that without knowing the details of the problem. Hence it is not useful advice IMO.

That aside, the blog post was really cool to read and a instant favorite, wish there were more english posts on the blog.

Especially like the hardware limit based expectations, detailed measurements and the writing style.

MrCroxx 9 hours ago [-]
Thank you for liking this blog. I agree with your point. Actually, I’ve just recently transitioned from building data infrastructure on the cloud to taking on a high-performance computing role that truly handles massive amounts of data. So, although I’d heard about the benefits of hugepages before, I had never actually reproduced these issues in my own environment. This time, even though I initially suspected the problems were related to hugepages and the TLB, I didn’t write this blog from a seasoned perspective. Instead, I wanted to methodically investigate and eliminate all other possible issues I could think of. (Interestingly, my agent attributed the effectiveness of hugepages to the root causes of these bugs, which piqued my curiosity and drove my deeper exploration.)

Finally, thank you very much for your appreciation, which means a lot to me. Previously, I was working on open-source projects, but now that I’ve changed jobs, I may not have the same amount of energy to contribute to open-source code as before. However, I think blogging might be a new way for me to contribute. I hope I can keep it up.

(My English writing skills are poor, so I wrote in Chinese and used AI to translate it; I hope you don’t mind.)

MrCroxx 9 hours ago [-]
But I'm trying my best to practice it. Hope one day I can produce some solid posts in English directly. qwq
rnio 1 days ago [-]
[flagged]
serious_angel 1 days ago [-]
[flagged]
dang 22 hours ago [-]
You crossed into personal attack here, and that's not allowed on HN. It's not what this site is for, and destroys what it is for.

If you'd please review the site guidelines and stick to them when posting, we'd appreciate it.

https://news.ycombinator.com/newsguidelines.html

modslieulie 22 hours ago [-]
Lies lies lies Mods routinely break their own policies. You delete accounts that do not break any guidelines, you delete accounts because they say things you don't agree with, lies lies lies.

You allow government agents to do whatever they want on this site. Lies lies lies. You're so full of yourself too it's gross. Liar.

dang 20 hours ago [-]
None of this is true as far as I know (ok, other than being full of myself). But why not supply links so readers can make up their own minds?
ozgrakkurt 1 days ago [-]
It is obvious that the blog is good quality if you have moderate knowledge on the subject and read the blog post.
MasterScrat 1 days ago [-]
This sounds like a strong statement with little backing. The author does infra at DeepSeek if his LinkedIn is to be trusted, and is the author of Foyer.
flipped 1 days ago [-]
[flagged]
dang 22 hours ago [-]
We've banned this account for repeatedly breaking the site guidelines and ignoring our request to stop.

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 18:25:35 GMT+0000 (Coordinated Universal Time) with Vercel.