NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Making Julia as Fast as C++ (2019) (flow.byu.edu)
StilesCrisis 1 days ago [-]
Punchline: rewrote the code to look almost identical to C++, hand-held the compiler by adding @-marks to disable safety checks, forced SIMD codegen and fastmath on.

End result: code that is uglier and still much slower than C++. Kind of a shame.

celrod 21 hours ago [-]
I was once a bit of a Julia performance expert, but moved toward c++ for hobby projects even while still using Julia professionally.

I wrote a blog post at the time with exactly that punchline (not explicitly stated, but just look at the code!): https://spmd.org/posts/multithreadedallocations/ The example was similar to a real production-critical hot path from work.

Maybe things changed since I left Julia, but that was December 2023, for years after this blog post.

arbitrandomuser 21 hours ago [-]
hey , what happened to LoopModels ?
celrod 20 hours ago [-]
I'm still working on it. I'm currently working on a cache tile-size optimization algorithm that should (a) handle trees (a set of loops can be merged at some cache levels and split at others, e.g. in an MLP it may carry an output through the L3 cache, while doing sub-operations in the L2/L1/registers) (b) converge reasonably quickly so compile times are acceptable.

This is the last step before I move to code generation and then generating a ton of test cases/debugging.

My goal is some form of release by the end of the year.

arbitrandomuser 20 hours ago [-]
oh , is it closed source now ? i couldnt find it on github anymore , github.com/LoopModels returns a 404.
celrod 19 hours ago [-]
Yeah, for now. I'd like it to be open, but I also want to potentially be able to make money/a living off of it. My dream would be that it can be open while hardware vendors pay me to optimize for their hardware. For how, being closed gives me more options. It's a lot easier to open in the future than to close, so it's just keeping options open.

I've thought a lot more about the engineering than any sort of marketing or businesses plan, so I just want to defer those.

SatvikBeri 23 hours ago [-]
This is 7 years old. Julia is a totally different language by now.

As a quick anecdote, in our take-home interview exercise, we usually receive answers in C++ or Julia, and the two fastest answers have been in Julia.

HarHarVeryFunny 23 hours ago [-]
I'd have to guess that this is because of ease of use. C++ lets you get as close to the metal as you choose to, so there is no reason why a C++ solution shouldn't be at least as fast as one written in any other language, and yet ...

Of course it also depends on what additional libaries you are using, especially when it comes to parallel/GPU programming in C++, but easy to believe that Julia out of the box makes it easy to write high performance parallel software.

tialaramex 20 hours ago [-]
> C++ lets you get as close to the metal as you choose to

This only ends up being true (for any language, but it's too often cited for C++) in a pretty useless Turing Tarpit sort of sense.

So it's not "no reason" it's just sometimes impractical to solve some problems as well in C++ as in a language that was better suited.

Now people do do impractical things sometimes. It's not very practical to swim across the English channel, but people do it. It's not very practical to climb Mt Everest, but loads of people do that for some reason. Going to the moon wasn't practical but the Americans decided to do it anyway. But the reason even the Americans stopped going for a long time is that actually "that was too hard and I don't want to" is in fact a reason.

sheepscreek 10 hours ago [-]
Drawing from the analogies, what’s the Julia equivalent of them?
SatvikBeri 18 hours ago [-]
Yes, with unlimited development time I would expect C++ solutions to be as fast or faster. But Julia hits a really nice combination of development speed and performance that I haven't found in other languages, at least for number crunching and data pipelines.
d_tr 23 hours ago [-]
> This is 7 years old.

Yeah, I actually totally forgot to check the date...

neutrinobro 23 hours ago [-]
Hardly seems worth the effort, perhaps things have improved since 2019. It would be interesting to see an updated benchmark, but if your going to end up with code that looks like C++ to get proper performance, you might as well write it in C++. My biggest problem with Julia is that they decided to use column-major indexing for multi-dimensional arrays (i.e. FORTRAN/MATLAB style). This makes interoperability with C/C++ and python numpy a real pain, since you can't do zero-copy array sharing between the two without one side being forced into strided-access. For that reason alone I haven't adopted it in any of my work-flows.
adrian_b 17 hours ago [-]
Actually the column-major order of Fortran is more efficient for some linear algebra operations than the order of C, which has been inherited by many modern languages that do not care about high performance in scientific computations.

So I would say that the culprit for interoperability is C and its descendants, not Fortran or Julia. The designers of C and of the languages that have imitated C have not given any thought about which order for multi-dimensional arrays is better, so the users of such languages do not have any right to blame for interoperability other languages that have done the right thing. Even if the Fortran order had not been better, it had already been used for 20 years before C, so there was no reason to choose a different order.

C has chosen to store arrays in the order in which they are typically read by humans when written on paper, but this is a choice like the choice between big-endian and little-endian, where big-endian was how Europeans wrote numbers, but little-endian is more efficient on computers.

An example of why column-major order is preferable, is the matrix-vector product, i.e. the evaluation of a function that maps linear spaces.

The matrix-vector product should not be done as it is typically taught in schools, by scalar products of rows of the matrix with the vector, because this is less efficient, by making more memory accesses. The right way to compute a matrix-vector product is by doing AXPY operations between columns of the matrix and the vector operand (segments of the output of the AXPY operations are held in registers until all partial AXPY operations are accumulated, avoiding memory accesses). In this case, you need to read columns of the input matrix for each AXPY operation, which is much more efficient when the elements of a column are stored compactly in memory, avoiding the need of strided accesses.

The same thing happens for matrix-matrix products, which must not be done in the naive way taught in schools, by scalar products of rows of the first matrix with columns of the second matrix, but it must be done by tensor products of columns of the first matrix with rows of the second matrix.

ilayn 2 hours ago [-]
> Actually the column-major order of Fortran is more efficient for some linear algebra operations than the order of C, which has been inherited by many modern languages that do not care about high performance in scientific computations.

This is a plausible assumption to make but unfortunately it is not true at large. Especially when the traditional sizes are exceeded say n >= 2000 certain operations such as LU can be improved in terms of performance with C-major arrays. However the correct statement is you lose at some place you win at other. There are certainly linalg operations that F-major can give you more performance. However it is also true for C-major layout.

In your example matrix vector product or any BLAS2 or BLAS3 level operations you can also swap out the for loop order to convert things around (row*col buffer multiplication vs sum of weighted column sum interpretation). In particular matrix norm operations are the only exceptions (abs column sum, row abs sum etc.) that certain norms prefer certain orders. In fact if you go into the Goto method deep enough you'll see that internal order is a bit like Morton ordering to fit things into L1 Cache.

The reason why column-major is preferred is historical and requires more surgery to get it running with C-major ordering. Trust me I tried but it's too much work to gain not so much. Maybe someday when I retire I can attempt it. Hence I kept it column major in my retranslation of LAPACK https://github.com/ilayn/semicolon-lapack

Instead I implemented a "high"-performance AVX2 matrix transpose operation so that swapping the memory layout is trivial compared to the linalg cost.

csvance 19 hours ago [-]
Just reverse the axis on one side, typically the Julia side. This is the convention used in Lux.jl/Flux.jl. I share memory between the two with zero additional copying for my workflows on a daily basis. If you are really allergic to doing this, I’m sure it’s possible to use metaprogramming / the type system to write it the same way in both places with zero performance overhead.
brabel 24 hours ago [-]
> code that is uglier and still much slower than C++.

Oh such a shame indeed! They didn’t even manage to produce better looking code at least?? Julia was looking great in 2019 but it was very buggy still so I stopped looking. Had hopes that by now it would be a good choice over C++ and Rust with similar performance.

cmrdporcupine 23 hours ago [-]
There's simply no way it'd ever have similar performance to those. It's not possible.

I have always seen it as a potential alternative to Java, and definitely better than Python.

My experience working in it professionally was that it was... fine. But the GC in it was not good under load and not competitive with Java's.

csvance 22 hours ago [-]
From the sound of your post I'm guessing you view Julia as a general purpose language. I'd consider it general purpose insofar as the application leans into fast numerical computing, everyone else secondary. It can do most of the things other languages do reasonably well, but that's not why you would pick Julia for a project over say Java. You pick it because you want to write fast numerical code and express it elegantly. All of the other typical "glue" things you need to ship a product are secondary to that, but good enough to get the job done.

The key to performance with the GC in Julia is not allocating, but it has gotten substantially better since 2019.

2ndorderthought 23 hours ago [-]
How hard was it to maintain a large Julia code base rather then say an OOP or Rust one? It has an interesting paradigm. I feel like it could get really messy
andyferris 22 hours ago [-]
Personally I never struggled. You can employ interfaces and maintain them judiciously.

But interfaces are informal. Not using a monorepo say makes it harder to be sure if your broke downstream or not (via downstream’s unit tests).

But freedom from Rust’s orphan rule etc means you can decompose large code into fragments easily, while getting almost Zig-style specialisation yet the ease of use of python (for consumers). I would say this takes a fair bit of skill to wield safely/in a maintainable fashion though, and many packages (including my own) are not extremely mature.

cmrdporcupine 22 hours ago [-]
I personally think it requires discipline, I saw it go both ways.

I was never an expert in the language, but worked along people who were and they generally made nice code.

But there were a few places where I saw intensely confusing patterns from overloading with multimethods. Code that became hard to follow, and had poor encapsulation.

drnick1 19 hours ago [-]
Came here to say that. It's just easier to write C++ in the first place, and LLMs now make this easier than ever.
2ndorderthought 24 hours ago [-]
I don't get the appeal. It's like a. OSS Matlab but all contributions are used directly so the language developers can make money for a parent company? Most OSS languages aren't run that way. Seems kind of scammy
KenoFischer 21 hours ago [-]
It always amuses me when people assume that the nefarious scheme is taking open source contributions and selling them. That's not the nefarious scheme. The nefarious scheme is going to partners, funding agencies and investors and saying "look at this unique capability / important research / profitable business opportunity that we can do together, but oops, all of our code is written in Julia, so I guess we better pay some people to maintain it so it'll all come crashing down, wouldn't want that to happen".

Also, I'm of course using nefarious in jest here in both cases. While we don't directly try to monetize our open source work, I respect that sometimes people need to do that. As long as people are transparent about it, I don't have a problem. Doing the thing we're doing seems to work, but it's a lot harder, because you have to build a successful pice of software and a (or multiple) successful something elses that has a critical dependency on it. It's like hitting the lottery twice.

2ndorderthought 21 hours ago [-]
I wouldn't say nefarious, but I don't know how I feel about the power structure. I could see it being very much a one way venture for most participants. I'd have to think about it before actually using the language.
csvance 21 hours ago [-]
Your baseline for comparison is a company that doesn't give anything away for free?

Also, contributing in open source is a choice, not a mandate. I greatly benefit from Julia and its ecosystem so I chose to contribute back some of my work, no one forced me. I chose the MIT license because I want other people to be able to make money with it, just like I make money with other peoples MIT licensed stuff.

postflopclarity 22 hours ago [-]
the parent company is a consumer of Julia, and has no formal role in oversight or governance; they are of course invested in the success and performance of the language, but so are all other users!
2ndorderthought 22 hours ago [-]
Seems kind of contradictory with the other comment which states that they decide what features are prioritized. I guess not because it could be an informal process.

It's interesting. I like the more opaque approach rust takes. Rust has its own issues but it seems less corporately motivated. Maybe that's why it has more corporations using it? You aren't going to end up with the core maintainers to the language rug pulling packages or language features to slow down competition who are also using the tool. I say competition because it looks like they are making money through consultancies and very broad applications of the niche language.

Weird stuff to have to think about. I just want to write code

postflopclarity 15 hours ago [-]
> they decide what features are prioritized

this is not true; the other comment is wrong. there is no central body at all that "decides" what features are prioritized. features are simply worked on by whomever has the capacity, ability, and desire to do so.

many engineers at JuliaHub have all three of the capacity, ability, and desire to work on certain features because JuliaHub, in its capacity as a private business, pays them to do so. but with respect to Julia the programming language these are "just" third party contributions like any other.

2ndorderthought 14 hours ago [-]
So when I was googling it I was seeing a few other corporate activities seemingly coming from the other major contributors of the language outside of Julia hub? It looks like pumas AI has a few of the same people as Julia hub. Or am I misunderstanding the situation?

From a quick Google search it looked kind of like a bunch of MIT staff/professors(?) are getting students to churn out code for a variety of business interests. Just doesn't seem right in the surface and does make me wonder about what other things happen knowing what I know about human behavior.

I am personally not interested that's for sure. Thanks for sharing your experiences though.

kelipso 20 hours ago [-]
> I like the more opaque approach rust takes. Rust has its own issues but it seems less corporately motivated. Maybe that's why it has more corporations using it?

I don’t if these are contradictory exactly but it seems to come from a very cluttered space.

andyferris 22 hours ago [-]
Meh, I’ve never been associated with the company and AFAICT they provide value through platforms for enterprises. Not everyone gets OSS sponsorships to fund team (and using a social media presence to achieve this was a post-Julia phenomenon).

It’s nothing like Google-the-ad-company influencing Chrome. The company consumes Julia for products to sell, rather. Maybe this affects the ordering of features landing, but… meh.

Syzygies 17 hours ago [-]
Julia is reasonably fast. I returned to a language comparison project specific to my math research, to see how I might do better. My agents and I studied the advice in the post and various more recent links from the comments, but we were already mostly on target and nothing left moved the needle.

My work is more combinatorial. Julia does excel at numerical computation. There's a tribal divide in math between people who can't go 30 seconds away from the real or complex numbers, and those whose tolerance is about that long. I try to keep an open mind, but I'm closer to the second camp. Julia is good enough to consider either way.

A development in recent months, AI can assist in general purpose Lean 4 programming, no longer getting confused by the dominant proof-oriented training corpus. If one is a functional programmer who believes that Haskell was on the right track, then Lean is the most interesting language choice for shaping one's thoughts. Benchmarks are inherently misleading if a better language makes it possible to express algorithms out of reach of more primitive languages.

https://github.com/Syzygies/Compare

            C++  100    13.08s  ±0.08s
           Rust   99    13.16s  ±0.02s
          Julia   90    14.54s  ±0.01s
             F#   90    14.54s  ±0.04s
  Kotlin-native   88    14.79s  ±0.01s
         Kotlin   86    15.18s  ±0.01s
          Scala   79    16.50s  ±0.08s
   Scala-native   76    17.14s  ±0.02s
            Nim   65    20.17s  ±0.01s
          Swift   64    20.54s  ±0.04s
          Ocaml   52    25.38s  ±0.04s
           Chez   49    26.64s  ±0.02s
        Haskell   37    34.96s  ±0.06s
           Lean   29    45.39s  ±0.15s
kmaitreys 21 hours ago [-]
I really like Julia as a language but I have struggled to adopt it and be productive in it. Part of it is because of the JIT runtime and a sub-par LSP (at least when I last tried).

To those who regularly write Julia code, what is your workflow? The whole thing with Revise.jl did not suit me honestly. I have enjoyed programming in Rust orders of magnitude more because there's no run time and you can do AOT. My intention is not write scripts, but high performance numerical/scientific code, and with Julia's JIT-based design, rapid iteration (to me at least) feels slower than Rust (!).

jakobnissen 20 hours ago [-]
The boring answer is that I don’t use huge dependencies that takes minutes to compile, and I don’t lean on the LSP - I tend to put more effort in reading the code.

In my experience you really gotta work with the tools the language gives you. Julia gives you Revise, so it’s a bit of a handicap not using it. Maybe analogous to writing Rust without an LSP.

I get that leaning on the LSP can become a habit, and also that the Julia LSP is quite poor, but I find it wild that rapid iteration for you is faster in Rust. I write Rust as well and can’t imagine how that would be the case.

kmaitreys 18 hours ago [-]
A lot of people have focussed on the LSP in their replies when it is was only one of the problems I mentioned.

rust-analyzer is a great LSP and paired with clippy it can teach you the language itself. Also, writing numerical code is extremely easy in Rust. I can write code and just run cargo run to see the output. Julia, on the other hand, forced a REPL-based workflow which never has made sense to me. REPL-based workflow makes sense when you just want to do some script stuff. But when writing a code which will run for a long duration on a HPC? I don't get it. Part of the problem is I'm not "holding it correctly", but again, out of the box experience isn't good. You define a struct and later add or remove a field from it. Often you'll get an error because Revise.jl didn't recompile things. It was a sub-par experience and I was hoping to people would share their dev workflow in more detail

leephillips 16 hours ago [-]
And yet Julia is used for large-scale simulations on giant HPC machines and Rust is not.

Recent versions of Revise let you redefine structs in the REPL.

You are not forced to use the REPL, ever. It’s a fantastic convenience, however.

My dev workflow is to write my code in Neovim, sometimes with a REPL attached to the editor to try out code snippets. I don’t need or use LSPs. I do enjoy the Aerial plugin, which pops up an outline of my code for easy navigation.

SatvikBeri 18 hours ago [-]
Well, my workflow uses Revise.jl. I develop either in Jupyter notebooks or in the REPL, prototyping code there and then moving functions to files when they're ready. In that context, rapid iteration is fairly fast.

Nowadays I often use Claude Code, working with a Julia REPL in a tmux or zellij session via send-keys. I'll have it prototype and try to optimize an algorithm there, then create a notebook to "present its results", then I'll take the bits I like and add them to the production codebase.

kmaitreys 18 hours ago [-]
How do you develop a program which will run for longer duration on HPCs. How do you quickly modify struct definitations, how do you define imports (using vs include syntax is so confusing!)

REPL-based workflow doesn't make sense to me other than scripting work.

SatvikBeri 17 hours ago [-]
Re: REPL use, you just use it to run code and look at results. e.g. for TDD – you can modify your code files normally in the IDE, changes get picked up by revise, and then you re-run the tests in the REPL.

For long-running jobs, I basically follow the same process as in any other language: make the functions I want to run, test them locally on a small dataset that runs relatively quickly, then launch them on the remote machines with the full data.

Revise.jl has struct redefinition now, but before that I would just use NamedTuples while iterating, then make a struct when I was ready to move something to production.

`using` is for importing modules, `include` is for specific files. At work, we currently have a monorepo, with one top-level OurProject.jl file that uses `using` to import external packages, and `include` for all the internal files.

adgjlsfhk1 17 hours ago [-]
> How do you develop a program which will run for longer duration on HPCs.

The main strategy is to have a way of parameterize the program to bring the runtime down to seconds-minutes on a laptop. E.G. for PDEs, you may be running the HPC version on a giant mesh, but you can run the same algorithm on your local computer on a much coarser mesh.

> How do you quickly modify struct definitations

Thankfully on 1.12 this has been solved. You can redefine structs while keeping the REPL up.

> how do you define imports (using vs include syntax is so confusing!)

Yeah julia messed this up. The basic rule is that include and using are basically the same.

arbitrandomuser 21 hours ago [-]
yup the LSP is bad, there is a new lsp being rewritten based on JET.jl a static code analyzer , this should be faster than the old lsp which kind of runs by loading all the modules into a julia instance and queries it for symbols and docs ( im not 100% sure but i think thats how it works)
thimotedupuch 19 hours ago [-]
Exactly ! The new LSP is getting ready https://github.com/aviatesk/JETLS.jl/ with one of the compiler devs working hard on it. I tried it with VSCode, Zed and Helix and it's more than fine already.

I hope julia developper tools will one day match the best of what other programming languages have to offer.

tombert 19 hours ago [-]
Just an FYI...Claude is actually really good at building LSP servers [1].

If you want a better Julia LSP, you might just be able to get Claude or Codex to build one for you. I've been impressed with the TLA+ bindings it generated.

[1] https://github.com/Tombert/TLA-Language-Server-Protocol

paddim8 18 hours ago [-]
What's the problem with the JIT runtime? Why is rapid iteration slower with JIT? Just-in-time compilation isn't inherently slower and is normally faster than AOT for dynamic languages and even static languages that have some dynamic features like dynamic dispatch
lelanthran 19 hours ago [-]
> Part of it is because of the JIT runtime and a sub-par LSP (at least when I last tried)

Good LSPs do the autocompletion, sub par ones don't.

Is it really such a good idea to have every single automated aid turned on when picking up a new language?

How will you learn if you cannot get feedback on what you did wrong?

I mean, until you learn multiplication, maybe don't use the calculator.

Once you learn it then you get a small speed increase, but if you are new to something, LSP autocompletion is going to slow down your learning.

kmaitreys 18 hours ago [-]
I think LSPs like rust-analyzer are very good tools to learn the language itself. I think I learnt Rust solely through LSP and clippy.
mgkuhn 18 hours ago [-]
I'm always surprised when people describe Julia syntax as "Pythonic": Julia's syntax was clearly inspired by MATLAB rather than Python.

And that's a good thing, because Python+NumPy syntax is far more cumbersome than either Julia or MATLAB's.

You can see this at a glance from this nice trilingual cheat sheet:

https://cheatsheets.quantecon.org/

SatvikBeri 18 hours ago [-]
It's definitely closer to matlab than python, but it's closer to python than most mainstream programming languages. I ported ~20k lines of python code to Julia over a couple years manually, and for the most part could do line-by-line translations that worked (but weren't necessarily performant until I profiled and switched to using Julia idioms.)
ForceBru 1 days ago [-]
orthogonal_cube 21 hours ago [-]
Dang, haven’t read much on Julia as of late. I remember using it for a CS 300-level course around 2016 when learning about tokenizing and parsing as part of language fundamentals. Julia has undoubtedly made some significant performance improvements since then. Would love to see a follow-up that explores what, if anything, from this still holds true and what improvements can be made.
mgkuhn 18 hours ago [-]
Note that this article is about Julia 1.0.3, whereas today you should consider as obsolete any experience reports involving Julia versions prior to Julia 1.10 (the current LTS version), the most significant milestone in the maturity and usability of the language.
FattiMei 24 hours ago [-]
Very interesting post and I think this exposes the limitations of the Julia compiler. Note that an old version of the compiler is used (1.0.3 from 2019).

One could say that we can almost replicate the semantic of a C++ program, but writing in Julia. For example we can remove bounds checks in arrays or remove hidden memory allocations.

But the goal of a language for numerical computing is capturing the mathematical formulas using high level constructs closer to the original representation while compiling to efficient code.

Domain scientists want to play with the math and the formulas, not doing common subexpression elimination in their programs. Just curious to see how it evolves

northzen 24 hours ago [-]
I think the best compromise would be to get the best of two words. By default perform bound checks, but have a compiler flag which skips it. Might broke many programs written with default behaviour in mind, but allow perform additional optimizations.
postflopclarity 22 hours ago [-]
this is exactly what julia does. boundschecks are default on, and there are compiler flags --- either locally, via the `@inbounds` macro, or globally with `--check-bounds=no`--- to disable them
vivzkestrel 9 hours ago [-]
- why are all the newer posts on page 1 and page 2 under blog empty? I mean I literally only see the title

- not a single post has anything inside here https://flow.byu.edu/posts/

kasperset 21 hours ago [-]
I wonder how Mojo ranks along with Julia. Mojo was discussed yesterday here. Mojo seems to be more python focused while Julia is very much focused on Scientific computation. I may be wrong.
ekjhgkejhgk 21 hours ago [-]
Phew. 7-year old post about a 10-year old language. Triggers all the LLMs posting empty generic response "Very interesting, exposes limitations...".

Prelude of what's to come in the self-reinforcing cycle of machines talking to machines and drowning everything else.

kelipso 20 hours ago [-]
It's a very predictable pattern I swear. Thought it was a mostly reddit thing but dead internet theory looking more and more real even here.
slwvx 4 days ago [-]
From 2019
1 days ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 11:46:21 GMT+0000 (Coordinated Universal Time) with Vercel.