I have done extensive research on CDC and it almost never works out because most utilities don't create compressed archives in an "rsyncable" (rsync does CDC) format, I actually saved a lot of storage using restic when I switched my backups of certain things so that files were stored in archives uncompressed, and sorted in a stable order. I know syncthing eventually removed CDC and just went with constant-size block sizes.
Bazel, on the other hand, is completely in control of this, and it makes perfect sense to do this at that point -- and it seems to be a relatively efficient implementation too, really nice to see!
a_t48 1 days ago [-]
This is something I'm very interested in implementing for Docker builds. I've tested out CDC for the final image outputs, it results in smaller outputs but requires tuning between saved space versus request count when pulling. For build cache it might be even more advantageous.
stabbles 16 hours ago [-]
Isn't that rather difficult given the `.tar.gz` layers?
a_t48 12 hours ago [-]
I have a custom pull client/registry/builder that uses a different format, but can output standard OCI if needed.
tracnar 16 hours ago [-]
It also supports .tar but that's probably not very commonly used.
auscompgeek 16 hours ago [-]
In theory eStargz layers should be amenable to CDC.
a_t48 12 hours ago [-]
It feels that way, but eStargz is still only addressable as a single layer, or range of one.
londons_explore 1 days ago [-]
Doesn't this mean that malicious inputs can deliberately cause super tiny or super huge chunks?
rienbdj 20 hours ago [-]
Bazel caches tend to have a size limit.
You need to trust your build execution machine anyway. They have your source code and you will be executing the artifacts that they produce!
ramchip 1 days ago [-]
The same is true without CDC, and you can configure a maximum size.
Rendered at 04:13:50 GMT+0000 (Coordinated Universal Time) with Vercel.
Bazel, on the other hand, is completely in control of this, and it makes perfect sense to do this at that point -- and it seems to be a relatively efficient implementation too, really nice to see!
You need to trust your build execution machine anyway. They have your source code and you will be executing the artifacts that they produce!