A directory over SSH can be your git server. If your CI isn't too complex, a post-receive hook looping into Docker can be enough. I wrote up about self hosting git and builds a few weeks ago[1].
There are heavier solutions, but even setting something like this up as a backstop might be useful. If your blog is being hammered by ChatGPT traffic, spare a thought for Github. I can only imagine their traffic has ballooned phenomenally.
Doesn't post-receive block the push operation and get cancelled when you cancel the push?
qudat 24 hours ago [-]
I use https://pipe.pico.sh for this use case. It’s a pubsub over ssh. It’s multicast so you can have multiple listeners on the same topic, and you can have it block or not block the event.
duggan 1 days ago [-]
It does, you're just running a command over ssh, so if you've a particularly long build then something more involved may make more sense.
VorpalWay 24 hours ago [-]
Most builds take a long time, at least in C++ and Rust (the two languages I work in). And from what I have seen of people working in Python, the builds aren't fast there either (far faster of course, but still easily a minute or two).
Also, how would PRs and code review be handled?
Your suggestion really only makes sense for a small single developer hobby project in an interpreted language. Which, if that is what you intended, fair enough. But there really wasn't enough context to ascertain that.
duggan 24 hours ago [-]
I did give additional context in the blog post I linked, but yes, to be clear, this is something that will really work best for small projects with reasonably fast build cycles.
If you're already at the point where you're fielding pull requests, lots of long running tests, etc., you'll probably already know you need more than git over ssh.
23 hours ago [-]
TacticalCoder 23 hours ago [-]
> The origin of a git repo is more or less just the contents of the .git directory in a remote location. That's it. You don't even need to run a git server if you're happy enough using ssh for transport.
Yeah. You probably do want to make sure you turn your .git/ into a "bare" git repository but that's basically it.
And it's what I do too: an OCI container that gives me access to all my private Git repos (it sets up SSH with U2F so I get to use my Yubikey to push/pull from various machines to those Git repos).
terminalbraid 1 days ago [-]
I would prefer we have posts when github is not having issues to cut down on noise.
OtomotO 23 hours ago [-]
Yeah, right. I mean, I'm so happy that only one of my clients is using GitHub as their GitForge. Every single other one hosts their own GitForge. And I can't state how much better every single other GitForge is.
GitHub was the pinnacle of GitForge a couple of years back, and it seems like they wanted to hit a wall.
Otherwise, you cannot explain how you can enshittify a software that much.
heliumtera 23 hours ago [-]
There was GitHub, and then it was Microsoft.
kevwil 23 hours ago [-]
Microsoft trying to run a Ruby on Rails + SSH + Git system.
OtomotO 23 hours ago [-]
Exactly. But given all the slop, not even just AI slop, I wonder how Microsoft can still be in business.
pothamk 1 days ago [-]
What’s interesting about outages like this is how many things depend on GitHub now beyond just git hosting.
CI pipelines, package registries, release automation, deployment triggers, webhooks — a lot of infrastructure quietly assumes GitHub is always available.
When GitHub degrades, the blast radius is surprisingly large because it breaks entire build and release chains, not just repo browsing.
littlestymaar 1 days ago [-]
> a lot of infrastructure quietly assumes GitHub is always available
Which is really baffling when talking about a service that has at least weekly hicups even when it's not a complete outage.
Part of it is probably historical momentum.
GitHub started as “just git hosting,” so a lot of tooling gradually grew around it over the years — Actions, package registries, webhooks, release automation, etc. Once teams start wiring all those pieces together, replacing or decoupling them becomes surprisingly hard, even if everyone knows it’s a single point of failure.
ryandrake 1 days ago [-]
Insert the standard comment about how git doesn't even need a hub. The whole point of it is that it's distributed and doesn't need to be "hosted" anywhere. You can push or pull from any repo on anyone's machine. Shouldn't everyone just treat GitHub as an online backup? Zero reason it being down should block development.
anon7000 1 days ago [-]
The problem is that any kind of automatic code change process like CI, PRs, code review, deployments, etc etc are based on having a central git server. Even security may be based on SSO roles synced to GH allowing access to certain repos.
A self-hosted git server is trivial. Making sure everything built on top of that is able to fallback to that is not. Especially when GH has so many integrations out of the box
2001zhaozhao 1 days ago [-]
Forgejo has all of the features you mentioned and is completely open source!
anon7000 13 hours ago [-]
That’s awesome, but now we’re talking about moving a big enterprise install with loads of hooks connected to GH and hundreds of repos. Not an easy project.
zthrowaway 1 days ago [-]
Microslop ruins everything it touches.
shykes 1 days ago [-]
In moments like this, it's useful to have a "break glass" mode in your CI tooling: a way to run a production CI pipeline from scratch, when your production CI infrastructure is down. Otherwise, if your CI downtime coincides with other production downtime, you might find yourself with a "bricked" platform. I've seen it happen and it is not fun.
It can be a pain to setup a break-glass, especially if you have a lot of legacy CI cruft to deal with. But it pays off in spades during outages.
I'm biased because we (dagger.io) provide tooling that makes this break-glass setup easier, by decoupling the CI logic from CI infrastructure. But it doesn't matter what tools you use: just make sure you can run a bootstrap CI pipeline from your local machine. You'll thank me later.
nadirollo 1 days ago [-]
This is a must when your systems deal with critical workloads. At Fastly, we process a good chunk of the internet's traffic and can't afford to be "down" while waiting for the CI system to recover in the event of a production outage.
We built a CI platform using dagger.io on top of GH Actions, and the "break glass" pattern was not an afterthought; it was a requirement (and one of the main reasons we chose dagger as the underlying foundation of the platform in the first place)
noplacelikehome 23 hours ago [-]
I would really love to hear more about this, but my cursory search didn't find a write up about it.
I did a PoC of Dagger for an integration and delivery workload and loved the local development experience. Being able to define complex pipelines as a series of composable actions in a language which can be type checked was a great experience, and assembling these into unix-style pipelines felt very natural.
I struggled to go beyond this and into an integration environment, though. Dagger's current caching implementation is very much built around there being a single long-lived node and doesn't scale out well, at least without the undocumented experimental OCI caching implementation. Are you able to share any details on how Fastly operates Dagger?
nadirollo 23 hours ago [-]
We don't have any public posts around our setup (yet), but I think it's time we do. I'll put some time into it and will revert here to link to it.
vmaffet 13 hours ago [-]
Being able to run the exact same pipeline locally and in any CI environment is the most compelling feature of dagger. It frees you from any underlying platform, so you can adapt more easily.
VorpalWay 24 hours ago [-]
At times like this is when I'm so happy I don't work with deploying to a production environment, but rather we release software that (after extensive qualification), customers can install in their environment on their airgapped networks. Using a USB stick to cross the air gap. If we miss a release by a day or thrre, there is enough slack in the process before it goes to the customer that no one will be any the wiser.
Crazy in 2026, but installable software has some pros still, for both the developer and for the customer. And I would personally love if I could do things that way for more things.
ehnto 22 hours ago [-]
I had that revelation for embedded software. After years of live service hosted software, I released an embedded device. It just runs happily, somewhere, who knows, not me.
alex_suzuki 1 days ago [-]
100%. We used to design the pipeline a way that is easily reproducible locally, e.g. doesn’t rely on plugins of the CI runtime. Think build.sh shell script, normally invoked by CI runner but just as easy to run locally.
hinkley 1 days ago [-]
My automation is always an escalation of a run book that has gotten very precise and handles corner cases.
Even if I get the idea of an automation before there’s a run book for it.
SAI_Peregrinus 20 hours ago [-]
I like run scripts. Shell or python scripts that do nothing other than prompt the user with what to do, or which choice to make, and wait for them to hit a key to proceed to the next step. Encode the run book flowchart into an interactive script. Then if a step can be automated, the run book script can directly call that automation. Eventually you may end up with a fully automated script, but even if you don't it can still be a significant help.
hinkley 2 hours ago [-]
Someone gave me that idea about eight years ago and I spent the next several trying to look for a nail for that hammer.
I eventually expanded the one I wrote to include URLs to the right places in Bamboo to do things like disable triggers or start manual deployments. By the time I finished that we were doing 10x as many canary deployments as we had been before, and we’re retiring tech debt way faster because of it. 10/10 would do again.
npm publish will open a web browser for you for passcode entry, and I think I’ll do that next time instead of using cut and paste.
hinkley 1 days ago [-]
It’s a hard sell. I always get blank looks when I suggest it, and often have to work off book to get us there.
I generally recommend that the break glass solution always be pair programmed.
tomwphillips 1 days ago [-]
A while back I think I heard you on a podcast describing these pain points. Experienced them myself; sounded like a compelling solution. I remember Dagger docs being all about AI a year or two ago, and frankly it put me off, but that seems to have gone again. Is your focus back to CI?
shykes 1 days ago [-]
Yes, we are re-focused on CI. We heard loud and clear that we should pick a lane: either a runtime for AI agents, or deterministic CI. We pick CI.
Ironically, this makes Dagger even more relevant in the age of coding agents: the bottleneck increasingly is not the ability to generate code, but to reliably test it end-to-end. So the more we all rely on coding agents to produce code, the more we will need a deterministic testing layer we can trust. That's what Dagger aspires to be.
For reference, a few other HN threads where we discussed this:
Yes, I agree on your assessment. AI means a higher rate of code changes, so you need more robust and fast CI.
dzonga 24 hours ago [-]
I remember the days when it was mostly Gitlab having issues.
Github was super stable - then it got shitty once they switched to React on the frontend instead of the server rendered pages, then Co-pilot stuff
lately I haven't heard them bragging about the Rails Monolith
OtomotO 23 hours ago [-]
The question I ask myself to this day is why they began switching to React. It made no sense at all for me. Like it was a working product, so why would you switch?
I get that new developers might be more familiar with React, but then again, as soon as the trade-offs were apparent, I would've pulled the plug.
But they said: Buckle up, everyone, let's ruin our product!
ralph84 22 hours ago [-]
Promotion-driven development happens at Microsoft just like any other big tech company.
jamesfinlayson 23 hours ago [-]
Hm, I didn't realise they'd moved to React. I remember reading years ago that it used jQuery for the longest time but they put in some effort to move to pure Javascript (maybe using web components).
akoumjian 1 days ago [-]
Is this related to Cloudflare?
I'm getting cf-mitigated: challenge on openai API requests.
codeberg might be a little slower on git cli, but at least it's not becoming a weekly 'URL returned error: 500' situation...
popcornricecake 1 days ago [-]
These days it feels like people have simply forgotten that you could also just have a bare repository on a VPS and use it over ssh.
yoyohello13 1 days ago [-]
Most developers don’t even know git and GitHub are different things…
hrmtst93837 1 days ago [-]
I've found that a bare repo over SSH is the simplest way to keep control and reduce attack surface, especially when you don't need fancy PR workflows. I ran many projects with git init --bare on a Debian VPS, controlled access with authorized_keys and git-shell, and wrote a post-receive hook that runs docker-compose pull and systemctl restart so pushes actually deploy. The tradeoff is you lose built-in PRs, issue tracking, and easy third party CI, so either add gitolite or Gitea for access and a simple web UI, or accept writing hooks, backups, receive.denyNonFastForwards, and scheduled git gc to avoid surprises at 2AM.
mynameisvlad 1 days ago [-]
I mean, this isn't a 'URL returned error: 500' situation for anything that Codeberg provides considering this is an issue with Copilot and Actions.
joecool1029 1 days ago [-]
Except actually it was, that was what my git client was reporting trying to run a pull.
mynameisvlad 1 days ago [-]
I'm going to trust the constant stream of updates from the company itself which shows exactly what went down and came back up rather than a random anecdote.
workethics 1 days ago [-]
I only found this post because I decided to check HN after getting HTTP 500 errors pulling some repos.
iovoid 1 days ago [-]
If you look at the incident details it also claims most services were impacted.
> Git Operations is experiencing degraded availability. We are continuing to investigate.
Recent years have shown this to be the wrong prediction strategy. The reason seems to be an incentive imbalance where there are quite a few reasons for companies to lie (including their own CLAs) and not a lot of repercussions for doing so (everybody competes on lock-in, not on product). Of course, the word-of-mouth approach is also exploitable by dishonest actors, but thus far there doesn’t look to be a lot of exploitation going on, likely because there’s little reason to bother (once again, lock-in is king).
slopinthebag 1 days ago [-]
This seems intelligent, after all companies are incapable of making errors in reporting and also have absolutely no incentive to lie about stuff like that. Those 500 errors others have reported as experiencing must have just been the wind.
Imustaskforhelp 1 days ago [-]
I used to use codeberg 2 years ago. I may have been ahead of my time.
ocdtrekkie 1 days ago [-]
I rarely successfully get Codeberg URLs to load. Which is sad because I actually would very much like to recommend it but I find it unreliable as a source.
That being said, GitHub is Microsoft now, known for that Microsoft 360 uptime.
Imustaskforhelp 1 days ago [-]
I have never had this issue. IIRC Codeberg has a matrix community, they are a non-profit and they would absolutely love to hear your feedback of them. I hope that you can find their matrix community and join it and talk with them
I mean... It's right in the name! It's up for 360 days a year.
IshKebab 1 days ago [-]
I mean... you understand the scale difference right?
cpfohl 1 days ago [-]
I swear this is my fault. I can go weeks without doing infra work. Github does fine, I don't see any hiccups, status page is all green.
But the day comes that I need to tweak a deploy flow, or update our testing infra and about halfway through the task I take the whole thing down. It's gotten to the point where when there's an outage I'm the first person people ask what I'm doing...and it's pretty dang consistent....
aezart 1 days ago [-]
Sounds like my Dad, who used to have an uncanny ability to get stuck in elevators. Even got stuck in one with his claustrophobia therapist.
LollipopYakuza 1 days ago [-]
Plot twist: cpfohl works at Github and actually messes with the infra.
sidewndr46 1 days ago [-]
Second plot twist: cpfohl actually works at Microsoft on Copilot
Related: In FreeBSD we used to talk often about the Wemm Field. Peter Wemm was one of the early FreeBSD developers and responsible for most of the early project server cluster, and hardware had a phenomenal habit of breaking in his vicinity. One notable story I heard involved transporting servers between data centers and hitting a Christmas tree in the middle of a highway... in March.
macintux 1 days ago [-]
At my old job we’d call that Daily bogons (my last name). Didn’t know I was in such illustrious company.
cpfohl 1 days ago [-]
Brilliant. I love it
hmokiguess 1 days ago [-]
You should be promoted to SRE - Schrodinger Reliability Engineer
trigvi 1 days ago [-]
Simple solution: do infra work every few months instead of every few weeks.
Imustaskforhelp 1 days ago [-]
Just let us know in advance when you want to do infra work from now on, alright?
cpfohl 1 days ago [-]
I’ll try. Lemme know if you need a day off too…
Imustaskforhelp 1 days ago [-]
I know a guy who knows a guy who might need a day off haha
And they are gonna give a pizza party if I get them a day off. I am gonna share a slice with ya too.
Doing a github worldwide outage by magical quantum entanglement for a slice of pizza? I think I would take that deal! xD.
RGamma 1 days ago [-]
Surely this would earn you loads of internet street cred.
duckkg5 1 days ago [-]
I would so very much love to see GitHub switch gears from building stuff like Copilot etc and focus on availability
> It’s existential for GitHub to have the ability to scale to meet the demands of AI and Copilot, and Azure is our path forward. W
More existential than going down a few times a week?
coffeebeqn 1 days ago [-]
This is an absurd state they are at! Weekly outages in 2025 and 2026. From developer beloved and very solid to Microslop went faster than I expected
esseph 1 days ago [-]
They may have been Beloved before MS bought them. It takes awhile for technical debt to catch up.
hrmtst93837 1 days ago [-]
I think GitHub shipping Copilot while suffering availability issues is a rational choice because they get more measurable business upside from a flashy AI product than from another uptime graph. In my experience the only things that force engineering orgs to prioritize uptime are public SLOs with enforced error budgets that can halt rollouts, plus solid observability like Prometheus and OpenTelemetry tracing, canary rollouts behind feature flags, multi-region active-active deployments, and regular chaos experiments to surface regressions. If you want them to change, push for public SLOs or pay for an enterprise SLA, otherwise accept that meaningful uptime improvements cost money and will slow down the flashy stuff.
rschiavone 1 days ago [-]
Unless a major out(r)age forces a change of leadership, expect more slop down our throats.
overshard 1 days ago [-]
I've taken to hosting everything critical like this myself on a single system with Docker Compose with regular off premises backups and a restore process that I know works because I test it every 6 months. I can swap from local hosting to a VPS in 30 mins if I need to. It seems like the majority of large services like GitHub have had increasingly annoying downtime while I try to get work done. If you know what you're doing it's a false premise that you'll just have more issues with self hosting. If you don't know what you are doing it's becoming an increasingly good time to learn. I've had 4 years of continuous uptime on my services at this point. I still push to third parties like GitHub as yet another backup and see the occasional 500 and my workflow keeps chugging along. I've gotten old and grumpy and rather just do it myself.
joshrw 1 days ago [-]
Happening very often lately
risyachka 1 days ago [-]
and we all know why
rezonant 1 days ago [-]
Because they're moving it to Azure and doing it far too quickly, not taking care to avoid availability issues
lelanthran 24 hours ago [-]
Could be.
Or could be that the recent 12 months of 100x increase in code and activity is more than they had planned for when they last did capacity planning.
Vibe-coders, many of them here, often boast about the insane amount of KLoC/hour they can generate and merge.
AlexeyBelov 11 hours ago [-]
I've seen this take in another GitHub thread, but are there any stats confirming this? As far as I know a lot of Github stats are publicly available, and can be queried via Clickhouse.
Zanfa 1 days ago [-]
It wasn't the migration to Azure that completely borked their PR UI.
bubblewand 24 hours ago [-]
There may be other problems but as someone who's somehow ended up integrating Git into a service twice in my career without even trying that hard to find a reason (it turns out it's weirdly handy in quite a few situations, god I wish it were implemented as a library and not a pile of Perl and shit, and yes I know about libgit2) and has looked into some of Git's and Gitlab's posts about their architectures over the years though the lens of having fought a few of the same beasts, an Azure migration was very obviously going to make things worse.
risyachka 1 days ago [-]
yeah, ai slop rush
everyone builds off vibes and moves fast! like no, if you are a mature company you don't need to move fast, in fact you need to move slow
the only thing that can kill e.g. github is if they move fast and break things like they do recently
nor0x 1 days ago [-]
> This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
does anyone know where these "detailed root cause analysis" reports are shared? is there maybe an archive?
I really wish Graphite had just gone down the path of better Git hosting and reviewing, instead of trying to charge me $40 a month for an AI reviewer. It would be nice to have a real first class alternative to Github
DauntingPear7 1 days ago [-]
Codeberg?
garciasn 1 days ago [-]
How reliable is githubstatus.com? I know that status pages are generally not updated until Leadership and/or PR has a chance to approve the changes; is that the case here?
Our health check checks against githubstatus.com to verify 'why' there may be a GHA failure and reports it, e.g.
Cannot run: repo clone failed — GitHub is reporting issues (Partial System Outage: 'Incident with Copilot and Actions'). No cached manifests available.
But, if it's not updated, we get more generic responses. Are there better ways that you all employ (other than to not use GHA, you silly haters :-))
duckkg5 1 days ago [-]
Right now the page says Copilot and Actions are affected but I can't even push anything to a repo from the CLI.
alemanek 1 days ago [-]
Yep getting 500 errors intermittently on fetch and checkout operations in my CI pretty consistently at the moment. Like 1 in 2 attempts
jjice 1 days ago [-]
Agreed. I believe that's marked under "Git Operations" and it's all green. Just began being able to push again a minute ago.
littlestymaar 1 days ago [-]
In many companies I worked for, there were a bunch of infrastructure astronauts who made everything very complicated in the name of zero downtime and sold them to management as “downtime would kill pur credibility and our businesses ”, and then you have billion dollar companies everyone relies on (GitHub, Cloudflare) who have repeated downtime yet it doesn't seem to affect their business in any way.
wiether 1 days ago [-]
It's a multitude of factors but basically they can act like that because they are dominant on the market.
The classic "nobody ever gets fired for buying IBM".
If you pick something else, and there's issue, people will complain about your choice being wrong, should have gone with the biggest player.
Even if you provide metrics showing your solution's downtime being 1% of the big player.
Something like Cloudflare is so big and ubiquitous, that, when there's a downtime, even your grandma is aware of it because they talk about it in the news.
So nobody will put the blame on the person choosing Cloudflare.
Even if people decides to go back (I had a few customers asking us to migrate to other solutions or to build some kind of failover after the last Cloudflare incidents), it costs so much to find the solutions that can replace it with the same service level and to do the migration, that, in the end, they prefer to eat the cost of the downtimes.
Meanwhile, if you're a regular player in a very competitive market, yes, every downtime will result in lost income, customers leaving... which can hurt quite a lot when you don't have hundreds of thousands of customers.
bonesss 1 days ago [-]
Businesses are incommensurate.
GitHub is a distributed version control storage hub with additional add-on features. If peeps can’t work around a git server/hub being down and don’t know to have independent reproducible builds or integrations and aren’t using project software wildly better that GitHubs’, there are issues. And for how much money? A few hundred per dev per year? Forget total revenue, the billions, the entire thing is a pile of ‘suck it up, buttercup’ with ToS to match.
In contrast, I’ve been working for a private company selling patient-touching healthcare solutions and we all would have committed seppuku with outages like this. Yeah, zero downtime or as close to it as possible even if it means fixing MS bugs before they do. Fines, deaths, and public embarrassment were potential results of downtime.
All investments become smart or dumb depending on context. If management agrees that downtime would be lethal my prejudice would be to believe them since they know the contracts and sales perspective. If ‘they crashed that one time’ stops all sales, the 0% revenue makes being 30% faster than those astronauts irrelevant.
Krutonium 1 days ago [-]
To be fair - it SUPER does. Being down frequently makes your competition look better.
Of course, once you have the momentum it doesn't matter nearly as much, at least for a while. If it happens too much though, people will start looking for alternatives.
The key to remember is Momentum is hard to redirect, but with enough force (reasons), it will.
littlestymaar 16 hours ago [-]
Few companies (and none of the companies I worked for) are “momentum”-based. The typical company grows because incoming cash flow allows to hire more salespeople and develop new features attracting new kinds of customers.
If people tolerate 10 monthly github failures, they can most likely tolerate one hypothetical hour of downtime from one physical server failure for some random Saas product you're selling to them.
baggy_trough 1 days ago [-]
The reality is that consumers don't really care about downtime unless it's truly frequent.
Maybe we should turn these weekly posts into an actionable item we can use to move organizations away from this critical infrastructure that is failing in realtime.
sharksandwich 24 hours ago [-]
Microsoft acquiring Github increasing rhymes with Salesforce's acquisition of Heroku. What a shame.
granzymes 1 days ago [-]
I have a bug bash in an hour and fixes that need to go in beforehand. So of course GitHub is down.
banga 1 days ago [-]
Only on days with a "y"...
yoyohello13 1 days ago [-]
How many 9s is GitHub at now? 2?
jsheard 1 days ago [-]
If you count every service together, it's deep into one nine.
that's....gobsmacking...I knew it was memeably bad but I had no idea it was going so badly
modeless 1 days ago [-]
90 day non-degraded uptime of Github Actions is 98.8% if the official numbers can be believed
jandrese 1 days ago [-]
They're going to have to start advertising nine fives of reliability.
whateveracct 1 days ago [-]
they were down to a low 1 nine recently
Imustaskforhelp 1 days ago [-]
Github service has a better work life balance than many engineers here...
Octocat (The OG github mascot) has a family that it goes to the park with anytime he wants.
Luckily his boss Microslop, is busy with destroying windows of his house and banning people from its discord server.
cyberax 1 days ago [-]
You know that it's bad when the status page doesn't have the availability stats anymore.
xannabxlle 23 hours ago [-]
I was wondering why my Github Pages wasn't working
1 days ago [-]
delduca 1 days ago [-]
Microslop is farting too hard on vibecoding
paddy_m 1 days ago [-]
I am getting really tired of github. outages happen that's a given. but on so much stuff they don't even care or try. Github is becoming the bottleneck in my agentic coding workflows. unless I make Claude do it intelligently, I hit rate limits checking on CI jobs (5000 api requests in an hour). Depot makes their CI so much better, but it is still tied to github in a couple of annoying places.
PRs are a defacto communication and coordination bus between different code review tools, its all a mess.
LLMs make it worse because I'm pushing more code to github than ever before, and it just isn't setup to deal with this type of workload when it is working well.
lelanthran 24 hours ago [-]
> I am getting really tired of github. outages happen that's a given. but on so much stuff they don't even care or try. Github is becoming the bottleneck in my agentic coding workflows. unless I make Claude do it intelligently, I hit rate limits checking on CI jobs (5000 api requests in an hour). Depot makes their CI so much better, but it is still tied to github in a couple of annoying places.
Have you ever considered that this is the problem? GH never planned for this sort of pointless and unpaid activity before. Now they have a large increase (I've seen figures of 100x) in activity and they can't keep up.
It doesn't help that almost none of the added activity is actually useful; it's just thousands and thousands of clones of some other pointless product.
m_w_ 1 days ago [-]
Seems like the xkcd [1] for internet infrastructure that was posted earlier [2] should have github somewhere on it, even if just for how often it breaks. Maybe it falls under "whatever microsoft is doing"
Lowendtalk providers who take 7$ per year deals can provide more reliability than Github at this moment and I am not kidding.
If anyone is using Github professionally and pays for github actions or any github product, respectfully, why?
You can switch to a VPS provider and self host gitea/forejo in less time than you might think and pay a fraction of a fraction than you might pay now.
The point becomes more moot because github is used by developers and devs are so so much more likely to be able to spin up a vps and run forejo and run terminal. I don't quite understand the point.
There are ways to run github actions in forejo as well iirc even on locally hosted which uses https://github.com/nektos/act under the hood.
People, the time where you spent hundreds of thousands of dollars and expected basic service and no service outage issues is over.
What you are gonna get is service outage issues and lock-ins. Also, your open source project is getting trained on by the parent company of the said git provider.
PS: But if you do end up using Gitea/forejo. Please donate to Codeberg/forejo/gitea (Gitea is a company tho whereas Codeberg is non profit). I think that donating 1k$ to Codeberg would be infinitely better than paying 10k$ or 100k$ worth to Github.
esafak 1 days ago [-]
I spent hours trying to figure out what was wrong with GHCR, &^$% Github.
I'm on the lookout for an alternative, this really is not acceptable.
rvz 1 days ago [-]
So Tay.ai and Zoe are still wrecking GitHub infrastructure.
This is very worrying if their mandate doesn't include quality control.
xeonmc 1 days ago [-]
Maybe they mandated to use AI for quality control?
esseph 1 days ago [-]
Wasn't QC fired a decade ago in most companies?
khaledh 1 days ago [-]
I figured that it would be something like that. But it's been so frequent that I expect the leadership to act decisively towards a long-term reliability plan. Unfortunately they have near monopoly in this space, so I guess there's not enough incentive to fix the situation.
gobalini 1 days ago [-]
How frequent? I think the obsession with uptime is annoying. If GitHub is down, if there’s something so critical, then you need some more control of the system. Otherwise take a couple hours and get a coffee or an early lunch.
khaledh 1 days ago [-]
Frequent enough to interrupt the flow of an entire organization, wasting thousands of hours. Take a look:
Yeah that is pretty bad I guess. For decades 99% has been achievable for many orgs. 92% phew.
But “waste” is arguable. If folks have literally nothing to do when GitHub is down, I question that a bit. For example, design, administrative work (everyone has that), lunch. You know?
Critical CI/CD can use Jenkins, but in that case folks might end up with 89% uptime!
khaledh 23 hours ago [-]
> If folks have literally nothing to do when GitHub is down, I question that a bit.
It's not about a single person. I work at a company with over 10k employees, most of them rely on GitHub one way or another. It's not just about PRs and issues; there's a huge amount of automation, workflows, and integrations that depend on GitHub, round the clock. With this kind of uptime it has material impact on productivity of the company as a whole.
drcongo 1 days ago [-]
Does anything running on Azure have an acceptable uptime?
Imustaskforhelp 1 days ago [-]
Are we serious?
nlawalker 1 days ago [-]
The appearance of a thread here is so consistent that HN needs a black-bar style indicator for GH outages that points to it.
Imustaskforhelp 1 days ago [-]
At this point I am thinking of creating a 0 days until github outage website similar to how we had the running joke of 0 days until JS framework dropped.
That site could use a little more. Maybe a count of how many in the current month and year, tallies for each year, maybe even trends. Could be nice. :)
Imustaskforhelp 1 days ago [-]
Too late to create a 0 days since github outage, Too early to create a crypto rugpull about this whole situation.
Born just in time to talk about this situation on hackernews xD (/jk)
There are heavier solutions, but even setting something like this up as a backstop might be useful. If your blog is being hammered by ChatGPT traffic, spare a thought for Github. I can only imagine their traffic has ballooned phenomenally.
1: https://duggan.ie/posts/self-hosting-git-and-builds-without-...
Also, how would PRs and code review be handled?
Your suggestion really only makes sense for a small single developer hobby project in an interpreted language. Which, if that is what you intended, fair enough. But there really wasn't enough context to ascertain that.
If you're already at the point where you're fielding pull requests, lots of long running tests, etc., you'll probably already know you need more than git over ssh.
Yeah. You probably do want to make sure you turn your .git/ into a "bare" git repository but that's basically it.
And it's what I do too: an OCI container that gives me access to all my private Git repos (it sets up SSH with U2F so I get to use my Yubikey to push/pull from various machines to those Git repos).
GitHub was the pinnacle of GitForge a couple of years back, and it seems like they wanted to hit a wall.
Otherwise, you cannot explain how you can enshittify a software that much.
Which is really baffling when talking about a service that has at least weekly hicups even when it's not a complete outage.
There's almost 20 outages listed on HN over the past two months: https://news.ycombinator.com/from?site=githubstatus.com so much for “always available”.
A self-hosted git server is trivial. Making sure everything built on top of that is able to fallback to that is not. Especially when GH has so many integrations out of the box
It can be a pain to setup a break-glass, especially if you have a lot of legacy CI cruft to deal with. But it pays off in spades during outages.
I'm biased because we (dagger.io) provide tooling that makes this break-glass setup easier, by decoupling the CI logic from CI infrastructure. But it doesn't matter what tools you use: just make sure you can run a bootstrap CI pipeline from your local machine. You'll thank me later.
We built a CI platform using dagger.io on top of GH Actions, and the "break glass" pattern was not an afterthought; it was a requirement (and one of the main reasons we chose dagger as the underlying foundation of the platform in the first place)
I did a PoC of Dagger for an integration and delivery workload and loved the local development experience. Being able to define complex pipelines as a series of composable actions in a language which can be type checked was a great experience, and assembling these into unix-style pipelines felt very natural.
I struggled to go beyond this and into an integration environment, though. Dagger's current caching implementation is very much built around there being a single long-lived node and doesn't scale out well, at least without the undocumented experimental OCI caching implementation. Are you able to share any details on how Fastly operates Dagger?
Crazy in 2026, but installable software has some pros still, for both the developer and for the customer. And I would personally love if I could do things that way for more things.
Even if I get the idea of an automation before there’s a run book for it.
I eventually expanded the one I wrote to include URLs to the right places in Bamboo to do things like disable triggers or start manual deployments. By the time I finished that we were doing 10x as many canary deployments as we had been before, and we’re retiring tech debt way faster because of it. 10/10 would do again.
npm publish will open a web browser for you for passcode entry, and I think I’ll do that next time instead of using cut and paste.
I generally recommend that the break glass solution always be pair programmed.
Ironically, this makes Dagger even more relevant in the age of coding agents: the bottleneck increasingly is not the ability to generate code, but to reliably test it end-to-end. So the more we all rely on coding agents to produce code, the more we will need a deterministic testing layer we can trust. That's what Dagger aspires to be.
For reference, a few other HN threads where we discussed this:
- https://news.ycombinator.com/item?id=46734553
- https://news.ycombinator.com/item?id=46268265
Yes, I agree on your assessment. AI means a higher rate of code changes, so you need more robust and fast CI.
Github was super stable - then it got shitty once they switched to React on the frontend instead of the server rendered pages, then Co-pilot stuff
lately I haven't heard them bragging about the Rails Monolith
I get that new developers might be more familiar with React, but then again, as soon as the trade-offs were apparent, I would've pulled the plug.
But they said: Buckle up, everyone, let's ruin our product!
I'm getting cf-mitigated: challenge on openai API requests.
https://www.cloudflarestatus.com/ https://status.openai.com/
> Git Operations is experiencing degraded availability. We are continuing to investigate.
https://www.githubstatus.com/incidents/n07yy1bk6kc4
That being said, GitHub is Microsoft now, known for that Microsoft 360 uptime.
Actually here you go, I have pasted the matrix link to their community, hope it helps https://matrix.to/#/#codeberg-space:matrix.org
I mean... It's right in the name! It's up for 360 days a year.
But the day comes that I need to tweak a deploy flow, or update our testing infra and about halfway through the task I take the whole thing down. It's gotten to the point where when there's an outage I'm the first person people ask what I'm doing...and it's pretty dang consistent....
And they are gonna give a pizza party if I get them a day off. I am gonna share a slice with ya too.
Doing a github worldwide outage by magical quantum entanglement for a slice of pizza? I think I would take that deal! xD.
More existential than going down a few times a week?
Or could be that the recent 12 months of 100x increase in code and activity is more than they had planned for when they last did capacity planning.
Vibe-coders, many of them here, often boast about the insane amount of KLoC/hour they can generate and merge.
everyone builds off vibes and moves fast! like no, if you are a mature company you don't need to move fast, in fact you need to move slow
the only thing that can kill e.g. github is if they move fast and break things like they do recently
does anyone know where these "detailed root cause analysis" reports are shared? is there maybe an archive?
There are also monthly availability reports: https://github.blog/tag/github-availability-report/
Our health check checks against githubstatus.com to verify 'why' there may be a GHA failure and reports it, e.g.
Cannot run: repo clone failed — GitHub is reporting issues (Partial System Outage: 'Incident with Copilot and Actions'). No cached manifests available.
But, if it's not updated, we get more generic responses. Are there better ways that you all employ (other than to not use GHA, you silly haters :-))
The classic "nobody ever gets fired for buying IBM".
If you pick something else, and there's issue, people will complain about your choice being wrong, should have gone with the biggest player.
Even if you provide metrics showing your solution's downtime being 1% of the big player.
Something like Cloudflare is so big and ubiquitous, that, when there's a downtime, even your grandma is aware of it because they talk about it in the news. So nobody will put the blame on the person choosing Cloudflare.
Even if people decides to go back (I had a few customers asking us to migrate to other solutions or to build some kind of failover after the last Cloudflare incidents), it costs so much to find the solutions that can replace it with the same service level and to do the migration, that, in the end, they prefer to eat the cost of the downtimes.
Meanwhile, if you're a regular player in a very competitive market, yes, every downtime will result in lost income, customers leaving... which can hurt quite a lot when you don't have hundreds of thousands of customers.
GitHub is a distributed version control storage hub with additional add-on features. If peeps can’t work around a git server/hub being down and don’t know to have independent reproducible builds or integrations and aren’t using project software wildly better that GitHubs’, there are issues. And for how much money? A few hundred per dev per year? Forget total revenue, the billions, the entire thing is a pile of ‘suck it up, buttercup’ with ToS to match.
In contrast, I’ve been working for a private company selling patient-touching healthcare solutions and we all would have committed seppuku with outages like this. Yeah, zero downtime or as close to it as possible even if it means fixing MS bugs before they do. Fines, deaths, and public embarrassment were potential results of downtime.
All investments become smart or dumb depending on context. If management agrees that downtime would be lethal my prejudice would be to believe them since they know the contracts and sales perspective. If ‘they crashed that one time’ stops all sales, the 0% revenue makes being 30% faster than those astronauts irrelevant.
Of course, once you have the momentum it doesn't matter nearly as much, at least for a while. If it happens too much though, people will start looking for alternatives.
The key to remember is Momentum is hard to redirect, but with enough force (reasons), it will.
If people tolerate 10 monthly github failures, they can most likely tolerate one hypothetical hour of downtime from one physical server failure for some random Saas product you're selling to them.
And the frequency they can tolerate is surprisingly high given that we're talking about the 20th or so outage of 2026 for github. (See: https://news.ycombinator.com/from?site=githubstatus.com)
https://mrshu.github.io/github-statuses/
Most individual services have two nines... but not all of them.
that's....gobsmacking...I knew it was memeably bad but I had no idea it was going so badly
Octocat (The OG github mascot) has a family that it goes to the park with anytime he wants.
Luckily his boss Microslop, is busy with destroying windows of his house and banning people from its discord server.
PRs are a defacto communication and coordination bus between different code review tools, its all a mess.
LLMs make it worse because I'm pushing more code to github than ever before, and it just isn't setup to deal with this type of workload when it is working well.
Have you ever considered that this is the problem? GH never planned for this sort of pointless and unpaid activity before. Now they have a large increase (I've seen figures of 100x) in activity and they can't keep up.
It doesn't help that almost none of the added activity is actually useful; it's just thousands and thousands of clones of some other pointless product.
[1]: https://www.reddit.com/r/ProgrammerHumor/comments/1p204nx/ac... [2]: https://news.ycombinator.com/item?id=47230704
If anyone is using Github professionally and pays for github actions or any github product, respectfully, why?
You can switch to a VPS provider and self host gitea/forejo in less time than you might think and pay a fraction of a fraction than you might pay now.
The point becomes more moot because github is used by developers and devs are so so much more likely to be able to spin up a vps and run forejo and run terminal. I don't quite understand the point.
There are ways to run github actions in forejo as well iirc even on locally hosted which uses https://github.com/nektos/act under the hood.
People, the time where you spent hundreds of thousands of dollars and expected basic service and no service outage issues is over.
What you are gonna get is service outage issues and lock-ins. Also, your open source project is getting trained on by the parent company of the said git provider.
PS: But if you do end up using Gitea/forejo. Please donate to Codeberg/forejo/gitea (Gitea is a company tho whereas Codeberg is non profit). I think that donating 1k$ to Codeberg would be infinitely better than paying 10k$ or 100k$ worth to Github.
I'm on the lookout for an alternative, this really is not acceptable.
Should have self hosted.
https://www.windowscentral.com/microsoft/using-ai-is-no-long...
https://thenewstack.io/github-will-prioritize-migrating-to-a...
https://mrshu.github.io/github-statuses
But “waste” is arguable. If folks have literally nothing to do when GitHub is down, I question that a bit. For example, design, administrative work (everyone has that), lunch. You know?
Critical CI/CD can use Jenkins, but in that case folks might end up with 89% uptime!
It's not about a single person. I work at a company with over 10k employees, most of them rely on GitHub one way or another. It's not just about PRs and issues; there's a huge amount of automation, workflows, and integrations that depend on GitHub, round the clock. With this kind of uptime it has material impact on productivity of the company as a whole.
Born just in time to talk about this situation on hackernews xD (/jk)
> Too slow: https://github-incidents.pages.dev/
I am not even mad that I am slow honestly, this is really funny lol.