Oh, it's even more fun than that. If you sit there hitting F5, sooner or later you will get a proper page load. So some small subset of the servers is vending the correct data, and the rest not
Pr's don't load, issues don't load. Pretty much unusable for dev workflows. I felt like a lot of the hand wringing over GH reliability was a bit dramatic but this one seems pretty major (at least for me) and doesn't seem to even be getting that much coverage.
I think we have given GitHub enough time (more than half a decade) after Microsoft acquired it to sort itself out.
It is now being run into the ground.
At this point their chatbots Tay.ai, Zo, and Copilot are wrecking the platform and there is no CEO of GitHub to complain to about this so it now makes no sense to use GitHub at all. (Especially GitHub Actions)
It is now time to self host and not "centralize everything to GitHub". [0]
For the record, it's failing silently, too, showing e.g. "There aren’t any open pull requests." even though there are dozens. That's pretty bad, this will definitely mislead people.
while external merge queues offer a ton more features, i wouldn't describe any of them as 'perfect' based on the simple fact the UX is bolted on. github continues to display their native UI components for merging, and users are forced to interact via arcane commands in comments or external CLIs/webpages. not ideal!
I was surprised that incident didn’t seem to get as much attention since that was a pretty major data corruption bug, but I guess it was a much smaller scope of impacted repos/customers than a lot of these availability issues?
Wrapping my face in tinfoil: across the board, Amazon, Microsoft, GitHub, Anthropic and OpenAI, I’m seeing a lot of top-level service issues that sound an awful lot like code hitting production that hasn’t been fully tested.
Breaking buttons on the website is one thing, kinda, but Enterprise used to mean a certain degree of robustness and seriousness in product management.
Devs are expected to ship slop 10x faster. The AI tools genuinely help a bit, like maybe 2x, but the 10x "improvement" comes from not thinking about anything else than shipping your assigned features, not testing your code carefully, not getting proper code reviews, not dogfooding your stuff, and releasing carelessly.
Merge queues are not as frequently used… ~2000 PRs affected over 4 hours. I reckon that’s on the order of 10 commits per tenant. It’s a feature with low traction, probably because it creates more problems than it solves.
Yeah I think I've finally had enough. I need to start seriously advocating for alternatives since this is starting to impact our business. It's clearly not getting any better.
Go ahead. We've been self-hosting Gitea with Drone/Woodpecker for years; either it or Forgejo will do fine if you're okay with their feature set. I sometimes wander into these GitHub threads to have a laugh; our Gitea instance has had several minutes of downtime combined over the last few years, all of them planned (to upgrade Gitea) and in the middle of the night.
I struggled with Woodpecker for a bit, but now gitea has Actions that work wonderfully for my use case (and one less tool to support). I believe they also highlight compatibility with a github action protocol of sorts. Might be worth looking into.
Ooh, Woodpecker CI works with Gitea and Forgejo. https://woodpecker-ci.org/ That might be last piece I needed to migrating Git repos from GitHub to a self-hosted forge.
Edit: Actually there's Gitea Actions and Forgejo Actions, that might be enough for my use case.
I’ve found gitea actions (based on ACT, so it’s nearly identical to a GitHub action runner) to work great. Migrating a GitHub workflow is mostly just a file name change.
If you want a GitHub-like UI (with org/repo structure limitations) use either Forgejo or Gitea.
If you want a similar but different experience use GitLab.
If you want something more akin to the kernel experience (i.e. hosting, flexible repository structure, user auth via ssh keys, and a simple web UI) use gitolite with cgit, or alternatively gitweb.
I mean, technically it's a code review platform, not a complete toolbox like Gitlab and co, but damn if it isn't the most professional feeling experience.
I love gitea, and I use it for my homelab, but the permissions system needs a lot of work. There’s still an open bug which doesn’t let anyone but the repo owner read CI logs regardless of settings.
I used Gitea for a while, but eventually switched to gitolite+cgit. That was down to the org/repo structure not fitting my git hierarchy (I'm using a topic/repo, topic/subtopic/repo style structure) and the lack of organization/topic wide issue tracking/management.
It's close in the sense that it's also Jabbascript SPA crap that needs a supercomputer just to (try and fail to) display a diff of a few thousands lines, you mean? We're using it at work and it sucks massively.
I'd rather have the open core one running on my own servers (i.e. GitLab), but performance is a few orders of magnitude away from acceptable for both Git**b.
I run GitLab and its CI on a Xeon server from 2010. It's fine. It runs exactly as fast as anything else on that machine. I've also run it on a tiny AWS instance. Also fine.
I don't like that the idle CPU load is high (for really inane reasons) but it performs perfectly well.
Was thinking the same honestly. GH is very sticky though, especially when you have actions and all kinds of other integrations set up. But it’s just kind of absurd at this point how many outages they have.
An alternate interpretation of that chart is "After the microsoft acquisition, they got serious about actually tracking outages."
That said, anecdotally, it's felt much worse over the last 6 months. I'd guess it's a combination of MS-induced quality drops and AI-induced scale increases.
Well, perhaps not as long ago (e. g. from the acquisition), but if you look at the last four weeks or so, just that part alone, you can clearly see that something is not working here. Microsoft is constantly mentioned on Hacker News and not typically in a great, praising light.
If it’s just the last 4 weeks, then I would say it seems the Microsoft acquisition had little impact on their reliability.
It seems pretty reasonable that the massive surge in AI over the past 6 months has put tremendous strain on GitHub’s infrastructure, and most of these outages are as a result of that one way or another.
They're moving to Azure and had to fix up Azure first to be stable enough for GH to even consider moving.
I'm guessing its a combo of Azure still not being stable enough and a byproduct of trying to move an entire company's operations from a physical DC into a cloud while its running.
Speculation from afar: clouds are not commensurate, and high-volume cloud services are going to anchor key architectural decisions around technical benefits/realities of the cloud environment they target. Moving GitHub isn’t a tech decision, and it’s broadly a Dumb Idea.
I think GitHub is well past the complexity threshold where the reflective architecting that happens during cloud development can’t be separated from product. If the Engineers were begging for Azure it’d be one thing, but otherwise this is destabilizing churn.
I agree Azure needed a lift to even handle the job, and see the that gap as indicative of a more fundamental challenge. That change is kinda like a skeleton transplant… managements feelings and post-surgery desires don’t necessarily account for the impact and essential difficulty.
And when they introduced "Free" for everyone including teams, well I tried to warn everyone that centralizing everything to GitHub was not a good idea [0] 6 years ago.
It was maybe the epitome of the get shit done internet era, and despite AI's proported productivity gains, I actually don't think we've got anywhere close to the velocity, stability, and simplicity of the peak Rails era just coming out of those PHP days. And teams were actually way smaller than they are now even after all these AI cuts!
Yeah, collaboration usually requires some sort of centralisation. Whether that is the LKML+git.kernel.org, gitlab.gnome.org, salsa.debian.org or Sourcehut, or GitHub. At least Sourcehut isn't completely proprietary and shoving AI down your throat at every possible chance. The same can be said for Codeberg and almost any GitLab CE, Gitea or Forgejo instance
If you need to self-host, self-host. Sourcehut is obviously not a replacement for that.
But, if not: It is different because Drew DeVault is scathingly anti-AI, and has a history of sticking to strong opinions (for better or worse). Seems like the best bet for off-premise source control if you are concerned about AI scraping and downtime.
I am once again here to say that my gitea has better uptime since I deployed it. It's way snappier too. Long live self-hosting. Diversify from the cloud, build your own!
Another happy self hosted Gitea user here for ~3 years now.
Came from Gitlab which started pushing out basic users in 2022 with massive price hikes. I weighed Github as an option but was like "no I don't want to be dealing with this same problem in another 5 years" when some other rug pull or degradation happens with that service. So I'm feeling pretty validated for that decision these days.
The speed improvement was massive (super low latency), and was worth the switch on it's own, but we also saved 90% in immediate cost... probably more in secondary effects from the git host just not being a pain point. The only long or unplanned downtime we've had was 2 hours in that whole 3 years where the tiny Linode VPS host had a total hardware failure and got migrated, which is a pretty damn good number of 9s for a simple easy to host single server solution. We also gained more durable and fast offsite backups (zfs) that Gitlab could never offer, but that's more of a custom self hosted thing not specific to Gitea.
Reminder to all OSS projects: it is extraordinarily easy to setup a simple CI job to keep your code in sync between multiple Forges. And getting email notifications from a second Forge is 0 extra effort.
At least give people the option to start moving away from GitHub to contribute to your project. It will, ultimately, be better for the ecosystem.
Syncing is trivial, the CI is the deal. GH actions are the best option still. Neither the FSF nor any other OSS lab didn't come up with a proper CI for us open source maintainers. The CI load also increased massively since.
An increasingly disturbing trend from Github and I only see this getting worse.
I wouldn't rule out them moving away from offering the free tier to stop the all the code pushes. I think new code mostly written by AI isn't that appealing of a data set to train on.
My new projects do not use GitHub, and will not use GitHub as anything more than a mirror. Two nines of reliability isn't enough for devtools.
GitHub is in a tight space right now. The pace of software development is increasing and they are in a load-bearing position. In addition, their GitHub Copilot license was a massive loss-leader both directly costing them money, and making the traffic problem even worse. Simply put, they aren't prioritizing scaling and reliability like they need to be in this current situation and instead are focusing on feature build outs that boil down to being Microsoft's AI Middleman Salesperson.
Their position is hard, but they are potentially fumbling the ball in a big way. I for one don't trust them to not be down right before I want to do a production deploy.
It's crazy that the systems the best designed for decentralization like git, email, and the internet itself wound up being the most centralized with single points of failure.
Been noticing this all day.. various workflows failing in weird ways.. strange UI issues... Literally holding off on our deployment for a day... bad enough it seems like I'm fixing a CI/CD breakage once a month or more.
While I appreciate the sentiment... there was something nice about the social aspects of many/most projects being on GitHub in terms of collaboration. I think there is starting to be a lot of friction for many reasons. I've been seeing more use of issues as spam, not to mention even more nefarious activities making rounds.
Corpos tend to be on Github, where community focused projects are very rapidly shifting to codeberg, which is fully open source and also has commit signing integration that actually works.
Wow, this taking unusually long to fix. I suppose the team trying to fix it hit the Claude session limits and now can't do anything until the end of the cooldown and the only person who knows how to fix it without AI is out for a surgery. When the entire generation of people who knew how to fix shit without using AI will retire, what happens then?
We're barely starting to see AI's impact on infra - probably <1% of what's coming. Repo hosting as it is today won't scale - it needs to be rebuilt from the ground up, starting with basic architecture. And I don't think we should go back to self hosting and sharing patches over email...
(FWIW https://diversion.dev is at 100% uptime. Different scale, obviously, but also we're not Microsoft.)
I suggest testing your website with uBlock (and all its filter lists enabled). All I see is an almost empty page. Don't point to JS or CSS on third-party CDNs because due to the changes in cross-site sharing, neither Chrome nor Firefox will benefit from cacheing.
Man, this SUCKS big time for me. Just a few months ago, $PARENT_CONGLOMERATE mandated all under its benevolent wing to migrate to GitHub for reasons of synergy and efficiency. So now it's my turn at $DAYJOB to be migrating us from our self-hosted Gitlab instance. I already have a few grievances...
- IT policies around GH accounts make no sense. It's a long story but, in short, you can't use any of your pre-existing GH accounts whether personal or professional (as in, an account I made exclusively for $DAYJOB before The Synergy Mandate) and must create a new one aligned with IT conventions.
- We don't monorepo hence we made extensive use of groups. There is no direct mapping for this concept in GitHub so we have to manually namespace projects.
- And now of course GH's no-nines availability :(
For my team, profit happens to be sensitive to our release dates---a day or two of delay can really make the difference if we'll make the month's projections or not. In another world, I would proactively mirror our profit-essential code but it's not worth the risk making a skunkworks guerilla effort. I'd like to think we can blame The Synergy Mandate in a few postmortems in the near future but of course I did not graduate yesterday, I know that's not gonna happen.
Thoughts and prayers we keep hitting our profit projections and they don't axe our product for underperformance.
(Writing this down, I can really feel how this job has changed since I joined.)
It's a testbed for Azure since they started GitHub Actions. Since Azure is mismanaged and doomed https://news.ycombinator.com/item?id=47616242 such outages are normal. I think they hired cheap staff to manually fix overprovisioned boxes on demand. Microsoft Quality at work
https://status.codeberg.org/status/codeberg
https://social.anoxinon.de/@codebergstatus/11647770704799298...
It is now being run into the ground.
At this point their chatbots Tay.ai, Zo, and Copilot are wrecking the platform and there is no CEO of GitHub to complain to about this so it now makes no sense to use GitHub at all. (Especially GitHub Actions)
It is now time to self host and not "centralize everything to GitHub". [0]
[0] https://news.ycombinator.com/item?id=22867803
It’s also massively more performant
https://trunk.io/merge-queue
Breaking buttons on the website is one thing, kinda, but Enterprise used to mean a certain degree of robustness and seriousness in product management.
The bug only affected repos using merge queues AND squash/rebase merging (instead of the default merge commit)
https://mrshu.github.io/github-statuses/
Same for Forgejo.
Edit: Actually there's Gitea Actions and Forgejo Actions, that might be enough for my use case.
https://docs.gitea.com/usage/actions/
https://forgejo.org/docs/next/user/actions/reference/
If you want a similar but different experience use GitLab.
If you want something more akin to the kernel experience (i.e. hosting, flexible repository structure, user auth via ssh keys, and a simple web UI) use gitolite with cgit, or alternatively gitweb.
I mean, technically it's a code review platform, not a complete toolbox like Gitlab and co, but damn if it isn't the most professional feeling experience.
I don't like that the idle CPU load is high (for really inane reasons) but it performs perfectly well.
https://damrnelson.github.io/github-historical-uptime/
I don't think that chart shows what it seems like it shows. There were plenty of pre-2018 outages that don't show up there: https://hn.algolia.com/?dateEnd=1545696000&dateRange=custom&...
An alternate interpretation of that chart is "After the microsoft acquisition, they got serious about actually tracking outages."
That said, anecdotally, it's felt much worse over the last 6 months. I'd guess it's a combination of MS-induced quality drops and AI-induced scale increases.
It seems pretty reasonable that the massive surge in AI over the past 6 months has put tremendous strain on GitHub’s infrastructure, and most of these outages are as a result of that one way or another.
I'm guessing its a combo of Azure still not being stable enough and a byproduct of trying to move an entire company's operations from a physical DC into a cloud while its running.
I think GitHub is well past the complexity threshold where the reflective architecting that happens during cloud development can’t be separated from product. If the Engineers were begging for Azure it’d be one thing, but otherwise this is destabilizing churn.
I agree Azure needed a lift to even handle the job, and see the that gap as indicative of a more fundamental challenge. That change is kinda like a skeleton transplant… managements feelings and post-surgery desires don’t necessarily account for the impact and essential difficulty.
- Switching to Azure
- Adding more AI features
- Using AI more for development
- Higher load caused by AI agents
Three of those are top-down direction from MS.
[0] https://news.ycombinator.com/item?id=22867803
But in the past year or so, it does feel like outages are becoming commonplace.
It’s astonishing how bad their software is now. I guess 20 years of outsourcing and bean-counting will do that
They dropped Ruby on Rails.
Ruby on rails got a bad rap IMO.
It was maybe the epitome of the get shit done internet era, and despite AI's proported productivity gains, I actually don't think we've got anywhere close to the velocity, stability, and simplicity of the peak Rails era just coming out of those PHP days. And teams were actually way smaller than they are now even after all these AI cuts!
Yeah, collaboration usually requires some sort of centralisation. Whether that is the LKML+git.kernel.org, gitlab.gnome.org, salsa.debian.org or Sourcehut, or GitHub. At least Sourcehut isn't completely proprietary and shoving AI down your throat at every possible chance. The same can be said for Codeberg and almost any GitLab CE, Gitea or Forgejo instance
But, if not: It is different because Drew DeVault is scathingly anti-AI, and has a history of sticking to strong opinions (for better or worse). Seems like the best bet for off-premise source control if you are concerned about AI scraping and downtime.
Came from Gitlab which started pushing out basic users in 2022 with massive price hikes. I weighed Github as an option but was like "no I don't want to be dealing with this same problem in another 5 years" when some other rug pull or degradation happens with that service. So I'm feeling pretty validated for that decision these days.
The speed improvement was massive (super low latency), and was worth the switch on it's own, but we also saved 90% in immediate cost... probably more in secondary effects from the git host just not being a pain point. The only long or unplanned downtime we've had was 2 hours in that whole 3 years where the tiny Linode VPS host had a total hardware failure and got migrated, which is a pretty damn good number of 9s for a simple easy to host single server solution. We also gained more durable and fast offsite backups (zfs) that Gitlab could never offer, but that's more of a custom self hosted thing not specific to Gitea.
I don't trust Microsoft's status page. It might be "fine" over all but it definitely is not fine for me.
https://news.ycombinator.com/item?id=45517173
At least give people the option to start moving away from GitHub to contribute to your project. It will, ultimately, be better for the ecosystem.
The difficult part is all what's around the code:
* the tickets/PR (including the closed ones)
* the links referencing the project
* the CI setup
* for large projects, the committers permission setup
* if applicable, the push/commit/branch rules
All that will be deeply annoying to migrate on a per project basis, or might get lost.
But that's not even the worst on my opinion. Losing the go-to platform for finding software is (fediverse for software when?).
I wouldn't rule out them moving away from offering the free tier to stop the all the code pushes. I think new code mostly written by AI isn't that appealing of a data set to train on.
"intermittent" is kind of underselling a failure on ~9/10 page loads
GitHub is in a tight space right now. The pace of software development is increasing and they are in a load-bearing position. In addition, their GitHub Copilot license was a massive loss-leader both directly costing them money, and making the traffic problem even worse. Simply put, they aren't prioritizing scaling and reliability like they need to be in this current situation and instead are focusing on feature build outs that boil down to being Microsoft's AI Middleman Salesperson.
Their position is hard, but they are potentially fumbling the ball in a big way. I for one don't trust them to not be down right before I want to do a production deploy.
The Empire may fall ...
https://sfconservancy.org/GiveUpGitHub/
Can't wait for Microsoft to go the IBM way.
(FWIW https://diversion.dev is at 100% uptime. Different scale, obviously, but also we're not Microsoft.)
It's unbeleivably snappy and fast. I can't recommend Forgejo enough.
- IT policies around GH accounts make no sense. It's a long story but, in short, you can't use any of your pre-existing GH accounts whether personal or professional (as in, an account I made exclusively for $DAYJOB before The Synergy Mandate) and must create a new one aligned with IT conventions.
- We don't monorepo hence we made extensive use of groups. There is no direct mapping for this concept in GitHub so we have to manually namespace projects.
- And now of course GH's no-nines availability :(
For my team, profit happens to be sensitive to our release dates---a day or two of delay can really make the difference if we'll make the month's projections or not. In another world, I would proactively mirror our profit-essential code but it's not worth the risk making a skunkworks guerilla effort. I'd like to think we can blame The Synergy Mandate in a few postmortems in the near future but of course I did not graduate yesterday, I know that's not gonna happen.
Thoughts and prayers we keep hitting our profit projections and they don't axe our product for underperformance.
(Writing this down, I can really feel how this job has changed since I joined.)
[0] https://news.ycombinator.com/item?id=22868406