NewsLab
Apr 28 19:09 UTC

An Update on GitHub Availability (github.blog)

274 points|by salkahfi||199 comments|Read full story on github.blog

Comments (199)

120 shown|More comments
  1. 1. mijoharas||context
    > we started working on path to multi cloud.

    Is this microsoft stating that they aren't able to get acceptable reliability from Azure? (I mean, I think a lot of us have heard that, but it's interesting to hear it from microsoft themselves).

  2. 2. derwiki||context
    It’s pretty damning. But as someone who has used Azure, I buy it.
  3. 3. everfrustrated||context
    Pretty damming that two Microsoft subsidiaries - GitHub and LinkedIn - either shelved their forced migration to Azure or are looking at non-Azure options.
  4. 4. jasoncartwright||context
    Seems pretty sensible to not rely on a single provider for their large complex system?
  5. 5. mijoharas||context
    I mean, amazon (shopping, along with prime video e.t.c.) runs on AWS.
  6. 6. jasoncartwright||context
    Prime video uses a non-AWS CDN when I watch football on it here in the UK
  7. 7. farfatched||context
    The BBC were unable to find a single CDN that could serve the UK during its peak football matches. https://www.bbc.co.uk/webarchive/https%3A%2F%2Fwww.bbc.co.uk...
  8. 8. ksimukka||context
    When I was at AWS, retail was not yet running on AWS. Has that changed?

    Prime video does use some AWS services, but live and on-demand are two entirely different beasts.

  9. 9. mijoharas||context
    Really? I thought retail was. It's been almost a decade since I worked at prime video but I think everything was running on AWS. (Some things didn't use brazil etc, but I think all the servers etc. were on AWS)
  10. 10. malfist||context
    It's a distinction without a difference. All new development is nAWS (native AWS) legacy is mAWS (not sure about the acronym) which is still AWS under the hood and is mostly just a pool of EC2 instances with preconfigured networks. Nothing made in the last five or six years is on maws, and amazon is a micro service shop so things are always being built new. If you joined today there's a good chance you'd join a team without any maws infra
  11. 11. cmckn||context
    MAWS is “Move to AWS”, the name of the internal campaign to get legacy services into a somewhat-retrofitted AWS environment. It was a single VPC at one point.
  12. 12. malfist||context
    I just finished a nearly five year stint at amazon and didn't realize there was pre-maws stuff still around. Never encountered any of it. I was like two months from my yellow badge but, uh, life is really better outside amazon.
  13. 13. PunchyHamster||context
    It was more "we built AWS to run our stuff and figured out we can sell it too".

    While Azure feels like Temu clone of Cloud

  14. 14. cyanydeez||context
    This isn't a mom and pop shop. They have locations all over the world: https://datacenters.microsoft.com/

    There's no intrinsic reason they should be vulnerable to themselves.

  15. 15. jasoncartwright||context
    That website (for me) uses Cloudflare via WPEngine, which also isn't Azure
  16. 16. farfatched||context
    +1. Multi-cloud is typically done for vendor independence.

    But Github don't have that rationale.

  17. 17. embedding-shape||context
    Man, you should have been there 6 months ago when they decided to start tearing down GitHub's own data centers and move everything exclusively to Azure. Seems they themselves realized this after they started moving, but imagine if you could have helped them realize this before they even started :)
  18. 18. benterix||context
    > Seems they themselves realized this after they started moving

    I guess most people at Github knew exactly it makes no sense but they didn't really have a choice. Maybe some voiced their statement, got "we hear you" in response and were told to proceed anyway.

  19. 19. embedding-shape||context
    Yeah, I don't know how it went down, but I also know exactly how it went down:

    Microsoft Execs: Everyone needs to move to Azure!

    GitHub developers: But Azure is not gonna be able to handle our load, we literally have our own data centers!

    Microsoft Execs: Sure, but you're Microsoft now, please publish blog post about how in half a year you'll be 100% on Azure.

    Few months later...

    GitHub Developer: We've tried our best, users are leaving in droves and Azure can't keep up!

    Microsoft Execs: Ok fine, you can use something else too, but only if you mainly use Azure and continue publishing blog posts about how great Azure is.

  20. 20. alper||context
    Azure is the MS Teams of clouds.
  21. 21. nextaccountic||context
    Made me think. Why not convert Github datacenters into Azure datacenters that have Github as their sole customer?

    Then it's up to Azure how they will manage this

  22. 22. hobofan||context
    That sounds like the worst of both worlds? The Azure devision that can't even reliably can't provide decent infrastructure products based on their own data center trying to do the same one a bespoke data center.
  23. 23. cbg0||context
    I think this is more tailored towards enterprise clients that lose money when Github is down, that would probably help with retention.
  24. 24. bombcar||context
    You’d think they could have had the existing GitHub on whatever continue as is (maybe for paying customers) while all the AI new inrush goes to the Azure setup.
  25. 25. jofzar||context
    Yeah that's a top tier enterprise plan feature if I have ever seen ut
  26. 26. jansan||context
    The entire concept of multi cloud is amusing if you think what cloud originally was supposed to be. They could call them meta clouds (might infringe trademarks), and with the current growth trajectory of AI generated code eventually multi-meta-clouds, renamed to beyond-clouds, and then multi-beyond-clounds. I see no limits.
  27. 27. youwangd||context
    Show HN timing matters more than people think. Monday-Thursday, 9-11am Pacific, is when the front page has the most engaged readers. Weekend posts get less competition but also less engagement.
  28. 28. zamalek||context
    There was somewhat recently a post here about how priorities, pressure, and management subverted Dave Cutler's vision for Azure (which was to have near zero human involvement) - my Google fu isn't strong enough to find it. Supposedly, someone running over or opening a serial to a rack/VM is now typical operational procedure.
  29. 29. ok_dad||context
  30. 30. zamalek||context
    That's the one!
  31. 31. pbronez||context
  32. 32. tedd4u||context
    > multi-cloud

    XXXXL size project. May not ever deliver. But if it fails, it will only do so after years grinding through people, resources, etc.

  33. 33. pluc||context
    There are no words that Microsoft can use that would make me trust Microsoft.
  34. 34. baq||context
    openai, anthropic, google and a plethora of chinese models all end up pushing code into github. you can discuss whether gpt 5.5 is better than opus 4.7, but for github it doesn't matter: they'll be receiving the code no matter which llm spits it out.

    amazing on one hand, quite scary on the other for github and all other forges if this continues and there is no reason why it wouldn't.

  35. 35. graemep||context
    Simple solution: charge all users. Charge more for higher usage.
  36. 36. gattr||context
    And/or provide a baseline free tier, corresponding to how much a typical human user would at most push/clone etc. They have pre-LLM statistics on that.
  37. 37. jcattle||context
    When there's a gold rush invest in checks notes jewellery makers?
  38. 38. huijzer||context
    I’m pretty sure my Forgejo instance on a Raspberry Pi is outperforming GitHub reliability. It’s faster that’s for sure.
  39. 39. darkwater||context
    Glad that they released some data about new repo/issues/commits over the last years. It confirms what everyone else already believed from the outside: agents are putting a lot of extra, sudden pressure on GitHub. It's like a startup that is growing exponentially, with the difference that they already have a large user base to serve - and that keeps them in the bullseye - and probably a not-so-fast-moving organization when it comes down to changes. On the other side of the coin, they also have a lot of talent, infra and money a startup might not have yet.
  40. 40. maccard||context
    What data is that? There's an unlabelled graph and a number at the current peak.
  41. 41. ncruces||context
  42. 42. maccard||context
    This is the data that should be in the blog post. Thanks for sharing.
  43. 43. darkwater||context
    IMO it transmits the magnitude of the impact pretty well.
  44. 44. frangonf||context
    What are we doing?

    Stop subsidizing tokens now that we extracted enough training data from you and we have enough agentic junkies business to keep the flywheel going up and cut on the loss leaders. [0]

    [0] https://news.ycombinator.com/item?id=47923357

  45. 45. guidoiaquinti||context
    > While we were already in progress of migrating out of our smaller custom data centers into public cloud, we started working on path to multi cloud. This longer-term measure is necessary to achieve the level of resilience, low latency, and flexibility that will be needed in the future.

    Wild

  46. 46. maccard||context
    It's kind of hard to read this with a straight face.

    The unlabelled graph with big numbers on top, the priorities that don't match with what we're experiencing, and a list of things that they're doing without a real acknowledgement of the _dire_ uptime over the last 12 months....

  47. 47. ramon156||context
    "We hear you" in ~300 words, basically.
  48. 48. ncruces||context
    More numbers: https://x.com/kdaigle/status/2040164759836778878

    What's the question here, you don't believe growth is currently exponential, or do you think it shouldn't be hard to scale, when 10x YoY is not enough?

  49. 49. OtherShrezzing||context
    As a business user, our costs have gone up while service has gone down dramatically. Meanwhile our marginal cost to GitHub has hardly changed. Where our costs to them have increased, they mostly charge us per cpu minute, so obviously aren’t making any kind of loss on our account.

    I’m sure they’re experiencing scaling issues across the platform, but it’s unacceptable for that to have a negative impact on us when we're sending them $250/dev/yr for (what is in all honesty) hosting a bunch of static text files.

  50. 50. rdevilla||context
    > we're sending them $250/dev/yr for (what is in all honesty) hosting a bunch of static text files.

    You know, you can just host your own code forge. Or you can just drop gitolite on a server. Or pull directly from each others' dev machines on a LAN.

    GitHub is not git.

  51. 51. dist-epoch||context
    > we're sending them $250/dev/yr for (what is in all honesty) hosting a bunch of static text files.

    so start a GitHub competitor which bills $50/dev/yr for solving this easy problem and make a lot of money?

  52. 52. ncruces||context
    I understand that, and maybe GitHub became a bad deal because of that.

    But if anything, their post and your reply are precisely an endorsement of usage based billing.

    The bit that's growing 13x YoY (and which they expect will easily blow past that) is unmetered - commits. The bit that is metered (for some, not all folks) - action minutes, grew only 2x YoY.

    GitHub was not built to limit the number of commits, checkouts, forks, issues, PRs, etc - nor do we want them to - but that's what's growing ridiculously as people unleash hordes of busy beaver agents on GitHub, because their either free or unlimited.

    Where there are limits - or usage based billing - people add guardrails and find optimizations.

    Because for all the talk, agents don't bring a 10x value increase; otherwise, they'd justify a 10x cost increase.

    Besides, other forges are having issues too. Even running your own. We have Anubis everywhere protecting them for a reason.

  53. 53. conartist6||context
    That sounds bad. Paying users don't want huge and every-growing numbers of freeloaders reducing the return for each dollar they spend...

    That would only lead to further and further degradation of service until the paying customers were absolutely desperate to find a deal that didn't require them to lug around such a heavy ball and chain.

    It all made sense at the beginning when Github was free for OSS and OSS was thriving, but now these billions of commits are mostly incredibly low value. I'd bet the average commit now doesn't create 1/10th of the value the average commit did in, say, 2018

  54. 54. graemep||context
    In that case, why are you using them at all?
  55. 55. tracker1||context
    I'm curious how Azure DevOps reliability has been for comparison. My current job is managing stories in DevOps with SCC in GitHub ent. While I like Github slightly more, have been curious about the decision.
  56. 56. stackskipton||context
    We use Azure DevOps at work for few things. It's been pretty rock solid since all agents don't recommend it and it's different architecture.

    It's also legacy at this point since Microsoft is pouring all resources into GitHub but for most people/companies, they could probably use Azure DevOps just fine.

  57. 57. maccard||context
    These numbers should have been in the blog post, not the graphs that are present.

    > What's the question here, you don't believe growth is currently exponential, or do you think it shouldn't be hard to scale

    I think you're putting words in my mouth here; I didn't say either of those things. I'm saying that this blog post is a meaningless platitude when the github stability issues predate this, and that all this post says is "we hear you're having issues".

  58. 58. ncruces||context
    Sorry if I misread your intent.

    I just think their charts, taken at face value, show substantially the same thing (for PRs, commits, new repos).

    Either those charts are a bald-faced lie (the tweet could be as well) or there is no way for that chart to be something else.

    The only way to fake exponential growth like that would be to use an inverse log scale (which would be a bald-faced lie).

    It doesn't even really matter what's the y-axis baseline, unless we really think growth was huge in 2020, then cratered to zero by 2023, now back to the previous normal.

    As for the rest of the post, I do think it's panic mode platitudes. But I honestly don't know what I'd write instead that's better.

    You can already see people complaining loudly where they instead of "we'll do better" decided to limit usage.

  59. 59. maccard||context
    No problem - it's tough online sometimes.

    > I just think their charts, taken at face value, show substantially the same thing (for PRs, commits, new repos).

    The problem is that these charts show the massive exponential growth in 2026. But this didn't start in 2026, this has been going on since early last year. My team had more build failures in 2025 due to actions outages or "degraded performance" than _any other reason_ and that includes PR's that failed linting or tests that developer were working on.

    > As for the rest of the post, I do think it's panic mode platitudes. But I honestly don't know what I'd write instead that's better.

    IMO, this needed to be written a 6 months ago (around the time that the memo of them prioritising the migration to Azure was released), and then this post should have been "We're still struggling, this isn't good enough. Here's the amount of growth, here's what we've done to try and fix it, and here's what we're planning over the next 3-6 months", instead of "Our priorities are clear: availability first, then capacity, then new features" and "We are committed to improving availability, increasing resilience, scaling for the future of software development, and communicating more transparently along the way." This isn't transparency (yet).

  60. 60. ferguess_k||context
    You can do the same with so many clients.
  61. 61. georgyo||context
    These are not the worst graphs in the world... Sure the bottom left axis is not labeled, but it still conveys the point correctly. The growth between 2023->2024->2025->2026 is growing quickly. And that in the end/beginning of 2026 they say more growth than the three years before, combined!

    You don't need to know the bottom left axis number. We do have to assume the graph is linear, and not some kind of negative exponent log graph. But given the rest of the content, I think that is safe to assume.

    Any company that experiences significantly more growth than they were planning for will have capacity issues.

    The priorities are most inline with that. The are way beyond the point that they can just add more hardware. They need to make the backend more efficient, and all the stated goals are about helping there.

  62. 62. maccard||context
    > These are not the worst graphs in the world... Sure the bottom left axis is not labeled, but it still conveys the point correctly.

    No, they're completely useless. Using the "New repos per month" as an example, if the bottom left is 1m, then that's a 20x increase in 2 years which is a lot. If the bottom left is 19m, it's a 5% increase in 2 years which is nothing.

    The massive surge on their labelled X axis starts in 2026, and these issues have been going on for a lot longer than that. GHA has been borderline unusable for a year at this point, if not longer.

    > But given the rest of the content, I think that is safe to assume.

    The rest of the content is "we're working on it", and "here's two outages in the last 14 days, one of which caused actual data loss"

  63. 63. johndough||context
    > You don't need to know the bottom left axis number.

    We very much do. The graph suggests an insane growth in PRs from almost zero to 90M. Now compare this misleading graph with this much clearer one, which shows that the growth over the last three years has been less than 80%: https://github.blog/wp-content/uploads/2025/10/octoverse-202...

  64. 64. SkiFire13||context
    That link shows the number of PRs created to be less than 10M though.
  65. 65. johndough||context
    Yes, to be honest, that graph could use some improvements as well. I should probably just link to the blog post with actual numbers: https://github.blog/news-insights/octoverse/octoverse-a-new-...
  66. 66. PunchyHamster||context
    You mean since GH acquisition 6 years ago https://damrnelson.github.io/github-historical-uptime/
  67. 67. nraynaud||context
    So I gather that nobody is working on a search that stays on the current branch?
  68. 68. fontain||context
    Personally, I’m sympathetic. We know that GitHub did a huge amount of work over the last decade to make Git scale, which has benefited us all. These new scaling challenges are real challenges, 30x growth would be a nightmare for any system that was already pushing the limits of what was possible, I think we are being far too hard on GitHub, they deserve a little grace.
  69. 69. someone_eu||context
    GitHub's scaling issues are caused by their own vendor-lock approach and monopoly. Yes, of course _their_ goal is to be even bigger and even more all-consuming, so _they_ have to deal with the scale. Why a user would be sympathetic to that?

    The user (and not a big tech monopoly) answer to scaling issues is almost always to stop scaling and start federating and interoperating.

  70. 70. remus||context
    For all the negatives about github I agree. They offer a lot of free stuff, and LLMs seem likely to put massively increase their costs with no guarantee they'll be making money off it. I can't think of many (any?) large businesses which could scale up to meet so much new demand without some significant growing pains along the way.
  71. 71. icy||context
    I'm biased (founder of tangled.org), but the future really should be federated forges. Host repositories on sovereign infra with global identity + federated "metadata" (issues, pulls, etc.).

    Global indices for this should be trivial to spin up so availability is never a concern (we're working towards this!).

  72. 72. ramon156||context
    Love the idea, would replace the LLM generated content ony our site, though.

    I recently migrated to codeberg because I'm okay with self-hosting big runners, while using codeberg's available runners for smaller cron-based things (they even have lazy runners for this).

  73. 73. icy||context
    It’s… all hand written? We just sound “professional”.
  74. 74. ArcHound||context
    But, there are? I can host a repo on GitHub, Codeberg and self host it too. Then I need to watch over main to keep it consistent between those. After that's established, I can do updates from wherever. Link'em in the README.
  75. 75. nibbleyou||context
    There's also a tool to automatically push it to multiple repos: https://github.com/prashantsengar/GitEcho

    Disclaimer: the author is a colleague of mine

    Though to be fair, what the parent meant by federated forges is different than this approach.

  76. 76. pabs3||context
  77. 77. embedding-shape||context
    There are distributed forges? Yes, git is distributed, but often everything around it isn't. The case parent is trying to make, is that the rest ("federated forges") should also be distributed, not just git.
  78. 78. ArcHound||context
    Ok, gotcha. So there's a demand for the additional features that are not bundled within git to be federated somehow.

    I'd say we have emails, mailing lists and bug trackers. Or maybe: what is the missing killer feature that needs federation?

  79. 79. embedding-shape||context
    > what is the missing killer feature that needs federation?

    Issues, pull requests, collaboration/permissions/access, "staring"/"favoriting", etc.

    I think ultimately the goal is that people can run their own forges, yet still collaborate on repositories hosted in other forges, leveraging your existing authentication so you no longer need to sign up individually for each forge.

  80. 80. beernet||context
    What is "sovereign infra" exactly?
  81. 81. tfrancisl||context
    No less than self hosted, imo. If youre on some cloud it doesnt really matter that you pay them absurd amounts of money, you arent sovereign.
  82. 82. embedding-shape||context
    So literally a computer at home/in the office, as with anything else you don't really "own" the infrastructure? Or is this just about "cloud"?
  83. 83. icy||context
    Yeah sorry it's marketing BS speak for self-hosted or just infra that you control. It could be a VPS, it could be a Raspberry Pi at home. Your repos live on your servers. (And we support this on Tangled today!)
  84. 84. embedding-shape||context
    > just infra that you control

    But a VPS isn't actually infrastructure you control, you essentially have as much control over it as "cloud", so I don't think that'd be counted as "sovereign", would it?

  85. 85. icy||context
    Perhaps, but it's still better than nothing!
  86. 86. beernet||context
    So if a company self hosts their physical infrastructure which will burn down once a fire sets in, they are more "sovereign" than a company running on a redundant cloud? I definitely would not want to be "sovereign" then.

    Point is: This discussion is much more multi-dimensional than some suggest.

  87. 87. mathgeek||context
    I know it's just marketing speak, but the term made me think of the scenes in the Matrix where what's left of humanity (ignoring all the cyclical lore that was added on top of it) has to make sure the machines can't remote in to any of their tech.
  88. 88. sikozu||context
    I've never heard of this before, going to sign up and check it out!
  89. 89. icy||context
    Thanks! If you need anything, email me anirudh@!
  90. 90. ljm||context
    I would love if it coding agents didn't default to GitHub for their deep VCS integration.

    If I could get the same bells and whistles by wiring up another forge, so long as it offered a decent API and/or sent events over a webhook, I'd have everything self-hosted.

    The agents would need to expose an interface on their own end but as long as you implemented it with a plugin, it'd take the dependency of GitHub and you could use MCP or skills for the rest of it.

  91. 91. icy||context
    The neat thing about Tangled is it's built on an open protocol (https://atproto.com)—this allows us to effectively build an API-free system since all data on Tangled can effectively be ingested via the AT Protocol firehose.

    Which is to say, this is perfect for agents given they don't need any bespoke SDK from us: simply write Tangled records for issues, pulls, whatever to your PDS and it'll show up on Tangled. We plan to start working on some exemplar agents first-party that would 1. enhance Tangled itself, 2. showcase cool things you can do with an open data firehose.

  92. 92. iso1631||context
    > the future really should be federated

    The internet should not be centralised, but you can't make a billion dollar company without capturing the world and selling your company to a trillion dollar company

  93. 93. PunchyHamster||context
    It's cute idea but most people don't want to host their own stuff.

    And if they are using 3rd parties to host their stuff, inevitable 1-3 big players will show up offering that as a service.

    And even if you do host your own stuff to avoid availability problems, the big actors can still fail just like GH and you can't do shit coz your dependencies need it.

    So the solution is same as it is now, proxy or mirror everything you use

  94. 94. icy||context
    Yeah that's fine, we offer first-party hosting for free forever.
  95. 95. jftuga||context
    Some interesting tid bits:

    * we had to resolve a variety of bottlenecks that appeared faster than expected from moving webhooks to a different backend (out of MySQL)

    * * redesigning user session cache to redoing authentication and authorization flows to substantially reduce database load.

    * we accelerated parts of migrating performance or scale sensitive code out of Ruby monolith into Go.

    I'd like to know what database backend they migrated to. I was also surprised to read that the migration from Ruby to a more performant language had not already been completed. I assume this is because it a large code base with many moving parts, etc.

  96. 96. mohsen1||context
    Another interesting bit: they are hitting performance issues due to the rise of monorepos. GitHub and frankly Git were not designed for monorepos
  97. 97. ghthor||context
    Yet the Linux kernel is a monorepo
  98. 98. mohsen1||context
    Try google3
  99. 99. guipsp||context
    The Linux kernel is pretty small
  100. 100. rootnod3||context
    > Our priorities are clear: availability first

    That's a delayed April fool's right?

  101. 101. embedding-shape||context
    No, just a 6 month old memo that was first opened today, as they said literally the same 6 months ago.
  102. 102. embedding-shape||context
    Hah, love that now they say "Our priorities are clear: availability first, then capacity, then new features" when 6 months ago, it was seemingly exactly the same except Azure supposedly was gonna save them:

    > GitHub Will Prioritize Migrating to Azure Over Feature Development - GitHub is working on migrating all of its infrastructure to Azure, even though this means it'll have to delay some feature development.

    > In a message to GitHub’s staff, CTO Vladimir Fedorov notes that GitHub is constrained on capacity in its Virginia data center. “It’s existential for us to keep up with the demands of AI and Copilot, which are changing how people use GitHub,” he writes.

    https://thenewstack.io/github-will-prioritize-migrating-to-a...

    So the currently delayed feature development is now gonna be further delayed, yet almost every week we see new features and changes, just the other day the single issues view was changed, as just one example. And it was "existential" 6 months ago yet they keep stumbling on the exact same issue today?

    Even if they're focused exclusively on reliability and uptime, we get the experience that we have today, kind of incredible how a company with the resources of Microsoft seemingly are unable to stop continuously shot themselves in the foot. It's kind of impressive actually. As icing on the cake, they've decided to buy up all popular developer services then migrate them all to the same platform, great idea too.

  103. 103. ncruces||context
    > So the currently delayed feature development is now gonna be further delayed, yet almost every week we see new features and changes, just the other day the single issues view was changed, as just one example.

    They did that as a panic mode hack to mitigate performance: https://news.ycombinator.com/item?id=47912521

  104. 104. madeofpalk||context
    This seems uncharitable. Priorities aren't exclusive, especially at scale across large engineering orgs like GitHub. It could be that these are the top level priorities, but teams or individuals who aren't able to contribute to these priorities will work on other things like new features.
  105. 105. embedding-shape||context
    Ditto. I agree though, just because the priority is reliability, doesn't mean others can't work on features, especially features that might help with reliability, which I read was the motivation behind the new single-issue view, so that's my bad, might have been a bit much.

    I still think the rest of my point stands, especially the last one which is the move that has the biggest impact to the most of us developers.

  106. 106. voncheese||context
    Agree that priorities aren't exclusive and there may be teams/individuals that aren't able to contribute if they stay in their current teams/roles

    Where it becomes questionable though is when enough progress isn't being made on the top priority (reliability). If Github is being true to their word, they need to be pulling people off of teams that are working on features to work on reliability so that top priority gets the resourcing it needs.

    Given the pace of improvement, and the cited example of moving to Azure from months ago, it's not super clear they are doing that. Also not clear that they aren't, maybe the move to Azure is just a more than 6mo project no matter how many people are on it.

  107. 107. estimator7292||context
    Sure, but frontend devs fundamentally cannot contribute to the structural reliability issues.

    The person who rewrote the issue page view probably doesn't know anything about multi-cloud scaling for millions of users with Azure-crippling throughput. That's an incredibly specialized set of knowledge and experience that is utterly disjunct to frontend work.

    But at the same time, given the state that GitHub is in, I personally wouldn't want to allow any devs to push anything to prod that doesn't immediately affect stability. I'd completely freeze frontend work until the infrastructure is more stable. But then again I write C for microcontrollers so what do I know?

  108. 108. tedd4u||context
    I don't know their architecture but I would bet if FE devs wants to contribute to availability in a capacity-constrained world (as GH CTO mentions) they could focus on profiling and optimization, backend-access patterns for example, caching, etc. Maybe they already have people dedicated on that but if they are coming out of a "new features first" operating regime I would bet there's some fruit to pick there.
  109. 109. saghm||context
    No, but they are ordered generally, and in this case they are explicitly saying that availability should come first
  110. 110. rwmj||context
    It's entirely possible the move to Azure has made the availability problems worse. Dedicated hardware is much more predictable than cloud. "Let's not move to Azure and instead buy a few more racks" was likely a decision beyond the pay grade of github's management.
  111. 111. 0xy||context
    Azure is easily the least reliable and least secure of the 3 hyperscalers, which is crazy because GCP was an also-ran underdog not that long ago.
  112. 112. alper||context
    This entire exercise if anything is a huge indictment of Azure.

    But that doesn't matter because the kind of person that buys Azure, just like the kind of person that buys MS Teams, is entirely driven by price and does not care about anything else.

  113. 113. panarky||context
    > entirely driven by price

    I might buy that argument if Azure compensated for its awful availability and security with lower prices.

    But the kind of person who buys Azure is the kind of person who buys Windows and Teams, perfectly happy to pay a premium for all the extra abuse.

  114. 114. AntiUSAbah||context
    I mean its Microsoft and its Azure. How much can go wrong clicking yourself a few/hundred non autoscaling normal VMs?

    There is so much workload running on Azure, i never heard of VMs go away.

    If Microsoft can source hardware for Azure, Microsoft can source hardware for Github.

  115. 115. ZoneZealot||context
    I've had Windows Server VMs soft crash and hard crash on Azure. Some soft-lock and a restart via Azure gets them back. Some times the only fix has been to power off / deprovision - then power on again (i.e. a restart didn't fix it). It's not common, but I've encountered it multiple times. These are with operating systems that were created in Azure from their images.
  116. 116. dijit||context
    there's a lot that can go wrong with a hypervisor, even including hiding hardware issues from the guest OS.

    We don't think about it because we've been quite spoiled with excellent virtual machine platforms (KVM, Xen and even VMWare).

    Those that have worked a lot with VirtualBox will be aware of this, it can be deeply unnerving that VM technology is the default way to deploy things after you've spent sufficient time with VirtualBox. (which: is very good for its original purpose, but not for reliability).

    The question is: Does Azure use something more like VirtualBox, or more like KVM?

    HyperV exhibits properties closer to VirtualBox.

  117. 117. stackskipton||context
    HyperV looks like VirtualBox but it's not. It's type 1 like KVM is.
  118. 118. dijit||context
    i meant in terms of bubbling up hardware issues.
  119. 119. giancarlostoro||context
    If they had not added or changed any features to GitHub for the past 5 years, nobody would be upset, and yet, they keep changing it. It's a website that doesn't need to be reworked every five minutes. I assume the main development teams maintaining GitHubs codebase are ran by managers who cannot justify their jobs unless they deliver new features for the sake of delivering new features to keep their jobs going, and / or in the hopes of getting new people to join GH, when in reality the more they wind up breaking, the more the opposite becomes true.

    They severely nerfed their search, I'm not sure why every other major tech company (Google - Search and YouTube) keeps breaking search for everything when it was working fine previously.

    What's a bigger joke is Microsoft has Azure DevOps which looks like it might be abandoned? But then you also have GitHub... My least favorite thing about both is the ticketing system, I cannot believe that I'd ever utter the phrase "I miss Jira" when every Jira project I've ever been in had been so inconsistently setup, every, single, one.

  120. 120. JCTheDenthog||context
    >What's a bigger joke is Microsoft has Azure DevOps which looks like it might be abandoned?

    My favorite was trying to figure out how to publish debug symbols with NuGet packages to Azure DevOps artifact feeds. Horrible documentation and I was never able to get it figured out.