• sobchak@programming.dev
    link
    fedilink
    arrow-up
    1
    ·
    12 hours ago

    14 H100 GPUs per user that they serve

    Not per user, but probably decent rough estimate to that per vibecoding dev that is continually running agents 8+ hours/day. Some people’s “workflows” involve running multiple parallel agents sometimes or even a significant portion of the time (using the git worktree feature), so I think that’s probably a decent rough estimate. I imagine the limit would be serving 10 of these types of “devs.” Of course, there’s batching and stuff that can be done, but I think it still slows everybody else down near linearly. H100s aren’t the only accelerators used for inference; I just chose it as an example. Google has ~5 million H100 equivalent accelerators, Microsoft has 3.5 million, and Amazon has 2.5 million (https://www.networkworld.com/article/4156949/google-owns-the-most-ai-compute-and-it-built-it-its-way.html).

    • theunknownmuncher@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      12 hours ago

      Even so, your numbers are still a tiny fraction of GPU units compared to concurrent users, and the limit you “imagine” is just that, imagined.

      And you do need to remember that the majority of the compute at these companies is used for model training and not used for inference.