• Lettuce eat lettuce@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    2 hours ago

    I personally think that general consumers will never use LLMs in any significant number. I think that LLMs will exist in two distinct spaces, FOSS for devs and other technical people who want to run there own infra locally - and B2B for everything else.

    The few big AI companies that manage to last will be selling access to their models for much higher prices. Probably similar to current proprietary commercial software like VMWare, SolidWorks, VEEAM, Splunk, etc. Companies will pay hundreds, possibly thousands of dollars per seat depending on the niche offering and amount of usage.

    Suppose that a company developed an LLM that is trained & tuned specifically to do legal work, and suppose it produced work that was around 95% the quality of a typical paralegal. If that company charged $6,000 a year per license to work on their platform, that’s expensive, but if you’re a small firm with say, a dozen full time lawyers, then for the yearly price of a single average paralegal, you could have each lawyer using that software to do most of the work that the paralegal would have done. I can see those kinds of applications happening more and more.

    This assumes though that LLMs will continue to improve at a significant rate for a long time into the future, (5-10 more years) which isn’t at all obvious, and there is some evidence that it’s already starting to hit a ceiling.

    There are other ways it might work, like if there is a method of compression that is discovered that reduces the necessary RAM and Compute needs by 2-3 orders of magnitude. So models that are considered very large today (100-300 billion params at full quality) might be able to run effectively on a single 32GB GPU that costs a few thousand dollars.

    So the cost to run these models is reduced immensely, and a single small data center could run enormous models with 1,000,000+ context windows for tens of thousands of users at once.

    But that cuts both ways, which is something that any AI company is going to have to deal with. Once small free models get good enough to do the vast majority of a task, a user is going to start weighing the cost/benefits, and the prospect of just buying a box and throwing one of these models in for a few grand will be very appealing.

    I think there may be a good market out there for “AI boxes”, compact computers designed to run a tuned LLM, set up with a little special sauce so the interface is user-friendly, etc. Companies could sell these with support contracts to legal firms, indie Dev studios, startups, small government agencies, etc.

    Idk, it’s so up in the air right now, and everything is constantly changing so fast. It’s impossible to predict where things will be in 6 months, let alone 6 years from now.

  • mindbleach@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    4 hours ago

    They’re fucked.

    Local models are already winning. Those benchmarked a year behind the biggest of big boys, a year ago. Six months ago they were six months behind. Yesterday Qwen released 3.6 27B and it outperforms 3.5 397B… from February.

    Either we’re plateauing toward the asymptotic limit of LLM capabilities, and the endgame runs as well on a toaster as it does on a server - or breakthroughs use big fat models as a glorified search space to be rapidly discarded. Both options point toward neural networks as a lump of algebra that sits on your hard drive and occasionally spins your fans. Remote computing loses, as it basically always must, and the drastically reduced requirements for competing on local software favor clever new competitors who aren’t a bajillion dollars in debt.

  • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
    link
    fedilink
    arrow-up
    14
    ·
    edit-2
    13 hours ago

    I think by the time AI becomes efficient enough to be profitable, it’s going to be efficient enough to run locally and the whole AI as a service business model is going to collapse. We’re basically in the mainframe era of AI right now, and we’ve seen this happen with many technologies before. There’s no reason to think this case will be different.

    Just to give you an idea of how fast this stuff is moving. Qwen 3.6 was just released and can be run on a high end laptop, it outperforms Qwen 3.5 from February which required a commercial grade server to run. https://qwen.ai/blog?id=qwen3.6-27b

    • grue@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      edit-2
      11 hours ago

      There’s no reason to think this case will be different.

      Not even the end of Moore’s Law?

      I’m not sure if you’re aware, but processors aren’t really getting much more efficient anymore. They’re just getting bigger (more parallel), which is why the price for the newer generations of GPUs has been skyrocketing. A new top-end GPU costs twice as much (or more) as a previous-gen one because it has twice as many (or more) compute units, since they can’t make the individual compute units much faster due to fundamental laws of physics.

      • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        13 hours ago

        I expect that software will continue to get optimized, and we’ll see new algorithms that are more efficient than what people are doing currently. However, it’s possible we’ll start seeing hardware specifically built for models as well. For example, there’s already a startup that uses ASIC chips to print the model directly to the chip. Since each transistor acts as a state, it doesn’t need DRAM and the whole chip requires a small amount of SRAM which isn’t in short supply right now https://www.anuragk.com/blog/posts/Taalas.html

        The limitation with this approach is that the chip is made for a specific model, but that’s not really that different from the way regular chips work either. You buy a chip and if it does what you need, it keeps working. When new models come out, new chips get printed, and if you need the new capabilities then you upgrade.

        You can see how absurdly fast their hardware version of llama 3 is here https://chatjimmy.ai/

      • iByteABit@lemmy.ml
        link
        fedilink
        arrow-up
        4
        ·
        13 hours ago

        There’s always two sides to software, one is the power of the hardware, and the other is the efficiency of the software. I think in this case OP means that AI will be optimized so much that it will require tiny fractions of the resources it previously needed, at least for the casual use cases of an average person asking a simple question or performing a small task.

      • eldavi@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        12 hours ago

        i suspect that we’ve neared the end of what we can get out of using silicon and the only way forward, at this point, is to switch materials altogether into something like graphene or carbon; but i bet it would take a long time to ever do that because the profit motives that keeps on silicon won’t allow for it.

        • grue@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          10 hours ago

          From a basic physics research perspective (as opposed to an engineering process development for production perspective), are we even sure graphene semiconductors have that much potential headroom for improvement beyond the best possible silicon ones? I’m not convinced it buys us more than a couple of process nodes. I mean, we’re already making transistors so small you can damn near count the individual atoms in them today. Is making them out of atoms with one less valence level gonna be enough for a 10x, 100x, or 1000x improvement, even in the long run?

          • eldavi@lemmy.ml
            link
            fedilink
            English
            arrow-up
            2
            ·
            7 hours ago

            The Chinese will likely be the first ones to know for certain considering that they’ve already demonstrated a willingness to spend a metric fuck ton into public infrastructure like the United States used to do for its military.

        • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
          link
          fedilink
          arrow-up
          2
          ·
          11 hours ago

          There are a few different tracks here. One is software optimizations where models require less energy to use. That’s been moving really fast over the past few years, and there are still a lot of papers that haven’t been integrated into production systems that are really promising.

          Another track is hardware architecture where the substrate stays the same, but chip design improves. A general example of this is SoC architecture like M series from Apple of Kirin 9000 from Huawei. The architecture eliminates the memory bus which is one of the main bottlenecks, and RISC instruction set facilitates parallelism much better than SISC. A more specific example would be ASIC chips like what Taalas is making which print the model directly on the chip.

          And the last track is the one you mention with using a more efficient substrate. Notably this will directly benefit from the other two tracks as well. Whatever software and hardware architecture improvements people come up with, will directly apply to chips made out of graphene or other materials.

          • eldavi@lemmy.ml
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            11 hours ago

            Agreed and all of those w tracks to squeeze out as much as we can from silicon.

            There’s a limit that we haven’t yet reached but we will eventually because of those profits.

            I bet that China will be the first to reach it since they’re willing to spend so much on all infrastructure.

            • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
              link
              fedilink
              arrow-up
              3
              ·
              11 hours ago

              I expect so as well, and China also has a lot of incentive to invest in alternative substrates since they’re behind on silicon. If one of these moonshot projects they’re pursuing delivers that would make current silicon chips look like vacuum tubes by comparison.

  • NotMyOldRedditName@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    10 hours ago

    Im not convinced something like Claude isnt profitable with enough users. I dont think people are spending more in compute than they pay.

    Getting enough paying users though requires it to be better so more people will pay.

    Obviously the free tier is at a loss, but I mean at a per paid user level.

  • Dingaling@lemmy.ml
    link
    fedilink
    English
    arrow-up
    9
    ·
    14 hours ago

    It’s about market share (“Your first hit is free…” marketing), but you’re probably seeing only one aspect.

    They’re already charging very real money for subscription users, especially enterprise.

    Uber spent $3.4bn, their entire budget for AI fees for 2026, within the first four months of this year - that’s real money by anyone’s definition.

    We (not Uber) set up a monitoring portal (litellm) to manage this. Some users are burning through a surprising amount, hitting what we considered sane daily limits within their first hour. One person asked a single query that cost $30 .

    Individual consumers of AI are riding free on this as the big AI players jostle for position and valuation.

    Will that bubble burst or gradually deflate? Or keep growing longer? Nobody knows, or if they do they’re investing cleverly and keeping their mouth shut.