Pricing Models for LLM Apps

Historically, building software products has been a high-margin business because key costs - data storage, computing, transfer - have become commoditized and inexpensive, thanks to decades of focused innovation.

The emergence of large language models like ChatGPT, Llama, and Stable Diffusion, alters the status quo. The training and output generation of these models require substantially more computing resources, especially GPUs, which are costly and currently scarce in supply.

This shift fundamentally changes the business models for companies that rely on LLMs. Here, I’ll touch upon why this shift is happening and then break down some of the emerging pricing models developers are adopting to adapt to these changes.

Marginal Costs

The marginal cost of replication of a product is the additional cost of producing one more unit of the product. Lower marginal costs translate to higher economies of scale.

Traditional SaaS companies benefit from very low marginal costs where new users are often just additional database entries.

This is not the case for LLM-based applications. There is a cost associated with each query (measured in tokens). As the application scales, so do the total queries sent to an LLM, and the associated costs - all proportionally. This increases the marginal cost and a need for businesses to develop new pricing models.

Open Source + BYOK

Because generative AI is a platform shift in software, it has naturally captured the imagination of enthusiasts and tinkerers, who are doing their thing and building amazing open-source projects. Some of my favorites:

Paul Graham GPT: a chatbot trained on all of Paul Graham’s essays
TL Draw: convert sketches to UI and code
Open Interpreter: ChatGPT for your computer
AutoGPT: Open source AI agent

Since these projects are not commercial, developers don’t bear the LLM costs, which users have to pay themselves. This is typically done through a Bring-Your-Own-Key (BYOK) model, where users procure API keys and purchase credits from providers like OpenAI and Anthropic.

This is great for enthusiasts but excludes the majority of everyday users, who may not even know what an API key is, let alone procuring and recharging one.

SaaS + BYOK

Our first commercial pricing model. In this case, developers create applications and charge users as they do for SaaS - typically one-time fees or periodic subscriptions - but also ask users to enter their own keys for LLM usage.

Essentially, these developers seek the best of all worlds - the high margins of traditional SaaS, the added capabilities of LLMs, and not having to deal with the complexity of additional LLM costs.

Convenient for the developer, this is a bad user experience. Users have to pay for both the software and API credits. Again, this model is only for the enthusiasts and can’t scale.

Credit Based

This is a pay-as-you-go model. Users purchase credits from the developer. These credits are priced to bake in the cost of the application + LLM model + margins.

Examples include DreamStudio by Stability AI, RunwayML, and Photo AI.

The benefits here are that once you decide on pricing, this model is very straightforward and highly scalable. Developers can also change the credits required for different actions to reflect changing costs or to increase their margin.

The big downside is that revenue isn’t recurring. Recurring revenues form the bedrock of SaaS businesses, bringing predictable cash flows and reducing risk. Also, payments aren’t on auto-pilot. Users have to manually purchase more credits, leading to potential drop-offs.

Subscription with limits

Users pay a monthly/annual subscription fee and get access to a fixed number of requests within a certain timeframe.

Popular examples are ChatGPT with GPT-4 ($20 a month for 40 requests every three hours) and Midjourney (a fixed number of “fast hours” every month depending on the plan).

This closely mirrors a typical SaaS model where developers benefit from recurring subscription revenues. Since users don’t have to make multiple payments or recharge credits manually, it is a better user experience as well.

The downside, of course, is the limit itself. Very often, I’m in the middle of an important conversation with GPT4, I reach my rate limit. I can’t extend the limit even if I am willing to pay more money. This can be a frustrating experience.

Midjourney somewhat solves this problem. You can purchase additional hours at a per-hour rate if you exhaust your allocated “fast hours” for the month.

Subscription without limits

A slight modification of the previous model. Users pay a subscription fee to get unlimited access to a model.

Given that the costs to run LLM apps scale linearly with usage, this is a tricky model to implement. For power users, the cost to the company could be higher than the subscription price. This means that these users are subsidized by infrequent ones. Much like a gym membership, the developer relies on a category of customers who pay for the product but don’t use them as much.

This model is common with chatbot apps like Replika.

The advantage here is that it has a superior user experience compared to the previous models we discussed. Users pay a monthly fee and get unlimited access to the model.

However, setting a price in the first place is tricky. If costs rise for any reason, devs either have to increase the subscription price (leading to drop-offs) or introduce a usage limit (bad user experience).

Fully Free

Given everything we’ve discussed, you might conclude that it is almost infeasible to deploy a fully free LLM-based service.

But if you have billions of dollars and are in a race to become the de-facto player in the AI space, you might be tempted to release a free, unlimited-usage AI product.

OpenAI, Microsoft, and Google have this kind of money and ambition to win the AI race. As a result, we have ChatGPT with GPT3.5, Bing Chat, and Bard - AI apps that everyone can use as much as they want.

But AI is advancing very fast. Given the talent and investment, we can expect a steady reduction in infrastructure costs and model sizes, leading to basic intelligence becoming commoditized. An increasing number of products, like Perplexity and Cursor, are following ChatGPT’s pricing model - unlimited usage for a less powerful (and cheaper) LLM and a paid subscription for a better LLM with additional features.

A fully free product variant is sustainable only for teams that have money to burn, for now.

From a user experience point of view, well, we can’t ask for more!

Final Thoughts

There is no single answer to what is the best pricing model for an AI product. Each has clear upsides and downsides (apart from SaaS + BYOK, which has very few upsides - pls don’t use it!).

Eventually, the factors teams should consider are the nature of the product itself, the user profile (or profiles), and their runway and appetite for growth.

All of this being said, we are very early in the LLM software era. With new technology, comes new business models. We can and should expect innovation in pricing, especially with the AI agents coming into the mainstream foray soon. A few projects are emerging here and I’ll cover one of them very soon!