Why the Next Wave of AI Winners Will Be Infrastructure Companies, Not Model Companies

The model layer is commoditizing at 280x in two years. The real defensibility is in the infrastructure between the model and the enterprise: memory, governance, deployment. That's the bet we're making.

TL;DR

The cost of GPT-3.5-level inference dropped over 280-fold in two years. Model quality is converging. Pricing is a race to zero. If your platform's value is "we use the best model," you have a shrinking window before that's table stakes.
Andreessen Horowitz's analysis of the generative AI stack found that infrastructure vendors capture the majority of dollars, while application companies grow fast but struggle with retention and margins.
The three infrastructure layers that compound are deployment (where AI runs), memory (what AI knows), and governance (what AI is allowed to do). Models are interchangeable. These layers are not.
Enterprise market share is volatile: Anthropic went from 12% to 32% of enterprise LLM spend in 18 months while OpenAI dropped from 50% to 27%. Building on any single model is building on sand.
The companies that own the stack between the model and the enterprise will define the next decade of AI.

Two years ago, we made a bet. Not on which model would win, but on the assumption that it wouldn't matter which model won.

At the time, this felt contrarian. OpenAI had dominant market share. GPT-4 was the benchmark. Building "for OpenAI" seemed like the obvious play. Every startup in our space was optimizing prompts for a single provider, fine-tuning on one model family, and coupling their architecture to a specific API.

We went the other direction. We built an infrastructure layer: memory, governance, agent orchestration, deployment. The model was a parameter, not the product.

Two years later, the numbers confirm the thesis.

The Commoditization Curve Nobody Talks About

Stanford's 2025 AI Index Report tracked something remarkable: the inference cost of GPT-3.5-level performance dropped over 280-fold in just two years. Not 2x. Not 10x. Two hundred and eighty times cheaper.

This isn't a blip. It's a structural force. Every major lab is racing toward the same capabilities. Open-source models are closing the gap. DeepSeek, Llama, Mistral, Qwen are all delivering competitive performance at a fraction of the cost. The model layer is becoming what compute became in the cloud era: essential, but not differentiating.

If your AI platform's core value proposition is "we use GPT-4" or "we're powered by Claude," you're selling access to a commodity that gets cheaper every quarter.

The market share data makes this even clearer. Andreessen Horowitz's empirical study with OpenRouter (covering over 100 trillion tokens of enterprise usage) showed Anthropic capturing 32% of enterprise LLM market share, up from just 12% in 2023. OpenAI dropped from 50% to 27% over the same period. Anthropic now earns 40% of enterprise LLM spend.

That kind of volatility at the top is the signature of a market where the product is interchangeable. Enterprises aren't loyal to models. They're chasing the best price-performance ratio, and that ratio shifts every few months.

Where Value Actually Accrues

Andreessen Horowitz published what might be the most important framework for understanding AI market structure: "Who Owns the Generative AI Platform?". Their finding was direct:

Infrastructure vendors are the biggest winners in the GenAI market, capturing the majority of dollars. Application companies grow revenue quickly but struggle with retention, differentiation, and gross margins.

The pattern they documented: the companies creating the most value in AI haven't captured most of it. Value flows to whoever owns the critical chokepoints in the stack.

Sequoia's David Cahn framed the economics even more starkly in "AI's $600B Question". The AI industry faces a $600 billion annual revenue gap between what infrastructure spending demands and what AI companies actually generate in revenue. Most of that gap sits at the application layer. Infrastructure is where the economics work.

This isn't an argument against building applications. It's an observation about where defensibility lives. Applications that depend on model access alone will face the same margin pressure as any other thin wrapper around someone else's API.

The firms investing the most money understand this. Andreessen Horowitz raised a $4.25 billion fund in 2024 with substantial AI infrastructure allocations. Their "Theory of Well" thesis argues that the most defensible AI positions belong to companies controlling critical chokepoints in the supply chain. Not model providers. Not application wrappers. Infrastructure.

CoreWeave raised at a valuation exceeding $35 billion. Databricks closed at $62 billion. These are infrastructure companies. The market is telling you where it thinks the value is.

The Three Infrastructure Layers That Matter

If models are commoditizing, what isn't?

We've spent two years building the layer between models and enterprises. Through that work, three infrastructure problems have emerged as the ones that compound, the ones enterprises can't solve by swapping API keys.

Layer 1: Where the AI Runs (Deployment)

This is the most overlooked infrastructure decision. Where your AI platform physically runs determines who controls encryption keys, who sees audit logs, who manages network boundaries, and who can verify compliance in real time.

We deploy into the customer's own AWS account. The customer controls IAM, KMS, VPC, and security groups. Their CloudTrail logs every API call we make. If they want to revoke our access at 3am on a Saturday, they can.

This isn't a feature. It's an architecture. And it's not something you can bolt on later. The decision to deploy in a customer's account versus a multi-tenant cluster shapes every downstream design choice: how you handle secrets, how you manage state, how you structure DPAs, how you price.

Our Terraform deployment runs four phases and stands up the full platform in about two hours. The customer's AWS bill for the base infrastructure is $26-100/month depending on configuration. That transparency matters. When the customer can see every line item in their own AWS console, the conversation shifts from "trust us" to "verify yourself."

Layer 2: What the AI Knows (Memory)

I've written extensively about why RAG alone doesn't solve the memory problem. The short version: retrieval finds documents, memory understands entities. An AI platform that can extract, structure, deduplicate, update, and compound knowledge about customers, deals, and relationships across every agent interaction has something a model provider can never offer.

We maintain 20+ DynamoDB tables and a vector store per customer organization. Every memorize call writes to both the vector layer (for semantic search) and the serving layer (for structured browsing and property retrieval). The extraction pipeline runs quality gates on every batch. Properties are schema-enforced with confidence scores. Open-set facts are deduplicated by cosine similarity.

Anyone can call an LLM API. Not everyone can maintain a living, governed knowledge graph that compounds with every interaction. Memory is infrastructure that gets more valuable with use. Model access does not.

The memory silos problem is real: when each agent has its own private memory, the enrichment agent's insights never reach the outbound agent. Shared entity memory is an infrastructure layer. It doesn't depend on which model powers the extraction.

Layer 3: What the AI Is Allowed to Do (Governance)

Governance is the layer most enterprise AI platforms skip entirely, and the one regulated industries need most.

System prompts don't scale. I covered this in detail: when you have ten agents across five teams across three platforms, each with its own system prompt, you don't have governance. You have ten independent interpretations of what the company is supposed to sound like.

Our governance layer (SmartContext) delivers organizational policies to agents at execution time. When pricing changes, every agent knows immediately. When legal updates a compliance boundary, it's reflected in the next agent interaction, not in the next prompt rewrite. The fast path delivers governance context in ~850ms without an LLM call.

This layer has to be centralized, dynamic, auditable, and platform-agnostic. That's infrastructure, not an application feature.

Why We Built an Infrastructure Company

This wasn't the easy path. The easy path was: pick a model, build a product on top, ship features fast. The infrastructure path means building Terraform modules, managing cross-account IAM roles, handling DynamoDB pagination edge cases, and debugging CloudTrail integration. It's less glamorous than shipping an AI chatbot.

But we kept coming back to the same observation in every enterprise conversation. Prospects didn't ask which model we use. They asked three questions:

Where does my data go? (Deployment)
Can my agents share context? (Memory)
How do I control what agents say? (Governance)

These are infrastructure questions. And the answers can't be "we wrap GPT-4 and add a nice UI."

The model independence decision has already paid off in ways we didn't anticipate. When Anthropic surged past OpenAI in enterprise adoption, our customers didn't have to migrate anything. When a healthcare customer needed AWS Bedrock for HIPAA compliance (keeping LLM inference inside their account), we switched a parameter. When a prospect evaluated self-hosted vLLM for an air-gapped government deployment, we ran the cost analysis: break-even at roughly $2,000/month in API spend with a single A100.

None of those conversations would have been possible if we'd built on a single model.

The Embedding Trap

One caveat. There's one place where model dependency is real and dangerous: embeddings.

If you change your embedding model, you invalidate your entire vector store. Every stored embedding becomes incompatible with new query embeddings. For a platform with thousands of organizations and millions of records, re-embedding is a massive operation.

We use text-embedding-3-small and treat it as a foundational commitment. This is the one decision in our stack that has genuine switching costs. It's also the strongest argument for why the infrastructure layer is more defensible than the model layer: anyone can swap a chat model. Nobody wants to re-embed a production vector store.

Model agnosticism has one real exception: embedding models. The switching cost isn't the API call. It's re-embedding every record you've ever stored. This is where infrastructure lock-in actually lives, and it's in the memory layer, not the model layer.

What This Means If You're Buying AI

If you're evaluating AI platforms for your organization, here's the framework I'd suggest:

Ask about the infrastructure, not the model. "Which LLM do you use?" is the wrong question. The right questions: Where does my data live? Can I verify that independently? What happens to my data if you go out of business? Can I switch models without re-deploying? Do my agents share memory?

Look for deployment transparency. If the vendor can't tell you exactly which AWS services (or equivalent) they're running, what the infrastructure costs, and whether you can see the audit trail independently, you're buying a black box. The model inside the box will change. The box is what matters.

Evaluate governance as infrastructure. If governance is "we have a system prompt," run. Governance needs to be dynamic (updates without redeployment), centralized (one source of truth for all agents), auditable (who changed what, when), and platform-agnostic (works across tools and teams).

Check for memory architecture. Per-agent memory is a notebook. Shared entity memory is institutional knowledge. If the platform can't explain how Agent A's insights reach Agent B for the same customer, the memory is siloed. That gap costs real revenue.

The Next Decade

Total corporate AI investment hit $252.3 billion in 2024, according to Stanford HAI. Private investment jumped 44.5%. The money is pouring in.

Most of it will flow to model providers and application wrappers. Most of those application wrappers will face margin compression as models commoditize and customers realize they're paying for a thin integration layer.

The companies that own the infrastructure between the model and the enterprise: the memory that compounds, the governance that enforces organizational rules, the deployment architecture that gives customers real control. Those are the ones that will be around in ten years.

That's the bet we're making. The model will change. The infrastructure won't.

Frequently Asked Questions

Aren't you also dependent on AWS infrastructure?

Yes, currently. We deploy on AWS ECS Fargate, DynamoDB, and Lambda. We've assessed multi-cloud feasibility: Kubernetes support requires 4-6 hours of Helm work with zero application code changes. Azure requires a service abstraction layer (15-25 hours). But we're demand-driven, not speculative. AWS covers roughly 60% of the enterprise cloud market. We'll build Azure and GCP support when paying customers need it, not before. Revenue first, speculation never.

If models are commoditizing, won't infrastructure commoditize too?

Eventually, some layers will. Basic vector storage is already commodity. But the compound layers (memory that builds on itself over time, governance that reflects organizational knowledge, deployment that satisfies regulatory requirements) have compounding returns. The more data in the memory layer, the more valuable it gets. That's a different dynamic than model inference, where the output is stateless.

How do you handle customers who are committed to a single model provider?

We support it fully. If a customer wants to use only Anthropic, or only OpenAI, or only Bedrock, that's a configuration choice. The architecture doesn't require multi-model usage. It simply doesn't prevent it. The value is in the optionality: when the next model shift happens (and it will), the customer isn't locked in.

What about fine-tuned models? Don't they create real switching costs?

Fine-tuning does create model dependency, and it's a legitimate concern. Our approach: we invest in memory and governance (context assembly at inference time) rather than fine-tuning. A well-structured memory layer with governed retrieval often eliminates the need for fine-tuning entirely. The model gets the context it needs through retrieval, not through training.

References

Stanford HAI — "The 2025 AI Index Report" (April 2025): https://hai.stanford.edu/ai-index/2025-ai-index-report
Andreessen Horowitz — "Who Owns the Generative AI Platform?" (January 2023, updated 2024): https://a16z.com/who-owns-the-generative-ai-platform/
Andreessen Horowitz — "State of AI: An Empirical 100 Trillion Token Study with OpenRouter" (2025): https://a16z.com/state-of-ai/
Sequoia Capital — "AI's $600B Question" (David Cahn, June 2024): https://sequoiacap.com/article/ais-600b-question/
WebProNews — "The 'Theory of Well' Thesis: How a16z's Vision for AI Infrastructure Is Reshaping Venture Capital Strategy" (2025): https://www.webpronews.com/the-theory-of-well-thesis-how-a16zs-vision-for-ai-infrastructure-is-reshaping-venture-capital-strategy/