MIT Tech Review AI June 19, 2026 NEUTRAL

SubQ Puts Sparse Attention Back at the Center of LLM Debate

Subquadratic says its SubQ model can make LLMs faster, cheaper, and more energy efficient by replacing dense attention with sparse attention. Independent testing by Appen supports several performance claims, but limited public access means the strongest conclusions remain unproven.

WTF Index NEUTRAL

◄ Terminator 1 Idiocracy 0 ►

This is mainly a technical efficiency claim about LLM architecture, with limited implications for autonomy, harm, or social degradation.

SubQ Puts Sparse Attention Back at the Center of LLM Debate

Subquadratic, a Miami-based AI startup, is asking the AI world to take a serious look at a large language model that works differently from the dominant systems on the market. Its model, called SubQ, is built around sparse attention rather than the dense attention mechanism used in most leading LLMs.

The company says that change could make models faster, cheaper, and less energy-intensive while allowing them to work with much larger amounts of text at once. New independent testing gives the claim more weight, but it does not end the debate.

Why SubQ Is Drawing Attention

Subquadratic came out of stealth mode last month with a bold claim: it said it had solved a mathematical bottleneck that has limited large language models for almost a decade. At first, the evidence was thin. The company released only a small set of self-published test scores, and SubQ was not widely available for outside users to evaluate.

That made skepticism predictable. Dan McAteer, an artificial intelligence engineer, summed up the split reaction on X by saying, “SubQ is either the biggest breakthrough since the Transformer ... or it’s AI Theranos.”

Subquadratic has now shared more detail, including results from independent tests run by Appen, a third-party firm that evaluates models. Those results appear to support several of the company’s claims, especially around speed and long-context retrieval.

Alex Whedon, Subquadratic cofounder and chief technology officer, said the company expected skepticism. He also said that releasing third-party benchmarks with the original announcement would have helped, which is why the company is now emphasizing verification before future results are published.

The Bottleneck Inside Modern LLMs

The issue Subquadratic is targeting sits at the center of how most large language models process text. Today’s LLMs typically rely on transformers, which use dense attention to compare tokens across a piece of text.

In simple terms, dense attention assigns a number to each word or part of a word, then compares those numbers against one another. The goal is to capture meaning across the full text. But that process quickly becomes expensive.

The source article gives a concrete example: a text of 10,000 words can trigger almost 50 million individual multiplications. As the text grows, the number of computations rises sharply. Double the words, and the computations roughly quadruple, a pattern described as quadratic expansion.

That is why long context windows are so technically demanding. A model that can handle more text at once must do far more work unless it changes the way attention is calculated.

How Sparse Attention Changes the Tradeoff

Subquadratic’s answer is to replace dense attention with sparse attention. Instead of comparing every token with every other token, sparse attention chooses only some relationships to compute.

The logic is straightforward: not every relationship between words matters equally. Whedon put it this way: “Sparse attention says not all of those relationships are important, because they’re not.”

This is not a new idea. Other researchers and companies have tried sparse attention before. Will Depue, an independent AI researcher who previously worked at OpenAI, said that “pretty much everything under the sun has been attempted.”

The challenge is not merely reducing computation. The hard part is doing so without losing the ability to understand a document as well as dense attention does. Earlier methods often relied on fixed patterns for choosing which tokens to compare. Subquadratic says SubQ makes those choices dynamically, depending on the text it receives.

The company has not disclosed exactly how that selection process works. Whedon described that mechanism as the “secret sauce.”

What The Tests Show

Appen evaluated SubQ on several standard tests. In a speed test designed to measure how fast a model can operate in theory, rather than what it can accomplish on practical tasks, Appen found SubQ was 56 times faster than models using FlashAttention, a previous sparse-attention technique.

On LiveCodeBench, which tests performance on competitive coding problems from real contests, SubQ scored 89.7%. Appen’s Jeanine Sinanan-Singh said the model continues to show frontier-level coding performance.

Subquadratic also claims major cost advantages, though those claims are harder to verify because SubQ is not yet widely available. Justin Dangel, the company’s cofounder and CEO, said it costs $2600 to run Anthropic's LLM Opus 4.6 through RULER 128, a Nvidia test for retrieving information from large data sets. For SubQ, he said, “It cost us eight dollars.”

The model’s long-context claims are also central to the pitch. SubQ has a context window up to 12 million tokens long, while most top models today have context windows one million tokens long. In a demonstration, Whedon asked SubQ to reason across 400 documents, and it responded in seconds. The same task given to Perplexity failed to load all 400 documents.

Appen also ran a needle-in-a-haystack test, which measures whether a model can retrieve specific information from a large body of data. Its report says SubQ scored 98% with context windows six million and 12 million tokens long, “sustaining near-perfect long-context retrieval at scales few models are tested at.”

Why The Debate Is Not Over

The strongest case for SubQ is that independent testing now supports parts of Subquadratic’s story. The model appears especially interesting for coding and searching very large data sets, which are the areas the company is emphasizing.

But benchmarks are not the same as broad public use. A model can perform well under specific test conditions and still reveal limitations when many users apply it to varied real-world tasks.

Access is still narrow. Subquadratic says tens of thousands of potential users have signed up for early access, including more than 500 enterprise customers, but the company has given very few people access so far. Its explanation is practical: it is a new, small company with limited resources and cannot support too many users at once.

There is also a deeper question about how much SubQ really reinvents LLM architecture. Subquadratic reused weights from a version of the Chinese open-source model Qwen to bootstrap SubQ rather than training it from scratch. The article notes that this is common among model makers, but it complicates the company’s broader claim that it has fully changed how LLMs work.

Depue’s view captures the cautious middle ground: “They may have built something real and useful,” he said. “But the public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck.”

For now, SubQ looks less like a settled revolution and more like a serious technical claim that has moved from marketing promise into early evidence. The next test is broader access: more users, more tasks, and more chances to see whether sparse attention can deliver beyond the benchmark sheet.