AI coding has moved from a useful developer convenience into one of the most watched areas in generative AI. The first wave helped programmers write faster. The next wave is trying to take on a larger part of software work itself.
Startups including Zencoder, Merly, Cosine, Tessl and Poolside are competing with large technology companies to build systems that can understand codebases, generate working programs, test their own output and fix bugs. Some of the people building these tools also see software development as a possible route toward artificial general intelligence, or AGI.
Why coding became a core AI use case
For many developers, generative AI is already part of the daily workflow. GitHub’s Copilot, built on top of OpenAI’s large language models and launched by Microsoft-backed GitHub in 2022, is now used by millions of developers around the world. General-purpose chatbots such as Anthropic’s Claude, OpenAI’s ChatGPT and Google DeepMind’s Gemini are also used for everyday coding help.
Inside large companies, AI-generated code is no longer a side experiment. Alphabet CEO Sundar Pichai claimed on an earnings call in October that "more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers." That framing is important: the code is generated by AI, but humans still review and accept it.
The business case is also clear. Nathan Benaich, an analyst at Air Street Capital, says developers appear willing to pay for copilots, making code one of the easiest ways to monetize AI. That helps explain why new companies are entering the market so quickly, including Tessl, valued at $750 million within months of being set up, and Poolside, valued at $3 billion before it even released a product.
The harder problem is useful code
The central technical issue is correctness. Software engineers use the idea in more than one way. A program can be syntactically correct, meaning its words, numbers and mathematical operators are arranged in a way that allows it to run. That matters because a small mistake can stop a program from working at all.
Many first-generation AI coding assistants are already strong at this surface level. They have been trained on billions of pieces of code and can reproduce the structures common across many programs. But code that runs is not always code that solves the right problem.
The new wave is focused on functional correctness: whether the software does what the developer wanted. Alistair Pullen, a cofounder of Cosine, says large language models can write code that compiles, but may not produce the intended program. To improve that, companies are trying to capture the process a human coder would follow, not just the finished code that appears in online repositories.
Context is becoming the new battleground
One major weakness of earlier coding tools is that they often lacked enough context. A programming task may depend on many files, documentation and conventions inside a larger repository. If the assistant only sees open tabs or a narrow slice of the project, it can miss the information needed to make the right change.
Zencoder founder Andrew Filev says context is critical. Zencoder has hired search engine veterans to help build a tool that can analyze large codebases and decide what is relevant. The company calls this "repo grokking," and says it can reduce hallucinations and improve code quality.
Cosine is taking a related but more process-heavy approach. It has asked dozens of coders to record what they were doing while working through hundreds of programming tasks. The company asked them to explain choices such as why they opened a file, why they scrolled through it and why they closed it. Coders also annotated finished code to show where other files or documentation would have been needed.
Cosine then uses that information to create a synthetic data set that links developer steps and information sources to finished code. The aim is to train a model that can infer the path it should follow before producing a program.
Different startups are betting on different methods
Poolside is also building a synthetic data set around the coding process, but it leans more heavily on RLCE, or reinforcement learning from code execution. This is compared with RLHF, reinforcement learning from human feedback, which has been used to make chatbots better conversationalists.
With RLCE, a model is trained toward code that does what it is supposed to do when run. Poolside and Cosine both say they are inspired by DeepMind’s AlphaZero, which improved by playing games against itself and learning through trial and error. Applied to programming, the steps required to produce code become like moves in a game, and a correct program becomes the winning outcome.
There are also deeper disagreements. Cosine uses a custom version of GPT-4o provided by OpenAI, while Poolside is building its own large language model from scratch. Poolside CEO and cofounder Eiso Kant argues that training on code from the start may be better than adapting a broader model.
Merly rejects the large language model approach altogether. Justin Gottschlich, its CEO and founder, argues that programming requires precise logic and says, "I can’t train an illogical system to become logical." Merly’s system is not trained on human-written code. Instead, it works with an intermediate representation, closer to the machine-readable notation that programming languages are translated into before execution.
Developers may become reviewers and managers
If these tools work as their builders hope, the developer role could change. Rather than writing every line from scratch, engineers may spend more time specifying outcomes, reviewing generated code and correcting mistakes. Cosine says its assistant, Genie, can run coding tasks while engineers move between other work.
The same tools could also make it easier to prototype several approaches at once. For example, a developer building software with a payment system could ask an assistant to try Stripe, Mango and Checkout in parallel, instead of coding each option by hand one after another.
Bug fixing is another target. Genie can read descriptions from bug-reporting tools, produce fixes and leave humans to review them before code is updated. Yang Li, another Cosine cofounder, says the bottleneck may become how quickly people can review machine-generated code.
The labor implications are direct. Li thinks companies will use the technology to reduce the number of programmers they hire, with elite developers diagnosing failures when AI goes wrong and smaller teams of 10 to 20 people doing work that once required hundreds of coders. His view is blunt: "Anything you want to do will be determined by compute and not head count."
That is why the second wave of AI coding matters. It is not only about faster autocomplete. It is about whether software teams can trust AI systems to understand context, follow a development process, test their own work and generate code that is not merely valid, but useful.