Weekly analysis at the intersection of artificial intelligence and industry.

Is this email not displaying correctly?
View it in your browser.


follow
Subscribe
Send Tip
January 6, 2026

Hello and welcome to Eye on AI. In this edition…Nvidia snags the team and tech from AI chip startup Groq…Meta buys Manus AI…AI gets better at improving AI…but we might not know enough about the brain to reach AGI.

Happy New Year! A lot has happened in AI since we signed off for the year just before Christmas Eve. We’ll aim to catch you up in the Eye on AI News section below.

Meanwhile, as I’ve noted here before, 2025 was supposed to be the year of AI agents, but most companies struggled to implement them. As the year drew to a close, most companies were stuck in the pilot phase of experimenting with AI agents. I think that’s going to change this year, and one reason is that tech vendors are figuring out that simply offering AI models with agentic capabilities is not enough. They have to help their customers engineer the entire work flow around the AI agent—either directly, through forward deployed engineers who act as consultants and “customer success” sherpas; or through software solutions that make it super easy for customers to do this work on their own.

A key step in getting these workflows right is making sure AI agents have access to the right information. Since 2023, the standard way to do this has been with some kind of RAG, or retrieval augmented generation, process. Essentially, the idea is that the AI system has access to some kind of search engine that allows it to retrieve the most relevant documents or data from either internal corporate sources or the public internet and then the AI model bases its response or takes action based on that data, rather than relying on anything it learned during its training process. There are many different search tools that can be used for a RAG system—and many companies use a hybrid approach that combines vector databases, particularly for unstructured documents, as well as more traditional keyword search or even old-fashioned Boolean search.

But RAG is not a panacea and simple RAG AI processes can still suffer from relatively high error rates. One problem is that AI models often struggle to translate a user’s prompt into good search criteria. Another is that even if the search is conducted well, often the model fails to properly filter and sift the data from an initial search. This is sometimes because there are too many different data formats being retrieved, and sometimes because the human who is prompting the AI model has not written good instructions. In some cases, the AI models themselves are not reliable enough and they ignore some of the instructions.

But, most of the time, AI agents fail not because the agent “is not able to reason about data but the agent is not getting the right data in the first place,” Michael Bendersky, the research director at Databricks tells me. Bendersky was a long-time veteran of Google, where he worked on both Google Search and for Google DeepMind.


Databricks introduces a new retrieval ‘architecture’ that beats RAG

Today, Databricks (known for its data analytics software) is debuting a new architecture for retrieval-augmented AI agents called Instructed Retriever that it says solves most of RAG’s shortcomings.

The system translates a user’s prompt and any custom specifications that the model should always consider (such as the recency of a document or whether a product has good customer reviews) into a multi-step search plan for both structured and unstructured data—and, crucially, metadata—to get the right information to the AI model.

Much of this has to do with translating the natural language of the user’s prompt and the search specifications into specialized search query language. “The magic is in how you translate the natural language, and sometimes it is very difficult, and create a really good model to do the query translation,” Hanlin Tang, Databricks’ CTO for neural networks, says. (Tang was one of the cofounders of MosaicML, which Databricks acquired in 2023.)


On a suite of benchmark tests that Databricks designed that it says reflects real world enterprise use cases involving instruction-following, domain-specific search, report generation, list generation, and searching PDFs with complex layouts, the company’s Instructed Retriever architecture resulted in 70% better accuracy than a simple RAG method and, when used in a multi-step agentic process, delivered a 30% improvement over the same process built on RAG, while requiring 8% fewer steps on average to get to a result.


Improving results even with under-specified instructions

The company also created a new test to see how well the model can deal with queries that may not be well-specified. It is based partly on an existing benchmark dataset from Stanford University called StaRK (Semi-structured Retrieval Benchmark). In this case, Databricks looked at a subset of these queries related to Amazon product searches, called StaRK-Amazon, and then further augmented this dataset with additional examples. They wanted to look at search queries that have implied conditions. For instance, the query, “find a jacket from FooBrand that is best rated for cold weather,” has multiple implied constraints. It has to be a jacket. It has to be from FoodBrand. It has to be the FooBrand jacket that has the highest rating for cold weather. They also looked at queries where users want to exclude certain products or want the AI agent only to find products with recent reviews.


The idea of the Instructed Retriever architecture is that it turns these implied conditions into explicit search parameters. Bendersky says the breakthrough here is that Instructed Retriever knows how to turn a natural language query into one that will leverage meta data.


Databricks tested the Instructed Retriever architecture using OpenAI’s GPT-5 Nano and GPT-5.2, as well as Anthropic’s Claude-4.5 Sonnet AI models, and then also a fine-tuned small 4 billion parameter model they created specifically to handle these kind of queries, which they call InstructedRetriever-4B. They evaluated all of these against a traditional RAG architecture. Here they scored between 35% to 50% better in terms of the accuracy of the results. And the Instructed Retriever-4B scored about on par with the larger frontier models from OpenAI and Anthropic, while being cheaper to deploy.


As always with AI, having your data in the right place and formatted in the right way is the crucial first step to success. Bendersky says that Instructed Retriever should work well as long as  an enterprise’s dataset has a search index that includes metadata. (Databricks also offers products to help take completely unstructured datasets and produce this meta data.)


The company says that Instructed Retriever is available today to its beta test customers using its Knowledge Assistant product in its Agent Bricks AI agent building platform and should be in wide release soon.


This is just one example of the kinds of innovations we are almost certainly going to see more of this year from all the AI agent vendors. They might just make 2026 be the real year of AI agents.


With that, here’s more AI news.


Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn


.


.

FORTUNE ON AI


A year ago, Nvidia’s Jensen Huang said the ‘ChatGPT moment’ for robotics was around the corner. Now he says it’s ‘nearly here.’ But is it?—by Sharon Goldman

Google ex-CEO Eric Schmidt jumps into the AI data center business with a failed, 150-year-old Texas railroad turned oil giant—by Jordan Blum

‘He satisfies a lot of my needs’: Meet the women in love with ChatGPT—by Beatrice Nolan

At the edges of the AI data center boom, rural America is up against Silicon Valley billions—by Sharon Goldman

Why Singapore is the only Southeast Asian country in Pax Silica, the U.S.’s new AI ‘inner circle’—by Angelica Ang


AI IN THE NEWS


Meta acquires Manus AI. Meta Platforms is buying AI agent company Manus for more than $2 billion, marking one of the most prominent U.S. tech acquisitions of an AI product built in Asia. Manus, which gained attention for an AI agent that produces detailed research reports and builds websites using models from companies like Anthropic and Alibaba, will continue operating as a service and be integrated into Meta’s social-media products, the company said. Manus CEO Xiao Hong will report to Meta COO Javier Olivan. The deal may signal Meta’s push into the fast-growing AI-agent market as it competes with rivals such as Microsoft, Salesforce, and ServiceNow, all of which offer AI agent products. Although founded in China, Manus has since moved its headquarters and most of its employees to Singapore. Meta said Manus will cease operations in China and have no remaining Chinese ownership. You can read more here from The Wall Street Journal.

Nvidia reverse acquihires AI chip startup Groq. Nvidia struck a non-exclusive licensing deal with AI chip rival Groq and will hire Groq founder and CEO Jonathan Ross, president Sunny Madra, and other staff. It is yet another example of a “reverse acquihire” of an AI startup by a Big Tech player. (So called reverse acquihires result in the Big Tech company getting both people and technology, but without actually buying control of the startup.) CNBC reported Nvidia is buying assets from Groq for $20 billion—a figure Nvidia declined to confirm but if accurate would mark the chipmaker’s largest deal to date and further strengthen its dominance in AI hardware. Groq has developed a “language processing unit” that it claims can run large language models far faster and more efficiently than Nvidia’s GPUs. The move shows that Nvidia may be feeling its GPUs are vulnerable to rivals as the bulk of AI workloads move from training LLMs to running already trained models at scale (what is known as inference.) Groq has grown rapidly, recently raising $750 million at a $6.9 billion valuation and claiming more than 2 million developers use its technology. The startup had inked a $1.5 billion deal for its chips with Saudi Arabia’s government. You can read more from Tech Crunch here.

Accenture acquires UK AI company Faculty. Accenture has agreed to buy London-based AI start-up Faculty, a 10-year old venture backed company that helps enterprises adopt AI solutions. Faculty CEO Marc Warner will become Accenture’s chief technology officer and join its global management committee. Faculty reported £41.7 million ($56 million) in revenue last year. Financial terms of the deal were not disclosed, but the Financial Times reported that Accenture is spending more than $1 billion for Faculty, which would make it the largest-ever acquisition of a privately-held U.K. AI startup.

Elon Musk-backed xAI in hot water over Grok-generated nonconsensual sexual images. Grok, built by xAI and integrated into social media platform X, is facing mounting scrutiny after allegedly generating nonconsensual sexualized images of real people, including children. The use of Grok in this way and the hosting of these images on X may violate the law in several countries and U.S. states. Ashley St. Clair, a conservative commentator and mother of one of Musk’s children, told Fortune’s Bea Nolan that she is considering legal action after Grok continued producing increasingly explicit fake images of her despite her objections. The controversy has triggered regulatory responses in multiple countries, including urgent inquiries by U.K. communications regulator Ofcom under the country’s Online Safety Act, as well as investigations or warnings from authorities in France, India, and Malaysia.


EYE ON AI RESEARCH


Self-improving AI may be getting closer. A new benchmark from researchers at the University of Tübingen aims to test how well frontier AI models, such as OpenAI’s GPT-5.1 and Anthropic’s Claude Opus 4.5 are at improving smaller LLM models. They introduced a new benchmark called PostTrainBench designed to test how these models do when asked to autonomously fine-tune other open-weight AI models, given a fixed compute budget and time deadline, tools, and benchmarks to test their optimizations on. Results show today’s best models already achieve 20% to 30% performance gains, compared with roughly 60% for a human expert. OpenAI’s GPT-5.1 Codex Max performs best overall, followed by Anthropic’s Claude Opus 4.5 and Google’s Gemini 3 Pro. The findings suggest AI systems are rapidly approaching the ability to automate meaningful parts of AI research itself. You can see the benchmark and the results here on the PostTrainBench website. 


AI CALENDAR


Jan. 19-23: World Economic Forum, Davos, Switzerland.


Jan. 20-27: AAAI Conference on Artificial Intelligence, Singapore.


Feb. 10-11: AI Action Summit, New Delhi, India.


March 2-5: Mobile World Congress, Barcelona, Spain.


March 16-19: Nvidia GTC, San Jose, Calif.


BRAIN FOOD


Progress towards AGI may be stymied because we still don’t have the right learning algorithm. In a recent conversation with “Dwarkesh Podcast” host Dwarkesh Patel, neuroscientist and Convergent Research CEO Adam Marblestone argues that AI’s biggest bottleneck isn’t compute or scale, but our ignorance about how the brain actually learns. Humans learn far more efficiently than today’s neural networks, he says, not because of some magical architecture, but because evolution has baked in rich, highly specific reward functions and learning curricula that we barely understand. Modern AI, by contrast, relies on mathematically convenient objectives—like next-token prediction—that may miss what really allows animals and people to learn from very few examples and to learn continually throughout our lives.

Marblestone suggests that the brain’s cortex may function as a kind of omnidirectional prediction engine, capable of inferring any missing variable from any other, unlike today’s narrowly trained models. Crucially, we don’t yet know how the brain combines learning, memory, and motivation at low energy cost, or how it avoids catastrophic forgetting. Until neuroscience can answer those questions, attempts to build truly “brain-inspired” AI architectures may be mostly guesswork rather than principled design. The good news is that Marblestone thinks that AI may be starting to help neuroscientists design experiments and analyze data in ways that may allow us to begin to answer some of these questions. You can listen to Marblehead on the podcast on YouTube here.



.

FORTUNE AIQ: THE YEAR IN AI—AND WHAT'S AHEAD


Businesses took big steps forward on the AI journey in 2025, from hiring Chief AI Officers to experimenting with AI agents. The lessons learned—both good and bad–combined with the technology's latest innovations will make 2026 another decisive year. Explore all of Fortune AIQ, and read the latest playbook below: 


The 3 trends that dominated companies’ AI rollouts in 2025.


2025 was the year of agentic AI. How did we do?


AI coding tools exploded in 2025. The first security exploits show what could go wrong.


The big AI New Year’s resolution for businesses in 2026: ROI.


Businesses face a confusing patchwork of AI policy and rules. Is clarity on the horizon?


.
Email Us
Subscribe
share: Share on Twitter Share on Facebook Share on Linkedin
.
This message has been sent to you because you are currently subscribed to Eye on A.I..
Unsubscribe

Please read our Privacy Policy, or copy and paste this link into your browser:
https://fortune.com/privacy/

FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.

For Further Communication, Please Contact:
Fortune Customer Service
40 Fulton Street
New York, NY 10038


Advertising Info | Subscribe to Fortune