If you've had the experience of using ChatGPT Search or Perplexity, you're aware that the ability to search the web and obtain inline citations significantly elevates these AI chatbots. When timely information is incorporated, the results become more accurate, and web search can help reduce what are known as hallucinations - those instances when a generative AI outputs incorrect information. That's precisely why the French startup Linkup is developing an API. This API enables developers to access web content from premium and trusted sources and then hand over the results to a large language model (LLM) to enhance its answers. Many AI developers refer to this workflow as Retrieval-Augmented Generation (RAG).
Uncertain Future of Scraping Bots
The future of scraping bots remains highly uncertain. In the absence of a pre-existing financial agreement between content publishers and the entities scraping web pages, these bots are extracting content from the open web without making any payments. This situation has drawn significant regulatory scrutiny around AI training. There are also high-profile legal cases currently in play, such as the ongoing lawsuit between OpenAI, the maker of ChatGPT, and the New York Times. Consequently, the situation surrounding web scraping could undergo significant changes in the near future. This is why OpenAI has entered into multi-year content licensing deals with major publishers like AP, Axel Springer, Condé Nast, El País, the Financial Times, Le Monde, and others.Content Publishers' Dilemma
Content publishers currently face a challenging decision regarding GenAI's insatiable appetite for data. They have the option to block web scrapers using the (non-legally binding) robots.txt metadata file, which indicates whether a website can be used for training an AI model or not. Additionally, they can take legal action against AI companies they believe have violated their copyright. Another alternative is to allow bots to freely index their content. Or, they may choose to license their content to AI developers to obtain some form of compensation for their intellectual property. However, there are thousands of AI companies (or tech companies using AI) that do not possess the scale and reach of OpenAI. At the same time, the beauty of the web lies in the long tail of content publishers. This means that a small content publisher often lacks the financial resources to file a lawsuit. It also makes it difficult for them to switch from a scraping model to a licensing model for millions of websites.Linkup as a Marketplace
Linkup isn't merely a technical solution; it's a marketplace that acts as an intermediary between content publishers and companies seeking to enhance their LLM answers with web content. Linkup signs content licensing deals with publishers and integrates with their CMS, enabling it to fetch content from publishers without the need for scraping. Linkup then compensates content partners based on the frequency with which their content is accessed by Linkup clients.As Mizrahi, the co-founder and CEO of Linkup, explains, "We're specifically targeting applications that are integrating AI into their own products. The typical use case is that I create an AI application using a model from Mistral or OpenAI. I build my own pipeline, but I require external information to enrich this pipeline."For instance, while ChatGPT can browse the web, GPT models do not have this capability. OpenAI offers a highly popular application (ChatGPT) as well as LLMs that developers can utilize through an API (GPT). However, web search is a unique feature of ChatGPT."There's an example that we find quite interesting. One of our customers built an internal application for their salespeople. On one hand, they had listed all the advantages of their own products. Thanks to our service, they obtain fresh and high-quality information about their prospects and incorporate it into a Mistral LLM. The Mistral LLM then generates a sales pitch for the sales reps, which they can refer to when making calls with customer leads."Initially, Linkup decided to focus on corporate and business information. In addition to news websites, the startup collaborates with knowledge databases such as Statista, Xerfi, and other similar resources.Linkup isn't the only startup working on bringing premium content to LLMs through licensing contracts. The most prominent competitor is ScalePost, a startup that partners with Perplexity to expedite its licensing deals with publishers.Linkup raised a €3 million seed round ($3.2 million at current exchange rates) a few months ago from Axeleo Capital, Motier Ventures, Seedcamp, and a hundred business angels. Currently, there are around 10 people working for the startup, and it plans to hire another 10 staff members over the next year.