ChatGPT: How would you replace Google Search?

Could Google’s grip on search be under threat by an AI chat system such as ChatGPT (or even more immediately ChatGPT itself)? The idea first popped up in early December (here is an NYT article for example, and it also popped up in some form in Fred Wilson’s 2023 tech predictions). This week got even spicier with the launch of a several quick-to-market “search engines” that include ChatGPT: Perplexity.AI being the first one I saw.

There is also a Chrome Extension to put ChatGPT results alongside any search engine you use (I’m not linking to it since it’s an unknown developer, and you should do diligence before installing something which sends your search queries anywhere! You can find it in the chrome extensions store)

Lastly, there have been reports that Microsoft is considering adding ChatGPT results to Bing (reported here and here), or even Word. However, they have not been substantiated and this could just be an extrapolation of the hypotheticals people have been spinning.

So what gives? Is this really a threat to Google? To search in General? Would it work?

As I already said in my 2023 Generative AI predictions piece: I don’t think this is an immediate threat to search engine dominance, but it does pose future questions. So let’s dig in.

What ChatGPT thinks

Before we start to pick the topic apart, the obvious place to uncover any master plan is with ChatGPT itself – how would it take over? If you ask Chat GPT the question in the obvious form, you get a very general answer.

All these things are true, but we’re not building a search engine. We’re trying to replace one.

If you as a more specific question, you get this:

This is a solid answer and hints at some important points. The first being that it would be a requirement to connect to (and continuously train on) an extremely strong search index (which Google has and few others do). The response also misses the point a little in that it responds as though what we want is a list of results. It’s unclear that’s the case, often we may just want “the answer”.

Different types of queries

To really understand how ChatGPT might replace search it helps to look at how it actually works and (more importantly) what types of search use-cases it might replace. A search engine to date can best be thought of as a giant database of all the web links in the world, each with abundant data stored that has been pulled from the link (descriptions, headers text, as well as an overlay of powerful meta and ranking data that defines how relevant a given link is to a given query.

ChatGPT on the other hand can be thought of as a powerful transformer that has within it a large encoded set of knowledge items (often gleaned from web pages). On top of this, it has its own “learned” mapping that can craft answers it believes relevant to a query based on encoded knowledge.

In many ways these things are similar. However, the critical difference is that the search engine is returning the most relevant pieces of knowledge, whereas ChatGPT is returning a synthesis of those most relevant pieces of knowledge.

Another key difference is that a search engine typically ingests and ranks new web pages as soon as they appear, and systems such as ChatGPT require quite some time to encode new knowledge and update their models.

These differences are particularly relevant when thinking about the types of queries one might want to make of a search engine or an AI system. Here’s a rough breakdown. Queries could be:

  • Generative: A creative output which a human user can use as inspiration or a start point but (crucial) can validate themselves. In other words something like a creative story, or even a block of code to do something.
  • Factual: A query about a world fact. Anything from a query about physics such as “what is the boiling point of Gold?” (2700-2800 Degrees C if you’re interested) or which ferry companies operate between Spain and Morocco?
  • Opinion: A question about what is “best” product or solution for a given problem. An example here would be “What are the best adidas shoes for trail running?”
  • Give me a link: A query to find a link to a service (known or unknown) that you want to use. An example here would be “Spotify” or “United Airlines.” In this case, the intent is often to get a link to web site for the service so you can interact with that provider.
  • Current affairs: Questions about things happening in the world now, such as “Will Kevin McCarthy become the next US House of Representatives speaker?” (which as of today January 7th, 2023 is an unresolved question). [Update: it’s seems the answer turns out to be “yes“.]

One could come up with other classifications or types, but these are quite illustrative for now. Each of these queries is quite different in intent and, in fact, needs different types of capabilities to answer it well.

How do Google and Chat GPT compare on these types queries?

For factual / opinion / give me a link, and current affairs queries, Google is the gold standard we know. Other search engines such as Bing do also deliver much the same result but Google’s dominance from the late 90s onwards and its continual expansion since give it an 84% market share in terms of search queries (Statista). As we’ll see below though: it’s often shocking to see just how heavily monetized search queries on Google’s now are. The days o Google returning a clean list of options have long gone.

Notably, for the “generative” query, Google simply isn’t set up to provide a useful answer. Here’s the result for “Describe a walk in Central Park”:

This provides some useful information if you’re actually considering making that walk yourself, but it won’t help you produce a fresh fictional description of such an event. In this category ChatGPT clearly gives something meaningfully different:

Modifying that query could give you many interesting variations. Here is “Describe an Ice Cream Van driving around Central Park”:

But ChatGPT does know its history. For example “Describe Abraham Lincoln talking a walk in Central Park”:

These generative queries are clearly ChatGPTs forte and a large part of the “wow” effect of using this technology. They are queries which have not been possible before at this level of depth and sophistication.

How about the other queries then?

This is where things get more interesting since the other types of queries ARE what Google and other search engines are used for today.

There are really two questions here: 1) can ChatGPT produce meaningful responses? and 2) could ChatGPT do so consistently in a way that could be trusted?

In answer to question #1, ChatGPT does surprisingly well at the next three query types: factual, opinion and give me a link.

The factual query: “What is the melting point of Gold?” and “Which ferry companies operate between Spain and Morocco?”

The opinion query: “What are the best Adidas shoes for trail running?”

In each case, these results are on-point, concise, and useful. Google’s search result for “What is the boiling point of Gold?” also includes a factual summary at the top. It mentions a different number, but they are close.

In comparison, Google’s response to the running shoes query is enough to make anyone cry:

A large percentage of the, above the fold, screen is concerned not with “which is best” but where to buy and for how much. There are more articles below comparing products, but there are also heavily interlaced with links which purely focus on the purchase. Unsurprisingly Google monetizes these types of opinion queries heavily.

The last type of query (current affairs/news) is simply out of scope for ChatGPT:

I guess Google (and CNN) can breathe a sigh of relief here. This illustrates that, for now at least, the AI models that power something like ChatGPT are slower to update than a search index. This will likely be true for quite some time, but you can bet that with the right heuristics and filters, updates which “patch” a model will start to cut down this time lag.

Reading all this, it’s clear why there is interest and excitement about ChatGPT changing our experience of the search for answers.

Before jumping to this conclusion though, it’s important to think about that second question “could such results be produced consistently in a way that could be trusted?”. This is really where things start to get murkier. On the surface, both systems are providing answers to questions. However, underneath, they are doing this in different ways:

  • Goggle surfaces the links for a human to potentially check and determine which “answer” is most useful. Often this means one has a view on the credibility of a source for an answer.
  • ChatGPT returns an answer that it determines to be relevant. However, it is much harder to know if this answer is truly correct or just a “best guess”.

For certain classes of problems, this may not matter greatly. It certainly doesn’t matter a great deal for generative queries, since a human is likely on hand to validate or score whether an answer is good. It also likely doesn’t matter to much for very common facts (e.g. the boiling point of water) which ChatGPT likely will always get right. Low risk situations (e.g. estimate the distance to the moon – probably low risk unless you are Nasa) trivia queries are probably also fine. It gets more problematic however for queries with high risk such as: “What supplies should I take on a hike in death valley?”. For the record, the ChatGPT response for the death valley query is:

Pretty good, but you might want to check some official sources before setting out!

The accuracy of results also begins to be more critical in “opinion” and “link” queries. For the former, it’s important to see where a particular opinion set comes from. How neutral is it really? What bias does it have? A trusted running website may be a better bet and it’s biases will be clear. For Link queries, ChatGPT clearly isn’t optimized for this. A link returned may be right for most queries, but a selection of options might be more appropriate so I can quickly choose the most relevant.

This concern with accuracy isn’t just a hypothetical or something which can be “addressed” later. The way generative AI models work is that they encode knowledge and then navigate it as they process queries, integrating it in a synthesis. It will be extremely hard to ensure any such model is truly “accurate” for 100% of the forms of any given query. OpenAI’s Sam Altman said this well himself:

That work on robustness on trust will involve a lot of work on ensuring certain query types are safe. For many common queries, this may not be so much different to the pre-compiled answers Google provides. However, for the vast majority of long-tail queries, it is likely going to be better to think of ChatGPT as a very smart human who might be wrong about things and could certainly be biased. There will be many cases where getting a list of the “authoritative” written sources on a topic will be more valuable.

ChatGPT does well on many query types, and the form factor of its results can be incredibly appealing. So … will it replace Google?

Replacing Google Search

The central threat to Google’s business model that’s been talked about is that if a chat system can answer a question instantly, and without the need to click on any links, no links need to be shown and, thus, there is no room for ads to be shown.

There is clear appeal to this, and for some query types, ChatGPT responses are definitely appealing. In the short term though, it seems highly unlikely ChatGPT will outright replace Google searches.

  • Google is not only embedded in many devices; it is the default from most user search behavior.
  • ChatGPT results, while appealing, often lack the next step: “take me to the site to do what I need to do”
  • Accuracy is a concern for many query types.
  • A deep search index and knowledge base is still needed underneath to train models. Very few organizations have this.
  • Certain types of queries (e.g. search) remain weak or impossible.
  • Right now, ChatGPT is free. Anyone that would like to use it at “Google Scale” would have to determine how to monetize sufficiently to defray the server costs. This may (ironically) mean bringing back ads.

Search company Algolia also has a useful post on why certain query types are difficult. This long road. However, is not impossible, and ChatGPT technology certainly should give Google pause for thought. In particular:

  • It is another step in the shift to people wanting different formats for search results. In many cases, it really is better to just “get an answer.” Similar to how queries to Amazon’s Alexa skip ad content to give just one answer, user expectations are changing.
  • Other large search engine providers certainly could jump on the technology and use it to make market share inroads. Either from pure hype (“now with AI”) or by providing genuine utility.
  • The new use-cases, such as generative queries, could become important enough that they become a gateway for people using ChatGPT for some proportion of the queries they would normally send to Google.

I would expect Google to also begin “adopting” similar features for its search to get ahead of possible hype from other competitors. Perhaps this will be as simple as replacing it’s current canned responses to common queries, or a fully fledged ChatGPT clone.

None of this is likely to happen fast in terms of adoption by the general public. Search habits are deeply embedded and Google also has a powerful advantage in being the incumbent search engine on IOS today, and in controlling Android. However, there will no doubt be a lot more “ChatGPT for search” stories for a while.

The technological achievement OpenAI has made is truly remarkable. The fact that we’re even talking about disrupting search is indeed impressive.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.