Examining the Accuracy of ChatGPT

Discover the importance of accuracy in AI-generated responses and how language complexity and input clarity affect its performance.


ChatGPT, the conversational chatbot from OpenAI, has taken the world by storm. Its ability to understand inputs and generate responses that sound human (or perhaps almost human) is uncanny, and users have already found all sorts of uses for the tech.

But ChatGPT (and similar generative AI chatbots like Google Bard) have faced some strong criticism: as good as they sound, they have this unfortunate habit of just…making stuff up.

There are also questions about whether OpenAI’s various GPT models are actually getting better over time (as was the original claim), or if they are in fact getting worse (as some research from Stanford seems to show).

So: just how accurate is ChatGPT?

Why does it seem amazing some of the time, only to turn around and make boneheaded mistakes?

Could it write this newsletter? (Spoiler alert: Sort of, but not really.)

Let’s dive in and explore what’s going on behind the scenes.

What Is ChatGPT?

ChatGPT is one of the recent generative AI tools developed by OpenAI using a large language model (LLM) approach. It’s caught on quickly because you can ask it just about anything, and you’ll get a response in just seconds. Usually the response is well-written, human-sounding, and at least mostly accurate.

ChatGPT can do more than answer questions, too: it can understand and generate computer code, and newer versions of the tool can interpret visual media as well.

How Does ChatGPT Generate Answers?

This part gets a little confusing. ChatGPT is more or less breaking down prompts into component parts (larger than letter by letter, but smaller than word by word). Once it “understands” the user’s intent, it starts building an answer using those same “language chunks.” The system starts predicting what the next most likely language chunk is, related to the question asked and the chunks already chosen in the response.

This approach is why the answers sound relatively human: it’s literally predicting what it thinks a human would say or write next, pulling from millions of millions of words and texts within its training data.

We don’t have space to explain this in greater detail, but here’s a deep dive if you’re interested.

Why Does ChatGPT Get Things Wrong?

If the system trained itself on roughly half of the internet, then surely it knows the answers to just about any question you can throw at it, right? So why does it keep getting things weirdly wrong?

Essentially, while ChatGPT responds in ways that sound human, it still thinks like a computer.

It isn’t understanding your question in the way a human would, and so it can’t just go look up the answer in an encyclopedia or answer out of its own experience. Instead, it’s parsing through all that data it was trained on and then generating a series of “word chunks” that’s likely to sound like what a human would say in response to the prompt you gave.

This is why ChatGPT can answer questions where there is no textbook or encyclopedia answer. But it’s also why sometimes those answers are off by a little (…or a lot).

Why Is Inaccuracy a Big Deal?

Inaccurate answers are a big deal here because of how convincing ChatGPT answers can sound. It writes with an air of authority, and its answers are usually very fact-laden. They sound convincing, and it’s easy for people to trust what these AIs say.

Even worse, people turn to these sorts of tools when they don’t know the answer themselves. In most cases, users aren’t in a position to fact-check the answers ChatGPT spits out (because if they were, they wouldn’t need to use the tool in the first place).

Language Is Complex, and Inputs Matter

Another confounding factor here is that human languages are complex and nuanced. English can be particularly rough in this regard, too. (For example, look for a word in the next paragraph that can mean either “someone who cares for animals” or “evaluate carefully”!) Language is complex enough that we humans don’t always understand each other correctly. AI systems have the same issue.

Also, be aware that inputs matter. The clearer and more precise the input or prompt, the better ChatGPT tends to do with relevance. Tightening up your prompts won’t solve the accuracy concern: you should still vet and inspect every answer. But better inputs always lead to better outputs. So when you use these tools, strive for precise, unambiguous language in your prompts.

Reply

or to participate.