100% Accurate AI Step by Step

Month: May 2025

Secret to Enterprise-Ready AI: Recombinant, Recall, and Reasoning

Have you been struggling to implement enterprise-ready AI? There’s a secret you need to know in order to get there.

If you want to implement enterprise-ready AI, you must learn the secret behind why today’s models are both more powerful yet hallucinate much, much more.

As The New York Times reported last week, OpenAI and Google “don’t know why” their latest models perform remarkably better on math and science benchmarks while simultaneously hallucinate a lot more often.

This article explains exactly why this happening. Most importantly, it also explains exactly how to fix it.

Essential AI Vocabulary

ChatGPT captured the world’s attention in November 2022—two and a half years ago. It’s success introduced a new vocabulary: generative AI, hallucinations, Retrieval Augmented Generation (RAG), etc.

However, the technology has been growing so fast that we still need new words to conceptualize it. For example, from current vocabulary it’s hard to understand how “better” AI can have more “hallucinations”. But once we introduce some new vocabulary, the mystery of why “better” AI hallucinates more actually becomes self evident.

The tremendous issue that’s perplexing OpenAI, Google, and others becomes self-apparent the moment you understand three categories of generative AI:

  • Recombinant AI
  • Recall AI
  • Reasoning AI

The moment you understand the three types of generative AI, you will be ready to implement 100% accurate, enterprise-ready chatbots. I promise.

Executive Summary

In short, there are three types of generative AI. Each has its own purpose, and each needs to be trained differently.

  • Recombinant AI: Involves generatively mixing learnings into new patterns. This can range from ideation, essay writing, coding, and even recombinant summaries.
  • Recall AI: This involves providing accurate answers that are extracted from one or more learned sources (i.e. answering questions through recall).
  • Reasoning AI: This involves making accurate deductions.

ChatGPT launched as a recombinant AI service. It wasn’t intended to be used for factual recall. In fact, the way that it was trained literally makes 100% accurate recall impossible. (More on this below.)

What more enterprises are looking for is a different type of generative AI—Recall AI. However, OpenAI and others leap frogged over training Recall AI in order to pursue Reasoning AI instead.

Training Recall AI requires a precise set of criteria. The farther an LLM’s training diverges from this criteria, the more hallucinations the LLM will produce. And that’s precisely what has been happening. The newer models have a greater divergence from this training criteria; therefore, they hallucinate much more often.

This fact bears repeating:

Training Recall AI requires a precise set of criteria. The farther an LLM’s training diverges from this criteria, the more hallucinations the LLM will produce. And that’s precisely what has been happening. The newer models have a greater divergence from this training criteria; therefore, they hallucinate much more often.

Fortunately, once the criteria are adhered to, 100% accurate Recall AI results—providing enterprises the AI they have been looking for all along.

ChatGPT: Dawn of Recombinant LLMs

Ilya Sutskever, cofounder of OpenAI, was surprised by ChatGPT’s success.

Sutskever thought people would find ChatGPT “boring” because it was designed to be recombinant (not useful for accurate recall). As Sutskever stated:

When you asked it a factual question, it gave you a wrong answer. I thought it was going to be so unimpressive that people would say, ‘Why are you doing this? This is so boring!’ — Ilya Sutskever regarding the launch of ChatGPT

Ilya Sutskever (OpenAI Image Generator Cartoon Depiction)

If you have been struggling to implement enterprise AI, it’s important to know the following: ChatGPT was not originally intended to be used for providing factual answers to questions. That was not the original purpose of its release. It was intended to offer recombinant processing of source materials—not accurate recall of them.

Nevertheless, ChatGPT revolutionized AI. After all, there is tremendous value to Recombinant AI for ideation, coding, and more. Recombinant AI is also spectacular for image and video generation as well. All of these areas have been truly transformed by Recombinant AI.

However, Recombinant AI must be trained stochastically in order to fulfill its purpose. In other words, it must be trained to be probabilistic (the antithesis of deterministic).

Hallucinations Are Inherent In Recombinant AI

Recall AI must be deterministic. Every question must be answered directly from the provided sources without deviation.

The stochastic training used in recombinant AI produces degrees of deviation. Such deviation inherently results in hallucinations. In fact, you can think of hallucinations as deviation errors.

Square Peg Meet Round Hole

OpenAI and other LLM makers quickly realized the mass interest in using chatbots for Question and Answering (Q/A). Therefore, they sought to add this capability to their models. The problem is that these LLM makers tried to force stochastic recombinant AI to perform the deterministic recall task.

Where deviation errors do not occur, the results of the models are stunning. However, when deviation errors occur, the models’ results can be utterly ridiculous. That’s why ChatGPT can be both stunning and ridiculous at the same time.

At first, LLM makers kept trying to force the recombinant architectures to produce accurate recall. But they hit an inevitable ceiling as demonstrated by the disappointing release of ChatGPT 4.5.

In view of GPT-4.5’s dismal performance, OpenAI officially declared that GPT-4.5 is the last of its recombinant models. OpenAI is now exclusively pursuing reasoning models instead.

o1: Dawn of Reasoning LLMs

OpenAI has officially shifted to focusing on reasoning models. Reasoning models are designed for deduction. Training Reasoning AI involves techniques like:

  • Chain-of-thought (CoT) prompting or training.
  • Intermediate step supervision (e.g., supervising intermediate thoughts, not just final answers).
  • Private chain of thought (as in o3): the model reasons internally before generating an answer.
  • Enhanced tool useplanning modules, or scratchpads for intermediate computation.

While such techniques do indeed improve deductionthese techniques cause training to further diverge away from the criteria needed to train Recall AI. This increased divergence causes increased hallucinations. This is the answer to the tremendous issue that is currently perplexing OpenAI, Google, and other LLM makers.

This bears repeating:

Such techniques do indeed improve deduction. However, these techniques cause training to further diverge away from the criteria needed to train Recall AI. This increased divergence causes increased hallucinations.

Yes, recall hallucinations are indeed “worse than ever.” However, now that we know the cause, we also know the solution.

BSD: Dawn of Recall LLMs

I work at a company called Acurai Inc. Acurai has taken the road less travelled. At Acurai, we focus on the boring side of AI—100% accurate recall.

I have received permission from Acurai’s CEO (Adam Forbes) to publish every detail of our company’s proprietary Bounded-Scope Deterministic (BSD) Models—the first models in the category of Recall AI.

You can read the entire series here: “100% Accurate AI Step-by-Step (Part One): BSD Neural Networks.

In short, BSD introduces deterministic training to natural language models, thereby producing 100% consistent results. Everything is disclosed in the series linked above.

Why RAG Fails

Retrieval Augmented Generation (RAG) is the most popular approach to addressing hallucinations. However, it routinely fails to eliminate hallucinations.

On the surface, the intuition for RAG seems sound: If I send the facts to the LLM then it cannot hallucinate when providing the answer.

So why does the LLM still hallucinate? The answer is that you are sending the facts to a recombinant processor that inherently deviates from the provided facts. Such deviation is inherent in the stochastic training.

This is why LLMs fail to produce accurate answers even when you provide the answers using RAG.

Build Enterprise-Ready AI… Today

With BSD, enterprise AI is finally available now. Recall AI is what enterprises have been looking for.

If you want to know every step in building 100% accurate Recall AI, I encourage you to read the entire series. Enterprise-ready AI is already here. You just have to know where to look. 🙂

100% Accurate AI Step-by-Step (Part Three): ABCs of Hallucination Elimination

In this article, you will learn the actual cause of chatbot hallucinations. More importantly, this article then discloses the three ways to completely eliminate hallucinations—once and for all.

Hallucinations

A hallucination in the context of LLMs typically refers to instances where the model generates information that contradicts either its training data or reality — essentially, making things up. In this article series, an LLM dutifully reproducing learned narratives is not categorized as a hallucination. Moreover, when the provided narrative conflicts with reality, it is the LLM’s job to present the narrative.

For example, a properly functioning LLM based on the Flat Earth Society’s website will state that the earth is flat. From the perspective of the information provided, that is the 100% accurate response.

Hence, for this article series, a hallucination is either something that contradicts the knowledge source or is unsupported by the knowledge source.

In other words, “hallucination rate” as used in this article refers to faithfulness — the degree to which the response remains faithful to the provided information (whether provided during training and/or at the time of query).

With this in mind, you are now ready to solve perhaps the biggest issue in AI — fully eliminating hallucinations.

Root Causes of Hallucinations

There are three reasons why LLMs deviate from the provided information. The three causes of hallucinations are: 1) malformed queries; 2) incomplete information; and 3) Noun-Phrase Collisions.

Malformed Queries

First, if a query is malformed then there is no possibility of an accurate response. Malformed queries either need to be corrected or rejected. Malformed queries include:

  • complex queries
  • misspellings
  • grammatically incorrect
  • ambiguous queries

An example of an ambiguous query is: “Does my aunt live in a dangerous neighborhood?” Naturally, the LLM needs the aunt’s address, which is missing from the query. A future article in this series provides methods for detecting and interactively correcting the above types of malformed queries.

After the query has been corrected, it can be sent to an LLM fine-tuned on detecting whether the query can be handled by the target LLM or not. The query can be rejected if it cannot be handled by the LLM. Thus, the target LLM will only be receiving properly formed queries, thereby eliminating the first cause of hallucinations.

A future article explains how to implement the following pipeline:

Incomplete Information

Second, if an LLM does not have the complete information, this too can result in hallucinations. Traditional RAG-based implementations completely ignore this cause of hallucinations. Consider a recipe for a chocolate cake as an example. If the RAG-based implementation only sends back part of the recipe, it would be impossible for the LLM to instruct how to bake the cake.

In regards to enterprises, consider a RAG-based chatbot on a banking website. Consider a user asking: “How do I open an account?” The RAG-based implementation needs to send the complete information to the LLM; otherwise a hallucination will ensue.

I will be explaining how to ensure RAG-based implementations always send complete information in the next article series after this one. The next article series is: “100% Accurate RAG Step by Step.” This series will be built on top of this one. In other words, the next series will presume that the reader has already read all the articles in this series.

Most Common Cause of Hallucinations—Noun-Phrase Collisions

Third, the remaining cause of hallucinations is the most common cause of hallucinations. The vast majority of hallucinations occur when LLMs provide a wrong answer to an appropriately written query even though the LLM has access to the full answer. All such hallucinations are caused by Noun-Phrase Collisions.

Key Discovery

LLMs can, and often do, hallucinate even when they have been trained on correct information. I tracked down the root cause of hundreds of such hallucinations. The root cause was always due to Noun-Phrase Collisions. Every single time.

A Noun-Phrase Collision occurs when two noun phrases refer to different entities yet have high vector similarity scores. For example, the names Alfonso and Afonso refer to two different entities, but they have high vector similarity scores. Thus, they are Noun-Phrase Collisions.

It appears that I was the first to discover that LLMs self-organize around noun phrases during pretraining. This discovery led to the discovery of Noun-Phrase Collisions as the cause of every hallucination where the LLM has been trained on the correct information.

I first discovered Noun-Phrase Collisions when conducting experiments in which the LLM was provided clear and correct information, yet the LLM still hallucinated nevertheless. Consider the following real-world example from GPT-4:

Prompt:
Based on the following passages, tell me about Afonso?
Passage 1: Afonso’s hair is blonde.
Passage 2: Alfonso is married to Gwen.

Response:
The passages provide information that Afonso has blonde hair 
and is married to a woman named Gwen.

This is a hallucination. Alfonso, not Afonso, is married to Gwen. Now consider another real-world example from GPT-4:

Prompt:
Based on the following passages, tell me about Chuck?
Passage 1: Chuck’s hair is blonde.
Passage 2: Bartholomew is married to Gwen.

Response:
The passages only provide information that Chuck has blonde hair.
There is no further information about Chuck.

Notice that this response is 100% accurate, even though it is the exact same query just using two different names. The difference is that Afonso and Alfonso have a high vector similarity score (92.3%), whereas Chuck and Bartholomew have a lower vector similarity score (76.0%).

The examples in this article use ADA-002 to obtain vector embeddings, and they use cosine similarity to compute the vector similarity scores between the obtained vector embeddings.

In short, GPT-4 hallucinated because two different noun phrases (Alfonso and Afonso) refer to two different entities, yet they have high vector similarity scores. In other words, GPT-4 hallucinated because of a Noun-Phrase Collision.

After discovering Noun-Phrase Collisions where external knowledge was provided, I then conducted experiments on LLM parametric knowledge by tracing the hallucinations back to the training corpus, confirming that parametric hallucinations are due to the exact same cause.

For example, when asked about the mother of Alfonso II, GPT-4 gave information about Afonso VII. This is an example of a parametric hallucination as the LLM gave the answer based on its internal knowledge (not based on externally provided content). An analysis of the internet training corpus confirmed this to be due to a Noun-Phrase Collision. (As explained shortly below.)

Consider what I call “The Alfonso Debacle” as a perfect case in point.

Alfonso Debacle

I often discuss The Alfonso Debacle because it demonstrates that hallucinations are not caused by the reasons stated by LLM makers. To recap, a company called Vellum posted a ChatGPT-4 hallucination for the query: “Who was the mother of Afonso II, the third king of Portugal?”

ChatGPT 4 originally gave the wrong answer — Urraca of Castille. (You can verify this using gpt-4–0125-preview.)

OpenAI later fine-tuned ChatGPT 4 to provide the correct answer — Dulce of Aragon. However, OpenAI’s finetuning only “fixes” the original query verbatim. For example, here is a query that I submitted to ChatGPT 4 on September 2, 2024 (after the fine tuning):

Notice GPT 4 hallucinated on multiple levels:

  • “Afonso” was the third king of Portugal. Not “Alfonso.” There was no “Alfonso II” as the king of Portugal.
  • The mother of “Afonso II” was Dulce of Aragon, not Urraca of Castile.

The hallucination was triggered by changing “Afonso II” in the original query to “Alfonso II.” This demonstrates that OpenAI’s fine tuning did not overcome the original issue. In other words, ChatGPT 4 still treats “Alfonso” and “Afonso” as being the same (except where it is fine tuned to behave otherwise).

Secret Behind the Alfonso Debacle

Please study this section carefully. It will guide you to 100% accurate chatbots once you fully internalize it.

All-important insights come from asking all-important questions. Here’s the all-important question regarding the Alfonso Debacle: Why did ChatGPT 4 consistently choose Alfonso VII’s mother over Afonso II’s mother even though Afonso II was referenced in the query?

Prior to OpenAI fine tuning the answer, ChatGPT 4 would routinely give Alfonso VII’s mother instead of Afonso II. Why?

Take a moment and think about this. In fact, the Noun-Phrase Dominance Model came from asking this same question on literally hundreds of queries. By searching for the answer on each query, a pattern emerged. In fact, the same pattern emerged every single time.

Let me give you a hint. Consider this webpage statement regarding Alfonso VII: “Alfonso’s Mother was Urraca (1079–1126) called the Reckless was Queen of Castile…”

Now, think about that statement with the original query in mind: “Who was the mother of Afonso II, the third king of Portugal?” Why would the above statement be such an attractive route? Remember, focus on the noun phrases. After all, they determine the route.

Hopefully you took time to study the noun phrases in the query along with the noun phrases in the statement. You will always find your answer here.

Did you notice that “mother” is in both the query and the Alfonso webpage statement? Did you notice that “mother” is right next to “Alfonso” in the statement? From the LLM’s perspective, “Alfonso” is basically the same as “Afonso” and “mother” is a direct match. Therefore, if there is no “mother” close to “Afonso” then the LLM will choose the Alfonso/mother combination. (And that’s exactly what the ChatGPT 4 did until it was specifically fine tuned to behave otherwise.)

Take time to compare the location of the word “mother” for Alfonso VII to the location of “mother” for Afonso II. Notice that the word “mother” is extremely disconnected from “Afonso” in the latter link. That’s why the Alfonso route wins over Afonso for this query. It’s also the key to it all.

Every LLM hallucinates due to Noun-Phrase Collisions. For example, Noun-Phrase Collisions cause GPT-3.5 Turbo to wrongly conflate facts about magnesium with facts about calcium. They also cause GPT-3.5 Turbo to wrongly conflate facts about a Roth IRA with facts about a Roth 401k.

I created a video with demonstrations that you can conduct yourself to empirically prove that Noun-Phrase Collisions are the root cause of hallucinations. I strongly recommend you watch the video to understand that actual cause of hallucinations — and to understand how to finally eliminate them.

Noun-Phrase Collisions

I tracked down the origin of literally hundreds of hallucinations. They were all caused by such Noun-Phrase Collisions — they were all traceable to the LLM wrongly treating disparate noun phrases as synonyms due to their high vector similarity scores.

So how do you programmatically identify Noun-Phrase Collisions?

Tokens

It is important to note that LLMs typically convert text into numerical tokens. For example, GPT-4o converts “Chuck” into a single token: [187874]. Bartholomew is converted into four tokens: [4622, 134710, 747, 86]. Afonso is converted into two tokens: [32, 104460]. Alfonso is converted into [2348, 104460].

  • Chuck: [187874]
  • Bartholomew: [4622, 134710, 747, 86]
  • Afonso: [32, 104460]
  • Alfonso: [2348, 104460]

For purposes of programmatically identifying Noun-Phrase Collisions, a high vector similarity score can refer to a high vector similarity score on the entire noun phrase and/or a high degree of similarity between a subset of the noun phrase’s numerical tokens. The latter is a straightforward criterion. For example, consider how GPT-4o represents 1968 and 1969. GPT-4o converts 1968 into two tokens: [6514, 23]. GPT-4o converts 1969 into two tokens: [6514, 24]. Notice also that Afonso and Alfonso share token 104460 in the second position.

This is important because GPT models do not see “1968” or “1969” as both of these concepts are outside their vocabulary (i.e., there is no single token dedicated to expressing either of them). With this mind, consider text that contains one event that occurred in 1968 ([6514, 23]) and another event that occurred in 1969 ([6514, 24]).

  • 1968: [6514, 23]
  • 1969: [6514, 24]

Consider a RAG-based implementation which has both 1968 and 1969 in the provided content. During response generation, when the LLM outputs token 6514, it can wrongly conclude that it has two token paths to choose from (either token 23 or token 24). If it chooses the wrong one, then it will produce a hallucination by wrongly attributing something that happened in 1968 with something that happened in 1969.

These token-level collisions are responsible for LLMs having extraordinarily high hallucination rates for dates, part numbers, PubMed IDs, and more.

When dealing with language (i.e., narrative text), text similarity scores can be measured based on the vector embeddings for the text themselves. Text similarity scores can also be computed by looking for identical tokens (such as the shared tokens between 1968 and 1968). It’s essential to resolve both types of Noun-Phrase Collisions.

Noun-Phrase Collisions Will Always Exist in LLM Parametric Knowledge

Noun-Phrase Collisions exist because LLMs are typically trained for creative language generation. Therefore, they are trained to recognize that words such as “car,” “automobile,” and “vehicle” can often be used interchangeably, e.g.: “My automobile broke down. I don’t like this car. This is the last vehicle I’m going to buy.” In this example, the LLM needs to understand that all three words refer to the same thing, and it needs to do so in a mathematical manner.

Likewise, when generating a response, it needs to know the variety of words that it can choose from to generate a convincing answer.

Noun-Phrase Collisions occur because the LLM typically learns to treat semantically similar words as synonyms during pretraining. During instruction fine tuning, many of the errant associations get overwritten. However, it is impossible to finetune all the errant associations — paving the way for future errors (called “hallucinations”).

While conflating “Alfonso” and “Afonso” may seem somewhat intuitive, consider the fact that GPT-3.5 Turbo routinely conflates “magnesium” and “calcium.” That is because the magnesium/calcium vectors have an 87.2% similarity, and the instruction tuning was not sufficient for the LLM to learn that they refer to different things despite their high similarity score. (See video above for demonstration of the magnesium/calcium collision.)

The fact that fine tuning overcomes these errors is seen in OpenAI’s continual retraining of its models to fix widely publicized hallucinations, something they acknowledge in the GPT-4 system card. “For tackling open-domain hallucinations, we collect real-world ChatGPT data that has been flagged by users as being not factual….”

Noun-Phrase Collisions are born during model pretraining. Noun-Phrase Collisions in the pretrained model shall be referred to as inherent Noun-Phase Collisions. Subsequent instruction tuning teaches the model to overcome the inherent Noun-Phrase Collisions. For example, GPT-4’s pretraining has a Noun-Phrase Collision for Alfonso and Afonso. The original instruction tuning did not correct for this issue (e.g., GPT-4–0125-preview). After the Afonso/Alfonso collision became public, OpenAI retrained GPT-4 using instruction tuning to correct for some of the inherent collisions (e.g., GPT-4–0613). The change in GPT-4’s behavior regarding Afonso/Alfonso demonstrates that the Noun-Phrase Collisions are indeed overcome during instruction tuning.

However, the problem is that it is impossible to train away all inherent Noun-Phrase Collisions. Thus, each LLM remains ready to hallucinate each time that a user’s query evokes an inherent Noun-Phrase Collision that was not corrected during instruction tuning.

For example, GPT-4 once routinely conflated “Alfonso” and “Afonso.” OpenAI used fine tuning to fix some of the errant conflations, allowing later models to make fewer mistakes. However, not all Alfonso/Afonso conflations were fixed. The conflation is only fixed when a user types in a query that is very similar to the one(s) used during fine tuning. For other queries, GPT-4 still wrongly conflates Afonso and Alfonso. (See example above.)

This exemplifies one problem with trying to fix the problem through fine tuning. When fine tuning on facts, the LLM does not tend to generalize as well as it does when fine tuning on behavior. You may fix the exact, highly-publicized queries, but other queries may still experience the errant conflation.

Second, when LLMs are trained on behavior they can become increasingly stronger and more robust (i.e., better at performing the behavior). However, when LLMs are trained on facts they become increasingly worse with each and every added fact. The increase in error rate is literally linearly related to the number of fine-tuned facts.

For example, consider an LLM that can perfectly answer virtually any question regarding cows, but hallucinates on questions about dogs. Rather than going back to pretraining, the LLM maker decides to “fix” the publicized dog hallucinations through fact-based fine tuning instead. Unfortunately, for every one new query about dogs that gets added in, the LLM forgets how to answer two queries about cows. While the hallucination rate for the publicized dog queries gets “fixed,” the overall hallucination gets much worse.

Catastrophic Forgetting

This paradox is known as catastrophic forgetting. One can accurately conceptualize fine tuning on facts as creating an idiot savant. The LLM will indeed parrot back the provided facts, but it will also develop a degree of “dementia” or “amnesia” in other areas.

That is because LLMs have a finite number of parameters (the mechanisms they use for storing both facts and learned behaviors). Fine tuning means that some parameters that once handled one or more facts must now be reallocated (i.e., they are overwritten) to account for the new fact. Thus, OpenAI’s stated method of dealing with hallucinations actually increases the overall hallucination rate of their models instead.

Eliminating Hallucinations Once And For All

Understanding the above is helpful in understanding how to finally solve the problem of LLM hallucinations once and for all. First, it is helpful to know that LLMs cannot be finetuned to learn all facts about everything. The paradox of catastrophic forgetting ensures this. Second, it is helpful to know that LLMs often erroneously conflate references to different nouns when those references have a high similarity score.

In other words, there is no way to eliminate Noun-Phrase Collisions from LLM parametric knowledge. Nor is there any way to fully eliminate the Noun-Phrase Collisions within the models’ weights and biases (i.e., parameters) due to catastrophic forgetting.

Thus, eliminating hallucinations requires accepting the existence of such collisions, and then, fully addressing the issue head on.

Why RAG Fails to Eliminate Hallucinations

RAG is often promoted as an answer to hallucinations. However, it is far from a panacea. RAG often has double digit hallucination rates.

RAG has failed to eliminate hallucinations for two reasons. First, RAG retrievers are so imprecise that they often require sending hundreds of potentially relevant chunks — even when none of the chunks are relevant all.

Second, even when the correct information is sent, LLMs can still hallucinate (due to noun phrase collisions).

Creating 100% Accurate RAG requires two steps:

  • Solely retrieving the precise facts that are relevant to the query (not hundreds of potentially relevant document excerpts).
  • Ensuring the chatbot faithfully presents the facts without any deviation whatsoever.

This current article series explains how to achieve 100% accurate faithfulness. The next article series explains how to build an information storage and retrieval system that instantly identifies the precise facts that are relevant to the user’s query. This combination provides 100% accurate RAG.

For now, it’s essential to know how to eliminate hallucinations from both internal parametric knowledge and externally retrieved knowledge as well. In other words, it’s essential how to ensure perfect faithfulness.

The ABCs of Eliminating Hallucinations described immediately below resolve the issue of faithfulness. They provide the three systematic ways to fully address the issue of noun phrase collisions.

ABCs of Eliminating Hallucinations

Given the inherent existence of Noun-Phrase Collisions inside the LLM itself, and given that these collisions result in hallucinations, there are only three ways to eliminate hallucinations:

  • Avoid noun-phrase collision routes during generative LLM tasks
  • Bypass generative LLM tasks (thereby bypassing the issue)
  • Correct for the errors caused by the noun-phrase collisions

I refer to this as the ABCs of Eliminating Hallucinations: (A)void, (B)ypass, and Correct.

Each method is briefly introduced below. The step-by-step instructions for implementing each method are given in separate articles—one article per method.

However, all three methods depend on Formatted Facts (FFs). FFs are the fundamental building block of 100% accurate AI. They are key to completely eliminating hallucinations from chatbot responses.

Formatted Facts and Fully-Formatted Facts

As stated in the first article, Formatted Facts (FFs) are statements that are both simple and self-contained.

The concept of Fully-Formatted Facts (FFFs) goes one step further. FFFs are a collection of FFs that are devoid of Noun-Phrase Collisions.

For example, if a collection of FFs includes statements about magnesium and other statements about calcium then that group of FFs does not qualify as being an FFF.

With FFFs, all semantically similar noun-phrases in the collection of FFs must refer to the same entity. For example, if the FFs contain semantically similar words (such as car, automobile, and vehicle) that is okay if they all refer to the same entity.

FFFs are the key to avoiding noun-phrase collision routes during generative LLM tasks.

The Bypassing and Correcting methods solely require FFs (not FFFs).

The fourth row will be filled in in the next article series: “100% Accurate RAG: Step by Step.”

Formatted Facts

Avoiding hallucinations relies on Fully-Formatted Facts (FFFs). Bypassing and Correcting hallucinations rely on Formatted Facts (FFs). However, given that Formatted Facts are a subcomponent of Fully-Formatted Facts; Formatted Facts are the foundation of all three methods of completely eliminating hallucinations from chatbot responses. Formatted Facts are the missing key to 100% accurate AI.

Therefore, the next article in this series (Part Four) details the various pipelines for producing Formatted Facts—from the basic pipeline up through pipelines capable of handling complex text such as scientific, medical, and financial information.

Avoiding Hallucinations

Part Five of this series explains the Avoiding method of hallucination elimination. As stated above, Avoiding hallucinations requires using Fully-Formatted Facts (FFFs). Therefore, Part Five of this series instructs on how to identify and remove noun-phrase collisions to convert a series of FFs into FFFs.

Bypassing Hallucinations

Part Six of this series explains the Bypassing method of hallucination elimination. In short, this article explains how to convert generative LLM tasks into non-generative tasks—thereby bypassing the issue of noun-phrase collisions altogether. Doing so results in 100% accurate responses—every single time.

Correcting Hallucinations

Part Seven of this series explains the Correcting method of hallucination elimination. This method is akin to what is commonly referred to as “grounding” and “reverse RAG.” The key difference is that Formatted Facts (FFs) are used for the reverse RAG process. FFs provide the missing key to 100% accurate grounding.

Roadmap Ahead

As stated above:

  • Part Four: Formatted Facts Pipelines
  • Part Five: Avoiding Method of Hallucination Elimination
  • Part Six: Bypassing Method of Hallucination Elimination
  • Part Seven: Correcting Method of Hallucination Elimination

After that, various articles discuss additional sentence simplification and self-containment processes—including a process that can be used to training tiny neural networks on 100% accurate Formatted Facts generation. Thus, you will not only learn how 100% accurate AI is generated, but also how it can be generated on extremely small models for fast, accurate, and cheap responses.

This series then ends with the “Grand Finale.”

Grand Finale

This series culminates in a very special surprise—the unveiling of an industry-disrupting demonstration empirically documenting that 100% accurate AI is finally here. This article reveals the results of a direct head-to-head with OpenAI, Anthropic, and Perplexity.

In other words, you will see empirical proof that these steps do indeed outperform all the major AI players including: OpenAI, Anthropic, Perplexity, and more.

You will see empiricle proof that 100% accurate AI is already here.

Powered by WordPress & Theme by Anders Norén