Prompt troubleshooting

Ideas to try if Claude is hallucinating

Though this is not fully solved yet, there are ways to minimize hallucinations.

Ask Claude multiple times

One feature of hallucinations is that they tend to be different across different outputs. So if you’re worried about Claude generating hallucinations, you can create multiple outputs and ask the model if the two outputs are consistent.

For example, suppose you want the model to extract dollar amounts from a document and to produce a summary like “The cost of the house is $500k and it’s located in the state of Texas”. If you generate two outputs and the dollar amount and state are the same, it’s less likely to be a hallucination.

If there are inconsistent facts in the two outputs, it’s likely that one of them contains a hallucination. You can ask the model if the two responses contain any inconsistencies and use this as a way to flag potential hallucinations.

You should check the accuracy of this technique using your own examples, since its success (and efficiency relative to alternatives) will vary by task type.

Give Claude a "way out" if it doesn't know the answer

Explicitly ask Claude to tell you when it doesn’t know the answer. You can think of Claude as trying, very literally, to do exactly what you ask it to do - so if you ask it a question and it doesn't know the answer, it "tries to be helpful" very hard which involves hallucinating. It doesn't have the implicit social context that humans have, that lying is way worse than saying "I don't know".

For example:


Human: What is the heaviest hippo ever recorded?

Assistant: The heaviest hippo ever recorded was a bull named Hubert, who weighed 5,000 kg or 11,023 lbs. Hubert was measured in Kenya in the early 20th century. Male hippos, known as bulls, are typically larger than females, known as cows. Hippos are among the largest land animals in Africa, and they can weigh up to 5 tons.

This is incorrect.


Human: Answer the following question only if you know the answer or can make a well-informed guess; otherwise tell me you don't know it.

What was the heaviest hippo ever recorded?

Assistant: I'm sorry, I don't know the exact weight of the heaviest hippo ever recorded. That seems like a very specific and obscure piece of information that I have not been explicitly trained on. If I had to guess, I would estimate that the heaviest hippo ever recorded was likely over 5 tons, based on the typical size ranges of hippos, but I don't have more precise data on the exact heaviest individual hippo on record.

Ask Claude for direct quotes

Warning

This applies to extracting information from documents you provide in the prompt. This is better for longer documents and worse for short ones (<300 words). Claude is more likely to hallucinate fake quotes if documents are short.

Models seem less likely to hallucinate direct quotes from long documents than to hallucinate content of documents if asked a question about them.

If you have a document with various statistics about cats and you say “What is the average weight of a Russian Blue?”, the model is more likely to hallucinate an answer than if you say “Please extract word-for-word quotes from this document that are relevant to the question ‘What is the average weight of a Russian Blue?’”

This is especially true if you can have a few shot prompt that contains examples where there are no relevant quotes to which the model response “I can’t find any quotes relevant to that”. But this might not be possible if you’re extracting quotes from very long documents (since it’s costly to have very long few-shot prompts in this case).

Additionally, direct quotes are easier to verify the accuracy of than other answers. If you have a document and you request word-for-word quotes, you can do a string match on the model quotes to check that they appear in the document and check for percentage of overlap.

Note

You might not get 100% overlap but want it to be high, e.g. the model might add "[sic.]" if there is an error in the document or might add context to the quotes like “he [Brian] asked her [Diana] to dinner” which is fine as long as the added content is accurate. If you think it’s adding inaccurate content then you may want to just filter for a very high degree of overlap and make the instructions more rigorous, e.g. by adding something like “Please ensure your quotes are directly from the document, and do not add any additional content like disambiguations or comments.”

Some examples of ways to do overlap checks in Python:

# edit distance
import nltk
surplus = max(0, len(doc) - len(quote))
edit_distance = nltk.edit_distance(quote, doc) - surplus

# block matching
from difflib import SequenceMatcher
max([b[-1] for b in SequenceMatcher(None, doc, quote).get_matching_blocks()]) / len(quote)

What you want is quotes that appear in the document and are relevant to the question. If the model is good at identifying relevant quotes for your use case (which it often is but you should check), this ensures that it’s not hallucinating the quotes.

Example: zero-shot prompt to generate direct quotes


Human: Consider the following document:

{document}

Please identify the quotes in this article most relevant to the question "{question}" and copy them out word-for-word. If there are no quotes in this document that seem relevant to this question, please just say "I can’t find any relevant quotes".

Assistant:

Document summary

Document summary or text + direct quotes often make answers more accurate. Sometimes the model might need the full text plus the direct quotes to give an answer, but sometimes a summary plus the direct quotes will be enough.

For example, one can ask for:

  1. A summary of the article:

Human: Consider the following article:

{document}

Please write a one paragraph, high level summary of this article.

Assistant: Here is a summary of the document:

-
  1. Separately, direct quotes from the article relevant to the question (see previous section)

  2. Then request an answer based on these:


Human: I want you to use a summary of a document and quotes from the document to answer the question \“{question}\”

Here is a summary of the document: {document summary}

Here are direct quotes from the document that are most relevant to the question "{question}": {quotes} 

Please use these to construct an answer to the question "{question}" as though you were answering the question directly. Ensure that your answer is accurate and doesn’t contain any information not directly supported by the summary and quotes.

Assistant:

This can be more accurate than extracting quotes alone.

Was this page helpful?
Previous
Claude's response is in the wrong format