Using the API

API Reference

Base URL

All API requests should be sent to


API requests require that an API key be passed in an X-API-Key header. New API keys can be generated via the console. Both of our SDKs accept an API key when you instantiate the client object:

# Python
client = anthropic.Client(api_key)
// Typescript
const client = new Client(apiKey);


All arguments must be passed as a JSON object. As a result, you must also pass the Content-Type: application/json header with your request to signal that you are sending JSON.

All responses are given in JSON. If using streaming, each new instance of data sent will be a full JSON object.


This POST endpoint sends a prompt to Claude for completion.


string • required
The prompt you want Claude to complete. For proper response generation you will need to format your prompt as follows:
const userQuestion = "Why is the sky blue?";
const prompt = `\n\nHuman: ${userQuestion}\n\nAssistant:`;
See our comments on prompts for more context.

string • required

As we improve Claude, we develop new versions of it that you can query. This controls which version of Claude answers your request. Right now we are offering two model families: Claude and Claude Instant.

Specifiying any of the following models will automatically switch to you the newest compatible models as they are released:

  • "claude-v1": Our largest model, ideal for a wide range of more complex tasks.
  • "claude-v1-100k": An enhanced version of claude-v1 with a 100,000 token (roughly 75,000 word) context window. Ideal for summarizing, analyzing, and querying long documents and conversations for nuanced understanding of complex topics and relationships across very long spans of text.
  • "claude-instant-v1": A smaller model with far lower latency, sampling at roughly 40 words/sec! Its output quality is somewhat lower than the latest claude-v1 model, particularly for complex tasks. However, it is much less expensive and blazing fast. We believe that this model provides more than adequate performance on a range of tasks including text classification, summarization, and lightweight chat applications, as well as search result summarization.
  • "claude-instant-v1-100k": An enhanced version of claude-instant-v1 with a 100,000 token context window that retains its performance. Well-suited for high throughput use cases needing both speed and additional context, allowing deeper understanding from extended conversations and documents.

You can also select specific sub-versions of the above models:

  • "claude-v1.3": Compared to claude-v1.2, it's more robust against red-team inputs, better at precise instruction-following, better at code, and better and non-English dialogue and writing.
  • "claude-v1.3-100k": An enhanced version of claude-v1.3 with a 100,000 token (roughly 75,000 word) context window.
  • "claude-v1.2": An improved version of claude-v1. It is slightly improved at general helpfulness, instruction following, coding, and other tasks. It is also considerably better with non-English languages. This model also has the ability to role play (in harmless ways) more consistently, and it defaults to writing somewhat longer and more thorough responses.
  • "claude-v1.0": An earlier version of claude-v1.
  • "claude-instant-v1.1": Our latest version of claude-instant-v1. It is better than claude-instant-v1.0 at a wide variety of tasks including writing, coding, and instruction following. It performs better on academic benchmarks, including math, reading comprehension, and coding tests. It is also more robust against red-teaming inputs.
  • "claude-instant-v1.1-100k": An enhanced version of claude-instant-v1.1 with a 100,000 token context window that retains its lightning fast 40 word/sec performance.
  • "claude-instant-v1.0": An earlier version of claude-instant-v1.
int • required
A maximum number of tokens to generate before stopping.
list of strings • optional
Our models stop on "\n\nHuman:", and may include additional built-in stop sequences in the future. By providing the stop_sequences parameter, you may include additional strings that will cause the model to stop generating.
boolean • optional
defaults to false
Whether to incrementally stream the response using SSE.
float • optional
defaults to 1
Amount of randomness injected into the response. Ranges from 0 to 1. Use temp closer to 0 for analytical / multiple choice, and temp closer to 1 for creative and generative tasks.
int • optional
defaults to -1
Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Defaults to -1, which disables it. Learn more technical details here.
float • optional
defaults to -1
Does nucleus sampling, in which we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p. Defaults to -1, which disables it. Note that you should either alter temperature or top_p, but not both.
object • optional
defaults to {}
An object describing metadata about the request. Child parameters:
string • optional
A uuid, hash value, or other external identifier for the user who is associated with the request. Anthropic may use this id to help detect abuse. Do not include any identifying information such as name, email address, or phone number.


The resulting completion up to and excluding the stop sequences
"stop_sequence" or "max_tokens"
The reason we stopped sampling:
  • "stop_sequence": we reached a stop sequence — either provided by you via the stop_sequences parameter, or a stop sequence built into the model
  • "max_tokens": we exceeded max_tokens_to_sample or the model's maximum



# Synchronous request: only replies with the full response
export API_KEY=my_api_key
  -H "x-api-key: $API_KEY"\
  -H 'content-type: application/json'\
  -d '{
    "prompt": "\n\nHuman: Tell me a haiku about trees\n\nAssistant: ",
    "model": "claude-v1", "max_tokens_to_sample": 300, "stop_sequences": ["\n\nHuman:"]

# Streaming request: sends the response as it's generated
# The -N arg tells curl not to buffer
export API_KEY=my_api_key
curl -N\
  -H "x-api-key: $API_KEY"\
  -H 'content-type: application/json'\
  -d '{
    "prompt": "\n\nHuman: Tell me a haiku about trees\n\nAssistant: ",
    "model": "claude-v1", "max_tokens_to_sample": 300, "stop_sequences": ["\n\nHuman:"],
    "stream": true


See the Python library GitHub repo.


See the Typescript library GitHub repo.

Was this page helpful?