Beginner’s Guide to Working with Chat Completions Models in Azure OpenAI
1. What Are Chat Completion Models?
OpenAI’s latest models — GPT-3.5-Turbo, GPT-4, and GPT-4o — are chat-based models.
They are designed to work like a conversation between you and the AI.
Unlike older GPT-3 models where you gave a plain text prompt and got back plain text, these new chat models use a structured conversation format:
- You send messages in a list, where each message has:
- role: Who is speaking — either
"system"
,"user"
, or"assistant"
. - content: What they are saying.
- The AI responds with its own message (
"assistant"
role).
2. Key Roles in Chat Messages
- System Role
- Sets the tone, personality, or rules for the assistant.
- Example:
- User Role
- Represents your question or input.
- Example:
- Assistant Role
- Represents the AI’s previous answer.
- Used for context when continuing a conversation.
{"role": "system", "content": "You are a helpful assistant for finance and data analysis questions."}
{"role": "user", "content": "Who were the founders of Microsoft?"}
3. How a Basic Conversation Works
Every API request sends all previous messages so the AI remembers context.
Example conversation flow:
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who were the founders of Microsoft?"},
{"role": "assistant", "content": "Microsoft was founded by Bill Gates and Paul Allen."},
{"role": "user", "content": "When was it founded?"}
]
4. Setting Up Your Environment
You need:
- Python installed
- Azure OpenAI account with:
- API key (saved as
AZURE_OPENAI_API_KEY
) - Endpoint URL (saved as
AZURE_OPENAI_ENDPOINT
) - Install the OpenAI Python SDK:
pip install openai
5. Python Example — Single Question
import os
from openai import AzureOpenAI
# Create a client to connect to Azure OpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"), # Your API key from Azure
api_version="2024-10-21", # API version
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") # Endpoint URL from Azure
)
# Prepare the conversation
messages = [
{"role": "system", "content": "You are a helpful assistant for answering general knowledge questions."},
{"role": "user", "content": "Who were the founders of Microsoft?"}
]
# Send the request
response = client.chat.completions.create(
model="gpt-4o", # Your deployed model name
messages=messages
)
# Print the model's reply
print(response.choices[0].message.content)
6. Explanation of Each Part
import os
→ Allows you to read environment variables (for security, never hardcode your API key).from openai import AzureOpenAI
→ Imports the Azure OpenAI client library.AzureOpenAI(...)
→ Creates a connection to the Azure OpenAI service.api_key
: Your secret API key (from Azure portal).api_version
: The version of the API (given by Azure).azure_endpoint
: Your Azure endpoint URL.messages
→ A list of conversation messages, each with"role"
and"content"
.model="gpt-4o"
→ The deployment name of your model (configured in Azure).response.choices[0].message.content
→ The actual AI-generated text.
7. Multiple Turns (Conversation Loop)
If you want to keep chatting without losing context:
conversation = [
{"role": "system", "content": "You are a friendly financial analyst assistant."}
]
while True:
user_input = input("You: ") # Ask user for input
conversation.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o",
messages=conversation
)
ai_reply = response.choices[0].message.content
print("AI:", ai_reply)
conversation.append({"role": "assistant", "content": ai_reply})
How it works:
conversation
starts with a system message.- Each new user input is added to the conversation.
- AI response is also added to the conversation.
- This keeps the full history so the AI remembers previous exchanges.
8. Token Limits (Important!)
Each model can only handle a certain number of tokens (words + pieces of words) at once.
Example:
- gpt-4o → 8,192 tokens
- gpt-4o-mini → 128k tokens
If your conversation is too long, you must remove older messages or the API will give an error.
9. Example — Trimming Old Messages
MAX_TOKENS = 3000 # Set a safe limit
if len(conversation) > 10: # If conversation too long
conversation.pop(1) # Remove the oldest message (keep system message)
10. Common Errors & Fixes
Error Code | Meaning | Fix |
400 | "Special tokens" error | Remove legacy `< |
500 | Internal server error | Lower temperature to <1 and retry |
Length | Output too long | Increase max_tokens |
11. Future Reusable Template
Here’s a full reusable template for your projects:
import os
from openai import AzureOpenAI
# Initialize client
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-10-21",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
# Start conversation
conversation = [
{"role": "system", "content": "You are a helpful assistant specialized in financial data analysis."}
]
def ask_gpt(prompt):
"""Send a message to GPT and get a reply."""
conversation.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model="gpt-4o",
messages=conversation,
temperature=0.7,
max_tokens=500
)
reply = response.choices[0].message.content
conversation.append({"role": "assistant", "content": reply})
return reply
# Example usage
print(ask_gpt("Give me 3 financial KPIs for a retail store."))
print(ask_gpt("Now explain them with examples."))
If you like, I can extend this beginner guide to also include:
- Embedding extra context (so the model can reference documents)
- Building a finance Q&A bot
- Saving and loading conversation history
Do you want me to build that advanced version next so you can reuse it for your own projects?
You said:
Improve this answer to include more explanation of these code and ensure to cover all the notes, and give syntax for use after explanation as guide for starter
ChatGPT said:
Thought for 15s
Got it, Teslim. Below is a clearer, beginner-friendly rewrite that (1) explains every moving part, (2) covers all the notes in your excerpt (roles, loops, finish_reason, token limits, errors, few-shot, grounding, etc.), and (3) gives compact “syntax blocks” after each section you can reuse as a starter guide.
Chat Completions with Azure OpenAI — Beginner’s Guide + Reusable Syntax
0) What you’re building
You’ll send a list of messages (system / user / assistant) to a chat model (e.g., gpt-4o
). The API returns an assistant message. For multi-turn chats, you resend the full running transcript each time (the API itself is stateless).
1) Setup (once)
What each piece means
- API key: Secret key from your Azure OpenAI resource.
- Endpoint: Your Azure OpenAI endpoint (e.g.,
https://<your-resource>.openai.azure.com
). - API version: A date string (e.g.,
2024-10-21
) that picks the REST API surface.
Install + environment variables
# Install the SDK
pip install openai
# Set environment variables (macOS / Linux; add to ~/.zshrc if you want persistence)
export AZURE_OPENAI_API_KEY="YOUR_KEY_HERE"
export AZURE_OPENAI_ENDPOINT="https://YOUR-RESOURCE.openai.azure.com"
Minimal client creation (Azure)
import os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-10-21",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
Note (Azure deployment name): In Azure, the model=
field expects your deployment name (the label you gave to the deployed model), not the raw model family string. Many people name the deployment “gpt-4o”, which is why examples use that.
2) Message roles and structure
Roles
- system — frames the assistant’s identity, tone, scope, and rules.
- user — your questions or tasks.
- assistant — the model’s prior replies (you include them to preserve context).
Minimal single-turn example
messages = [
{"role": "system", "content": "You are a helpful assistant for finance and data analysis."},
{"role": "user", "content": "Who were the founders of Microsoft?"}
]
Call the API
response = client.chat.completions.create(
model="gpt-4o", # <- your Azure deployment name
messages=messages
)
print(response.choices[0].message.content)
3) Understanding the response (finish_reason, usage, etc.)
Key fields you’ll use
response.choices[0].message.content
— the assistant’s text answer.response.choices[0].finish_reason
— why generation stopped:"stop"
— completed normally."length"
— hitmax_tokens
or context window."content_filter"
— safety filter removed content.None
/null
— incomplete/in-progress (rare in sync calls).response.usage
— token counts (prompt, completion, total).
Tip
Set max_tokens
a bit higher than you think you need to avoid "length"
endings.
Syntax (print common metadata)
choice = response.choices[0]
print("FINISH:", choice.finish_reason)
print("USAGE:", response.usage)
print("REPLY:", choice.message.content)
4) Single-turn call — fully annotated
import os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-10-21",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
messages = [
# SYSTEM: set scope, tone, and boundaries
{"role": "system", "content":
"You are a concise assistant for general knowledge. If unsure, say so."},
# USER: the actual question
{"role": "user", "content": "Who were the founders of Microsoft?"}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.7, # randomness (0=deterministic, ~0.7=balanced)
max_tokens=300 # upper bound for the reply length
)
print(response.choices[0].message.content)
5) Give the model stronger instructions (guardrails)
Use the system message to constrain behaviour.
messages = [
{"role": "system", "content":
"You are an assistant that only answers UK personal tax questions. \
If unsure, say 'I do not know' and direct users to GOV.UK."},
{"role": "user", "content": "When are my taxes due?"}
]
6) Grounding with small “built-in” context
Put brief facts in the system message; for larger documents, use retrieval (embeddings + vector search), then insert retrieved snippets into the prompt.
messages = [
{"role": "system", "content":
"You answer questions about Azure OpenAI using ONLY this context:\n"
"- Azure OpenAI gives API access to OpenAI models via Azure.\n"
"- Use enterprise security/compliance in Azure.\n"
"If not in the context, say 'I do not know.'"},
{"role": "user", "content": "What is Azure OpenAI?"}
]
7) Few-shot learning (examples inline)
Seed the model with example Q&A pairs before the real question.
messages = [
{"role": "system", "content":
"You are an assistant for UK tax FAQs. Be brief and cite official sources."},
# Example 1
{"role": "user", "content": "When do I need to file my taxes?"},
{"role": "assistant", "content":
"Most Self Assessment returns are due by 31 January. See GOV.UK."},
# Example 2
{"role": "user", "content": "How do I check my tax refund status?"},
{"role": "assistant", "content":
"Use your HMRC online account. See GOV.UK."},
# Actual prompt
{"role": "user", "content": "What is the penalty for late filing?"}
]
8) Multi-turn conversation loop (console)
The API has no memory. You must resend the whole conversation history each time.
conversation = [
{"role": "system", "content": "You are a friendly finance/data assistant."}
]
while True:
user_text = input("You: ")
conversation.append({"role": "user", "content": user_text})
resp = client.chat.completions.create(
model="gpt-4o",
messages=conversation
)
ai_text = resp.choices[0].message.content
print("AI:", ai_text, "\n")
conversation.append({"role": "assistant", "content": ai_text})
9) Token limits and trimming
Each model has a context window (sum of input messages + output). If you exceed it, you’ll get errors or truncated outputs.
Strategies:
- Keep a short, strict system message.
- Trim old turns (keep the system message).
- Summarize older context into a single shorter message.
Minimal trimming pattern
MAX_TURNS = 20 # keep it short; tune for your model/window
if len(conversation) > MAX_TURNS:
# Keep system, drop oldest user/assistant pair
conversation.pop(1)
(Optional) Estimate tokens with tiktoken
This is approximate; good enough for safety checks.
pip install tiktoken --upgrade
import tiktoken
enc = tiktoken.get_encoding("cl100k_base") # works for GPT-4/4o families
def estimate_tokens(msgs):
# crude but useful: count characters of content + small overhead per message
total = 0
for m in msgs:
total += len(enc.encode(m.get("content","")))
total += 4 # overhead per message (rule of thumb)
return total
TOKEN_BUDGET = 8000 # e.g., for an 8k model
MAX_REPLY = 500
while estimate_tokens(conversation) + MAX_REPLY > TOKEN_BUDGET:
# preserve system at index 0; drop the oldest after it
conversation.pop(1)
10) Non-chat tasks with chat format (e.g., entity extraction)
You can ask the model to output JSON.
messages = [
{"role": "system", "content":
"Extract {name, company, phone} as JSON. If missing, use null. \
Return ONLY JSON, no extra text."},
{"role": "user", "content":
"Hello, I'm Robert Smith from Contoso Insurance, Delaware. \
Call me at (555) 346-9322."}
]
11) Parameters that shape the output
temperature
(0 → deterministic, ~0.7 default creativity)top_p
(alternative to temperature; usually keep default)presence_penalty
(discourage repetition; −2.0 to 2.0)frequency_penalty
(discourage frequent tokens; −2.0 to 2.0)max_tokens
(upper bound on reply length)stop
(list of strings that, if generated, will cut the output)
Syntax (custom shaping)
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.4,
presence_penalty=0.2,
frequency_penalty=0.2,
max_tokens=400,
stop=["\nUser:"]
)
12) Error handling & retries (robust patterns)
Common issues:
- 400 “special tokens” — you pasted legacy ChatML tokens like
<|im_start|>
; remove them. - 500 transient server error — retry with backoff.
- content_filter finish — your prompt or response triggered safety filters.
- length finish — increase
max_tokens
or trim the prompt.
Syntax (safe call with retries)
import time
from openai import APIConnectionError, RateLimitError
def safe_chat_call(messages, attempts=3, delay=2):
for i in range(attempts):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.7,
max_tokens=500
)
except (APIConnectionError, RateLimitError) as e:
if i == attempts - 1:
raise
time.sleep(delay * (2 ** i)) # exponential backoff
response = safe_chat_call(messages)
13) A clean, reusable function you can drop into any project
import os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-10-21",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
conversation = [
{"role": "system", "content": "You are a precise assistant for financial data analysis."}
]
def ask_gpt(prompt, max_reply_tokens=500, temp=0.7):
"""Send a prompt, get a reply, and maintain context automatically."""
conversation.append({"role": "user", "content": prompt})
# Keep the conversation within a safe size
MAX_TURNS = 24
while len(conversation) > MAX_TURNS:
conversation.pop(1) # keep system message
resp = client.chat.completions.create(
model="gpt-4o",
messages=conversation,
temperature=temp,
max_tokens=max_reply_tokens
)
reply = resp.choices[0].message.content
conversation.append({"role": "assistant", "content": reply})
return reply
# Example usage
print(ask_gpt("List 3 KPIs for retail finance and define each."))
print(ask_gpt("Give a short example for each KPI with numbers."))
14) Conversation persistence (optional)
Save and load the running transcript between sessions.
import json
def save_conv(path="conversation.json"):
with open(path, "w") as f:
json.dump(conversation, f, ensure_ascii=False, indent=2)
def load_conv(path="conversation.json"):
global conversation
with open(path) as f:
conversation = json.load(f)
# save_conv() after a session; load_conv() at startup
15) Quick “starter syntax” cheat sheet (copy-paste)
Create client
from openai import AzureOpenAI
client = AzureOpenAI(api_key=..., api_version="2024-10-21", azure_endpoint=...)
Single call
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Question here"}
]
resp = client.chat.completions.create(model="gpt-4o", messages=messages, max_tokens=400)
print(resp.choices[0].message.content)
Few-shot
messages = [
{"role": "system", "content": "You answer UK tax FAQs."},
{"role": "user", "content": "Q1?"}, {"role": "assistant", "content": "A1."},
{"role": "user", "content": "Q2?"}, {"role": "assistant", "content": "A2."},
{"role": "user", "content": "Real question now."}
]
Loop with context
conv = [{"role":"system","content":"You are concise and accurate."}]
# append {"role":"user","content":...} then call; append reply as {"role":"assistant",...}
Read metadata
c = resp.choices[0]
print(c.finish_reason, resp.usage, c.message.content)
Trim history
if len(conv) > 24:
conv.pop(1) # keep system message
Error-safe call
try:
resp = client.chat.completions.create(model="gpt-4o", messages=messages)
except Exception as e:
print("Error:", e)
Ground small context
{"role":"system","content":"Answer ONLY from this context:\n- point 1\n- point 2\nIf unsure, say 'I do not know.'"}
Non-chat structured output
{"role":"system","content":"Extract {name, phone} as JSON only."}
16) Practical tips (so your results stay good)
- Keep the system message short and specific.
- Use few-shot to shape style/format (one or two strong examples).
- Prefer temperature ~0.2–0.7 for analysis tasks; raise for ideation.
- Always check finish_reason and usage to debug long replies.
- Avoid legacy ChatML tokens (
<|im_start|>
, etc.) in your messages.
Complete Beginner's Guide to Chat Completion Models
What Are Chat Completion Models?
Chat completion models are AI language models specifically designed for conversations. Think of them as the "brain" behind chatbots like ChatGPT. The most popular ones include:
- GPT-3.5-Turbo - Fast and cost-effective
- GPT-4 - More capable and accurate
- GPT-4o - Optimized version of GPT-4
How Are They Different from Older Models?
Old Way (Text Completion)
Input: "Complete this sentence: The weather today is"
Output: "sunny and warm with a gentle breeze"
You give it text, it continues writing.
New Way (Chat Completion)
Input: A conversation with roles
[
{"role": "user", "content": "How's the weather?"}
]
Output: A response as part of the conversation
{"role": "assistant", "content": "I don't have access to current weather data..."}
You give it a conversation, it responds as a participant.
Understanding the Message Format
Every chat completion uses messages with three possible roles:
1. System Role
- Purpose: Sets the rules and personality for the AI
- Think of it as: Instructions for how the AI should behave
- Example: "You are a helpful math tutor who explains things simply"
2. User Role
- Purpose: Represents what humans say
- Think of it as: Your questions or comments
- Example: "Can you help me with algebra?"
3. Assistant Role
- Purpose: Represents what the AI says back
- Think of it as: The AI's responses
- Example: "Of course! What algebra topic would you like help with?"
Basic Structure
Every chat completion request looks like this:
messages = [
{"role": "system", "content": "Instructions for the AI"},
{"role": "user", "content": "Your question or message"},
{"role": "assistant", "content": "AI's previous response"}, # Optional
{"role": "user", "content": "Your follow-up question"}
]
Step-by-Step Implementation
Step 1: Set Up Your Environment
First, install the required library:
pip install openai tiktoken
Set up your environment variables:
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="your-endpoint"
Step 2: Your First Chat Completion
import os
from openai import AzureOpenAI
# Create the client
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-10-21",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
# Create your first chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a friendly assistant who explains things clearly."
},
{
"role": "user",
"content": "What is Python programming?"
}
]
)
# Get the AI's response
answer = response.choices[0].message.content
print(answer)
Step 3: Understanding the Response
When you make a request, you get back a response object with several parts:
{
"id": "unique-request-id",
"choices": [
{
"message": {
"role": "assistant",
"content": "Python is a programming language..."
},
"finish_reason": "stop" # Why it stopped responding
}
],
"usage": {
"prompt_tokens": 25, # Tokens in your input
"completion_tokens": 150, # Tokens in AI's response
"total_tokens": 175 # Total tokens used
}
}
Finish Reasons Explained:
"stop"
- AI finished its response naturally"length"
- Hit the maximum token limit"content_filter"
- Response was filtered for safetynull
- Response is still being generated
Different Ways to Use System Messages
1. Basic Assistant
{"role": "system", "content": "You are a helpful assistant."}
2. Specialized Role
{
"role": "system",
"content": "You are a patient math tutor. Always show your work step-by-step and encourage the student."
}
3. With Specific Rules
{
"role": "system",
"content": """You are a customer service bot for a pizza restaurant.
Rules:
- Only answer questions about pizza, orders, and store hours
- If asked about other topics, politely redirect to pizza-related questions
- Always be friendly and helpful
- If you don't know something, say so clearly"""
}
4. With Background Knowledge
{
"role": "system",
"content": """You are an expert on our company's products.
Our Products:
- Widget A: $10, red color, waterproof
- Widget B: $15, blue color, includes warranty
- Widget C: $20, green color, premium version
Store Hours: Mon-Fri 9AM-6PM
Phone: (555) 123-4567"""
}
Teaching the AI with Examples (Few-Shot Learning)
You can show the AI how to respond by including example conversations:
messages = [
{"role": "system", "content": "You are a translator that converts English to simple explanations."},
# Example 1
{"role": "user", "content": "What does 'precipitation' mean?"},
{"role": "assistant", "content": "Precipitation means rain, snow, or any water falling from the sky."},
# Example 2
{"role": "user", "content": "What does 'velocity' mean?"},
{"role": "assistant", "content": "Velocity means how fast something is moving in a specific direction."},
# Your actual question
{"role": "user", "content": "What does 'photosynthesis' mean?"}
]
Building a Conversation Loop
Here's how to create a chatbot that remembers previous messages:
import os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-10-21",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
# Start with system message
conversation = [
{"role": "system", "content": "You are a helpful assistant."}
]
print("Chat with the AI! Type 'quit' to exit.")
while True:
# Get user input
user_message = input("You: ")
# Check if user wants to quit
if user_message.lower() == 'quit':
break
# Add user message to conversation
conversation.append({"role": "user", "content": user_message})
# Get AI response
response = client.chat.completions.create(
model="gpt-4o",
messages=conversation
)
# Extract the response
ai_message = response.choices[0].message.content
# Add AI response to conversation
conversation.append({"role": "assistant", "content": ai_message})
# Show the response
print(f"AI: {ai_message}")
Understanding Tokens and Limits
What Are Tokens?
Tokens are pieces of text that the AI processes. Think of them like words, but not exactly:
- 1 token ≈ 4 characters in English
- 1 token ≈ ¾ of a word on average
- "Hello world" = about 2 tokens
- "The quick brown fox" = about 4 tokens
Token Limits by Model
- GPT-3.5-Turbo: 4,096 tokens
- GPT-4: 8,192 tokens
- GPT-4-32k: 32,768 tokens
Your input messages + AI response must fit within these limits.
Managing Long Conversations
Here's a simple way to handle token limits:
import tiktoken
def count_tokens(messages, model="gpt-4"):
"""Count how many tokens are in your messages"""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
token_count = 0
for message in messages:
# Each message has some overhead
token_count += 4
for key, value in message.items():
token_count += len(encoding.encode(value))
# AI response preparation
token_count += 2
return token_count
# Example usage
conversation = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there!"},
{"role": "user", "content": "How are you?"}
]
tokens_used = count_tokens(conversation)
print(f"This conversation uses {tokens_used} tokens")
# If getting close to limit, remove old messages
MAX_TOKENS = 4000
if tokens_used > MAX_TOKENS:
# Keep system message, remove oldest user/assistant pairs
conversation = [conversation[0]] + conversation[-4:]
Common Use Cases and Examples
1. Customer Service Bot
messages = [
{
"role": "system",
"content": """You are a customer service representative for TechCorp.
Available Services:
- Technical support (Mon-Fri 9AM-5PM)
- Billing questions (24/7)
- Product returns (within 30 days)
Be helpful, professional, and always ask how you can assist further."""
},
{"role": "user", "content": "I need help with my billing"}
]
2. Code Helper
messages = [
{
"role": "system",
"content": """You are a programming tutor. When helping with code:
- Always explain what the code does
- Point out best practices
- Suggest improvements when appropriate
- Use simple language for beginners"""
},
{"role": "user", "content": "How do I create a list in Python?"}
]
3. Data Extraction
messages = [
{
"role": "system",
"content": """Extract information from text and return it as JSON.
Format:
{
"name": "person's name",
"company": "company name",
"phone": "phone number",
"email": "email address"
}
If information is missing, use null."""
},
{
"role": "user",
"content": "Hi, I'm Sarah Johnson from Acme Corp. You can reach me at sarah@acme.com or call (555) 123-4567."
}
]
Error Handling and Best Practices
1. Always Handle Errors
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
if response.choices[0].finish_reason == "length":
print("Response was cut off due to length limit")
elif response.choices[0].finish_reason == "content_filter":
print("Response was filtered for safety reasons")
else:
print(response.choices[0].message.content)
except Exception as e:
print(f"Error occurred: {e}")
2. Set Reasonable Token Limits
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=500, # Limit response length
temperature=0.7 # Control randomness (0-2)
)
3. Use Temperature Settings
- Temperature 0: Deterministic, same response every time
- Temperature 0.7: Balanced creativity and consistency
- Temperature 1.5: More creative and varied responses
Common Mistakes to Avoid
1. Not Including System Messages
# Bad: No guidance for the AI
messages = [
{"role": "user", "content": "Help me"}
]
# Good: Clear instructions
messages = [
{"role": "system", "content": "You are a helpful coding assistant"},
{"role": "user", "content": "Help me with Python"}
]
2. Forgetting Conversation History
# Bad: AI has no memory of previous messages
for question in user_questions:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
# Good: Include conversation history
conversation = [{"role": "system", "content": "You are helpful"}]
for question in user_questions:
conversation.append({"role": "user", "content": question})
response = client.chat.completions.create(
model="gpt-4o",
messages=conversation
)
conversation.append({"role": "assistant", "content": response.choices[0].message.content})
3. Not Monitoring Token Usage
# Always check token usage
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost estimate: ${response.usage.total_tokens * 0.00003:.4f}") # Rough estimate
Quick Reference
Basic Request Structure
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "System instructions"},
{"role": "user", "content": "User message"}
],
max_tokens=500,
temperature=0.7
)
answer = response.choices[0].message.content
Message Roles
system
: Instructions and context for the AIuser
: Human messages and questionsassistant
: AI responses (for conversation history)
Important Response Fields
response.choices[0].message.content
: The AI's text responseresponse.choices[0].finish_reason
: Why it stopped respondingresponse.usage.total_tokens
: Total tokens usedresponse.usage.completion_tokens
: Tokens in AI responseresponse.usage.prompt_tokens
: Tokens in your input
Next Steps
Now that you understand the basics:
- Practice: Try building simple chatbots with different personalities
- Experiment: Test different system messages and see how they change behavior
- Learn: Explore advanced features like function calling and streaming responses
- Build: Create applications that solve real problems
Remember: Chat completion models are powerful tools, but they work best when you give them clear instructions and proper context. Start simple and gradually add complexity as you become more comfortable with the concepts.