Here is an example of how to use the GPT-4o API in Python to enable real-time interaction:
python
import openai
import speech_recognition as sr
openai.api_key = "YOUR_API_KEY"
# Initialize the speech recognizer
r = sr.Recognizer()
# Start the real-time interaction loop
while True:
try:
# Listen for audio input
with sr.Microphone() as source:
print("Listening...")
audio = r.listen(source)
# Transcribe the audio using GPT-4o
transcription = openai.Audio.transcribe(
model="whisper-1",
file=audio,
)
# Generate a response using GPT-4o
response = openai.Completion.create(
model="gpt-4o",
prompt=f"User: {transcription.text}\nAssistant: ",
max_tokens=100,
n=1,
stop=None,
temperature=0.7,
)
# Print the assistant's response
print(f"Assistant: {response.choices[0].text}")
except Exception as e:
print(f"Error: {e}")
Key points:
- Use the `speech_recognition` library to capture audio input from the microphone
- Transcribe the audio using the `openai.Audio.transcribe()` function with the "whisper-1" model
- Generate a response using `openai.Completion.create()` with the "gpt-4o" model
- Pass the transcribed text as the prompt to GPT-4o
- Adjust parameters like `max_tokens` and `temperature` to customize the response
- Print the assistant's response to enable real-time interaction
This example demonstrates how to leverage the GPT-4o API for real-time voice interaction. The user's speech is captured, transcribed using Whisper, and then passed to GPT-4o to generate a relevant response. The response is then printed, creating a conversational experience.
Note that this is a simplified example, and in a production environment, you would want to add error handling, user interface, and other features to create a robust and user-friendly voice assistant.
The search results indicate that GPT-4o can process audio directly, without requiring separate transcription models like Whisper. This allows for faster and more efficient real-time interactions compared to previous models that relied on external speech recognition components.
Citations:[1] https://github.com/TheStoneMX/conversation_with_GPT4o
[2] https://deepgram.com/learn/how-to-make-the-most-of-gpt-4o
[3] https://tilburg.ai/2024/05/tutorial-gpt-4o-api/
[4] https://apidog.com/blog/gpt-4o-api/
[5] https://community.openai.com/t/announcing-gpt-4o-in-the-api/744700?page=3