Real-Time Voice Interaction with GPT-4o API: A Python Example

Here is an example of how to use the GPT-4o API in Python to enable real-time interaction:

python
import openai
import speech_recognition as sr

openai.api_key = "YOUR_API_KEY"

# Initialize the speech recognizer
r = sr.Recognizer()

# Start the real-time interaction loop
while True:
    try:
        # Listen for audio input
        with sr.Microphone() as source:
            print("Listening...")
            audio = r.listen(source)
        
        # Transcribe the audio using GPT-4o
        transcription = openai.Audio.transcribe(
            model="whisper-1",
            file=audio,
        )
        
        # Generate a response using GPT-4o
        response = openai.Completion.create(
            model="gpt-4o",
            prompt=f"User: {transcription.text}\nAssistant: ",
            max_tokens=100,
            n=1,
            stop=None,
            temperature=0.7,
        )
        
        # Print the assistant's response
        print(f"Assistant: {response.choices[0].text}")
        
    except Exception as e:
        print(f"Error: {e}")

Key points:

- Use the `speech_recognition` library to capture audio input from the microphone
- Transcribe the audio using the `openai.Audio.transcribe()` function with the "whisper-1" model
- Generate a response using `openai.Completion.create()` with the "gpt-4o" model
- Pass the transcribed text as the prompt to GPT-4o
- Adjust parameters like `max_tokens` and `temperature` to customize the response
- Print the assistant's response to enable real-time interaction

This example demonstrates how to leverage the GPT-4o API for real-time voice interaction. The user's speech is captured, transcribed using Whisper, and then passed to GPT-4o to generate a relevant response. The response is then printed, creating a conversational experience.

Note that this is a simplified example, and in a production environment, you would want to add error handling, user interface, and other features to create a robust and user-friendly voice assistant.

The search results indicate that GPT-4o can process audio directly, without requiring separate transcription models like Whisper. This allows for faster and more efficient real-time interactions compared to previous models that relied on external speech recognition components.

Citations:
[1] https://github.com/TheStoneMX/conversation_with_GPT4o
[2] https://deepgram.com/learn/how-to-make-the-most-of-gpt-4o
[3] https://tilburg.ai/2024/05/tutorial-gpt-4o-api/
[4] https://apidog.com/blog/gpt-4o-api/
[5] https://community.openai.com/t/announcing-gpt-4o-in-the-api/744700?page=3

GPT-4o API Real-time interaction with Python