A Python-based SIP relay server with OpenAI integration for intelligent voice interactions. Supports SIP signaling, RTP media handling, WebSocket-driven real-time control, and AI-powered audio transcription and response generation. Designed for environments requiring call routing, audio streaming, SIP integration, and AI-assisted call center functionality.
-
SIP Signaling Support INVITE, ACK, BYE, CANCEL
-
RTP Media Streaming Real-time audio using G.711 (PCMA/PCMU)
-
WebSocket Integration Bi-directional control and audio transmission
-
WAV Audio Playback Automatic audio playback on call establishment
-
Call Recording Saves inbound audio streams to WAV files
-
Dual Operation Modes
- Incoming call handling (server mode)
- Outgoing call initiation (client mode)
-
Dynamic RTP Port Allocation
-
Multi-Session Management
-
OpenAI Integration
- Speech-to-Text (Whisper API)
- Text-to-Speech (TTS API)
- LLM-powered responses (GPT-4o-mini)
-
Call Center Mode
- Real-time audio buffering and processing
- AI-powered conversation handling
-
Environment-based Configuration
- Centralized config management
- .env file support
| Command | Format |
|---|---|
| CALL | CALL:{PHONE_NUMBER} |
| RTP | RTP:{PCM Byte String} |
| RTP | RTP:{CALL_ID}##{BASE64 AUDIO} |
| CALL_ANS | CALL_ANS:{CALL_ID} |
| CALL_IGNORE | CALL_IGNORE:{CALL_ID} |
| HANGUP | HANGUP:{CALL_ID} |
| BYE | BYE:{CALL_ID} |
| RING_ANS | RING_ANS:{PHONE_NUMBER} |
| RING_IGNORE | RING_IGNORE:{CALL_ID} |
| CALL_FAILED | CALL_FAILED:{STATUS_CODE} {REASON} |
- RelayServer (
receive_server.py) Handles SIP signaling and orchestrates all subsystems. - RTPHandler (
helper/rtp_handler.py) Manages RTP packet sending and receiving. - SIPRTPSession (
helper/sip_session.py) Maintains session state and resources. - SIPMessageParser (
helper/sip_parsers.py) Parses and validates SIP messages. - WebSocket Helper (
helper/ws_helper.py) Handles WebSocket communication. - LLMHandler (
call_center.py) Integrates OpenAI services for speech-to-text, text-to-speech, and LLM responses. - Config (
config.py) Centralized configuration management with environment variable support.
SIP_server_v2/
main.py # Main entry point
receive_server.py # SIP relay server
call_center.py # AI call center implementation
reply_handler.py # OpenAI integration (STT, TTS, LLM)
config.py # Configuration management
helper/
rtp_handler.py # RTP packet handling
sip_session.py # Session management
sip_parsers.py # SIP message parsing
ws_helper.py # WebSocket communication
ws_command.py # WebSocket command helpers
wav_handler.py # WAV file operations
model/
sip_message.py # SIP message models
rtp.py # RTP packet models
ws_command.py # WebSocket command models
call_status.py # Call status enums
recording/ # Call recordings
output/
convented/ # Converted audio files
response/ # AI response audio
transcode/ # Transcoded audio (greeting.wav)
-
Python 3.12+
-
OpenAI API Key (required for AI features)
-
Dependencies:
openai >= 2.8.1pydantic >= 2.12.4pydub >= 0.25.1python-dotenv >= 1.2.1websockets >= 15.0.1realtimestt >= 0.3.104
git clone <repository-url>
cd SIP_server_v2
uv syncCreate a .env file:
# Required
OPENAI_API_KEY=your_openai_api_key_here
# SIP Configuration (optional, defaults shown)
SIP_LOCAL_IP=192.168.1.101
SIP_LOCAL_PORT=5062
SIP_TRANSFER_PORT=5060
SIP_SERVER_IP=192.168.1.170
# WebSocket Configuration (optional)
WS_HOST=192.168.1.101
WS_PORT=8080
# RTP Configuration (optional)
RTP_PORT_START=31000
RTP_PORT_END=31010
# Logging (optional)
LOG_LEVEL=INFO
# Call Center (optional)
CALL_CENTER_BUFFER_SIZE=120The server uses a centralized Config class that loads settings from environment variables (.env file). All configuration is managed through the config.py module.
| 选择 | ID | 描述 | 源类型 | 源 ID | 主叫前缀 | 被叫前缀 | 目的类型 | 目的 ID | 选线方式 | 号码转换类型 | 左起删除的位数 | 右起删除的位数 | 添加前缀 | 添加后缀 | 右起保留的位数 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | IP | 192.168.157.126 | GSM | 1 | 轮选 | 转换被叫 | |||||||||
| 2 | GSM | 1 | IP | 192.168.157.126 | 轮选 | 转换主叫 |
See the .env file created during installation for all available options. The configuration is validated on startup to ensure required values (like OPENAI_API_KEY) are present.
Using the main entry point (recommended):
python main.pyThis starts both the SIP server and WebSocket server with proper configuration validation.
To run the AI-powered call center that processes audio with OpenAI:
python call_center.pyThis mode:
- Connects to the WebSocket server
- Buffers incoming RTP audio packets
- Transcribes audio using Whisper
- Generates responses using GPT-4o-mini
- Converts responses to speech using TTS
- Sends audio back through the call
CALL:{phone_number>
Example:
CALL:0912341234
RTP:{call_id>##<base64_audio>
BYE
RING_ANS:{phone_number>##<call_id>CALL_ANS:{call_id>CALL_IGNORE:{call_id>CALL_FAILED:{status> <reason>BYE:{call_id>RTP:{hex_audio_data>
- Receive INVITE
- Parse SDP offer
- Allocate RTP ports
- Reply with 200 OK + SDP
- Receive ACK
- Start RTP
- Play greeting audio
- Record inbound audio
- Handle BYE
- Finalize recording
- Receive WebSocket CALL command
- Allocate RTP ports
- Send INVITE
- Handle 180 Ringing
- Receive 200 OK
- Send ACK
- Stream audio
- Handle BYE
- PCMA (G.711 A-law) – Payload 8
- PCMU (G.711 μ-law) – Payload 0
- 8000 Hz sample rate
- Mono
- 16-bit PCM
- 160 samples per 20 ms frame
Place WAV files in:
./output/transcode/greeting.wav- Default port range: 31000–31010
- Ports allocated in pairs
- 4-port spacing between each session
- Automatic cleanup on session termination
- UDP transport
- G.711 payload
- 160-byte payload per packet
- Automatic sequence number rollover
The system integrates with OpenAI's APIs for AI-powered voice interactions:
-
OpenAiSTT (
reply_handler.py)- Uses Whisper-1 model for speech-to-text
- Supports multiple languages (default: Chinese)
- Transcribes audio files to text
-
OpenAiTTS (
reply_handler.py)- Uses GPT-4o-mini-TTS for text-to-speech
- Multiple voice options (default: alloy)
- Adjustable speed (0.25-4.0x)
-
OpenAiLLM (
reply_handler.py)- Uses GPT-4o-mini for text generation
- Customizable system prompts
- Generates conversational responses
from reply_handler import OpenAiSTT, OpenAiTTS, OpenAiLLM
from pathlib import Path
# Initialize
api_key = "your-api-key"
stt = OpenAiSTT(api_key)
tts = OpenAiTTS(api_key)
llm = OpenAiLLM(api_key, model="gpt-4o-mini")
# Process audio
text = stt.transcribe(Path("input.wav"))
response = llm.chat(text)
tts.speak(response, output=Path("output.wav"))- File:
sip_server.log - Console: stdout
- Format:
[LEVEL] - TIMESTAMP - MESSAGE - FILE:LINE
- INFO
- DEBUG
- WARNING
- ERROR
- Full type hint coverage
- Pydantic for all data structures
- Match/case routing
- Structured logging practices
python receive_server.pysocket.error: [Errno 98] Address already in useChange port or free the process.
- Check UDP firewall rules
- Verify
greeting.wavexists - Confirm RTP ports allocated correctly
- Verify IP and port config
- Check NAT/firewall
- Review SIP logs
- Confirm SIP server IP/port
- Codec compatibility
- Look at SIP response codes
Missing API Key:
ValueError: OPENAI_API_KEY is required
- Ensure
.envfile exists with validOPENAI_API_KEY - Check that the API key is active in your OpenAI account
Authentication Error:
- Verify API key is correct and not expired
- Check OpenAI account has available credits
Audio Processing Issues:
- Ensure audio files are in WAV format for transcription
- Check that
CALL_CENTER_BUFFER_SIZEis appropriate for your use case - Verify network connectivity to OpenAI APIs
Code by DHT@Matthew