# External Meeting Control & Command Patterns Send audio, chat messages, images, and playback controls into a live meeting through Meetstream's control WebSocket channel. Works with Google Meet, Zoom, and Microsoft Teams. --- ## Overview When you create a bot with the `socket_connection_url` configuration, Meetstream opens a WebSocket connection from the bot to your server. You send JSON commands over this connection to control what the bot does inside the meeting. ### Enabling the Control Channel Include `socket_connection_url` in your Create Bot API request: ```json { "meeting_url": "https://meet.google.com/abc-defg-hij", "socket_connection_url": { "websocket_url": "wss://your-server.com/control" } } ``` The `websocket_url` is a WebSocket endpoint **you host**. Meetstream connects to it as a client. --- ## Connection Lifecycle ### 1. Bot connects to your WebSocket endpoint The bot initiates the connection when it joins the meeting. ### 2. Bot sends a JSON text handshake ```json { "type": "ready", "bot_id": "bot_abc123", "message": "Ready to receive messages" } ``` After you receive this, the bot is ready to accept commands. ### 3. You send JSON commands All commands are JSON text WebSocket frames. The bot executes them in the meeting. ### 4. Connection closes when the bot leaves The WebSocket closes with a normal `1000` close code when the bot exits. ### Timeline ![Control Command Lifecycle](https://files.buildwithfern.com/meetstream-ai-573402.docs.buildwithfern.com/683357c37e68a61ed6a829a9e54bf7bfe05c4574a94b1d7a671089dbfcba46e8/docs/assets/images/control-command-lifecycle.png) --- ## Command Reference All commands share this structure: ```json { "command": "", "bot_id": "", ...fields specific to the command } ``` --- ### `sendaudio` — Play Audio in the Meeting Plays audio through the bot's virtual microphone so all meeting participants hear it. Use this for text-to-speech output, pre-recorded audio prompts, or any audio your application generates. ```json { "command": "sendaudio", "bot_id": "bot_abc123", "audiochunk": "", "sample_rate": 48000, "encoding": "pcm16", "channels": 1, "endianness": "little" } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `command` | `string` | yes | `"sendaudio"` | | `bot_id` | `string` | yes | The bot identifier from the handshake | | `audiochunk` | `string` | yes | Base64-encoded audio bytes | | `sample_rate` | `int` | yes | Sample rate in Hz. **48000 recommended.** | | `encoding` | `string` | no | `"pcm16"` (only supported format) | | `channels` | `int` | no | `1` (mono only) | | `endianness` | `string` | no | `"little"` | #### Audio Encoding Requirements The `audiochunk` field must be a base64-encoded string of **raw PCM16 signed little-endian** audio bytes. No WAV headers, no MP3 — just raw samples. | Property | Value | |----------|-------| | Format | Signed 16-bit integer (PCM16) | | Byte order | Little-endian | | Sample rate | 48,000 Hz recommended | | Channels | 1 (mono) | | Container | None — raw samples only | | Transport encoding | Base64 | #### Encoding Audio — Python ```python import base64 import numpy as np def encode_pcm16_to_base64(pcm_bytes: bytes) -> str: """Encode raw PCM16 LE bytes to base64 for the sendaudio command.""" return base64.b64encode(pcm_bytes).decode("utf-8") def float32_to_pcm16_bytes(float_audio: np.ndarray) -> bytes: """Convert float32 audio (-1.0 to 1.0) to PCM16 LE bytes.""" clipped = np.clip(float_audio, -1.0, 1.0) return (clipped * 32767).astype(np.int16).tobytes() def resample_to_48k(pcm_bytes: bytes, source_rate: int) -> bytes: """Resample PCM16 LE audio from any sample rate to 48kHz.""" if source_rate == 48000: return pcm_bytes samples = np.frombuffer(pcm_bytes, dtype=np.int16).astype(np.float32) num_output = int(len(samples) * 48000 / source_rate) t_in = np.linspace(0, 1, len(samples), endpoint=False) t_out = np.linspace(0, 1, num_output, endpoint=False) resampled = np.interp(t_out, t_in, samples) return np.clip(resampled, -32768, 32767).astype(np.int16).tobytes() def build_sendaudio_command(bot_id: str, pcm_bytes: bytes, sample_rate: int = 48000) -> dict: """Build a complete sendaudio command ready to send over the WebSocket.""" return { "command": "sendaudio", "bot_id": bot_id, "audiochunk": encode_pcm16_to_base64(pcm_bytes), "sample_rate": sample_rate, "encoding": "pcm16", "channels": 1, "endianness": "little", } ``` #### Encoding Audio — JavaScript / Node.js ```javascript function encodePcm16ToBase64(pcmBuffer) { return pcmBuffer.toString("base64"); } function float32ToPcm16Buffer(float32Array) { const pcm = Buffer.alloc(float32Array.length * 2); for (let i = 0; i < float32Array.length; i++) { const clamped = Math.max(-1, Math.min(1, float32Array[i])); pcm.writeInt16LE(Math.round(clamped * 32767), i * 2); } return pcm; } function buildSendAudioCommand(botId, pcmBuffer, sampleRate = 48000) { return { command: "sendaudio", bot_id: botId, audiochunk: encodePcm16ToBase64(pcmBuffer), sample_rate: sampleRate, encoding: "pcm16", channels: 1, endianness: "little", }; } ``` #### Sending a WAV File — Python ```python import wave def send_wav_file(ws, bot_id: str, wav_path: str): """Read a WAV file and send it as a sendaudio command.""" with wave.open(wav_path, "rb") as wf: assert wf.getsampwidth() == 2, "WAV must be 16-bit" assert wf.getnchannels() == 1, "WAV must be mono" pcm_bytes = wf.readframes(wf.getnframes()) source_rate = wf.getframerate() pcm_48k = resample_to_48k(pcm_bytes, source_rate) command = build_sendaudio_command(bot_id, pcm_48k) ws.send(json.dumps(command)) ``` #### Chunked Audio Streaming For long audio (TTS streams, file playback), send audio in chunks rather than one large command. A good chunk size is 0.5–2 seconds: ```python CHUNK_SAMPLES = 48000 # 1 second at 48kHz CHUNK_BYTES = CHUNK_SAMPLES * 2 # 2 bytes per sample for i in range(0, len(pcm_bytes), CHUNK_BYTES): chunk = pcm_bytes[i : i + CHUNK_BYTES] command = build_sendaudio_command(bot_id, chunk) await ws.send(json.dumps(command)) await asyncio.sleep(0.8) # pace slightly below real-time to avoid queue gaps ``` --- ### `sendmsg` — Send a Chat Message Posts a text message in the meeting's chat panel, visible to all participants. ```json { "command": "sendmsg", "bot_id": "bot_abc123", "message": "Hello from the bot!", "msg": "Hello from the bot!" } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `command` | `string` | yes | `"sendmsg"` | | `bot_id` | `string` | yes | The bot identifier | | `message` | `string` | yes | The chat message text | | `msg` | `string` | yes | Same text as `message` | > **Why both `message` and `msg`?** Different platforms read different fields internally. Always include both with the same value for cross-platform compatibility. #### Example — Python ```python def build_chat_command(bot_id: str, text: str) -> dict: return { "command": "sendmsg", "bot_id": bot_id, "message": text, "msg": text, } await ws.send(json.dumps(build_chat_command("bot_abc123", "Meeting notes are ready!"))) ``` #### Example — JavaScript ```javascript function buildChatCommand(botId, text) { return { command: "sendmsg", bot_id: botId, message: text, msg: text, }; } ws.send(JSON.stringify(buildChatCommand("bot_abc123", "Meeting notes are ready!"))); ``` --- ### `sendchat` — Send a Chat Message with Role and Streaming An extended chat command that supports agent/user role tagging and incremental streaming. ```json { "command": "sendchat", "bot_id": "bot_abc123", "role": "assistant", "text": "Here is my response...", "is_final": true } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `command` | `string` | yes | `"sendchat"` | | `bot_id` | `string` | yes | The bot identifier | | `role` | `string` | yes | `"assistant"` for bot output, `"user"` for user transcript echo | | `text` | `string` | yes | The message text | | `is_final` | `bool` | yes | `false` for interim streaming tokens, `true` for the final committed message | #### Streaming Pattern Send partial messages as they are generated, then a final complete message: ```python # Stream tokens as they arrive for token in llm_stream: accumulated_text += token await ws.send(json.dumps({ "command": "sendchat", "bot_id": bot_id, "role": "assistant", "text": accumulated_text, "is_final": False, })) # Send the final complete message await ws.send(json.dumps({ "command": "sendchat", "bot_id": bot_id, "role": "assistant", "text": accumulated_text, "is_final": True, })) ``` --- ### `interrupt` — Stop Audio Playback Immediately stops any audio currently playing through the bot's speaker and clears the audio playback queue. Use this for barge-in (when a human starts speaking while the bot is talking) or to cancel a response. ```json { "command": "interrupt", "bot_id": "bot_abc123", "action": "clear_audio_queue" } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `command` | `string` | yes | `"interrupt"` | | `bot_id` | `string` | yes | The bot identifier | | `action` | `string` | yes | `"clear_audio_queue"` | #### Platform Support | Platform | Status | |----------|--------| | Google Meet | Fully supported — clears browser audio queue immediately | | Zoom | Not yet supported — command is accepted but audio queue is not cleared | | Teams | Not yet supported — command is accepted but audio queue is not cleared | #### Example — Barge-In Pattern ```python async def handle_barge_in(ws, bot_id: str): """Stop the bot's audio when a user starts speaking.""" await ws.send(json.dumps({ "command": "interrupt", "bot_id": bot_id, "action": "clear_audio_queue", })) ``` --- ### `sendimg` — Set Bot Video Frame (Base64) Sets the bot's camera feed to a static image. The image is displayed as the bot's video in the meeting. ```json { "command": "sendimg", "bot_id": "bot_abc123", "img": "" } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `command` | `string` | yes | `"sendimg"` | | `bot_id` | `string` | yes | The bot identifier | | `img` | `string` | yes | Base64-encoded image (JPEG or PNG) | #### Example — Python ```python import base64 def build_image_command(bot_id: str, image_path: str) -> dict: with open(image_path, "rb") as f: img_b64 = base64.b64encode(f.read()).decode("utf-8") return { "command": "sendimg", "bot_id": bot_id, "img": img_b64, } await ws.send(json.dumps(build_image_command("bot_abc123", "avatar.png"))) ``` --- ### `sendimg_url` — Set Bot Video Frame (URL) Same as `sendimg` but provides a URL instead of inline base64. The bot downloads the image and sets it as its video frame. ```json { "command": "sendimg_url", "bot_id": "bot_abc123", "img_url": "https://example.com/bot-avatar.png" } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `command` | `string` | yes | `"sendimg_url"` | | `bot_id` | `string` | yes | The bot identifier | | `img_url` | `string` | yes | Publicly accessible URL to a JPEG or PNG image | --- ## Full Server Examples ### Python Server (FastAPI) A complete server that accepts both the audio and control channels, logs audio, and sends a welcome message. ```python import asyncio import base64 import json import numpy as np from fastapi import FastAPI, WebSocket, WebSocketDisconnect app = FastAPI() control_sockets: dict[str, WebSocket] = {} def decode_audio_frame(data: bytes): if len(data) < 5 or data[0] != 0x01: return None sid_len = int.from_bytes(data[1:3], "little") speaker_id = data[3 : 3 + sid_len].decode("utf-8") off = 3 + sid_len sname_len = int.from_bytes(data[off : off + 2], "little") off += 2 speaker_name = data[off : off + sname_len].decode("utf-8") off += sname_len return speaker_id, speaker_name, data[off:] async def send_command(bot_id: str, command: dict): ws = control_sockets.get(bot_id) if ws: await ws.send_text(json.dumps(command)) @app.websocket("/{bot_id}/audio") async def audio_endpoint(websocket: WebSocket, bot_id: str): """Receives live meeting audio from the Meetstream bot.""" await websocket.accept() try: while True: raw = await websocket.receive() if "text" in raw and raw["text"]: handshake = json.loads(raw["text"]) print(f"[{bot_id}] Audio handshake: {handshake}") continue if "bytes" in raw and raw["bytes"]: result = decode_audio_frame(raw["bytes"]) if result is None: continue speaker_id, speaker_name, pcm_bytes = result duration_ms = (len(pcm_bytes) / 2 / 48000) * 1000 print(f"[{bot_id}] [{speaker_name}] {duration_ms:.0f}ms audio") except WebSocketDisconnect: print(f"[{bot_id}] Audio disconnected") @app.websocket("/{bot_id}/control") async def control_endpoint(websocket: WebSocket, bot_id: str): """Receives commands from and sends commands to the Meetstream bot.""" await websocket.accept() control_sockets[bot_id] = websocket try: while True: text = await websocket.receive_text() data = json.loads(text) if data.get("type") == "ready": print(f"[{bot_id}] Control ready") # Send a welcome chat message await send_command(bot_id, { "command": "sendmsg", "bot_id": bot_id, "message": "Bot is now live!", "msg": "Bot is now live!", }) except WebSocketDisconnect: print(f"[{bot_id}] Control disconnected") finally: control_sockets.pop(bot_id, None) ``` Run: ```bash pip install fastapi uvicorn websockets numpy uvicorn server:app --host 0.0.0.0 --port 8000 ``` ### Node.js Server (ws) ```javascript const { WebSocketServer } = require("ws"); const http = require("http"); const server = http.createServer(); const wss = new WebSocketServer({ server }); const controlSockets = new Map(); function decodeAudioFrame(buf) { if (buf.length < 5 || buf[0] !== 0x01) return null; const sidLen = buf.readUInt16LE(1); const speakerId = buf.subarray(3, 3 + sidLen).toString("utf-8"); let off = 3 + sidLen; const snameLen = buf.readUInt16LE(off); off += 2; const speakerName = buf.subarray(off, off + snameLen).toString("utf-8"); off += snameLen; return { speakerId, speakerName, pcmData: buf.subarray(off) }; } function sendCommand(botId, command) { const ws = controlSockets.get(botId); if (ws && ws.readyState === 1) ws.send(JSON.stringify(command)); } wss.on("connection", (ws, req) => { const [, botId, channel] = req.url.split("/"); if (channel === "audio") { console.log(`[${botId}] Audio connected`); ws.on("message", (data, isBinary) => { if (!isBinary) { console.log(`[${botId}] Audio handshake:`, JSON.parse(data.toString())); return; } const frame = decodeAudioFrame(data); if (frame) { const ms = ((frame.pcmData.length / 2) / 48000 * 1000).toFixed(0); console.log(`[${botId}] [${frame.speakerName}] ${ms}ms`); } }); } else if (channel === "control") { controlSockets.set(botId, ws); console.log(`[${botId}] Control connected`); ws.on("message", (data) => { const msg = JSON.parse(data.toString()); if (msg.type === "ready") { console.log(`[${botId}] Control ready`); sendCommand(botId, { command: "sendmsg", bot_id: botId, message: "Bot is now live!", msg: "Bot is now live!", }); } }); ws.on("close", () => controlSockets.delete(botId)); } }); server.listen(8000, () => console.log("Listening on :8000")); ``` --- ## Using Both Channels Together When you enable both `live_audio_required` and `socket_connection_url`, the bot opens **two independent WebSocket connections** to your server. A typical Create Bot request: ```json { "meeting_url": "https://meet.google.com/abc-defg-hij", "bot_name": "My Assistant", "live_audio_required": { "websocket_url": "wss://your-server.com/{bot_id}/audio" }, "socket_connection_url": { "websocket_url": "wss://your-server.com/{bot_id}/control" } } ``` ### Pattern: Receive Audio → Process → Respond ```python # Audio channel: receive meeting audio speaker_id, speaker_name, pcm_bytes = decode_audio_frame(binary_message) # Your processing pipeline transcript = your_stt_service.transcribe(pcm_bytes) response = your_llm.generate(transcript) tts_audio = your_tts_service.synthesize(response) # Control channel: send response back into the meeting await send_command(bot_id, build_sendaudio_command(bot_id, tts_audio)) await send_command(bot_id, { "command": "sendchat", "bot_id": bot_id, "role": "assistant", "text": response, "is_final": True, }) ``` ### Pattern: Barge-In Detection ```python # When you detect the user started speaking while bot audio is playing: await send_command(bot_id, { "command": "interrupt", "bot_id": bot_id, "action": "clear_audio_queue", }) # Then process the new user input and generate a fresh response ``` --- ## Command Summary | Command | Purpose | Key Fields | |---------|---------|------------| | `sendaudio` | Play audio through the bot's speaker | `audiochunk` (base64 PCM16 LE), `sample_rate` | | `sendmsg` | Post a chat message | `message`, `msg` (same text in both) | | `sendchat` | Post a chat message with role and streaming | `role`, `text`, `is_final` | | `interrupt` | Stop audio playback and clear queue | `action: "clear_audio_queue"` | | `sendimg` | Set bot's video frame from base64 | `img` (base64 JPEG/PNG) | | `sendimg_url` | Set bot's video frame from URL | `img_url` | --- ## Audio Specs — Quick Reference | Direction | Format | Encoding | Sample Rate | Channels | Transport | |-----------|--------|----------|-------------|----------|-----------| | **Incoming** (meeting → you) | PCM16 signed LE | Raw binary | 48,000 Hz | 1 (mono) | Binary WebSocket frame | | **Outgoing** (you → meeting) | PCM16 signed LE | Base64 in JSON | 48,000 Hz recommended | 1 (mono) | JSON text WebSocket frame |