Meeting Control and Command Patterns | MeetStream AI

Send audio, chat messages, images, and playback controls into a live meeting through Meetstream’s control WebSocket channel. Works with Google Meet, Zoom, and Microsoft Teams.

Overview

When you create a bot with the socket_connection_url configuration, Meetstream opens a WebSocket connection from the bot to your server. You send JSON commands over this connection to control what the bot does inside the meeting.

Enabling the Control Channel

Include socket_connection_url in your Create Bot API request:

1 {
2   "meeting_url": "https://meet.google.com/abc-defg-hij",
3   "socket_connection_url": {
4     "websocket_url": "wss://your-server.com/control"
5   }
6 }

The websocket_url is a WebSocket endpoint you host. Meetstream connects to it as a client.

Connection Lifecycle

1. Bot connects to your WebSocket endpoint

The bot initiates the connection when it joins the meeting.

2. Bot sends a JSON text handshake

1 {
2   "type": "ready",
3   "bot_id": "bot_abc123",
4   "message": "Ready to receive messages"
5 }

After you receive this, the bot is ready to accept commands.

3. You send JSON commands

All commands are JSON text WebSocket frames. The bot executes them in the meeting.

4. Connection closes when the bot leaves

The WebSocket closes with a normal 1000 close code when the bot exits.

Timeline

Control Command Lifecycle

Command Reference

All commands share this structure:

1 {
2   "command": "<command_name>",
3   "bot_id": "<bot_id>",
4   ...fields specific to the command
5 }

`sendaudio` — Play Audio in the Meeting

Plays audio through the bot’s virtual microphone so all meeting participants hear it. Use this for text-to-speech output, pre-recorded audio prompts, or any audio your application generates.

1 {
2   "command": "sendaudio",
3   "bot_id": "bot_abc123",
4   "audiochunk": "<base64-encoded PCM16 LE audio bytes>",
5   "sample_rate": 48000,
6   "encoding": "pcm16",
7   "channels": 1,
8   "endianness": "little"
9 }

Field	Type	Required	Description
`command`	`string`	yes	`"sendaudio"`
`bot_id`	`string`	yes	The bot identifier from the handshake
`audiochunk`	`string`	yes	Base64-encoded audio bytes
`sample_rate`	`int`	yes	Sample rate in Hz. 48000 recommended.
`encoding`	`string`	no	`"pcm16"` (only supported format)
`channels`	`int`	no	`1` (mono only)
`endianness`	`string`	no	`"little"`

Audio Encoding Requirements

The audiochunk field must be a base64-encoded string of raw PCM16 signed little-endian audio bytes. No WAV headers, no MP3 — just raw samples.

Property	Value
Format	Signed 16-bit integer (PCM16)
Byte order	Little-endian
Sample rate	48,000 Hz recommended
Channels	1 (mono)
Container	None — raw samples only
Transport encoding	Base64

Encoding Audio — Python

1 import base64
2 import numpy as np
3 
4 def encode_pcm16_to_base64(pcm_bytes: bytes) -> str:
5     """Encode raw PCM16 LE bytes to base64 for the sendaudio command."""
6     return base64.b64encode(pcm_bytes).decode("utf-8")
7 
8 
9 def float32_to_pcm16_bytes(float_audio: np.ndarray) -> bytes:
10     """Convert float32 audio (-1.0 to 1.0) to PCM16 LE bytes."""
11     clipped = np.clip(float_audio, -1.0, 1.0)
12     return (clipped * 32767).astype(np.int16).tobytes()
13 
14 
15 def resample_to_48k(pcm_bytes: bytes, source_rate: int) -> bytes:
16     """Resample PCM16 LE audio from any sample rate to 48kHz."""
17     if source_rate == 48000:
18         return pcm_bytes
19     samples = np.frombuffer(pcm_bytes, dtype=np.int16).astype(np.float32)
20     num_output = int(len(samples) * 48000 / source_rate)
21     t_in = np.linspace(0, 1, len(samples), endpoint=False)
22     t_out = np.linspace(0, 1, num_output, endpoint=False)
23     resampled = np.interp(t_out, t_in, samples)
24     return np.clip(resampled, -32768, 32767).astype(np.int16).tobytes()
25 
26 
27 def build_sendaudio_command(bot_id: str, pcm_bytes: bytes, sample_rate: int = 48000) -> dict:
28     """Build a complete sendaudio command ready to send over the WebSocket."""
29     return {
30         "command": "sendaudio",
31         "bot_id": bot_id,
32         "audiochunk": encode_pcm16_to_base64(pcm_bytes),
33         "sample_rate": sample_rate,
34         "encoding": "pcm16",
35         "channels": 1,
36         "endianness": "little",
37     }

Encoding Audio — JavaScript / Node.js

1 function encodePcm16ToBase64(pcmBuffer) {
2   return pcmBuffer.toString("base64");
3 }
4 
5 function float32ToPcm16Buffer(float32Array) {
6   const pcm = Buffer.alloc(float32Array.length * 2);
7   for (let i = 0; i < float32Array.length; i++) {
8     const clamped = Math.max(-1, Math.min(1, float32Array[i]));
9     pcm.writeInt16LE(Math.round(clamped * 32767), i * 2);
10   }
11   return pcm;
12 }
13 
14 function buildSendAudioCommand(botId, pcmBuffer, sampleRate = 48000) {
15   return {
16     command: "sendaudio",
17     bot_id: botId,
18     audiochunk: encodePcm16ToBase64(pcmBuffer),
19     sample_rate: sampleRate,
20     encoding: "pcm16",
21     channels: 1,
22     endianness: "little",
23   };
24 }

Sending a WAV File — Python

1 import wave
2 
3 def send_wav_file(ws, bot_id: str, wav_path: str):
4     """Read a WAV file and send it as a sendaudio command."""
5     with wave.open(wav_path, "rb") as wf:
6         assert wf.getsampwidth() == 2, "WAV must be 16-bit"
7         assert wf.getnchannels() == 1, "WAV must be mono"
8         pcm_bytes = wf.readframes(wf.getnframes())
9         source_rate = wf.getframerate()
10 
11     pcm_48k = resample_to_48k(pcm_bytes, source_rate)
12     command = build_sendaudio_command(bot_id, pcm_48k)
13     ws.send(json.dumps(command))

Chunked Audio Streaming

For long audio (TTS streams, file playback), send audio in chunks rather than one large command. A good chunk size is 0.5–2 seconds:

1 CHUNK_SAMPLES = 48000  # 1 second at 48kHz
2 CHUNK_BYTES = CHUNK_SAMPLES * 2  # 2 bytes per sample
3 
4 for i in range(0, len(pcm_bytes), CHUNK_BYTES):
5     chunk = pcm_bytes[i : i + CHUNK_BYTES]
6     command = build_sendaudio_command(bot_id, chunk)
7     await ws.send(json.dumps(command))
8     await asyncio.sleep(0.8)  # pace slightly below real-time to avoid queue gaps

`sendmsg` — Send a Chat Message

Posts a text message in the meeting’s chat panel, visible to all participants.

1 {
2   "command": "sendmsg",
3   "bot_id": "bot_abc123",
4   "message": "Hello from the bot!",
5   "msg": "Hello from the bot!"
6 }

Field	Type	Required	Description
`command`	`string`	yes	`"sendmsg"`
`bot_id`	`string`	yes	The bot identifier
`message`	`string`	yes	The chat message text
`msg`	`string`	yes	Same text as `message`

Why both message and msg? Different platforms read different fields internally. Always include both with the same value for cross-platform compatibility.

Example — Python

1 def build_chat_command(bot_id: str, text: str) -> dict:
2     return {
3         "command": "sendmsg",
4         "bot_id": bot_id,
5         "message": text,
6         "msg": text,
7     }
8 
9 await ws.send(json.dumps(build_chat_command("bot_abc123", "Meeting notes are ready!")))

Example — JavaScript

1 function buildChatCommand(botId, text) {
2   return {
3     command: "sendmsg",
4     bot_id: botId,
5     message: text,
6     msg: text,
7   };
8 }
9 
10 ws.send(JSON.stringify(buildChatCommand("bot_abc123", "Meeting notes are ready!")));

`sendchat` — Send a Chat Message with Role and Streaming

An extended chat command that supports agent/user role tagging and incremental streaming.

1 {
2   "command": "sendchat",
3   "bot_id": "bot_abc123",
4   "role": "assistant",
5   "text": "Here is my response...",
6   "is_final": true
7 }

Field	Type	Required	Description
`command`	`string`	yes	`"sendchat"`
`bot_id`	`string`	yes	The bot identifier
`role`	`string`	yes	`"assistant"` for bot output, `"user"` for user transcript echo
`text`	`string`	yes	The message text
`is_final`	`bool`	yes	`false` for interim streaming tokens, `true` for the final committed message

Streaming Pattern

Send partial messages as they are generated, then a final complete message:

1 # Stream tokens as they arrive
2 for token in llm_stream:
3     accumulated_text += token
4     await ws.send(json.dumps({
5         "command": "sendchat",
6         "bot_id": bot_id,
7         "role": "assistant",
8         "text": accumulated_text,
9         "is_final": False,
10     }))
11 
12 # Send the final complete message
13 await ws.send(json.dumps({
14     "command": "sendchat",
15     "bot_id": bot_id,
16     "role": "assistant",
17     "text": accumulated_text,
18     "is_final": True,
19 }))

`interrupt` — Stop Audio Playback

Immediately stops any audio currently playing through the bot’s speaker and clears the audio playback queue. Use this for barge-in (when a human starts speaking while the bot is talking) or to cancel a response.

1 {
2   "command": "interrupt",
3   "bot_id": "bot_abc123",
4   "action": "clear_audio_queue"
5 }

Field	Type	Required	Description
`command`	`string`	yes	`"interrupt"`
`bot_id`	`string`	yes	The bot identifier
`action`	`string`	yes	`"clear_audio_queue"`

Platform Support

Platform	Status
Google Meet	Fully supported — clears browser audio queue immediately
Zoom	Not yet supported — command is accepted but audio queue is not cleared
Teams	Not yet supported — command is accepted but audio queue is not cleared

Example — Barge-In Pattern

1 async def handle_barge_in(ws, bot_id: str):
2     """Stop the bot's audio when a user starts speaking."""
3     await ws.send(json.dumps({
4         "command": "interrupt",
5         "bot_id": bot_id,
6         "action": "clear_audio_queue",
7     }))

`sendimg` — Set Bot Video Frame (Base64)

Sets the bot’s camera feed to a static image. The image is displayed as the bot’s video in the meeting.

1 {
2   "command": "sendimg",
3   "bot_id": "bot_abc123",
4   "img": "<base64-encoded image data>"
5 }

Field	Type	Required	Description
`command`	`string`	yes	`"sendimg"`
`bot_id`	`string`	yes	The bot identifier
`img`	`string`	yes	Base64-encoded image (JPEG or PNG)

Example — Python

1 import base64
2 
3 def build_image_command(bot_id: str, image_path: str) -> dict:
4     with open(image_path, "rb") as f:
5         img_b64 = base64.b64encode(f.read()).decode("utf-8")
6     return {
7         "command": "sendimg",
8         "bot_id": bot_id,
9         "img": img_b64,
10     }
11 
12 await ws.send(json.dumps(build_image_command("bot_abc123", "avatar.png")))

`sendimg_url` — Set Bot Video Frame (URL)

Same as sendimg but provides a URL instead of inline base64. The bot downloads the image and sets it as its video frame.

1 {
2   "command": "sendimg_url",
3   "bot_id": "bot_abc123",
4   "img_url": "https://example.com/bot-avatar.png"
5 }

Field	Type	Required	Description
`command`	`string`	yes	`"sendimg_url"`
`bot_id`	`string`	yes	The bot identifier
`img_url`	`string`	yes	Publicly accessible URL to a JPEG or PNG image

Full Server Examples

Python Server (FastAPI)

A complete server that accepts both the audio and control channels, logs audio, and sends a welcome message.

1 import asyncio
2 import base64
3 import json
4 import numpy as np
5 from fastapi import FastAPI, WebSocket, WebSocketDisconnect
6 
7 app = FastAPI()
8 
9 control_sockets: dict[str, WebSocket] = {}
10 
11 
12 def decode_audio_frame(data: bytes):
13     if len(data) < 5 or data[0] != 0x01:
14         return None
15     sid_len = int.from_bytes(data[1:3], "little")
16     speaker_id = data[3 : 3 + sid_len].decode("utf-8")
17     off = 3 + sid_len
18     sname_len = int.from_bytes(data[off : off + 2], "little")
19     off += 2
20     speaker_name = data[off : off + sname_len].decode("utf-8")
21     off += sname_len
22     return speaker_id, speaker_name, data[off:]
23 
24 
25 async def send_command(bot_id: str, command: dict):
26     ws = control_sockets.get(bot_id)
27     if ws:
28         await ws.send_text(json.dumps(command))
29 
30 
31 @app.websocket("/{bot_id}/audio")
32 async def audio_endpoint(websocket: WebSocket, bot_id: str):
33     """Receives live meeting audio from the Meetstream bot."""
34     await websocket.accept()
35 
36     try:
37         while True:
38             raw = await websocket.receive()
39 
40             if "text" in raw and raw["text"]:
41                 handshake = json.loads(raw["text"])
42                 print(f"[{bot_id}] Audio handshake: {handshake}")
43                 continue
44 
45             if "bytes" in raw and raw["bytes"]:
46                 result = decode_audio_frame(raw["bytes"])
47                 if result is None:
48                     continue
49                 speaker_id, speaker_name, pcm_bytes = result
50                 duration_ms = (len(pcm_bytes) / 2 / 48000) * 1000
51                 print(f"[{bot_id}] [{speaker_name}] {duration_ms:.0f}ms audio")
52 
53     except WebSocketDisconnect:
54         print(f"[{bot_id}] Audio disconnected")
55 
56 
57 @app.websocket("/{bot_id}/control")
58 async def control_endpoint(websocket: WebSocket, bot_id: str):
59     """Receives commands from and sends commands to the Meetstream bot."""
60     await websocket.accept()
61     control_sockets[bot_id] = websocket
62 
63     try:
64         while True:
65             text = await websocket.receive_text()
66             data = json.loads(text)
67 
68             if data.get("type") == "ready":
69                 print(f"[{bot_id}] Control ready")
70 
71                 # Send a welcome chat message
72                 await send_command(bot_id, {
73                     "command": "sendmsg",
74                     "bot_id": bot_id,
75                     "message": "Bot is now live!",
76                     "msg": "Bot is now live!",
77                 })
78 
79     except WebSocketDisconnect:
80         print(f"[{bot_id}] Control disconnected")
81     finally:
82         control_sockets.pop(bot_id, None)

Run:

$ pip install fastapi uvicorn websockets numpy
$ uvicorn server:app --host 0.0.0.0 --port 8000

Node.js Server (ws)

1 const { WebSocketServer } = require("ws");
2 const http = require("http");
3 
4 const server = http.createServer();
5 const wss = new WebSocketServer({ server });
6 
7 const controlSockets = new Map();
8 
9 function decodeAudioFrame(buf) {
10   if (buf.length < 5 || buf[0] !== 0x01) return null;
11   const sidLen = buf.readUInt16LE(1);
12   const speakerId = buf.subarray(3, 3 + sidLen).toString("utf-8");
13   let off = 3 + sidLen;
14   const snameLen = buf.readUInt16LE(off);
15   off += 2;
16   const speakerName = buf.subarray(off, off + snameLen).toString("utf-8");
17   off += snameLen;
18   return { speakerId, speakerName, pcmData: buf.subarray(off) };
19 }
20 
21 function sendCommand(botId, command) {
22   const ws = controlSockets.get(botId);
23   if (ws && ws.readyState === 1) ws.send(JSON.stringify(command));
24 }
25 
26 wss.on("connection", (ws, req) => {
27   const [, botId, channel] = req.url.split("/");
28 
29   if (channel === "audio") {
30     console.log(`[${botId}] Audio connected`);
31     ws.on("message", (data, isBinary) => {
32       if (!isBinary) {
33         console.log(`[${botId}] Audio handshake:`, JSON.parse(data.toString()));
34         return;
35       }
36       const frame = decodeAudioFrame(data);
37       if (frame) {
38         const ms = ((frame.pcmData.length / 2) / 48000 * 1000).toFixed(0);
39         console.log(`[${botId}] [${frame.speakerName}] ${ms}ms`);
40       }
41     });
42 
43   } else if (channel === "control") {
44     controlSockets.set(botId, ws);
45     console.log(`[${botId}] Control connected`);
46 
47     ws.on("message", (data) => {
48       const msg = JSON.parse(data.toString());
49       if (msg.type === "ready") {
50         console.log(`[${botId}] Control ready`);
51         sendCommand(botId, {
52           command: "sendmsg",
53           bot_id: botId,
54           message: "Bot is now live!",
55           msg: "Bot is now live!",
56         });
57       }
58     });
59 
60     ws.on("close", () => controlSockets.delete(botId));
61   }
62 });
63 
64 server.listen(8000, () => console.log("Listening on :8000"));

Using Both Channels Together

When you enable both live_audio_required and socket_connection_url, the bot opens two independent WebSocket connections to your server. A typical Create Bot request:

1 {
2   "meeting_url": "https://meet.google.com/abc-defg-hij",
3   "bot_name": "My Assistant",
4   "live_audio_required": {
5     "websocket_url": "wss://your-server.com/{bot_id}/audio"
6   },
7   "socket_connection_url": {
8     "websocket_url": "wss://your-server.com/{bot_id}/control"
9   }
10 }

Pattern: Receive Audio → Process → Respond

1 # Audio channel: receive meeting audio
2 speaker_id, speaker_name, pcm_bytes = decode_audio_frame(binary_message)
3 
4 # Your processing pipeline
5 transcript = your_stt_service.transcribe(pcm_bytes)
6 response = your_llm.generate(transcript)
7 tts_audio = your_tts_service.synthesize(response)
8 
9 # Control channel: send response back into the meeting
10 await send_command(bot_id, build_sendaudio_command(bot_id, tts_audio))
11 await send_command(bot_id, {
12     "command": "sendchat",
13     "bot_id": bot_id,
14     "role": "assistant",
15     "text": response,
16     "is_final": True,
17 })

Pattern: Barge-In Detection

1 # When you detect the user started speaking while bot audio is playing:
2 await send_command(bot_id, {
3     "command": "interrupt",
4     "bot_id": bot_id,
5     "action": "clear_audio_queue",
6 })
7 
8 # Then process the new user input and generate a fresh response

Command Summary

Command	Purpose	Key Fields
`sendaudio`	Play audio through the bot’s speaker	`audiochunk` (base64 PCM16 LE), `sample_rate`
`sendmsg`	Post a chat message	`message`, `msg` (same text in both)
`sendchat`	Post a chat message with role and streaming	`role`, `text`, `is_final`
`interrupt`	Stop audio playback and clear queue	`action: "clear_audio_queue"`
`sendimg`	Set bot’s video frame from base64	`img` (base64 JPEG/PNG)
`sendimg_url`	Set bot’s video frame from URL	`img_url`

Audio Specs — Quick Reference

Direction	Format	Encoding	Sample Rate	Channels	Transport
Incoming (meeting → you)	PCM16 signed LE	Raw binary	48,000 Hz	1 (mono)	Binary WebSocket frame
Outgoing (you → meeting)	PCM16 signed LE	Base64 in JSON	48,000 Hz recommended	1 (mono)	JSON text WebSocket frame

Overview

Enabling the Control Channel

Connection Lifecycle

1. Bot connects to your WebSocket endpoint

2. Bot sends a JSON text handshake

3. You send JSON commands

4. Connection closes when the bot leaves

Timeline

Command Reference

sendaudio — Play Audio in the Meeting

Audio Encoding Requirements

Encoding Audio — Python

Encoding Audio — JavaScript / Node.js

Sending a WAV File — Python

Chunked Audio Streaming

sendmsg — Send a Chat Message

Example — Python

Example — JavaScript

sendchat — Send a Chat Message with Role and Streaming

Streaming Pattern

interrupt — Stop Audio Playback

Platform Support

Example — Barge-In Pattern

sendimg — Set Bot Video Frame (Base64)

Example — Python

sendimg_url — Set Bot Video Frame (URL)

Full Server Examples

Python Server (FastAPI)

Node.js Server (ws)

Using Both Channels Together

Pattern: Receive Audio → Process → Respond

Pattern: Barge-In Detection

Command Summary

Audio Specs — Quick Reference

`sendaudio` — Play Audio in the Meeting

`sendmsg` — Send a Chat Message

`sendchat` — Send a Chat Message with Role and Streaming

`interrupt` — Stop Audio Playback

`sendimg` — Set Bot Video Frame (Base64)

`sendimg_url` — Set Bot Video Frame (URL)