Types of Streams


There are two types of Streams 

1) Unidirectional Streams

Unidirectional Streams allow you to Stream live calls over Websocket in a single direction,

from Exotel to the WebSocket Endpoint. Some of the use cases for this are live transcription, realtime monitoring of agents, realtime coaching coaching etc

2) Bidirectional Streams

Bidirectional Streams allow two way flow of voice data over a websocket. Exotel would

send the voice data of the caller to a websocket endpoint. The endpoint can return back voice data back on the websocket and Exotel would play it out to the caller. The primary use case for this to enable building intelligent conversational bots that will help you optimize your workforce


Enabling And Disabling Streams

Streams can be enabled for a call flow using the “Stream” Applet when creating Custom Apps in the App Bazaar

This Applet might not be available by default for all accounts . If you are not able to see them in the list of Voice Applets, drop a mail to hello@exotel.in or talk to your Account Manager



Configuring a Stream

You can enable streaming on a call flow using the Stream Applet.


The applet takes 4 parameters

1. Action - You can either start a new stream or stop a stream that you started earlier in the same call flow. You will use the Stop action if you have started a unidirectional Stream

earlier in the same flow. When you choose stop, that is the only input you need to

configure

2. URL - This is the URL to which Exotel will stream the voice media. You can either specify a wss endpoint or a https endpoint. If you specify a http/https endpoint, Exotel expects the https endpoint to return a wss url in its response. This is to allow

a. Dynamic endpoints for the same call flow

b. Have dynamic custom parameters that can be passed to the websocket endpoint to

handle any application specific customization

When you specify a https endpoint, it must return a json with the key “url”

{
"url" : "wss://streamhandler.yourdomain.com"
}


On receiving this, Exotel will initiate a connection to wss://streamhandler.yourdomain.com.

You could also return custom parameters in the json that would be passed in the Start Message.


3. Type - Unidirectional or BiDirectional Stream. When you chose Unidirectional you can chose to initiate the stream immediately or when the second leg of the call has been picked

4. Next Applet

a. In the case if Unidirectional Stream, the Stream would be created and the call flow

proceeds to the next applet configured

b. In the case of Bi-Directional Stream, the call flow is blocked here till the stream

ends. The Stream can end if the call is disconnected or the websocket is closed or

the stream is explicitly stopped by the client. In the case of Bi-Directional Stream

you do not need to add a explicit “Stop” Stream applet since the stream is

automatically closed before executing the next Applet


Video WalkThrough

You can find a quick walkthrough of a sample flow here



Protocol

Communication between Exotel and customer endpoint happens over websocket connection.


Websocket messages - From Exotel

Each message in the websocket will be sent/received as a JSON string. Following are the types ofmessages that we send

Connected

Start

Media

Stop

Mark (Only in Bidirectional)


Connected message:

After websocket connection is established, this message will be sent.

{

"event" : "connected",

}


Start message:

Start message will contain information about the stream parameters. It will be sent only once, right after the connected message. The custom parameters are picked from the URL configured in the Stream Applet. 

If you had mentioned a https/http URL in the Applet that returns the following

{
   "url":"wss://example.com/media",
   "params":{
      "queuename":"premium",
      "product":"radio"
   }
}


queuename and product would be passed in as keys with premium and radio as values.


{
   "event":"start",
   "sequence_number":1,
   "stream_sid":"<stream sid>",
   "start":{
      "stream_sid":"<>",
      "call_sid":"",
      "account_sid":"",
      "custom_parameters":{
         "queuename":"premium",
          "product":"radio"
      },
      "media_format":{
         "encoding":"<>",
         "sample_rate":"<>",
         "bit_rate":"<>"
      }
   }
}


Media message:

This message encapsulates the audio packets.


{
   "event":"media",
   "sequence_number":3,
   "stream_sid":"<stream sid>",
   "media":{
      "chunk":2,
      "timestamp":"10",
      "payload":"<>"
   }
}


media.chunk : chunk of the message

media.timestamp : Timestamp in milliseconds from the start of the stream.


Stop message:

Stop message is sent when the stream is stopped or the call has ended.

{
   "event":"stop",
   "sequence_number":10,
   "stream_sid":"<stream sid>",
   "stop":{
      "call_sid":"<>",
      "account_sid":"<>",
      "reason":"stopped or callended"
   }
}



Mark Message:

Mark message is used only in bidirectional streaming to track media when it is completed.

{
   "event":"mark",
   "sequence_number":15,
   "stream_sid":"<stream sid>",
   "mark":{
      "name":"<label>"
   }
}




Websocket messages - To Exotel

These messages will be used only in bidirectional streaming.

Mark Message:

Mark message is used only in bidirectional streaming to track media when it is completed. You can send a mark event message after sending a media message to request a notification when the audio that you have sent has been processed. You'll receive a mark event message with a matching name from Exotel when the audio is processed

{
   "event":"mark",
   "stream_sid":"<stream sid>",
   "mark":{
      "name":"<label>"
   }
}



Media message:

This message encapsulates the audio packets.

{
   "event":"media",
   "stream_sid":"<stream sid>",
   "media":{
      "payload":"<>"
   }
}


Media Format

Media in the payloads are sent in raw/SLIN16 audio format( 8 kHz, mono ) encoded in base64. The same is expected from the client in the case of bi directional streams to be played back to the caller.


Sample Code

https://github.com/exotel/voice-streaming



Limitations


1) The Number of Custom Parameters that can be passed in the START message is limited to 3

2) The stream is sent as a mono channel raw audio format and the client will have to handle speaker diarization