Websocket Message Formats

During session establishment the scene connects the orchestration websocket. This connection enables a server to monitor and/or control the behavior of a Digital Person. If the connection fails then the session will fail with reason controlFailed.

Each message is json encoded and sent as a websocket ‘text’ message.

After the initial connection is established, a ‘state’ event message is sent including all current persona state.

If the orchestration websocket connection is lost then the session will be closed with reason controlDisconnected.

Messages

Each json encoded message includes some standard fields for the kind of message. There are three kinds of messages: event, request and response.

Event

Each event includes the category(scene), kind (‘event’), name and body. Events are sent by the Persona to all connected servers and the connected client web browser (if any).

Request

Each request includes the category (‘scene’), kind (‘request’), name, transaction and body. Requests are sent from the Server to the Scene. If the transaction is not null then a response will be sent for the request. The transaction is an integer that is unique across all requests and will be included in the response. If no transaction is included then no response will be sent - the request is one way.

Response

Each response includes the category (‘scene’), kind (‘response’), name, transaction (from the matching request), status (the C sdk status integer where >= 0 is success, < 0 is failure) and body. Responses are sent from the Scene to the Server. The remainder of this document outlines the messages sent and received between the scene and the server.

Example event:

{
    "category": "scene",
    "kind": "event",
    "name": "state",
    "body": {
        "persona": {
            "1": { // is a personaId
                "speechState": "idle",
                "currentSpeech": "Blah blah"
            }
        }
    }
}

state

Direction: Scene to Server.

Notes: Initial state message sent after connection established. Subsequent state messages only include fields that have changed since the last state event. Change can happen either internally in the persona or be requested by a server or api. The session state is always ‘connected’ for a local/kiosk persona.

{
    "category": "scene",
    "kind": "event",
    "name": "state",
    "body": {
        "scene": {
            "sceneId": "<>", // unique guid //
            "runtime": "<>",
            "presentation": "local|web",
            "speechRecognizer": null | {
                "kind": "watson"
            },
        },
        "configuration": {
            "audio": {
                "squelchLevel": -30.0,
                "ambientNoiseLevel": -50.0
            }
        },
        "persona" {
            "1": { // is a personaId
                "users": [
                    {
                        "confidence": 1.0, // face present = 1, no face = 0
                        "emotion": {
                            "positivity": 0.1,
                            "negativity": 0.1,
                            "confusion": 0.1,
                            "emotionalConfidence": 0.6
                        },
                        "activity": {
                            "userIsTalking": 1.0,
                            "userIsAttentive": 0.9
                        },
                        "conversation": {
                            "turn": "persona",
                            "context": {
                                "name1": < value1 > ,
                                "name2": < value2 >
                                ...
                            }
                        }
                    }
                ],
                "speechState": "idle" | "speaking" | "animating",
                "currentSpeech": "Blah blah",
                "currentSpeechEML": "<EML tagged speech>",
                "currentSpeechContext": "<speech context as specified in startSpeaking>",
                "conversationProvider": {
                    "kind": "watson|dialogflow"
                } // null if no provider
            }
        }
        "recognizing": true | false,
        "session": {
            "sessionId": "guid",
            "state": "idle|offered|accepted|connected",
            "userInfo": null | ""
        }
    }
}

recognizeResults

Direction: Scene to Server.

Notes: After recognition has been started by a start_recognize request the persona will send recognition results including both intermediate results (final = false) while the recognition continues to refine its view on the transcript and final results (final = true). From SDK11 and above we recommend using “conversationRequest” instead.

{
    "category": "scene",
    "kind": "event",
    "name": "recognizeResults",
    "body": {
        "status": "sdk_status_int", // 0 == success
        "errorMessage": "", // only present if status < 0
        "results": [
            {
                "final": "true|false",
                "alternatives": [
                    {
                        "confidence": 0.6,
                        "transcript": ""
                    }
                ]
            }
        ]
    }
}

conversationResult

Direction: Scene to Server.

Notes: A conversation result has been received from the back end (watson/dialog flow). In particular, this result includes the output text along with any context variables specified by the corpus prefixed with ‘public’.

{
    "category": "scene",
    "kind": "event",
    "name": "conversationResult",
    "body": {
        status: SmOK,
        errorMessage: ..., // optional
            personaId: <string|number>,
        input: { text: "...", context: { ...
                    }
                },
        output: {
        text: "...",
        context: {
                        "<var1>": <value1>, // only variables of the form public* will be included
                        "<var2>": <value2>
                    },
        provider: {
        kind: "watson" | "dialogflow",
        meta: {
                            // watson
            system: { … from watson response
                            }
                            // provisionalText: ["...", ],
                            // entities: [ { entity: "", value: "", locations: [<int>, <int>, ...] }, ... ],
                            // intents: [ { intent: "", confidence: <float> }, ...],
                            // dialogflow
                            "dialogflow": {... [queryResult](https://cloud.google.com/dialogflow/docs/reference/rest/v2/projects.agent.sessions/detectIntent#QueryResult) from Dialogflow response
                            } // See Dialogflow metadata below
                }
            }
        }
    }
    // The “dialogflow” metadata entry takes the form of a “queryResult” key containing the “queryResult” object returned from Dialogflow. Example payload:
    {
        "queryResult": {
            "allRequiredParamsPresent": true,
            "fulfillmentMessages": [
                {
                    "text": {
                        "text": [
                            "This is a simple response."
                        ]
                    }
                }
            ],
            "fulfillmentText": "This is a simple response.",
            "intent": {
                "displayName": "Respond Simple",
                "name": "projects/df-conversation-test/agent/intents/c6a499cd-ccb2-4af1-bd41-3ad922c627cd"
            }
        }
    }

Provider details: The watson assistant (v1) documentation for the response contained in body: <personaId>."output"."provider"."meta"."watson" is here.

The dialogflow (v2) documentation for the response contained in body: <personaId>."output"."provider"."meta"."dialogflow" is here.

conversationActivation

A UX rule has activated.

Direction: Scene to server.

{
    "category": "scene",
    "kind": "event",
    "name": "activation",
    "body": 
    {
        personaId: "string | number",
        kind: "intent | emote",
        name: "<rule name>",
        turn: "user | persona",
        context": { ...relevant triggering context variable(s)… }
    }
}

speechMarker

Direction: Scene to server

Events triggered during speech, and can be used to show content (i.e. showcards) or just generic events which people may use to trigger other behaviors (e.g. open a menu, scroll the page)

{
    "category": "scene",
    "kind":     "event",
    "name":     "speechMarker",
    "body": 
    {
        personaId: <string|number>,
        name: <string>, // e.g. showcards, hidecards
        arguments: string[]
    }
}

REQUESTS/RESPONSES

Example request/response:

{
    "category": "scene",
    "kind":"request",
    "name": "startSpeaking",
    "transaction":  1, // null | <string|number> - if null the request is one-way and no response will be sent 
    "body": 
    {
        "personaId": <string|number>,
        "text" : "Blah blah",
        "context": "..."    // a context string e.g. node or intentName 
        // that will be included in the transcript 
    }
}

{
    "category":     "scene",
    "kind":         "response",
    "name":         "startSpeaking",
    "transaction":  1,
    "status": 0, // ‘C’ sdk status value where >= 0 is success, < 0 is failure
    "body": 
    {
    }
}

startSpeaking

Direction: Server to Scene.

Notes: Sends text to the persona to start speaking. This text will interrupt and replace any existing text that the persona is speaking. From SDK11 and above we recommend implementing conversation using “conversationRequest” and “conversationResponse” instead (Custom conversation).

{
    "category":     "scene",
    "kind":         "request",
    "name":         "startSpeaking",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "personaId": <string|number>,
        "text" : "Blah blah",
        "context": "..."    // a context string e.g. node or intentName 
                    // that will be included in the transcript  
    }
}

stopSpeaking

Direction: Server to Scene.

Notes: Stops the persona speaking.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "stopSpeaking",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body":     
    {
        "personaId": <string|number>
    }
}

conversationSend

Direction: Server to Scene.

Notes: This message is used to implement chat window functionality where a user types in a request that goes to conversation as if they had spoken it. Custom variables can be included in ‘variables’ that will be received by the conversation corpus. Query parameters for Dialogflow may be provided in the ‘optionalArgs’ object. Note that the “event” kind is only supported in Dialogflow. If sending an event, the “text” field will be sent as the Dialogflow event name.

{
    "category": "scene",
    "kind":         "request",
    "name":         "conversationSend",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "personaId": <string|number>,
        "text" : "Blah blah" ,
        "variables": { "name1": "value1", "name2": "value2"... },
        "optionalArgs": { 
            "kind": "text"/"event"/"init"
        }
    }
}

Example ‘optionalArgs’/query parameters for Dialogflow:

{
    "kind": "text"/"event"/"init",
    "contexts": 
    [ { 
        "name": "contextId", // This may be provided in its full form e.g. "projects/<project_id>/agent/sessions/<session_id>/contexts/contextId" or just context ID without the prefix                                                
        "lifespanCount": 100,                                                              
        "parameters": 
        {
            "name": "Bob" 
        }
    } ],                                                                 
    "payload": { "foo": 1},                                                                
    "timeZone": "America/New_York",                                                               
    "geoLocation": 
    {
        "latitude": 51.42,
        "longitude": 9.999
    },
    "sentimentAnalysisRequestConfig":
    {
        "analyzeQueryTextSentiment": true                                                                }
    }

Response: normal status only response, no body, if successful a conversationResult will be received once dequeued processed by the conversation backend.

conversationSetVariables

Direction: Server to Scene.

Notes: This message is used to set variables that are sent to the conversation provider on the next and subsequent requests. Examples include timezone and Digital Human name.

{
"category":     "scene",
"kind":         "request",
"name":         "conversationSetVariables",
"transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
"body": 
{
    "personaId": <string|number>,
    "variables": { "name1": "value1", "name2": "value2"... }
    }
}

Response: normal status only response, no body.

conversationGetVariables

Direction: Server to Scene.

Notes: This message is used to get the current conversation variable values.

{
    "category": "scene",
    "kind":         "request",
    "name":         "conversationGetVariables",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "personaId": <string|number>
    }
}

Response body:

{
... 
    "body": 
    {
        "variables": 
        {
            ... name/value pairs for each conversation variable 
        }
    }
}

conversationRequest

Direction: Scene to Server.

Notes: The Speech-To-Text (STT) result or chat window entry (see “conversationSend”). The server implementation needs to ensure that all “conversationRequest” messages are matched with a corresponding “conversationResponse” (the “output.text” field can be empty if necessary). Custom variables can be included in “variables” that will be received by the conversation corpus. See: Custom Conversation and Orchestration Server Messages

{
    "category": "scene",
    "kind": "event",
    "name": "conversationRequest",
    "body": {
        "personaId": 1, // <number|string>
        "input": {
            "text": "Can I apply for a credit card?"
        },
        "variables": {
            //... name/value pairs for each conversation variable
        },
        "optionalArgs": {
            // name-value pairs for each optional arg
            "kind": "init" // example arg prompting for Welcome message
        }
    }
}

conversationResponse

Direction: Server to Scene.

Notes: Sends text to the persona to start speaking. By default this text will interrupt and replace any existing text that the persona is speaking. This behavior can be configured in “variables” using conversation state variables. . Custom variables can also be included in “variables” that will be received by the conversation corpus. See: Custom Conversation and Orchestration Server Messages

{
    "category": "scene",
    "kind": "request",
    "name": "conversationResponse",
    "transaction": null, // <null|number|string> if null the request is one-way and no response will be sent
    "body": {
        "personaId": 1, // <number|string>
        "input": {
            "text": "Can I apply for a credit card?"
        },
        "output": {
            "text": "Yes, I can help you apply for your credit card. Please tell me which type of card you are after, Visa or Mastercard?"
        },
        "variables": {
            // [“allow_interrupt”|”ignore_speech”|”allow_gestures”] : [true|false]
            //... name/value pairs for each conversation variable
        },
        "fallback": false, // <bool> flag as true to indicate fallback response, used for interoperability with skills
    }
}

Conversation state variables:

allow_interrupt: Allow/disallow interruption of speech. This setting will persist until a subsequent message changes its value.

configure

Direction: Server to Scene.

Notes: Configures the scene. This request is atomic - if one item cannot be applied then none are applied and the configure request is failed. Each property in the body is optional.

{
"category": "scene",
"kind":         "request",
"name":         "configure",
"transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
"body": 
{
"directionalLight": { "x": <float>, "y": <float>, "z": <float> }
},
"audio": {
"squelchLevel": -30.0
},
    "persona": {
        "<personaId>": {
            "conversation": {
                "speakResults": <bool> // defaults to true
            }
        }
    },
    "Statistics": {
        "intervalSeconds": <float>
    }
}
}

getState

Direction: Server to Scene.

Notes: Gets the Scenes current complete state. The response contains a body identical in form to the initial ‘state’ event. Note that the values of all currently monitored variables are included.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "getState",
    "transaction":  <string|number>,
    "body": 
    {
    }
}

Response body:

{
...
    "body": 
    {
        // identical to the initial ‘state’ event
    }
}

getTranscript

Direction: Server to Scene.

Notes: Gets transcript records for the session that describe interactions between the user and the persona. Each record includes a millisecond timestamp since the start of the session and either recognized user speech or persona state information. Records are requested with a starting cursor position (an integer id) and all records are returned since that cursor position to the time in which the request is made.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "getTranscript",
    "transaction":  <string|number>,
    "body": 
    {
        "cursor": 0  // optional, defaults to 0
    }
}

Response body:
{
...
    "body": 
    {
        "nextCursor": 5,
        "startCursor": 0,
        "records":
        [
            { 
            "timestamp": 0, // milliseconds since start of session
            "recognizeResults": { ...  } // same content as recognize results, only successful final results are included
            }, 
            {  
            "timestamp": 2000,  // milliseconds since start of session
            "state": { ... } // same content as persona state event, only changed portions of ‘persona’ object are included
            },
            { 
            "timestamp": 5000,
            "recognizeResults": { ... } 
            },
            ...
        ]
    }
}

startRecognize

Direction: Server to Scene.

Notes: Starts the speech recognizer. Relies on the persona having a configured speech recognizer.

{
    "category": "scene",
    "kind":         "request",
    "name":         "startRecognize",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "audioSource": 2 (processed) | 4 (squelched) // defaults to processed
    }
}

stopRecognize

Direction: Server to Scene.

Notes: Stops the speech recognizer.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "stopRecognize",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
    }
}

closeSession

Direction: Server to Scene.

Notes: Closes the current offered or connected session (e.g. webrtc). This request is succeeded (but ignored) on a local/kiosk persona.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "closeSession",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "sessionId": "guid",
        "reason": "normal | timeout"
    }
}

sendSession

Direction: Server to Scene.

Notes: Sends custom text to the browser.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "sendSession",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "sessionId": "guid",
        "userText": ""
    }
}

animateToNamedCamera

Direction: Server to Scene.

Notes: Changes to a new named camera position, with orbital camera adjustments.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "animateToNamedCamera",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "personaId": “1”,
        "camera_name" : "CloseUp",
        "time" : 0.5,
        "orbitDegX" : 10.0,
        "orbitDegY" : 0.0,
        "panDeg" : 1.0,
        "tiltDeg" : 0.0
    }
}

transitionToComponent

Direction: Server to Scene.

Notes: Applies the configuration associated with a named component.

{
    "category":     "scene",
    "kind":         "request",
    "name":         "transitionToComponent",
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent 
    "body": 
    {
        "personaId": “1”,
        "componentName" : <string> // a component name in config/soul
    }
}

applyEmlDictionaries

Direction: Server to Scene.

Notes: Applies the specified list of EML keyword dictionaries to the persona in the order specified. The supported dictionary kinds are “object” and “base64”. Any duplicate EML tags will be overridden by the last occurrence of the dictionary.

{
    "category": "scene",
    "kind":         "request",
    "name":         "applyEmlDictionaries”,
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent
    "body":
        {
            "emlDictionaries":
            [
                { "name": "happy_dictionary", "kind": "object", "content": { "SAM_HappyModerate": ["cat"], "SAM_EmphasisGesture": ["bad", "answer for"] } },
                { "name": "base64_dictionary", "kind": "base64", "content": "eyAiU0FNX1F1ZXN0aW9uTW9kZXJhdGUiOiBbInF1ZXN0aW9uIiwgIlNLSUkiLCAiWXVtaSJdIH0=" }
            ]
        }
}

updateContentAwareness

Direction: Server to Scene.

Notes: Updates the persona’s content awareness with the given UI elements and their positions. Note that the “SmVideoFrame” element (persona video frame) is a required element in the “content” array as this is the reference point for the other content in the list.

{
    "category": "scene",
    "kind":         "request",
    "name":         "updateContentAwareness”,
    "transaction":  null | <string|number>, // if null the request is one-way and no response will be sent
    "body":
    {
        "viewWidth": 1024,
        "viewHeight": 768,
        "videoFrame": {
            "x1": 0, // Top left x coordinate of object
            "y1": 0, // Top left y coordinate
            "x2": 600, // Bottom right x coordinate
            "y2": 700, // Bottom right y coordinate
        },
        "safeArea": {
            "x1": 0, // Top left x coordinate of object
            "y1": 0, // Top left y coordinate
            "x2": 400, // Bottom right x coordinate
            "y2": 700, // Bottom right y coordinate
        },
        "content": [
            { 
                "id": "object-1",
                "x1": 400, // Top left x coordinate of object
                "y1": 500, // Top left y coordinate
                "x2": 600, // Bottom right x coordinate
                "y2": 700, // Bottom right y coordinate
            "depthReference": "camera", // Optional field. Only “CAMERA” supported currently
            "relativeDepth": 0.1, // Optional field. Defaults to 0 wrt camera if not specified.
                "meta": { }, // Optional field
            },
            {
                "id": "object-2",
                "x1": 500,
                "y1": 600,
                "x2": 700,
                "y2": 800,
            "depthReference": "camera",
            "relativeDepth": 0.1
            }
        ]
    }
}

Websocket Message Formats

Messages​

Event​

Request​

Response​

state​

recognizeResults​

conversationResult​

conversationActivation​

speechMarker​

REQUESTS/RESPONSES​

startSpeaking​

stopSpeaking​

conversationSend​

conversationSetVariables​

conversationGetVariables​

conversationRequest​

conversationResponse​

Conversation state variables:​

configure​

getState​

getTranscript​

startRecognize​

stopRecognize​

closeSession​

sendSession​

animateToNamedCamera​

transitionToComponent​

applyEmlDictionaries​

updateContentAwareness​

Messages

Event

Request

Response

state

recognizeResults

conversationResult

conversationActivation

speechMarker

REQUESTS/RESPONSES

startSpeaking

stopSpeaking

conversationSend

conversationSetVariables

conversationGetVariables

conversationRequest

conversationResponse

Conversation state variables:

configure

getState

getTranscript

startRecognize

stopRecognize

closeSession

sendSession

animateToNamedCamera

transitionToComponent

applyEmlDictionaries

updateContentAwareness