Adding NLP and Conversational Content
This section describes the NLP platforms that can be used with Soul Machines Digital Person and the options available for integrating dynamic data into the conversational content.
NLP Integration
Soul Machines Digital People leverage Natural Language Processing platforms (NLPs) in order to:
- Understand the user’s intention or inquiry by analyzing their utterance.
- Select the proper pre-written response from a finite set of responses.
There are currently several NLP platforms supported out of the box, including Google Dialogflow (ES and CX), IBM Watson Assistant, and MS Azure Bot Service. Additionally, other NLPs can be utilized by building an "Orchestration Layer" with customer-built API integration between Soul Machine’s backend WebSocket events and the desired NLP’s API. This is covered further in the Orchestration section below.
For detailed information see Supported NLP Platforms. If you need further assistance contact Soul Machines technical team to set up a Digital Person with your NLP provider.
Including Dynamic Data in the Conversational Content
While developing the Conversational Content, you may find scenarios in which static, predetermined text is not sufficient in creating a natural, Person-like conversation. For example, consider the Digital Person being asked about “What’s the weather outside?" or “What is my account balance?”
Where dynamic content is needed, you can use Dialogflow's Fulfillment Integrations to pull relevant data from 3rd party services (such as a weather provider, in the case above), or from your company's API.
The IBM Watson Assistant provides you with the ability to instantiate API requests for this third-party data using Cloud Functions. For any assistance with making Fulfillment requests, please contact your Soul Machines Solutions Architect or Conversational Engineer.
Note that dynamic content can also be provided via an Orchestration server.
Speech Recognition (STT) and Voice Synthesis (TTS)
Your NLP leverages 3rd party vendor Speech-to-Text (STT) and Text-to-Speech (TTS) functionality to process audio signals produced by your users, as well as respond in a natural way.
An overview of the data flow can be found in NLP Orchestration Implementation section.
Audio utterances from the end user are transmitted via WebRTC and are directed to the desired 3rd party STT API for conversion to text. The resulting text is then directed to the NLP, as the end user query and processed.
Soul Machines supports Google, IBM, AWS, and Microsoft STT services. Soul Machines also supports phrase hints (Google and AWS) for STT services. Phrase hints increase the likelihood that the STT will correctly transcribe domain specific words and phrases, refer to the relevant 3rd party vendor documentation. Your Soul Machines technical contact can help with this.
Text responses generated by the NLP are first pre-processed by your Soul Machines Persona Server for parsing of Emotional Markup Language (EML), with the remaining static text processed by the TTS service. The final output is the voiced content, uttered by your Digital Person. Soul Machines supports Google, Microsoft, and AWS Polly TTS services. Soul Machines supports a number of the voice services configuration options including: language, accent, speed of dialogue, and other voice components.
The entire speech stack (STT/NLP/TTS) is hosted by the Soul Machines customer’s chosen provider, and API keys for these services must be provided to Soul Machines, as described in the NLP Integration section.
Orchestration Layer (optional)
The Soul Machines Orchestration Layer is an additional layer of logic (hosted on a separate server from the Soul Machines Digital Person and operated by the customer), use cases include but are not limited to:
- Connecting to an unsupported NLP (see Linking an Orchestration Server section).
- Providing user specific data based on the authenticated user specified in your JWT or an additional authentication mechanism.
- Integrating with other systems to retrieve or save data.
- Collecting and saving system data, such as conversation transcripts and EQ data.
Note: Transcript data is not saved and only passed to clients via the Orchestration server for analytics purposes.
Upon instantiation of the Digital Person session, the Persona server connects to Orchestration server via the nominated Web Socket Server URI over HTTPS (as specified in the JWT).
Sample code templates can be provided, written in Node.js. Your Soul Machines Customer Success team can assist you in determining if there is a need for Orchestration for your use case.
The following diagram provides a sample Orchestration server configuration:
Refer to the Orchestration Server Configuration Guide for more information.