Text Transcript
The transcript for a conversation includes the text-based inputs of both the Digital Person and the User. In your UI, you will need to capture this text and save it into a list if you wish to display it.
A transcript should include all text that:
- The Digital Person spoke
- The user spoke
- The user typed (if you support this in your UI)
Digital Person speech
The Digital Person's speech is sent back to the UI in the state
websocket message. These messages are mapped to the onStateEvent
EventEmitter, which you can listen to.
The onStateEvent
emits frequently, and contains a lot of data unrelated to the Digital Person's speech. You will need to filter the messages to look for changes to the speechState
of the persona
property in the messages.
When you receive an event where the speechState
is equal to 'speaking'
, then the currentSpeech
property will contain the text representation of the Digital Person's speech.
scene.onStateEvent.addListener((scene, event) => {
const personaState = event.persona?.['1'];
if (personaState?.speechState === 'speaking') {
const personaSpeech = personaState?.currentSpeech;
console.log('[personaSpeech]', personaSpeech);
}
});
User speech
User speech is sent back to the UI as text in the recognizeResults
websocket message. These messages are mapped to the onRecognizeResultsEvent
EventEmitter, which you can listen to.
For user speech, the UI is sent intermediate results as the user is speaking, and is also sent a final result when it is determined that the user has finished speaking.
For a transcript, you should use only the final results.
For displaying live transcription, you can use the intermediate (non-final) results.
scene.onRecognizeResultsEvent.addListener(
(scene, status, errorMessage, results) => {
const result = results[0];
const userSpeech = result.alternatives[0].transcript;
if (result.final === false) {
console.log('[userSpeech] user is saying:', userSpeech);
}
if (result.final === true) {
console.log('[userSpeech] user said:', userSpeech);
}
}
);
User text input
If your UI allows a user to send typed text input, you will need to capture this text yourself and add it to your app's transcript. You must send the typed text to the server, but the server will not send it back again as an event.
mySendMessageButton.addEventListener('click', () => {
const userText = myTextInput.value;
scene.sendMessage(userText);
console.log('[userText] user typed:', userText);
});
Example code for transcript
const transcript = [];
// add event listener for user's spoken input as text
scene.onRecognizeResultsEvent.addListener(
(scene, status, errorMessage, results) => {
const result = results[0];
const userSpeech = result.alternatives[0].transcript;
if (result.final === true) {
console.log('[userSpeech] user said:', userSpeech);
// add it to the transcript
transcript.push({ source: 'user', text: userSpeech });
console.log('[transcript]: ', transcript);
}
}
);
// add event listener for Digital Person's spoken output as text
scene.onStateEvent.addListener((scene, event) => {
const personaState = event.persona?.['1'];
if (personaState?.speechState === 'speaking') {
const personaSpeech = personaState?.currentSpeech;
console.log('[personaSpeech]', personaSpeech);
// add it to the transcript
transcript.push({ source: 'persona', text: personaSpeech });
console.log('[transcript]: ', transcript);
}
});
// send a message typed by the user into a UI input
function onSendMessage() {
const textInput = document.getElementById('text-message-input');
const userText = textInput.value;
// send it to the server
persona.conversationSend(userText, {}, {});
console.log('[userText] user typed:', userText);
// add it to the transcript
transcript.push({ source: 'user', text: userText });
console.log('[transcript]: ', transcript);
textInput.value = '';
}