Skip to main content

Text Transcript

The transcript for a conversation includes the text-based inputs of both the Digital Person and the User. In your UI, you will need to capture this text and save it into a list if you wish to display it.

A transcript should include all text that:

  • The Digital Person spoke
  • The user spoke
  • The user typed (if you support this in your UI)

Digital Person speech

The Digital Person's speech is sent back to the UI in the state websocket message. These messages are mapped to the onStateEvent EventEmitter, which you can listen to.

The onStateEvent emits frequently, and contains a lot of data unrelated to the Digital Person's speech. You will need to filter the messages to look for changes to the speechState of the persona property in the messages.

When you receive an event where the speechState is equal to 'speaking', then the currentSpeech property will contain the text representation of the Digital Person's speech.

scene.onStateEvent.addListener((scene, event) => {
const personaState = event.persona?.['1'];

if (personaState?.speechState === 'speaking') {
const personaSpeech = personaState?.currentSpeech;
console.log('[personaSpeech]', personaSpeech);
}
});

User speech

User speech is sent back to the UI as text in the recognizeResults websocket message. These messages are mapped to the onRecognizeResultsEvent EventEmitter, which you can listen to.

For user speech, the UI is sent intermediate results as the user is speaking, and is also sent a final result when it is determined that the user has finished speaking.

For a transcript, you should use only the final results.

For displaying live transcription, you can use the intermediate (non-final) results.

scene.onRecognizeResultsEvent.addListener(
(scene, status, errorMessage, results) => {
const result = results[0];
const userSpeech = result.alternatives[0].transcript;

if (result.final === false) {
console.log('[userSpeech] user is saying:', userSpeech);
}

if (result.final === true) {
console.log('[userSpeech] user said:', userSpeech);
}
}
);

User text input

If your UI allows a user to send typed text input, you will need to capture this text yourself and add it to your app's transcript. You must send the typed text to the server, but the server will not send it back again as an event.

mySendMessageButton.addEventListener('click', () => {
const userText = myTextInput.value;
scene.sendMessage(userText);
console.log('[userText] user typed:', userText);
});

Example code for transcript

const transcript = [];

// add event listener for user's spoken input as text
scene.onRecognizeResultsEvent.addListener(
(scene, status, errorMessage, results) => {
const result = results[0];
const userSpeech = result.alternatives[0].transcript;

if (result.final === true) {
console.log('[userSpeech] user said:', userSpeech);

// add it to the transcript
transcript.push({ source: 'user', text: userSpeech });
console.log('[transcript]: ', transcript);
}
}
);

// add event listener for Digital Person's spoken output as text
scene.onStateEvent.addListener((scene, event) => {
const personaState = event.persona?.['1'];

if (personaState?.speechState === 'speaking') {
const personaSpeech = personaState?.currentSpeech;
console.log('[personaSpeech]', personaSpeech);

// add it to the transcript
transcript.push({ source: 'persona', text: personaSpeech });
console.log('[transcript]: ', transcript);
}
});

// send a message typed by the user into a UI input
function onSendMessage() {
const textInput = document.getElementById('text-message-input');
const userText = textInput.value;

// send it to the server
persona.conversationSend(userText, {}, {});
console.log('[userText] user typed:', userText);

// add it to the transcript
transcript.push({ source: 'user', text: userText });
console.log('[transcript]: ', transcript);

textInput.value = '';
}