Skip to main content

Generative Content Skills

You may wish to create a Base Corpus where the content is generated programmatically. This may be done using a machine learning model like GPT, or could be done using a more traditional method of scripted code.

This approach enables you to have full control over the creation of a Digital Person's response to each user input, and to use whichever programmatic tools are most suitable for your use case.

Streaming with Skills Async API

The Skills Async API can be useful when implementing a generative content skill. A common scenario can be using a LLM (Large Language Model) or integrating with an existing generative API; these models and APIs often support streaming, i.e. they generate a stream of tokens. The async API allows you to stream messages as they are generated, without having to wait until the generation of the full response has finished, thus improving latency and responsiveness, resulting in a better user experience.

Here we explore an example that illustrates the basics of streaming, using the Skills Async API.

Sample Code

Getting started You can download or clone a Skills Async API template app from GitHub:

The code that is walked through in this document can be applied to the template.

Streaming API mock

Let's start by having a mock implementation of a generative API, to simulate streaming of chunks.

import random
from time import sleep

def response_generator(input_text):
"""
Generates a response based on the input and gradually streams it
in chunks of varying sizes.
"""
response = make_response(input_text)
while len(response) > 0:
chunk, response = get_chunk(response)
is_final = len(response) == 0
yield chunk, is_final
if not is_final:
sleep(0.1) # sleep between each chunk to simulate latency

def make_response(input_text):
"""
Respond by copying the input multiple times, concatenating as separate sentences.
"""
response = ""
# repeat the input sentence a random number of times
times_to_repeat = random.randint(2, 4)
response = ". ".join([input_text] * times_to_repeat)
return response

def get_chunk(text):
"""
Take out a chunk of a random size from the given text and return
both the chunk and the remaining text.
"""
chunk_size = random.randint(3, 5) # generating a random chunk size between 3 and 5
chunk = text[:chunk_size]
text = text[chunk_size:]
return chunk, text # returning both chunk and the updated text

Generate sentences from chunks

We want to respond with coherent sentences, rather than with individual chunks of the response, so we put the chunks together into sentences:

import re

class ResponseData:
def __init__(self, text: str, is_final: bool = True):
self.text = text
self.is_final = is_final

def sentence_generator(response):
text = ""
for chunk, is_final in response:
text += chunk
sentences = re.split('(?<=[^\d][.!?])\s+', text)
for sentence in sentences[:-1]:
yield ResponseData(text=sentence.strip(), is_final=False)
text = sentences[-1] if sentences else ""

if is_final:
yield ResponseData(text=text.strip(), is_final=True)

Stream the sentences

Now that we can get sentences from our mock generative API, we can stream those responses via the Async API:

conversation.py
import uuid
from smskillsdk.models.api_async import UserConversationMessage, SkillConversationMessage
from utils.ws_connection import Connection

def conversation_handler(connection: Connection, msg: UserConversationMessage):
input = msg.text

if msg.variables:
turn_id = msg.variables.get("Turn_Id", None)
if not turn_id:
turn_id = str(uuid.uuid4())

response = response_generator(input)
sentences = sentence_generator(response)
for sentence in sentences:
vars = {
"Turn_Id": turn_id,
"isFinalSentence": sentence.is_final
}
connection.send(SkillConversationMessage(text=sentence.text, variables=vars))

Note that we add the Turn_Id and isFinalSentence variables. This information helps the Digital Person know that these messages correspond to the same turn, and also inform when the message is finalized, which is important to allow early cancelling of responses, as well as facilitating better handling of the streamed messages.