
LANGCHAIN — What Are Data-Driven Characters?
The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it. — Mark Weiser
Data-driven characters, a repository for creating, debugging, and interacting with chatbots conditioned on your own story corpora, provides a way to generate character chatbots from pre-existing corpora. It offers three main ways to interact with your data-driven characters: exporting to character.ai, debugging locally, and hosting a self-contained Streamlit app in the browser.
The repository offers a simple library that allows users to process any text corpus, create character definitions, and manage memory. Here’s a quick overview of how it works:
To generate a character definition, you can use the following code snippet:
from dataclasses import asdict
import json
from data_driven_characters.character import generate_character_definition
from data_driven_characters.corpus import generate_corpus_summaries, load_docs
CORPUS = 'data/everything_everywhere_all_at_once.txt'
CHARACTER_NAME = "Evelyn"
docs = load_docs(corpus_path=CORPUS, chunk_size=2048, chunk_overlap=64)
character_definition = generate_character_definition(
name=CHARACTER_NAME,
corpus_summaries=generate_corpus_summaries(docs=docs))
print(json.dumps(asdict(character_definition), indent=4))You can export this character definition to character.ai and run your own chatbot. The benefit of creating characters on character.ai is that it hosts an entire ecosystem of character chatbots that you can interact with for free. However, the data-driven characters repository allows you to easily create, debug, and run your own chatbots conditioned on your own corpora, giving you control over character grounding and memory management.
The repository also provides tools to compare different methods for packaging information about a character’s backstory to create the character, such as character summary, retrieval over the transcript, and retrieval over a summarized version of the transcript.
In addition, you can contribute characters generated with data-driven-characters to the repository, and the long-term goal is to create a decentralized ecosystem and community of data-driven artificial characters.
Data-driven characters is an evolving repository and has a lot of potential for future work, including the addition of new chatbot architectures, memory management schemes, and better user interfaces. If you’re interested in contributing, check out the contributing section in the Github README for details.




