Serialization#
The following objects in Rigging have great serialization support for storage and retrieval:
Most of this stems from our use of Pydantic for core models, and we've included some helpful fields for reconstructing Chats and Completions.
JSON Serialization#
Let's build a joke pipeline and serialize the final chat into JSON.
import rigging as rg
class Joke(rg.Model):
content: str
chat = (
await
rg.get_generator("gpt-3.5-turbo")
.chat(f"Provide 3 jokes each between {Joke.xml_tags()} tags.")
.meta(tags=['joke'])
.with_(temperature=1.25)
.run()
)
chat.last.parse_set(Joke)
serialized = chat.model_dump_json(indent=2)
print(serialized)
{
"uuid": "891c3834-2588-4652-8371-e9746086fd46",
"timestamp": "2024-05-10T11:44:15.501326",
"messages": [
{
"role": "user",
"parts": [],
"content": "Provide 3 jokes each between <joke></joke> tags."
}
],
"generated": [
{
"role": "assistant",
"parts": [
{
"model": {
"content": " Why was the math book sad? Because it had too many problems. "
},
"slice_": [
0,
75
]
},
{
"model": {
"content": " I told my wife she should embrace her mistakes. She gave me a hug. "
},
"slice_": [
76,
157
]
},
{
"model": {
"content": " Why did the scarecrow win an award? Because he was outstanding in his field. "
},
"slice_": [
158,
249
]
}
],
"content": "<joke> Why was the math book sad? Because it had too many problems. </joke>\n<joke> I told my wife she should embrace her mistakes. She gave me a hug. </joke>\n<joke> Why did the scarecrow win an award? Because he was outstanding in his field. </joke>"
}
],
"metadata": {
"tags": [
"joke"
]
},
"generator_id": "litellm!gpt-3.5-turbo,temperature=1.25"
}
You'll notice that every Chat gets a unique id
field to help track them in a datastore like Elastic or Pandas. We also
assign a timestamp
to understand when the generation took place. We are also taking advantage of the
.meta()
to add a tracking tag for filtering later.
JSON Deserialization#
The JSON has everything required to reconstruct a Chat including a generator_id
dynamically
constructed to perserve the parameters used to create the generated message(s). We can now
deserialize a chat from a datastore, and immediately step back into a
ChatPipeline
for exploration.
chat = rg.Chat.model_validate_json(serialized)
print(chat.conversation)
# [user]: Provide 3 jokes each between <joke></joke> tags.
# [assistant]:
# <joke> Why was the math book sad? Because it had too many problems. </joke>
# <joke> I told my wife she should embrace her mistakes. She gave me a hug. </joke>
# <joke> Why did the scarecrow win an award? Because he was outstanding in his field. </joke>
continued = chat.continue_("Please explain the first joke to me.").run()
print(continued.last)
# [assistant]: In the first joke, the pun is based on the double meaning of the word "problems."
# The math book is described as being sad because it has "too many problems," which could be
# interpreted as having both mathematical problems (equations to solve) and emotional difficulties.
# This play on words adds humor to the joke.
Pandas DataFrames#
Rigging also has helpers in the rigging.data
module for performing conversions
between Chat objects and other storage formats like Pandas. In chats_to_df
the messages are flattened and stored with a chat_id
column for grouping.
df_to_chats
allows you to reconstruct a list of Chat objects back from a DataFrame.
import rigging as rg
chats = (
await
rg.get_generator("claude-3-haiku-20240307")
.chat("Write me a haiku.")
.run_many(3)
)
df = rg.data.chats_to_df(chats)
# or
df = chats.to_df()
print(df.info())
# RangeIndex: 6 entries, 0 to 5
# Data columns (total 9 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 chat_id 6 non-null string
# 1 chat_metadata 6 non-null string
# 2 chat_generator_id 6 non-null string
# 3 chat_timestamp 6 non-null datetime64[ms]
# 4 generated 6 non-null bool
# 5 role 6 non-null category
# 6 parts 6 non-null string
# 7 content 6 non-null string
# 8 message_id 6 non-null string
# dtypes: bool(1), category(1), datetime64[ms](1), string(6)
df.content.apply(lambda x: len(x)).mean()
# 60.166666666666664
back = rg.data.df_to_chats(df)
print(back[0].conversation)
# [user]: Write me a haiku.
#
# [assistant]: Here's a haiku for you:
#
# Gentle breeze whispers,
# Flowers bloom in vibrant hues,
# Nature's simple bliss.