Home#
Rigging is a lightweight LLM framework to make using language models in production code as simple and effective as possible. Here are the highlights:
- Structured Pydantic models can be used interchangably with unstructured text output.
- LiteLLM as the default generator giving you instant access to a huge array of models.
- Define prompts as python functions with type hints and docstrings.
- Simple tool use, even for models which don't support them at the API.
- Store different models and configs as simple connection strings just like databases.
- Integrated tracing support with Logfire to track activity.
- Chat templating, forking, continuations, generation parameter overloads, stripping segments, etc.
- Async batching and fast iterations for large scale generation.
- Metadata, callbacks, and data format conversions.
- Modern python with type hints, async support, pydantic validation, serialization, etc.
import rigging as rg
@rg.prompt(generator_id="gpt-4")
async def get_authors(count: int = 3) -> list[str]:
"""Provide famous authors."""
print(await get_authors())
# ['William Shakespeare', 'J.K. Rowling', 'Jane Austen']
Rigging is built by dreadnode where we use it daily.
Installation#
We publish every version to Pypi:
If you want all the extras (vLLM, transformers, examples), just specify the all
extra:
If you want to build from source:
Migration Guides#
Getting Started#
Rigging is a flexible library built on other flexible libraries. As such it might take a bit to warm up to it's interfaces provided the many ways you can accomplish your goals. However, the code is well documented and topic pages and source are a great places to step in/out of as you explore.
IDE Setup
Rigging has been built with full type support which provides clear guidance on what methods return what types, and when they return those types. It's recommended that you operate in a development environment which can take advantage of this information. Rigging will almost "fall" into place and you won't be guessing about objects as you work.
Basic Chats#
Let's start with a very basic generation example that doesn't include any parsing features, continuations, etc. You want to chat with a model and collect it's response.
We first need to get a generator object. We'll use
get_generator
which will resolve an identifier string
to the underlying generator class object.
API Keys
The default Rigging generator is LiteLLM, which
wraps a large number of providers and models. We assume for these examples that you
have API tokens set as environment variables for these models. You can refer to the
LiteLLM docs for supported providers and their key format.
If you'd like, you can change any of the model IDs we use and/or add ,api_key=[sk-1234]
to the
end of any of the generator IDs to specify them inline.
import rigging as rg # (1)!
generator = rg.get_generator("claude-3-sonnet-20240229") # (2)!
pipeline = generator.chat(
[
{"role": "system", "content": "You are a wizard harry."},
{"role": "user", "content": "Say hello!"},
]
)
chat = await pipeline.run() # (3)!
print(chat.all)
# [
# Message(role='system', parts=[], content='You are a wizard harry.'),
# Message(role='user', parts=[], content='Say hello!'),
# ]
- You'll see us use this shorthand import syntax throughout our code, it's totally optional but makes things look nice.
- This is actually shorthand for
litellm!anthropic/claude-3-sonnet-20240229
, wherelitellm
is the provider. We just default to that generator and you don't have to be explicit. You can find more information about this in the generators docs. - From version 2 onwards, Rigging is fully async. You can use
await
to trigger generation and get your results, or useawait_
.
Generators have an easy chat()
method which you'll
use to initiate the conversations. You can supply messages in many different forms from
dictionary objects, full Message
classes, or a simple str
which will be converted to a user message.
import rigging as rg
generator = rg.get_generator("claude-3-sonnet-20240229")
pipeline = generator.chat( # (1)!
[
{"role": "system", "content": "You are a wizard harry."},
{"role": "user", "content": "Say hello!"},
]
)
chat = await pipeline.run()
print(chat.all)
# [
# Message(role='system', parts=[], content='You are a wizard harry.'),
# Message(role='user', parts=[], content='Say hello!'),
# Message(role='assistant', parts=[], content='Hello! How can I help you today?'),
# ]
generator.chat
is actually just a helper forchat(generator, ...)
, they do the same thing.
ChatPipeline vs Chat
You'll notice we name the result of chat()
as pipeline
. The naming might be confusing,
but chats go through 2 phases. We first stage them into a pipeline, where we operate
and prepare them before we actually trigger generation with run()
.
Calling .chat()
doesn't trigger any generation, but calling any of these run methods will:
In this case, we have nothing additional we want to add to our chat pipeline, and we are only interested
in generating exactly one response message. We simply call .run()
to
execute the generation process and collect our final Chat
object.
import rigging as rg
generator = rg.get_generator("claude-3-sonnet-20240229")
pipeline = generator.chat(
[
{"role": "system", "content": "You are a wizard harry."},
{"role": "user", "content": "Say hello!"},
]
)
chat = await pipeline.run()
print(chat.all)
# [
# Message(role='system', parts=[], content='You are a wizard harry.'),
# Message(role='user', parts=[], content='Say hello!'),
# Message(role='assistant', parts=[], content='Hello! How can I help you today?'),
# ]
View more about Chat objects and their properties over here. In general, chats give you access to exactly what messages were passed into a model, and what came out the other side.
Prompts#
Operating chat pipelines manually is very flexible, but can feel a bit verbose. Rigging supports the concept of "prompt functions" where you to define the interaction with an LLM as a python function signature, and convert that to a callable object which abstracts the pipeline away from you.
Prompts are very powerful. You can take control over any of the inputs in your docstring, gather outputs as structured objects, lists, dataclasses, and collect the underlying Chat object, etc.
Check out Prompt Functions for more information.
Tools#
Tools exposed to LLMs are super simple with Rigging. You can define a python function and make it available straight in the chat pipeline.
import rigging as rg
def add_numbers(x: float, y: float) -> float:
return x + y
chat = (
await
rg.get_generator("gpt-4o-mini")
.chat("What is 1337 + 42?")
.using(add_numbers)
.run()
)
print(chat.conversation)
# [user]: What is 1337 + 42?
#
# [assistant]:
# |- add_numbers({"x":1337,"y":42})
#
# [tool]: 1379
#
# [assistant]: 1337 + 42 equals 1379.
You can add as many tools as you'd like, document them and their parameters, and we support complex
argument types like pydantic models and dataclasses. Your function can return standard objects to
cast into strings, Message
objects, or even content parts for
multi-modal generation (ContentImageUrl
)
Check out Tools for more information.
Tools + Prompts#
You can combine prompts and tools to achieve "multi-agent" behavior:
import rigging as rg
from typing import Annotated
Joke = Annotated[str, rg.Ctx("joke")]
@rg.prompt(generator_id="gpt-4o-mini")
async def generate_jokes(count: int) -> list[Joke]:
"Write {{count}} short hilarious jokes."
@rg.prompt(generator_id="gpt-4o", tools=[generate_jokes])
async def write_joke() -> Joke:
"""
Generate some jokes, then choose the best.
You must return just a single joke.
"""
joke = await write_joke()
Underneath the generate_jokes
prompt will be presented as an available tool when gpt-4o
is working
on tasks, and rigging with handle all the inference and type processing for you.
Conversations#
Both ChatPipeline
and Chat
objects provide freedom
for forking off the current state of messages, or continuing a stream of messages after generation has occured.
In general:
ChatPipeline.fork
will clone the current chat pipeline and let you maintain both the new and original object for continued processing.Chat.fork
will produce a freshChatPipeline
from all the messages prior to the previous generation (useful for "going back" in time).Chat.continue_
is similar tofork
(actually a wrapper) which tellsfork
to include the generated messages as you move on (useful for "going forward" in time).
In other words, the abstraction of going back and forth in a "conversation" would be continuously calling
Chat.continue_
after each round of generation.
import rigging as rg
generator = rg.get_generator("gpt-3.5-turbo")
chat = generator.chat("Hello, how are you?")
# We can fork before generation has occured
specific = await chat.fork("Be specific please.").run()
poetic = await chat.fork("Be as poetic as possible").with_(temperature=1.5).run() # (1)!
# We can also continue after generation
next_chat = poetic.continue_("That's good, tell me a joke") # (2)!
update = await next_chat.run()
- In this case the temperature change will only be applied to the poetic path because
fork
has created a clone of our chat pipeline. - For convience, we can usually just pass
str
objects in place of full messages, which underneath will be converted to aMessage
object with theuser
role.
Basic Parsing#
Now let's assume we want to ask the model for a piece of information, and we want to make sure this item conforms to a pre-defined structure. Underneath rigging uses Pydantic XML which itself is built on Pydantic. We'll cover more about constructing models in a later section, but don't stress the details for now.
XML vs JSON
Rigging is opinionated with regard to using XML to weave unstructured data with structured contents
as the underlying LLM generates text responses, at least when it comes to raw text content. If you want
to take advantage of structured JSON parsing provided by model providers or inference tools,
APITools
are a good alternative.
You can read more about XML tag use from Anthropic who have done extensive research with their models.
To begin, let's define a FunFact
model which we'll have the LLM fill in. Rigging exposes a
Model
base class which you should inherit from when defining structured
inputs. This is a lightweight wrapper around pydantic-xml's BaseXMLModel
with some added features and functionality to make it easy for Rigging to manage. However, everything
these models support (for the most part) is also supported in Rigging.
import rigging as rg
class FunFact(rg.Model):
fact: str # (1)!
chat = await rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a fun fact between {FunFact.xml_example()} tags."
).run()
fun_fact = chat.last.parse(FunFact)
print(fun_fact.fact)
# The Eiffel Tower can be 15 cm taller during the summer due to the expansion of the iron in the heat.
- This is what pydantic XML refers to as a "primitive" class as it is simply and single typed value placed between the tags. See more about primitive types, elements, and attributes in the Pydantic XML Docs
We need to show the target LLM how to format it's response, so we'll use the
.xml_example()
class method which all models
support. By default this will simple emit empty XML tags of our model:
Customizing Model Tags
Tags for a model are auto-generated based on the name of the class. You are free
to override these by passing tag=[value]
into your class definition like this:
We wrap up the generation and extract our parsed object by calling .parse()
on the last message of our generated chat. This will process the contents
of the message, extract the first matching model which parses successfully, and return it to us as a python
object.
import rigging as rg
class FunFact(rg.Model):
fact: str
chat = await rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a fun fact between {FunFact.xml_example()} tags."
).run()
fun_fact = chat.last.parse(FunFact)
print(fun_fact.fact) # (1)!
# The Eiffel Tower can be 15 cm taller during the summer due to the expansion of the iron in the heat.
- Because we've defined
FunFact
as a class, the result if.parse()
is typed to that object. In our code, all the properties of fact will be available just like we created the object directly.
Notice that we don't have to worry about the model being verbose in it's response, as we've communicated
that the text between the <fun-fact></fun-fact>
tags is the relevent place to put it's answer.
Strict Parsing#
In the example above, we don't handle the case where the model fails to properly conform to our
desired output structure. If the last message content is invalid in some way, our call to parse
will result in an exception from rigging. Rigging is designed at it's core to manage this process,
and we have a few options:
- We can extend our chat pipeline with
.until_parsed_as()
which will cause therun()
function to internally check if parsing is succeeding before returning the chat back to you. - We can make the parsing optional by switching to
.try_parse()
. The type of the return value with automatically switch toFunFact | None
and you can handle cases where parsing failed.
chat = (
await
rg.get_generator('gpt-3.5-turbo')
.chat(f"Provide a fun fact between {FunFact.xml_example()} tags.")
.until_parsed_as(FunFact)
.run()
)
fun_fact = chat.last.parse(FunFact) # This call should never fail
print(fun_fact or "Failed to get fact")
Double Parsing
We still have to call .parse()
on the message despite
using .until_parsed_as()
. This is
a limitation of type hinting as we'd have to turn every ChatPipeline
and Chat
into a generic
which could carry types forward. It's a small price for big code complexity savings. However,
the use of .until_parsed_as()
will cause
the generated messages to have parsed models in their .parts
.
So if you don't need to access the typed object immediately, you can be confident serializing
the chat object and the model will be there when you need it.
Max Rounds Concept
When control is passed into a chat pipeline with .until_parsed_as()
,
a callback is registered internally to operate during generation. When model output is received, the
callback will attempt to parse, and if it fails, it will re-trigger generation with or without context depending
on the attempt_recovery
parameter. This process will repeat
until the model produces a valid output or the maximum number of "rounds" is reached.
Often you might find yourself constantly getting ExhaustedMaxRoundsError
exceptions. This is usually a sign that the LLM doesn't have enough information about the desired output, or
complexity in your model is too high. You have a few options for gracefull handling these situations:
Parsing Multiple Models#
Assuming we wanted to extend our example to produce a set of interesting facts, we have a couple of options:
- Simply use
run_many()
and generate N examples individually - Rework our code slightly and let the model provide us multiple facts at once.
Parsing with Prompts#
The use of Prompt
functions can make parsing even easier. We can refactor
our previous example and have rigging parse out FunFacts directly for us:
Keep Going#
Check out the topics section for more in-depth explanations and examples.