rigging.generator
Generators produce completions for a given set of messages or text.
StopReason = t.Literal['stop', 'length', 'content_filter', 'unknown']
module-attribute
#
Reporting reason for generation completing.
GenerateParams
#
Bases: BaseModel
Parameters for generating text using a language model.
These are designed to generally overlap with underlying APIs like litellm, but will be extended as needed.
Note
Use the extra
field to pass additional parameters to the API.
api_base: str | None = None
class-attribute
instance-attribute
#
The base URL for the API.
extra: dict[str, t.Any] = Field(default_factory=dict)
class-attribute
instance-attribute
#
Extra parameters to be passed to the API.
frequency_penalty: float | None = None
class-attribute
instance-attribute
#
The frequency penalty.
max_tokens: int | None = None
class-attribute
instance-attribute
#
The maximum number of tokens to generate.
presence_penalty: float | None = None
class-attribute
instance-attribute
#
The presence penalty.
seed: int | None = None
class-attribute
instance-attribute
#
The random seed.
stop: list[str] | None = None
class-attribute
instance-attribute
#
A list of stop sequences to stop generation at.
temperature: float | None = None
class-attribute
instance-attribute
#
The sampling temperature.
timeout: int | None = None
class-attribute
instance-attribute
#
The timeout for the API request.
top_k: int | None = None
class-attribute
instance-attribute
#
The top-k sampling parameter.
top_p: float | None = None
class-attribute
instance-attribute
#
The nucleus sampling probability.
merge_with(*others: t.Optional[GenerateParams]) -> GenerateParams
#
Apply a series of parameter overrides to the current instance and return a copy.
Parameters:
-
*others
(Optional[GenerateParams]
, default:()
) –The parameters to be merged with the current instance's parameters. Can be multiple and overrides will be applied in order.
Returns:
-
GenerateParams
–The merged parameters instance.
Source code in rigging/generator/base.py
to_dict() -> dict[str, t.Any]
#
Convert the parameters to a dictionary.
Returns:
-
dict[str, Any]
–The parameters as a dictionary.
Source code in rigging/generator/base.py
GeneratedMessage
#
Bases: BaseModel
A generated message with additional generation information.
extra: dict[str, t.Any] = Field(default_factory=dict)
class-attribute
instance-attribute
#
Any additional information from the generation.
message: Message
instance-attribute
#
The generated message.
stop_reason: t.Annotated[StopReason, BeforeValidator(convert_stop_reason)] = 'unknown'
class-attribute
instance-attribute
#
The reason for stopping generation.
usage: t.Optional[Usage] = None
class-attribute
instance-attribute
#
The usage statistics for the generation if available.
GeneratedText
#
Bases: BaseModel
A generated text with additional generation information.
extra: dict[str, t.Any] = Field(default_factory=dict)
class-attribute
instance-attribute
#
Any additional information from the generation.
stop_reason: t.Annotated[StopReason, BeforeValidator(convert_stop_reason)] = 'unknown'
class-attribute
instance-attribute
#
The reason for stopping generation.
text: str
instance-attribute
#
The generated text.
usage: t.Optional[Usage] = None
class-attribute
instance-attribute
#
The usage statistics for the generation if available.
Generator
#
Bases: BaseModel
Base class for all rigging generators.
This class provides common functionality and methods for generating completion messages.
A subclass of this can implement both or one of the following:
generate_messages
: Process a batch of messages.generate_texts
: Process a batch of texts.
api_key: str | None = Field(None, exclude=True)
class-attribute
instance-attribute
#
The API key used for authentication.
model: str
instance-attribute
#
The model name to be used by the generator.
params: GenerateParams
instance-attribute
#
The parameters used for generating completion messages.
chat(messages: t.Sequence[MessageDict] | t.Sequence[Message] | MessageDict | Message | str | None = None, params: GenerateParams | None = None) -> ChatPipeline
#
Build a chat pipeline with the given messages and optional params overloads.
Parameters:
-
messages
(Sequence[MessageDict] | Sequence[Message] | MessageDict | Message | str | None
, default:None
) –The messages to be sent in the chat.
-
params
(GenerateParams | None
, default:None
) –Optional parameters for generating responses.
Returns:
-
ChatPipeline
–chat pipeline to run.
Source code in rigging/generator/base.py
complete(text: str, params: GenerateParams | None = None) -> CompletionPipeline
#
Build a completion pipeline of the given text with optional param overloads.
Parameters:
-
text
(str
) –The input text to be completed.
-
params
(GenerateParams | None
, default:None
) –The parameters to be used for completion.
Returns:
-
CompletionPipeline
–The completed text.
Source code in rigging/generator/base.py
generate_messages(messages: t.Sequence[t.Sequence[Message]], params: t.Sequence[GenerateParams]) -> t.Sequence[GeneratedMessage]
async
#
Generate a batch of messages using the specified parameters.
Note
The length of params
must be the same as the length of many
.
Parameters:
-
messages
(Sequence[Sequence[Message]]
) –A sequence of sequences of messages.
-
params
(Sequence[GenerateParams]
) –A sequence of GenerateParams objects.
Returns:
-
Sequence[GeneratedMessage]
–A sequence of generated messages.
Raises:
-
NotImplementedError
–This method is not supported by this generator.
Source code in rigging/generator/base.py
generate_texts(texts: t.Sequence[str], params: t.Sequence[GenerateParams]) -> t.Sequence[GeneratedText]
async
#
Generate a batch of text completions using the generator.
Note
This method falls back to looping over the inputs and calling generate_text
for each item.
Note
If supplied, the length of params
must be the same as the length of many
.
Parameters:
-
texts
(Sequence[str]
) –The input texts for generating the batch.
-
params
(Sequence[GenerateParams]
) –Additional parameters for generating each text in the batch.
Returns:
-
Sequence[GeneratedText]
–The generated texts.
Raises:
-
NotImplementedError
–This method is not supported by this generator.
Source code in rigging/generator/base.py
load() -> Self
#
If supported, trigger underlying loading and preparation of the model.
Returns:
-
Self
–The generator.
prompt(func: t.Callable[P, t.Coroutine[None, None, R]]) -> Prompt[P, R]
#
Decorator to convert a function into a prompt bound to this generator.
See rigging.prompt.prompt for more information.
Parameters:
-
func
(Callable[P, Coroutine[None, None, R]]
) –The function to be converted into a prompt.
Returns:
-
Prompt[P, R]
–The prompt.
Source code in rigging/generator/base.py
to_identifier(params: GenerateParams | None = None) -> str
#
Converts the generator instance back into a rigging identifier string.
This calls rigging.generator.get_identifier with the current instance.
Parameters:
-
params
(GenerateParams | None
, default:None
) –The generation parameters.
Returns:
-
str
–The identifier string.
Source code in rigging/generator/base.py
unload() -> Self
#
watch(*callbacks: WatchCallbacks, allow_duplicates: bool = False) -> Generator
#
Registers watch callbacks to be passed to any created rigging.chat.ChatPipeline or rigging.completion.CompletionPipeline.
Parameters:
-
*callbacks
(WatchCallbacks
, default:()
) –The callback functions to be executed.
-
allow_duplicates
(bool
, default:False
) –Whether to allow (seemingly) duplicate callbacks to be added.
Returns:
-
Generator
–The current instance of the chat.
Source code in rigging/generator/base.py
wrap(func: t.Callable[[CallableT], CallableT] | None) -> Self
#
If supported, wrap any underlying interior framework calls with this function.
This is useful for adding things like backoff or rate limiting.
Parameters:
-
func
(Callable[[CallableT], CallableT] | None
) –The function to wrap the calls with.
Returns:
-
Self
–The generator.
Source code in rigging/generator/base.py
LiteLLMGenerator
#
Bases: Generator
Generator backed by the LiteLLM library.
Find more information about supported models and formats in their docs..
Note
Batching support is not performant and simply a loop over inputs.
Warning
While some providers support passing n
to produce a batch
of completions per request, we don't currently use this in the
implementation due to it's brittle requirements.
Tip
Consider setting max_connections
or [min_delay_between_requests
][rigging.generator.litellm_.LiteLLMGenerator.min_delay_between_requests
if you run into API limits. You can pass this directly in the generator id:
max_connections: int = 10
class-attribute
instance-attribute
#
How many simultaneous requests to pool at one time. This is useful to set when you run into API limits at a provider.
Set to 0 to remove the limit.
min_delay_between_requests: float = 0.0
class-attribute
instance-attribute
#
Minimum time (ms) between each request. This is useful to set when you run into API limits at a provider.
Usage
#
chat(generator: Generator, messages: t.Sequence[MessageDict] | t.Sequence[Message] | MessageDict | Message | str | None = None, params: GenerateParams | None = None) -> ChatPipeline
#
Creates a chat pipeline using the given generator, messages, and params.
Parameters:
-
generator
(Generator
) –The generator to use for creating the chat.
-
messages
(Sequence[MessageDict] | Sequence[Message] | MessageDict | Message | str | None
, default:None
) –The messages to include in the chat. Can be a single message or a sequence of messages.
-
params
(GenerateParams | None
, default:None
) –Additional parameters for generating the chat.
Returns:
-
ChatPipeline
–chat pipeline to run.
Source code in rigging/generator/base.py
get_generator(identifier: str, *, params: GenerateParams | None = None) -> Generator
#
Get a generator by an identifier string. Uses LiteLLM by default.
Identifier strings are formatted like <provider>!<model>,<**kwargs>
(provider is optional andif not specified)
Examples:
- "gpt-3.5-turbo" ->
LiteLLMGenerator(model="gpt-3.5-turbo")
- "litellm!claude-2.1" ->
LiteLLMGenerator(model="claude-2.1")
- "mistral/mistral-tiny" ->
LiteLLMGenerator(model="mistral/mistral-tiny")
You can also specify arguments to the generator by comma-separating them:
- "mistral/mistral-medium,max_tokens=1024"
- "gpt-4-0613,temperature=0.9,max_tokens=512"
- "claude-2.1,stop_sequences=Human:;test,max_tokens=100"
(These get parsed as rigging.generator.GenerateParams)
Parameters:
-
identifier
(str
) –The identifier string to use to get a generator.
-
params
(GenerateParams | None
, default:None
) –The generation parameters to use for the generator. These will override any parameters specified in the identifier string.
Returns:
-
Generator
–The generator object.
Raises:
-
InvalidModelSpecified
–If the identifier is invalid.
Source code in rigging/generator/base.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 |
|
get_identifier(generator: Generator, params: GenerateParams | None = None) -> str
#
Converts the generator instance back into a rigging identifier string.
Warning
The extra
parameter field is not currently supported in identifiers.
Parameters:
-
generator
(Generator
) –The generator object.
-
params
(GenerateParams | None
, default:None
) –The generation parameters.
Returns:
-
str
–The identifier string for the generator.
Source code in rigging/generator/base.py
register_generator(provider: str, generator_cls: type[Generator] | LazyGenerator) -> None
#
Register a generator class for a provider id.
This let's you use rigging.generator.get_generator with a custom generator class.
Parameters:
-
provider
(str
) –The name of the provider.
-
generator_cls
(type[Generator] | LazyGenerator
) –The generator class to register.
Source code in rigging/generator/base.py
VLLMGenerator
#
Bases: Generator
Generator backed by the vLLM library for local model loading.
Find more information about supported models and formats in their docs.
Warning
The use of VLLM requires the vllm
package to be installed directly or by
installing rigging as rigging[all]
.
Note
This generator doesn't leverage any async capabilities.
Note
The model load into memory will occur lazily when the first generation is requested.
If you'd want to force this to happen earlier, you can use the
.load()
method.
To unload, call .unload()
.
enforce_eager: bool = False
class-attribute
instance-attribute
#
Eager enforcement passed to vllm.LLM
gpu_memory_utilization: float = 0.9
class-attribute
instance-attribute
#
Memory utilization passed to vllm.LLM
llm: vllm.LLM
property
#
The underlying vLLM model
instance.
quantization: str | None = None
class-attribute
instance-attribute
#
Quantiziation passed to vllm.LLM
trust_remote_code: bool = False
class-attribute
instance-attribute
#
Trust remote code passed to vllm.LLM
from_obj(model: str, llm: vllm.LLM, *, params: GenerateParams | None = None) -> VLLMGenerator
classmethod
#
Create a generator from an existing vLLM instance.
Parameters:
-
llm
(LLM
) –The vLLM instance to create the generator from.
Returns:
-
VLLMGenerator
–The VLLMGenerator instance.
Source code in rigging/generator/vllm_.py
DEFAULT_MAX_TOKENS = 1024
module-attribute
#
Lifting the default max tokens from transformers
TransformersGenerator
#
Bases: Generator
Generator backed by the Transformers library for local model loading.
Warning
The use of Transformers requires the transformers
package to be installed directly or by
installing rigging as rigging[all]
.
Warning
The transformers
library is expansive with many different models, tokenizers,
options, constructors, etc. We do our best to implement a consistent interface,
but there may be limitations. Where needed, use
.from_obj()
.
Note
This generator doesn't leverage any async capabilities.
Note
The model load into memory will occur lazily when the first generation is requested.
If you'd want to force this to happen earlier, you can use the
.load()
method.
To unload, call .unload()
.
device_map: str = 'auto'
class-attribute
instance-attribute
#
Device map passed to AutoModelForCausalLM.from_pretrained
llm: AutoModelForCausalLM
property
#
The underlying AutoModelForCausalLM
instance.
load_in_4bit: bool = False
class-attribute
instance-attribute
#
Load in 4 bit passed to AutoModelForCausalLM.from_pretrained
load_in_8bit: bool = False
class-attribute
instance-attribute
#
Load in 8 bit passed to AutoModelForCausalLM.from_pretrained
pipeline: TextGenerationPipeline
property
#
The underlying TextGenerationPipeline
instance.
tokenizer: PreTrainedTokenizer
property
#
The underlying AutoTokenizer
instance.
torch_dtype: str = 'auto'
class-attribute
instance-attribute
#
Torch dtype passed to AutoModelForCausalLM.from_pretrained
trust_remote_code: bool = False
class-attribute
instance-attribute
#
Trust remote code passed to AutoModelForCausalLM.from_pretrained
from_obj(model: str, llm: AutoModelForCausalLM, tokenizer: PreTrainedTokenizer, *, pipeline: TextGenerationPipeline | None = None, params: GenerateParams | None = None) -> TransformersGenerator
classmethod
#
Create a new instance of TransformersGenerator from an already loaded model and tokenizer.
Parameters:
-
model
(str
) –The loaded model for text generation.
-
tokenizer
–The tokenizer associated with the model.
-
pipeline
(TextGenerationPipeline | None
, default:None
) –The text generation pipeline. Defaults to None.
Returns:
-
TransformersGenerator
–The TransformersGenerator instance.