llm_handler¶

About¶

The llm_handler is a utility for interacting with Large Language Models (LLMs) within the Cascade.

Usage¶

Each Cascade can create an associated llm_handler instance:

cascade = gd.Cascade(...)
cascade.set_llm_handler(...)

Once set_llm_handler() has been run the llm_handler can be accessed and ran on the data in the cascade.

cascade.llm_handler.[...]

llm_handler¶

Returns the llm_handler instance once set_llm_handler() has been run.

Handles interaction with LLM providers.

Inherits all attributes, properties & methods from both CascadeLLMHandler and BaseLLMHandler classes.

Type:: llm_handler (Cascade attribute)

Cascade Extension¶

The .llm_handler instance available in the Cascade has some extra functionality.

Internally, this is added by the CascadeLLMHandler super class which adds cascade functionality to the base class.

class glyphdeck.processors.cascade.Cascade.CascadeLLMHandler(

*args,

**kwargs,

)¶

Bases: BaseLLMHandler

Inherits from BaseLLMHandler, handles the interaction with LLM providers and manages the processing of input data for asynchronous querying.

outer_cascade¶: Reference to the Cascade instance that this Handler is associated with.

use_selected¶: Boolean flag to indicate whether to use manually ‘selected’ data or the latest data.

use_selected_of_record¶: Boolean flag to indicate whether to use a selected record in the cascade.

selected_record_identifier¶: The identifier (key or title) of the selected record to be accessed.

selected_input_data¶: The data dictionary selected for use by the handler.

selected_column_names¶: A list of column names to be used by the handler, if specified.

selected_record_title¶: The title of the selected record, used to keep the cache identifier unique.

property active_column_names: List[str]¶

Returns the column names of the active record.

Depending on the state of self.use_selected, this method retrieves the column names from either the selected column names or the active record in the cascade.

Returns:: The list of active column names.
Return type:: List[str]

property active_input_data: Dict[int | str, List]¶

Returns the input data to be used by the Handler, determined by the current selection state.

If self.use_selected is True, it returns self.selected_input_data. Otherwise, it returns the data of the active record key.

Returns:: The input data dictionary to be used by the Handler.
Return type:: DataDict

property active_record_key: int¶

Returns the key of the active record based on current selection state.

If self.use_selected_of_record is True, returns the selected record key. Otherwise, returns the key of the latest record.

Returns:: The key of the active record.
Return type:: int

property active_record_title: str¶

Returns the title of the active record.

Depending on the state of self.use_selected, this method retrieves the title from either the selected record or the active record in the cascade.

Returns:: The title of the active record.
Return type:: str

run( title, )¶

Run the CascadeLLMHandler and appends the results to the cascade.

The function will process the CascadeLLMHandler with the current settings and append the resulting output data to the cascade as a new record with the specified title.

Parameters:: title (str) – The title to be assigned to the new record in the cascade.
Returns:: The CascadeLLMHandler object, allowing further cascadeed operations.
Return type:: CascadeLLMHandler

use_latest()¶

Set the CascadeLLMHandler to use the latest record in the Cascade.

When invoked, this method ensures that the CascadeLLMHandler will operate on the latest record in the Cascade rather than any manually selected data.

Parameters:: None
Returns:: None

use_record( record_identifier: int | str, )¶

Set cascade.llm_handler to use a specified record.

Parameters:

record_identifier – The identifier of the record to be used. Can be an integer representing the record key,
title. (or a string representing the record)

Returns:

None

use_selection( data: Dict[int | str, List], record_title: str, column_names: List[str] | None = None, )¶

Update selected data and column_names. Will use the self.latest_column names if column_names is not specified.

When selected through this method, the handler will use the provided data, column names (if any), and record title for future processing steps.

Parameters:

data – The data to be utilized.
record_title – A unique title given to this specific record of data.
column_names – A list of column names to be used for this data. Defaults to None.

Returns:

None

BaseLLMHandler¶

The .llm_handler inherits the features of the BaseLLMHandler.

class glyphdeck.processors.llm_handler.BaseLLMHandler( input_data: Dict[int | str, List], provider: str, model: str, system_message: str, validation_model, cache_identifier: str, use_cache: bool = True, temperature: float = 0.2, max_validation_retries: int = 2, max_preprepared_coroutines: int = 10, max_awaiting_coroutines: int = 100, )¶

Bases: object

Handler for interacting with Large Language Models (LLMs) and managing their settings, inputs, and outputs.

It can be used separately in this module but can also be accessed in a more streamlined way as within the Cascade class.

input_data¶

Dictionary containing the input data.

Type:: DataDict

provider¶

Name of the LLM provider.

Type:: str

model¶

Model identifier for the LLM.

Type:: str

system_message¶

The system message to provide in the LLM prompts.

Type:: str

validation_model¶: Pydantic class used for validating LLM outputs.

cache_identifier¶

Unique string used to identify discrete jobs and avoid cache mixing.

Type:: str

use_cache¶

Boolean indicating whether to use cache or not.

Type:: bool

temperature¶

Determines if the responses are deterministic (lower value) or random (higher value).

Type:: float

max_validation_retries¶

Maximum number of retries for validation attempts.

Type:: 2

max_preprepared_coroutines¶

Semaphore to limit the number of pre-prepared coroutines.

Type:: 10

max_awaiting_coroutines¶

Semaphore to limit the number of awaiting coroutines.

Type:: 100

_raw_output_data¶: Dictionary to store the intermediate LLM outputs.

new_output_data¶: Flattened output data to be generated.

new_column_names¶: Generated column names to be used in the flattened output data.

available_providers¶: List of LLM providers that are available.

property column_names: List[str]¶

Accesses the column names after they have been generated during data flattening.

Returns:: The list of column names.
Return type:: List[str]
Raises:: AssertionError – If column_names is accessed before flatten_output_data() has been run.

flatten_output_data( column_names: List[str], )¶

Flattens output data into a dictionary of lists for compatibility with the cascade class. Also creates the new column names for the eventual output.

Parameters:: column_names – List of column names to be used.
Returns:: Dictionary of flattened output data.

property output_data: Dict[int | str, List]¶

Accesses the output data after it has been flattened.

Returns:: The flattened output data.
Return type:: DataDict
Raises:: AssertionError – If output_data is accessed before flatten_output_data() has been run.