llm_handler

About

The llm_handler is a utility for interacting with Large Language Models (LLMs) within the Cascade.

Usage

Each Cascade can create an associated llm_handler instance:

cascade = gd.Cascade(...)
cascade.set_llm_handler(...)

Once set_llm_handler() has been run the llm_handler can be accessed and ran on the data in the cascade.

cascade.llm_handler.[...]
llm_handler

Returns the llm_handler instance once set_llm_handler() has been run.

Handles interaction with LLM providers.

Inherits all attributes, properties & methods from both CascadeLLMHandler and BaseLLMHandler classes.

Type:

llm_handler (Cascade attribute)

Cascade Extension

The .llm_handler instance available in the Cascade has some extra functionality.

Internally, this is added by the CascadeLLMHandler super class which adds cascade functionality to the base class.

class glyphdeck.processors.cascade.Cascade.CascadeLLMHandler(
*args,
**kwargs,
)

Bases: BaseLLMHandler

Inherits from BaseLLMHandler, handles the interaction with LLM providers and manages the processing of input data for asynchronous querying.

outer_cascade

Reference to the Cascade instance that this Handler is associated with.

use_selected

Boolean flag to indicate whether to use manually ‘selected’ data or the latest data.

use_selected_of_record

Boolean flag to indicate whether to use a selected record in the cascade.

selected_record_identifier

The identifier (key or title) of the selected record to be accessed.

selected_input_data

The data dictionary selected for use by the handler.

selected_column_names

A list of column names to be used by the handler, if specified.

selected_record_title

The title of the selected record, used to keep the cache identifier unique.

property active_column_names: List[str]

Returns the column names of the active record.

Depending on the state of self.use_selected, this method retrieves the column names from either the selected column names or the active record in the cascade.

Returns:

The list of active column names.

Return type:

List[str]

property active_input_data: Dict[int | str, List]

Returns the input data to be used by the Handler, determined by the current selection state.

If self.use_selected is True, it returns self.selected_input_data. Otherwise, it returns the data of the active record key.

Returns:

The input data dictionary to be used by the Handler.

Return type:

DataDict

property active_record_key: int

Returns the key of the active record based on current selection state.

If self.use_selected_of_record is True, returns the selected record key. Otherwise, returns the key of the latest record.

Returns:

The key of the active record.

Return type:

int

property active_record_title: str

Returns the title of the active record.

Depending on the state of self.use_selected, this method retrieves the title from either the selected record or the active record in the cascade.

Returns:

The title of the active record.

Return type:

str

run(
title,
)

Run the CascadeLLMHandler and appends the results to the cascade.

The function will process the CascadeLLMHandler with the current settings and append the resulting output data to the cascade as a new record with the specified title.

Parameters:

title (str) – The title to be assigned to the new record in the cascade.

Returns:

The CascadeLLMHandler object, allowing further cascadeed operations.

Return type:

CascadeLLMHandler

use_latest()

Set the CascadeLLMHandler to use the latest record in the Cascade.

When invoked, this method ensures that the CascadeLLMHandler will operate on the latest record in the Cascade rather than any manually selected data.

Parameters:

None

Returns:

None

use_record(
record_identifier: int | str,
)

Set cascade.llm_handler to use a specified record.

Parameters:
  • record_identifier – The identifier of the record to be used. Can be an integer representing the record key,

  • title. (or a string representing the record)

Returns:

None

use_selection(
data: Dict[int | str, List],
record_title: str,
column_names: List[str] | None = None,
)

Update selected data and column_names. Will use the self.latest_column names if column_names is not specified.

When selected through this method, the handler will use the provided data, column names (if any), and record title for future processing steps.

Parameters:
  • data – The data to be utilized.

  • record_title – A unique title given to this specific record of data.

  • column_names – A list of column names to be used for this data. Defaults to None.

Returns:

None

BaseLLMHandler

The .llm_handler inherits the features of the BaseLLMHandler.

class glyphdeck.processors.llm_handler.BaseLLMHandler(
input_data: Dict[int | str, List],
provider: str,
model: str,
system_message: str,
validation_model,
cache_identifier: str,
use_cache: bool = True,
temperature: float = 0.2,
max_validation_retries: int = 2,
max_preprepared_coroutines: int = 10,
max_awaiting_coroutines: int = 100,
)

Bases: object

Handler for interacting with Large Language Models (LLMs) and managing their settings, inputs, and outputs.

It can be used separately in this module but can also be accessed in a more streamlined way as within the Cascade class.

input_data

Dictionary containing the input data.

Type:

DataDict

provider

Name of the LLM provider.

Type:

str

model

Model identifier for the LLM.

Type:

str

system_message

The system message to provide in the LLM prompts.

Type:

str

validation_model

Pydantic class used for validating LLM outputs.

cache_identifier

Unique string used to identify discrete jobs and avoid cache mixing.

Type:

str

use_cache

Boolean indicating whether to use cache or not.

Type:

bool

temperature

Determines if the responses are deterministic (lower value) or random (higher value).

Type:

float

max_validation_retries

Maximum number of retries for validation attempts.

Type:

2

max_preprepared_coroutines

Semaphore to limit the number of pre-prepared coroutines.

Type:

10

max_awaiting_coroutines

Semaphore to limit the number of awaiting coroutines.

Type:

100

_raw_output_data

Dictionary to store the intermediate LLM outputs.

new_output_data

Flattened output data to be generated.

new_column_names

Generated column names to be used in the flattened output data.

available_providers

List of LLM providers that are available.

property column_names: List[str]

Accesses the column names after they have been generated during data flattening.

Returns:

The list of column names.

Return type:

List[str]

Raises:

AssertionError – If column_names is accessed before flatten_output_data() has been run.

flatten_output_data(
column_names: List[str],
)

Flattens output data into a dictionary of lists for compatibility with the cascade class. Also creates the new column names for the eventual output.

Parameters:

column_names – List of column names to be used.

Returns:

Dictionary of flattened output data.

property output_data: Dict[int | str, List]

Accesses the output data after it has been flattened.

Returns:

The flattened output data.

Return type:

DataDict

Raises:

AssertionError – If output_data is accessed before flatten_output_data() has been run.