YALC (Yet Another LLM Client) library

Why?

YALC is open source — find it on GitHub.

There are many libraries that wrap calls from LLM providers especially for Python. However the one thing they don't do very well is provide structure to the responses. That's where the instructor library comes in. It uses pydantic to provide structure to the final response. That means a user can define the response structure via a model for example

class Person(BaseModel):
  name: str
  age: int
  occupation: str

and we have a guarantee that the response from the LLM will be in this format or that they will fail the validation. This guarantee is important because it brings a bit more predictability to the already unpredictable world of LLMs.

Instructor also returns metadata about the LLM call. Metadata in this case is information like token usages, how many input and output tokens it took to generate the answer. However this metadata can be laid out differently in the final pydantic models for different providers and so can be hard to handle efficiently. Since YALC uses instructor internally, it provides an easy and unified way of working with metadata. More on this in the following sections.

Most LLM providers give data about token usages but don't provide the final pricing that is how much money did the actual call cost. YALC also handles this in the same way it provides a unified interface for the token tracking.

So the answer to the why question is that YALC provides a unified and easy way to track token/cost usages and a structured output for LLM responses.

Unified metadata handling & usage

The main thing that yalc library provides is the create_client function. It uses the factory pattern to create instance of the Client class that provide methods for calling LLMs. The factory takes a few parameters like which LLM model should be used and then a list of metadata strategies which will be explained later. For now lets focus on calling the factory without any strategies so by providing an empty list. We call this the metadata return mode.

Metadata return mode

An example of this usage is

client = create_client(LLMModel.gpt_4o_mini)

result, metadata = await client.structured_response(
    JudgmentResult, messages
)

As you can see its very easy and straightforward to create the intial client. YALC also provides a unified LLMModel Enum for creating clients so that no string constants need to be remembered. After the initial creation the structured_response can be called to get the result which will be typed to JudgmentResult pydantic model and the metadata will contain as the name sauggest the call metadata, it contains the cost/token tracking as mentioned earlier. This way of using YALC is useful if its required to process the metadata straight away after the initial call and do something with it. However everytime you call this method you need to "manually" handle the metadata which brings additional cognitive load or you can just forget to handle it. Below is a summary for this approach.

Advantages	Disadvantages
Simple: No setup required; get up and running immediately.	Manual Labor: You must handle or log metadata manually on every single call.
Direct Access: Metadata is available right at the call site for quick debugging.	Inconsistency: Risk of handling data differently across various parts of your codebase or forgetting it entirely.

Strategy metadata mode

This mode uses the strategy pattern. A factory can take any number of metadata strategies that handle the metadata automatically. That means the handling is defined during the client creation and doesn't need to be thought of anymore. See an example of a usage below.

# 1. Define your strategy
class LogStrategy(ClientMetadataStrategy[LLMLogContext]):
    def handle(self, call: ClientCall, context: LLMLogContext):
        print(f"Tokens: {call.input_tokens + call.output_tokens}")
        print(f"Cost: {call.input_tokens_cost + call.output_tokens_cost}")
        # Save to db for persistence
        db.save(call.model_dump(), context.request_id)

# 2. Create client with the strategy
client = create_client(LLMModel.gpt_4o_mini, metadata_strategies=[LogStrategy()])

# 3. Pass context to trigger the strategy
result = await client.structured_response(
    JudgmentResult, messages, context=llm_log_context
)

First a strategy needs to be defined by inheriting from the ClientMetadataStrategy and by implementing the handle abstract method. The method receives 2 arguments:

ClientCall - contains the metadata for cost/token tracking and the messages used for generating the LLM answer (identical to the metadata in the metadata return mode)
LLMLogContext - a context object that can be defined by the user using a generic that contains custom context data

In the example above the context argument contains the request_id which can be used to pair the LLM call with the app specific request for example a support bot request. Then the strategy needs to be passed to the client factory method.

As can be seen from the code snippet the final call of the structured_response only returns the result and no longer a tuple. This makes the usage of the response method very easy and less complex than the first approach. The custom strategy is then automatically called during the structured_response method. Again below you can see the summary.

Advantages	Disadvantages
Consistent: Metadata handling is set up once and applied consistently.	More Setup: More initial setup required compared to the manual approach.
Clean Call Sites: No need to unpack or handle metadata each time.	Implicit Handling: Metadata handling is implicit, which can be harder to trace.

YALC philosophy

YALC could be compared to some other library like litellm. Its a library that does a similar thing but also does a lot more. Its a huge codebase. The aim with YALC is to provide a minimalistic library that does one thing great and doesn't do anything else. This makes the code of the library much easier to understand and review before using it. Additionally YALC is supposed to be used on the "low level" of just calling LLM APIs and not using any larger frameworks which bring a lot abstraction and are opinionated.

Some of the idea behind this library came from the Unix Philosophy where a program should really do the most trivial things in isolation very well. Then the large programs are built from these small trivial programs into something greater. This way of thinking is applied to our LLM applications which we build as well. It allows us to build from the bottom up understand the applications very well and this makes them reliable and predictable.