oumi.core.inference#
Inference module for the Oumi (Open Universal Machine Intelligence) library.
This module provides base classes for model inference in the Oumi framework.
- class oumi.core.inference.BaseInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None)[source]#
Bases:
ABCBase class for running model inference.
- apply_chat_template(conversation: Conversation, **tokenizer_kwargs) str[source]#
Applies the chat template to the conversation.
- Parameters:
conversation – The conversation to apply the chat template to.
tokenizer_kwargs – Additional keyword arguments to pass to the tokenizer.
- Returns:
The conversation with the chat template applied.
- Return type:
str
- get_batch_results_partial(batch_id: str, conversations: list[Conversation]) BatchResult[source]#
Gets partial results of a completed batch job.
Engines that support batch inference should override this method.
- Parameters:
batch_id – The batch job ID.
conversations – Original conversations used to create the batch.
- Returns:
BatchResult with successful conversations and failure details.
- Raises:
NotImplementedError – If the engine does not support batch.
- abstractmethod get_supported_params() set[str][source]#
Returns a set of supported generation parameters for this engine.
Override this method in derived classes to specify which parameters are supported.
- Returns:
A set of supported parameter names.
- Return type:
Set[str]
- infer(input: list[Conversation] | None = None, inference_config: InferenceConfig | None = None) list[Conversation][source]#
Runs model inference.
- Parameters:
input – A list of conversations to run inference on. Optional.
inference_config – Parameters for inference. If not specified, a default config is inferred.
- Returns:
Inference output.
- Return type:
List[Conversation]
- list_models(chat_only: bool = True) list[str][source]#
Returns a list of model IDs supported by this engine.
Override this method in derived classes to query the provider’s API for available models. The default implementation returns the model name this engine was initialized with.
- Parameters:
chat_only – If True (default), only return models that support chat completions. If False, return all models.
- Returns:
A list of supported model ID strings.
- Return type:
list[str]
- class oumi.core.inference.BatchResult(successful: list[tuple[int, Conversation]], failed_indices: list[int], error_messages: dict[int, str])[source]#
Bases:
objectResult of a partial batch retrieval, separating successes from failures.
- error_messages: dict[int, str]#
Mapping of failed index to error message.
- failed_indices: list[int]#
Indices of requests that failed.
- property has_failures: bool#
Return True if any requests failed.
- successful: list[tuple[int, Conversation]]#
List of (original_index, conversation) for successful requests.