oumi.core.collators#

Submodules#

oumi.core.collators.text_collator_with_padding module#

class oumi.core.collators.text_collator_with_padding.TextCollatorWithPadding(tokenizer: PreTrainedTokenizerBase, *, max_length: int | None, truncation: bool = False, label_ignore_index: int | None = None, max_variable_sized_dims: int = 1, debug: bool = False)[source]#

Bases: object

__call__(batch) dict[str, Any][source]#

Pads to the longest length present in the batch.

Parameters:

batch – List of batch items.

Returns:

Processed batch.

Return type:

Dict[str, torch.Tensor]

oumi.core.collators.text_completions_collator_with_padding module#

class oumi.core.collators.text_completions_collator_with_padding.TextCompletionsCollatorWithPadding(tokenizer: PreTrainedTokenizerBase, response_template: str, train_target: str, instruction_template: str | None = None, debug: bool = False, end_of_turn_template: str | None = None, ignore_index: int = -100)[source]#

Bases: object

__call__(batch: list[dict[str, Any]]) dict[str, Any][source]#

Pads to the longest length present in the batch.

Parameters:

batch – List of batch items.

Returns:

Processed batch.

Return type:

Dict[str, torch.Tensor]

oumi.core.collators.trl_data_collator_for_completion_only_lm module#

class oumi.core.collators.trl_data_collator_for_completion_only_lm.DataCollatorForCompletionOnlyLM(response_template: str | list[int], instruction_template: str | list[int] | None = None, *args, train_target: str, end_of_turn_template: str | list[int] | None = None, mlm: bool = False, ignore_index: int = -100, padding_free: bool = False, **kwargs)[source]#

Bases: DataCollatorForLanguageModeling

Data collator for completion-only training.

Masks input labels so that the loss is only computed on specific tokens (typically assistant responses), while ignoring other tokens (system prompts, user messages, padding).

The train_target parameter selects the training target:

``all_assistant_turns``:

Span-based masking for multi-turn and tool-calling conversations. Masks everything, then unmarks each assistant response span bounded by response_template .. end_of_turn_template (inclusive of EOT). Correctly handles interleaved tool results and parallel tool calls.

``final_assistant_turn``:

Masks all tokens before the last response_template occurrence. Only the final assistant response is trained on. Suitable for single-turn completions.

Parameters:
  • response_template – String or token IDs marking the start of an assistant response. Required for all modes.

  • instruction_template – String or token IDs marking the start of a user instruction. Legacy — only used with the instruction+response fallback path.

  • train_target – One of "all_assistant_turns", "final_assistant_turn", "_legacy_instruction_response". Resolved by the builder before construction.

  • end_of_turn_template – String or token IDs marking the end of a conversational turn. Required for all_assistant_turns mode.

  • mlm – Whether to use masked language modeling. Default False.

  • ignore_index – Label value for masked tokens. Default -100.

  • padding_free – Remove padding and add position_ids. Default False.

torch_call(examples: list[list[int] | Any | dict[str, Any]]) dict[str, Any][source]#

Collates a list of examples into a batch.

oumi.core.collators.vision_language_collator_with_padding module#

class oumi.core.collators.vision_language_collator_with_padding.VisionLanguageCollatorWithPadding(tokenizer: PreTrainedTokenizerBase, *, max_length: int | None, truncation: bool = False, label_ignore_index: int | None = None, allow_multi_image_inputs: bool = True, main_image_feature: str = 'pixel_values', debug: bool = False)[source]#

Bases: object

__call__(batch) dict[str, Any][source]#

Custom collator for multi-modal vision-language training.

Parameters:

batch – List of batch items.

Returns:

Processed batch.

Return type:

Dict[str, torch.Tensor]

collate_images(images) Tensor[source]#

Collate images for multi-modal training.

Parameters:

images – List of images to collate.

Returns:

Batch of processed images.

Return type:

torch.Tensor

oumi.core.collators.vision_language_sft_collator module#

Vision-Language SFT collator for conversation-based multimodal training.

This module provides a collator specifically designed for supervised fine-tuning (SFT) of vision-language models using conversation data.

Unlike VisionLanguageCollatorWithPadding which expects pre-processed features, this collator works with raw conversation objects and handles the complete feature generation pipeline.

Example

>>> from oumi.builders import build_tokenizer
>>> from oumi.core.configs import ModelParams
>>> tokenizer = build_tokenizer(ModelParams(model_name="llava-hf/llava-1.5-7b-hf"))
>>> collator = VisionLanguageSftCollator(
...     tokenizer=tokenizer,
...     processor_name="llava-hf/llava-1.5-7b-hf",
...     max_length=512,
...     truncation=True
... )
>>> # Expects batch items with conversation_json field
>>> batch = collator([{"conversation_json": conversation1.to_json()}, ...])
class oumi.core.collators.vision_language_sft_collator.VisionLanguageSftCollator(tokenizer: PreTrainedTokenizerBase, processor_name: str, *, processor_kwargs: dict[str, Any] | None = None, max_length: int | None = None, truncation: bool = False, truncation_side: str = 'right', label_ignore_index: int | None = None, allow_multi_image_inputs: bool = True, trust_remote_code: bool = False, train_on_completions_only: bool = False, response_template: str | None = None, instruction_template: str | None = None, process_individually: bool = False)[source]#

Bases: object

Collator for vision-language SFT that processes conversation data.

This collator is designed for supervised fine-tuning of vision-language models where training data comes in the form of conversations containing both text and images. It handles the complete pipeline from raw conversations to model-ready tensor batches.

Key Features:

  • Processes Conversation objects containing text and image data

  • Uses model-specific processors to extract image features

  • Handles tokenization and feature generation in one step

  • Supports various vision-language architectures

  • Manages padding, truncation, and label masking

The collator expects batch items with a “conversation_json” field containing serialized Conversation objects. These conversations can include:

  • Multiple turns of dialogue

  • Image references (paths, URLs, or base64 data)

  • System prompts and user/assistant messages

__call__(batch) dict[str, Any][source]#

Process a batch of conversation data into model-ready features.

This method converts serialized conversations into the tensor format expected by vision-language models. It handles the complete pipeline: 1. Deserializes conversation JSON strings 2. Passes conversations to the feature generator 3. Returns batched tensors ready for training

Parameters:

batch

List of dictionaries, where each dictionary must contain a “conversation_json” field with a serialized Conversation object.

Expected format:

[
    {"conversation_json": '{"messages": [...], "images": [...]}'},
    {"conversation_json": '{"messages": [...], "images": [...]}'},
    ...
]

The conversation JSON should include: - messages: List of message dictionaries with role and content - images: Optional list of image data (paths, URLs, or base64)

Returns:

  • “input_ids”: Token IDs including image placeholders
    • ”attention_mask”: Attention masks for the input

    • ”labels”: Target labels with appropriate masking

    • ”pixel_values” or model-specific image features

    • Additional model-specific features (cross_attention_mask, etc.)

The exact keys depend on the model architecture and processor used.

Return type:

Dictionary containing all features needed for model training

Raises:

ValueError – If batch is empty or any item lacks “conversation_json” field.

Example:

>>> conversation = Conversation(messages=[
...     {"role": "user", "content": "What's in this image?"},
...     {"role": "assistant", "content": "I see a cat."}
... ], images=["path/to/image.jpg"])
>>> batch_item = {"conversation_json": conversation.to_json()}
>>> features = collator([batch_item])
>>> print(features.keys())
dict_keys(['input_ids', 'attention_mask', 'labels', 'pixel_values'])