April 24, 2026

Case Study: DMG achieves 6% higher quality and 100x lower costs for invoice validation

With Oumi’s technology, DMG’s custom AI beat GPT5.2 by 6% accuracy and 6% validity at 100x lower cost

By Stefan Webb

Problem

Divisions Maintenance Group (DMG) coordinates facility maintenance across thousands of properties, managing a shared pool of contractors — plumbers, electricians, handymen—who submit invoices for reimbursement after completing jobs.

Each invoice must be validated on two dimensions: validity (correct formatting and categorization) and appropriateness (reasonable charges given the job scope, profession, and pricing norms).

At scale, manual review was labor-intensive, slow, and expensive, and DMG’s existing automated approaches were underperforming—validity accuracy sat at just 72%, with no reliable baseline for appropriateness at all.

Solution

DMG partnered with Oumi to fine-tune an ultra-small language model (Qwen3-0.6B) purpose-built for invoice classification. The small footprint was critical: DMG’s volume demands and interest in edge deployment ruled out large proprietary models for production use.

“Every job we handle is bespoke — even the same HVAC unit breaking down twice runs differently. I’m convinced our future is to have our own fine-tuned models. The results have only gotten better.”
— Kumar Srinivasan, Chief Product Officer

The core challenge was data. DMG provided ~10k unlabeled invoice examples covering only plumbing, but the model needed to generalize across plumbing, electrical, and handyman domains.

Oumi built a synthesis recipe to generate labeled training data, including entirely new examples for the missing domains. Random real invoice samples were leveraged as few-shot formatting examples within each synthesis prompt to increase sample diversity.

Outcome

The small, specialized, fine-tuned 0.6B model developed with Oumi’s technology delivered dramatic improvements across both tasks over the pre-fine-tuned model, improving on validity from 72% to 99%, and on appropriateness from 52% to 91%.

Importantly, it also exceeded the large, general-purpose frontier GPT5.2 by 6% on both metrics (see table above). For context, Kimi K2 (a much larger model) achieved only marginally higher agreement at 95% and 87%.

The bottom line: A sub-1B fine-tuned model exceeded frontier-scale performance on this task, enabling DMG to move toward fully automated invoice processing with high reliability.

What’s next

With Oumi’s technology, DMG were able to build a small custom AI model that solved pressing business needs. Theirs, however, is just one use case. Many other enterprises are discovering the benefits of custom AI models and the ease and economy of model development that the Oumi Platform makes possible.

Why not try it out today and see for yourself? You only need to come with your prompt and the Oumi Agent will take it from there!