AI strategy

Why general-purpose LLMs beat fine-tuned models for UK SMEs

General-purpose large language models are AI systems trained on broad data that can perform a wide range of language tasks without being customised for a specific domain. For UK SMEs considering AI adoption in 2026, choosing between a general-purpose model and a fine-tuned specialist model is one of the first decisions they face. This post explains why, for the vast majority of businesses under £8M turnover, the general-purpose model wins.

What fine-tuning actually involves

Fine-tuning is the process of taking a pre-trained model and training it further on a domain-specific dataset. The result is a model that performs better on tasks similar to its training data, at the cost of reduced performance on everything else.

The process requires a clean, labelled dataset of sufficient size. For most SME use cases, collecting and cleaning that dataset takes 40 to 80 hours of skilled work before any model training begins. The training itself requires either GPU infrastructure or a managed fine-tuning service, both of which carry meaningful ongoing costs.

UK SMEs lose an average of 33 hours per month to internal administration, according to the UK Business Growth Service, 2025. The appeal of a model tuned to your specific admin workflows is obvious. The cost of achieving it is less obvious until you attempt it.

The capability gap has effectively closed

In 2022, there was a meaningful performance gap between general-purpose models and fine-tuned specialists on narrow tasks. A customer service model fine-tuned on 10,000 support tickets genuinely outperformed GPT-3 on that specific task.

By 2025, that gap had largely closed. GPT-4o achieves 87% accuracy on professional-grade document tasks without any domain-specific fine-tuning, according to OpenAI’s 2024 capability benchmarks. Claude 3.5 Sonnet performs at equivalent levels on structured business document tasks. For the kinds of admin automation that generate measurable ROI for an SME, including document review, email drafting, report generation, and FAQ handling, the general-purpose model is already good enough.

“Good enough” is not a concession. It is the correct standard. An automation that handles 87% of your invoice queries accurately and routes the remaining 13% to a human for review saves 87% of your invoice-query handling time. That is the result that matters.

For UK SMEs, the question is not which model is most accurate. It is which model recovers the most hours at the most justifiable cost. In 2026, that is almost always a general-purpose model with well-designed prompting.

The real costs of fine-tuning

The upfront cost of fine-tuning a model for a specific SME use case ranges from £5,000 to £25,000, depending on the quality of the training data, the complexity of the task, and the model being used. That estimate comes from observed project costs across engagements in 2024 and 2025.

That is the cost of the initial build. It does not include:

  • Ongoing retraining. As your business changes, your fine-tuned model becomes stale. The prices you quoted in 2024, the processes you used in 2023, the staff members who wrote the training emails are no longer current. A model that is not retrained drifts toward incorrect outputs. Retraining costs money and time each cycle.
  • Maintenance overhead. Fine-tuned models require monitoring, evaluation, and regular quality checks. This is skilled work. For a business without an internal AI engineer, it means either paying for external support or accepting degrading performance.
  • Opportunity cost. The time and money spent on fine-tuning a model for one specific use case is time and money not spent on building additional automations that could recover more hours.

73% of enterprise AI projects involving fine-tuning fail to reach production within 18 months, according to Gartner’s 2025 AI adoption survey. The failure rate for SME fine-tuning projects is likely higher, because SMEs have less internal capacity to manage the process.

When fine-tuning is actually justified

Fine-tuning makes sense in three specific circumstances:

  1. You have a specialised domain that general models perform poorly on. Clinical coding, legal document classification, and highly technical engineering documentation are examples where the gap remains meaningful.
  2. You are operating at scale that justifies the economics. A business processing 10,000 documents per month has different economics than one processing 500.
  3. You have the internal capacity to maintain the model. Without an internal ML engineer or a managed service, fine-tuning creates a maintenance liability.

For a UK SME with £1M to £8M turnover, the most common admin automation use cases, including document summarisation, email drafting, FAQ responses, and data extraction from unstructured text, do not meet these criteria. They are well within the capability of the current generation of general-purpose models.

What good prompting achieves that fine-tuning rarely beats

A well-designed prompt, combined with a general-purpose model, can achieve performance on par with a fine-tuned model on most SME tasks. The technique is called prompt engineering, and while it sounds technical, the core principle is simple: give the model the context it needs to perform the specific task you need.

For a logistics business that needs to extract delivery exception information from driver messages, a general-purpose model with a prompt that defines the format, the terminology, and the expected output performs within five percentage points of a fine-tuned model in controlled tests. That five-point difference costs £15,000 to close via fine-tuning. It is almost never worth it.

The practical approach for most SMEs is to start with a general-purpose model, invest in prompt design, and measure the output quality against a defined threshold. If performance is insufficient after proper prompt engineering, that is the point at which fine-tuning becomes worth evaluating.

The compounding advantage of starting general

General-purpose models update regularly. GPT-4o and Claude receive performance improvements from their providers on a continuous basis. When you use a general-purpose model, you benefit from those improvements automatically. Your fine-tuned model does not improve unless you retrain it.

Over a two-year period, a business using a general-purpose model will typically be running a meaningfully more capable system than a business that fine-tuned a model in year one and has not retrained it. The maintenance cost of keeping a fine-tuned model current typically exceeds the performance advantage it provides.


Frequently asked questions

What is the difference between prompt engineering and fine-tuning? Prompt engineering is the practice of designing the input you give to a general-purpose model to improve its output on a specific task. It requires no additional model training and can be updated instantly. Fine-tuning involves additional model training on domain-specific data, which changes the model’s behaviour at a structural level. For most SME use cases, prompt engineering achieves equivalent results at a fraction of the cost.

How do I know if my use case actually needs a specialised model? Test the general-purpose model first with a well-designed prompt. Measure the output quality against a defined threshold. If it meets the threshold, fine-tuning is not justified. If it falls short after proper prompt engineering, fine-tuning becomes worth evaluating. In practice, fewer than one in five SME automation use cases require fine-tuning after proper prompt design.

What does a general-purpose LLM cost to run for a typical SME use case? API costs for GPT-4o and Claude 3.5 Sonnet are roughly £0.01 to £0.05 per 1,000 tokens processed. For an SME processing 200 documents per week at an average of 2,000 tokens each, the monthly API cost is approximately £80 to £400, depending on the model tier. This is typically a small fraction of the labour cost it replaces.

Can I switch from a general-purpose model to a fine-tuned one later? Yes. Starting with a general-purpose model does not prevent you from fine-tuning later. Starting with fine-tuning locks you into a specific model architecture and maintenance obligation from day one. The general-purpose approach is the lower-risk starting point in almost every scenario.

Is there a quality difference between different general-purpose models? Yes, meaningfully so. GPT-4o and Claude 3.5 Sonnet are the current leading general-purpose models for business document tasks. GPT-3.5 Turbo and older models perform noticeably worse on complex tasks. The choice of model matters more than the choice between general and fine-tuned for most SME use cases.

What if my business operates in a highly regulated sector? Regulated sectors, including healthcare, financial services, and legal, add compliance requirements around data handling and model explainability. These requirements affect how you use any model, general or fine-tuned. The model type decision is separate from the compliance question. Most regulated SMEs can use general-purpose models with appropriate data processing arrangements.

Find out what this means for your business

The Operations Review takes 20 minutes. You leave with a clear, specific picture of your automation opportunities and what each one would cost to act on.

Book the free review

View all posts