Case Study — First Wealth

Evidence-based AI model selection for First Wealth | Certified B Corp

First Wealth, an independent UK financial planning firm, needed to know which AI models to trust across their advice and operations workflows and how to run them in a way that stood up to scrutiny on data protection. I tested ten models against eight of their workflows, scored every output and wired the chosen platforms straight into their existing n8n automation setup, coordinating with First Wealth's ops team to ensure GDPR compliance.

AI models tested

Hosting platforms

Live workflows

~45pp

Evaluation report

“Phil took our AI research to the next level. Practical advice on which models to use, security, costs and implementation. Superb.”

Anthony Villis

First Wealth

The brief

Which models, accessed how — and how to keep client data safe?

First Wealth wanted to put AI to work across their advice and operations - drafting pension transfer suitability reports, producing annual review reports and a handful of other document-heavy workflows. The question wasn't whether AI could help. It was which models to use, and how to access them in a way that was appropriate for an FCA-regulated business.

They wanted the result wired into their existing n8n automation environment, with API credentials in place and validated, so the recommended setup was ready to run the day the work finished.

At a glance

Client

First Wealth | Certified B Corp

Sector

UK financial services (FCA-regulated)

Engagement

AI model selection + GDPR-compliant deployment

Delivered

~45-page evaluation report + live n8n integration

What I did

Two parts: decide what to use, then make it run

Part 01

AI Model Selection Report

I designed and ran a structured evaluation of ten AI models across four hosting platforms, tested against eight of First Wealth's main workflows - from pension transfer suitability reports to annual review documents.

Every model was run against a custom-built fictional scenario rather than real client data (a deliberate decision for both data protection and test fairness) scored on a multi-dimension quality rubric plus workflow-specific criteria, and reviewed by a human before anything made it into a recommendation.

The report covered

Workflow-by-workflow performance tables and scoring

Cost modelling — cost per run, token counts and latency, captured from provider APIs

A multi-tier data residency framework

A phased implementation roadmap

~45 pages, produced as a branded document

Part 02

n8n handover & API credential setup

Once the report was signed off by First Wealth's compliance, I worked alongside their technical lead to implement the recommended platform connections inside their existing n8n automation environment.

Each workflow was connected to the correct model and platform per the report's recommendations, the API credentials were configured and the whole setup was validated end-to-end before handover - so nothing was left as a theory.

The handover included

Recommended platform connections built into the existing n8n setup

Each workflow wired to its correct model and platform

API credentials configured for every connection

End-to-end validation before handover

Worked directly with First Wealth's technical lead

How it was proven

Tested. Scored. Evidenced.

The point of difference here isn't an opinion about which AI is best. It's that every recommendation is backed by a test you can see the working for.

Tested on real workflows

Ten models across four hosting platforms, run against eight of First Wealth's actual workflows — not generic demos. Every test used a custom-built fictional scenario rather than live client data, for both data protection and test fairness.

Scored on a rubric

Each output was scored against a multi-dimension quality rubric plus workflow-specific criteria. For long-form outputs like the pension transfer report I used an LLM-as-judge technique, with every flagged output reviewed by a human.

Measured from the source

Cost per run, token counts and latency were captured directly from the provider APIs — not estimated. So the cost modelling and performance comparisons reflect what these models actually do, not what a brochure claims.

The key finding

Deployment — not model quality — was the deciding factor

The most consequential finding was about how a model is accessed, not how good it is. Reach a frontier model through a "Global Standard" deployment and requests can be processed in any region worldwide — which brings them within reach of the US CLOUD Act. Reach the same model through an EU Data Zone deployment and processing stays in the EU. Same model, very different data residency.

For a regulated firm processing client data, that distinction is decisive: strong output quality doesn't make up for losing control of where processing happens. Several models that scored well on quality were ruled out in their default configuration for exactly this reason. The finding reoriented the whole report - all eight primary recommendations moved to EU-sovereign, open-weight models, with backup models retained where it made sense.

8/8

primary recommendations re-pointed to EU-sovereign, open-weight models

The compliance picture

Data residency, in plain English

The hard part of AI for a regulated firm usually isn't the technology — it's knowing what the data protection picture actually is. What does "EU-hosted" really mean when the provider is a US-parented company subject to the US CLOUD Act? What does the ICO's January 2026 guidance on processing location mean in practice for an advice firm?

The report mapped the regulatory landscape in plain English and turned it into a multi-tier data residency framework — something concrete they could act on without needing to be AI specialists themselves.

What the framework clarified

✓ What "EU-hosted" does and doesn't guarantee for a US-parented provider
✓ Where the US CLOUD Act changes the risk picture
✓ How the ICO's January 2026 guidance on processing location applies in practice
✓ A tiered view of which workflows need which level of data residency

The outcome

What First Wealth walked away with

✓

A decision they can defend

A structured, evidenced basis for every model choice — testing, scoring and comparison — instead of being asked to guess. Each of the eight workflows has a recommendation with the working shown behind it.

✓

The right model per workflow

The testing showed clearly that the best model varies by workflow. Picking one model and applying it everywhere — the obvious temptation — would have left quality and cost on the table.

✓

A safe footing on client data

Their existing AI subscription was convenient, but not a safe assumption once data residency was properly understood. The recommended setup is built around where client data is actually processed.

✓

A compliance map they can use

A plain-English framework covering data residency, the US CLOUD Act and the ICO's current position — ready to act on and explain to clients and regulators.

✓

A working setup, not a slide deck

The recommended platforms wired into their existing n8n environment, credentials configured and validated end-to-end — ready to run, not just to read about.

✓

Cost and performance on real numbers

Cost per run, token counts and latency were captured straight from the provider APIs — so First Wealth can budget, compare and choose on evidence rather than vendor estimates.

Working on something similar?

Want this kind of clarity for your firm?

If you're weighing up which AI models to use — and how to run them in a way that holds up for a regulated business — let's talk. Book a no-obligation discovery call and I'll tell you honestly whether a structured evaluation is the right next step.

Book a discovery call

This is the Tech Stack & Tool Evaluation service in practice.