“Phil took our AI research to the next level. Practical advice on which models to use, security, costs and implementation. Superb.”
First Wealth wanted to put AI to work across their advice and operations - drafting pension transfer suitability reports, producing annual review reports and a handful of other document-heavy workflows. The question wasn't whether AI could help. It was which models to use, and how to access them in a way that was appropriate for an FCA-regulated business.
They wanted the result wired into their existing n8n automation environment, with API credentials in place and validated, so the recommended setup was ready to run the day the work finished.
I designed and ran a structured evaluation of ten AI models across four hosting platforms, tested against eight of First Wealth's main workflows - from pension transfer suitability reports to annual review documents.
Every model was run against a custom-built fictional scenario rather than real client data (a deliberate decision for both data protection and test fairness) scored on a multi-dimension quality rubric plus workflow-specific criteria, and reviewed by a human before anything made it into a recommendation.
Once the report was signed off by First Wealth's compliance, I worked alongside their technical lead to implement the recommended platform connections inside their existing n8n automation environment.
Each workflow was connected to the correct model and platform per the report's recommendations, the API credentials were configured and the whole setup was validated end-to-end before handover - so nothing was left as a theory.
The point of difference here isn't an opinion about which AI is best. It's that every recommendation is backed by a test you can see the working for.
Ten models across four hosting platforms, run against eight of First Wealth's actual workflows — not generic demos. Every test used a custom-built fictional scenario rather than live client data, for both data protection and test fairness.
Each output was scored against a multi-dimension quality rubric plus workflow-specific criteria. For long-form outputs like the pension transfer report I used an LLM-as-judge technique, with every flagged output reviewed by a human.
Cost per run, token counts and latency were captured directly from the provider APIs — not estimated. So the cost modelling and performance comparisons reflect what these models actually do, not what a brochure claims.
The hard part of AI for a regulated firm usually isn't the technology — it's knowing what the data protection picture actually is. What does "EU-hosted" really mean when the provider is a US-parented company subject to the US CLOUD Act? What does the ICO's January 2026 guidance on processing location mean in practice for an advice firm?
The report mapped the regulatory landscape in plain English and turned it into a multi-tier data residency framework — something concrete they could act on without needing to be AI specialists themselves.
A structured, evidenced basis for every model choice — testing, scoring and comparison — instead of being asked to guess. Each of the eight workflows has a recommendation with the working shown behind it.
The testing showed clearly that the best model varies by workflow. Picking one model and applying it everywhere — the obvious temptation — would have left quality and cost on the table.
Their existing AI subscription was convenient, but not a safe assumption once data residency was properly understood. The recommended setup is built around where client data is actually processed.
A plain-English framework covering data residency, the US CLOUD Act and the ICO's current position — ready to act on and explain to clients and regulators.
The recommended platforms wired into their existing n8n environment, credentials configured and validated end-to-end — ready to run, not just to read about.
Cost per run, token counts and latency were captured straight from the provider APIs — so First Wealth can budget, compare and choose on evidence rather than vendor estimates.
If you're weighing up which AI models to use — and how to run them in a way that holds up for a regulated business — let's talk. Book a no-obligation discovery call and I'll tell you honestly whether a structured evaluation is the right next step.
Book a discovery callThis is the Tech Stack & Tool Evaluation service in practice.