A field map. The tools. The benchmarks. What we built. What is real.
Hands up.
Before we begin
What you'll get in 45 minutes
Global · finance & reporting · 2024 → 2027
Companies piloting or using AI in financial reporting today, projected within three years.
Saudi Arabia · the forcing functions
Part one of three
Five layers. Each does something the layer below cannot. Each costs more.
From browser to autonomous agent
Layer 00 · Chat in a browser
Open the tab. Paste the question. Read the answer. Frontier reasoning models now pass all three CFA exam levels.
Quick research. Drafting memos. Explaining a standard. Translating jargon. Brainstorming a risk list.
Hallucinations on figures. No memory of your client. No access to your working papers. No audit trail.
ChatGPT · audit research in 8 seconds
Under IFRS 15, when a Saudi construction firm receives a 30% advance for a 3-year project, how do I recognise revenue — input or output method? Cite the standard.
For long-term construction under IFRS 15, performance obligations are usually satisfied over time (IFRS 15.35), so revenue is recognised using progress measurement — not on cash collection.
The 30% advance is a contract liability until earned (IFRS 15.106).
Method: input method (cost-to-cost) is standard for construction, because outputs are hard to measure mid-project. Output method only fits when milestones are independently verifiable (IFRS 15.B14–B19).
Caveat: verify the contract has no termination clauses that would defeat "over time" criteria. Check IFRS 15.35(c).
Layer 01 · Workspace integrations
AI inside the tool you already work in. Reads your workbook — every tab, every formula dependency — then explains, audits, or updates it with cell-level citations.
Variance analysis. Formula auditing across nested tabs. Updating assumptions without breaking dependencies. Cash-flow narratives.
Audit-grade accountability. No client memory across files. Saudi-specific knowledge near zero. ~$20–$30/user/month on top of Excel.
Live, today · claude.com/claude-for-excel
Q3 revenue (C2: 7,140) is +46% vs Q2, well outside the 7–15% range elsewhere.
Cell C2 has a hard-coded value, not a formula. Other quarters use =Forecast!C2.
Likely cause: manual override. Recommend tying it back to source or flagging as a reclass.
Layer 02 · Vertical AI tools
Software built for one audit or finance workflow, done deeply. The point solutions partners already buy.
Ready Monday morning. Sample testing, journal-entry risk scoring, lease & revenue recognition, AP automation. DataSnipper alone runs in 175 countries and saved its clients $1.4B in 2025.
Each tool sees only its slice. Five vendors, five contracts, five logins, no shared memory of the engagement. Arabic + SOCPA workpaper support is thin to zero.
Layer 03 · Finance-specific agents
Purpose-built for finance. Trained on filings. Connected to authoritative content (FASB, IFRS, IRS, ZATCA when ready).
Layer 04 · Proprietary audit platforms
Every serious audit firm is now building, not buying. Here are the ten that matter.
November 2025
HUMAIN is PIF's AI company, launched May 2025. EY MENA's AI business solutions are being embedded into HUMAIN ONE for Saudi government and enterprise clients.
Part two of three
Hard numbers. Which models actually win, on which finance task.
Chartered Financial Analyst exam · Dec 2025
CFA Levels I, II and III — passed by frontier reasoning models. Some clear all three with near-perfect scores.
Human CFA Level III pass rate: 56%. AI completed Level III essays in minutes. Source: Columbia / RPI / UNC, December 2025. Models tested: OpenAI o4-mini, Google Gemini 2.5 Pro, Anthropic Claude Opus 4.
May 2026 · Public benchmarks
| Task | Winner | Score | Runner-up | Score | Human |
|---|---|---|---|---|---|
| FinanceBench (10-K Q&A) | OpenAI o3 | ~90% | GPT-5 | ~88% | — |
| Finance Agent (analyst tasks) | Claude Opus 4.7 | 64.4% | Claude Sonnet 4.6 | 63.3% | — |
| FinQA (table reasoning) | Fin-R1* | 76.0% | GPT-4.1 | ~68% | 91% |
| CFA Level I | Gemini 3.0 Pro | 97.6% | GPT-5 | 96.1% | 43% pass |
| CFA Level III essay | Gemini 3.0 Pro | 92.0% | Claude Opus | ~75% | 56% pass |
| DocVQA (document images) | Claude Opus 4.7 | 93.8% | GPT-5.4 | 91.1% | ~95% |
| Arabic reasoning (MMMLU) | Claude Mythos | 92.7% | Gemini 3.1 Pro | ~91% | — |
*Fin-R1 = open-source finance reasoning model (Alibaba). All other rows are general-purpose frontier models. Scores from Vals AI, awesomeagents.ai, arxiv:2512.08270, MindStudio (May 2026).
What the benchmarks tell us
A practical cheat sheet
| If you need to… | Use | Why it wins |
|---|---|---|
| Research a standard, draft a memo, translate jargon | Claude Opus 4.7 · GPT-5.5 | Best at long-form reasoning and clear English/Arabic writing. |
| Ask questions of a 10-K, audited financials, ZATCA filing | OpenAI o3 · GPT-5 | Top of FinanceBench for filings Q&A. Strong citation discipline. |
| Audit a workbook — formulas, tabs, dependencies | Claude Plugin for Excel | Reads multi-tab workbooks with cell-level citations. |
| Read scanned working papers, invoices, contracts | Claude Opus 4.7 | Best on document images and OCR (DocVQA 93.8%). |
| Study for the CPA, CFA, SOCPA, ACCA exam | Gemini 3.0 Pro | Highest scores on certification exams across the board. |
| Work in Arabic — explain, summarise, draft | Claude (latest) · Gemini 3.x | Best Arabic reasoning today. Still verify numbers. |
| Production audit work on a client engagement | Proprietary platform | None of the above alone. You need workflow + audit trail + partner sign-off. |
Rule of thumb: open with the cheapest model that fits, escalate if the answer feels thin. Always cross-check numbers against the source.
Part three of three
Three things we have learned. One demo. Top tips you can use Monday morning.
Insight 01
"The backbone of transformation is standardization. You've got to clean out your closet."
Jessie Kanter · Partner, Citrin Cooperman · Journal of Accountancy, 2026
When we acquire a firm, we are digitising thirty years of institutional memory. The roll-up is the standardisation play.
Concrete example
Insight 02
Generates working papers. Tests samples by hand. Builds memos line by line.
Designs the AI agent's scope. Reviews its output. Makes the judgment calls AI cannot.
More intellectually demanding. Not less.
What AI cannot do · 2026
Insight 03
Every AI output has a named partner who signs off. The AI moves fast. The partner stays accountable. We sell both.
The proprietary platform, opened up
Demo · 4 minutes
An internal tool. Not a product pitch. We are showing you the work.
Switch to laptop
What just happened
The junior auditor doesn't disappear. The afternoon disappears. The auditor moves to the harder question: is the anomaly actually wrong?
If you are starting out · 5 tips
If you run a firm · 4 tips
Now
Your questions are usually smarter than my answers.
Before we end
The question is not whether the audit profession changes. The question is who is ready for the audit profession that comes next.
We are starting now. Find us after the session.