Joined March 2012
9 Photos and videos
Pinned Tweet
Most senior operators score below the median on AET's AI Workflow Maturity Scorecard. Not because they're behind on AI. Because the scorecard measures the four dimensions most adoption frameworks skip. Here's what the 12-question scorecard actually tests: 1. Workflow Adoption Where AI is actually being used in your day-to-day deliverables. Not where people say it's being used. The gap between "we use AI" and the workflows where AI is genuinely integrated is wider than most CFOs realise. Most senior teams over-report adoption by 2-3x. 2. Output Quality & Trust Whether outputs are verified, whether fabrications have been caught, whether stakeholders can rely on what leaves your desk. This is the dimension most teams stall on. Adoption is easy. Trust takes structured verification — sample testing, materiality thresholds, paired-output checks. 3. Governance & Risk Written policy, data handling, approved tool list, named ownership of judgement calls. The audit-defensibility dimension. Without it, a regulator's review reveals workflows nobody can explain and outputs nobody can reproduce. 4. Team Capability Whether AI capability is one person's hobby or a team-wide operating standard. The "Claude into a 15-tab spreadsheet" moment lands once per team. The teams that compound it codify the workflow; the teams that don't stay stuck at one fluent operator. You'll score across 5 maturity levels — Ad Hoc to Optimised. The 12 questions take 5 minutes. The result returns: your peer benchmark, the ROI gap on the dimension where you're weakest, and a 90-day plan to close it. DM SCORECARD for the link. AET reads every DM.
24
1,000 invoice PDFs to a clean CSV of dates, names and totals: under 15 minutes with Claude Code. The detail worth copying is not the speed. Before renaming a single file, it ran a dry run on samples and showed its plan. Test on a sample before bulk execution. AI workflows deserve the same discipline as a data migration, because that is what they are.
1
A 13-week cash flow is the first document a stressed board asks for, and it normally costs an analyst 1-2 hours to build properly. Excel Agent Mode built it in minutes: "Create a 13-week cash flow forecast with two products. Product A has two-week payment terms, product B three weeks. Each product has different prices, units and costs. Include leasing and salary costs paid monthly, and a machine paid in advance. Link everything with formulas so the model recalculates when inputs change." Then the part that used to break spreadsheets: "For product B change the payment receipt term to 1.5 weeks." One minute. Every receipt lag re-phased, formulas intact. The check before it goes anywhere near a lender: pick one product and trace its receipts across the 13 weeks by hand. Payment-term lags are exactly where these models lie convincingly, and a lender's analyst will trace them even if nobody else does.
2
A financial model with no error-check row is a draft, whoever built it. AI made building models cheap. It made checking them mandatory.
2
Five Excel modelling prompts that build like a senior analyst. Verbatim, tested against a known answer key. 1. The rebuild test. CFI deleted every forecast formula from a finished course model, five years of them, and ran: "Use this Excel template to complete the forecast. You have all the assumptions you need to build the three financial statements linked in Excel." Net income, balance sheet and ending cash matched the answer key to the dollar. 2. The architecture clause. End every model prompt with: "Include a drivers tab that shows the historical and projected value for each account, and include error checks where relevant." A modeller once missed one link from a headcount tab. The error-check row is what catches millions in unbudgeted cash burn. 3. Kill the cash plug: "Create a statement of cash flows and replace your cash formula with a link to the cash flows tab." If the model's cash line is the net movement of everything else, this one line upgrades it to a real indirect-method cash flow. 4. Revenue from unit economics: "Build me a revenue build. SaaS, $300 a licence, customers acquired through marketing partnerships and sales reps, average customer life 18 months. Connect it to the drivers tab." It derived the 5.5% monthly churn from the 18-month life on its own. 5. The hardcode audit. AI builds, the reviewer still checks: blue inputs, black formulas, then F5, Go To Special, Constants, before anyone trusts a number.
17
Same tax scenario, 7 AI engines. Answers ranged from $135k to $206k. The two reasoning models agreed within rounding at roughly $173.5k. The chat models scattered. A $20,000 spread is not a tooling quirk, it is a control gap. The cheap fix: run any number that matters through two reasoning models independently. Agreement is not proof, but disagreement is a stop sign.
5
The CEO wants competitor benchmarks for tomorrow's board prep. The 14-minute version: Prompt, verbatim: "Provide benchmarks for financials and operational metrics: gross margins, ARPU, revenue growth rate, LTV, churn. Size of company: 10-50 million revenue. Specific vertical: fintech. Include timing, location and type of values." Then always: "Give me a summary table." One real run: 14 minutes, the equivalent of 83 searches, 29 cited sources, a board-ready table. The same approach turned a 2-3 week tariff impact analysis into 2 days. Two rules that keep it defensible: 1. Constrain the sources. One tax practitioner's version: "Only reference the IRS site and it's for tax year 2024." Swap in HMRC, the FRC or whichever authority applies. Source discipline is the difference between research and rumour. 2. Check the citations exist before anything reaches a board pack. Ten minutes of citation-checking on 29 sources is still 10x faster than doing the research from scratch.
10
"AI closed our books in 4 days" is the brochure. The operating reality behind the teams actually doing it: AI drafts the close, a controller verifies the exceptions, the FD signs a log an auditor can read. Teams with that sentence written down move faster than teams with the slogan. The written version is the unfair advantage.
15
A CPA reconciled a live QuickBooks file with ChatGPT to a difference of exactly $0. The pipeline: 1. Bank statement PDF in: "Extract the data and create a CSV with date, transaction number, description and amount. Double-check that the beginning balance plus the transactions equals the ending balance." 2. Export the ledger side to CSV. 3. Both files into a fresh chat, with the matching prompt he spent 4 hours perfecting: "Standardise the dollar amounts so deposits are positive and payments are negative. Match transaction by amount. Ensure each transaction is only matched once. If a dollar amount appears multiple times, use additional context such as date, description or memo to find the best match. Then provide a list of transactions that do not have a match on either file." 4. In the ledger, untick only the flagged exceptions. Difference: zero. The clause doing the heavy lifting is "ensure each transaction is only matched once". Naive prompts double-match identical amounts and the reconciliation lies to you. The 4 hours were spent once. The prompt now runs every month. And the control that keeps it audit-safe: a person reviews the exceptions list line by line. AI proposes the matches. Someone still owns them.
1
43
From an AI test of 13,295 journal entries: 109 round-dollar amounts, 99 posted after hours, 12 sequence gaps, 4 that did not balance. Honest question for controllers: how many of those four categories does your close test for every month, across the whole population rather than a sample?
2
$928 million of journal entries. Audit tested in 10 minutes. Here is the entire workflow, prompts included. A CPA pointed Claude at a folder containing three things: a 30,000-row GL export, a test-procedures PDF and a blank Excel workpaper template. Then one prompt: "Use the data in this folder and the test procedures PDF file to perform audit testing, documenting your testing and results in the Excel file using best practices for an auditor." The smarter follow-up, when it asked how to proceed: "Review the PDF first and recommend which tests to perform." That line encodes actual judgement. It scoped to the feasible, highest-risk tests instead of brute-forcing the whole programme. It then wrote and ran a Python audit script across 13,295 entries and 28,160 line items: completeness, debits equal credits, sequence gaps, round amounts, off-hours postings. Findings: 4 unbalanced entries. 12 numbering gaps. 109 round-dollar amounts. 99 entries posted outside business hours. Plus a recommendation to review access rights for 15 rarely seen users. Two caveats, because this is where most AI posts stop and should not: 1. Some workpaper tabs came back blank. The fix was a prompt any reviewer will recognise: "The lead sheet is missing the log counts and the testing results tab is blank. What's up with that?" It corrected itself. 2. This is experimentation grade, not production grade. Nothing in that workpaper goes to file until a named senior signs every exception. The 10 minutes is not the point. Testing the population instead of a sample is the point. The reviewer's signature is still the product.
43
o3-mini hallucination rate on technical research: 0.8%. GPT-4o: 1.7%. Same vendor, same subscription. Model choice alone halves the error risk, and most teams still use whatever the dropdown defaults to. Reasoning models for technical questions. Fast models for drafting. Treat it like staffing: the right grade for the right work.
15
The month-end close prompt pack. 5 prompts, verbatim. Screenshot the card, run the prompts. 1. Statement to CSV: "Extract the data from this bank statement and create a CSV with columns for date, transaction number, description and amount. Double-check that the beginning balance plus the transactions equals the ending balance." 2. Consolidate the card-statement tabs: "Consolidate the tabs into one tab. Keep only date, description, amount and chargebacks. Tag every transaction with the company and cardholder it came from. Then build pivot tables to analyse the cost." 3. Consolidate entities, with a control: "Generate a consolidated P&L where every entity's P&L is a column and the last column is the total. Before consolidating, confirm with me that you read each entity's net income correctly." 4. Prove the maths: "Add subtotals for each major category and make sure the math works, all the way to net income." 5. When a session works, end it with: "Now give me a single prompt that outlines everything we did, so I can do it in a single request next time." Number 5 is how a good afternoon becomes a monthly SOP. The exceptions any of these surface still get reviewed by a person, line by line.
9
Senior operators reading this: Where does your team sit on AI maturity? — Level 1: Ad Hoc (one-offs, no structure) — Level 2: Defined (named workflows, no controls) — Level 3: Managed (workflows register materiality) — Level 4: Predictable (outputs are measurable, reproducible) — Level 5: Optimised (continuous improvement built in) No wrong answer.
7
The 5 levels of AI maturity for senior finance teams. Most teams land at level 2. Most THINK they're at level 4. The gap is governance overhead, not tool count. Level 1 — Ad Hoc A few senior people use ChatGPT for one-offs. No team-wide workflow. No register. No governance. Sign: Someone in legal has a paid ChatGPT subscription. Finance doesn't know. Risk: Shadow AI. Everybody is doing it; nobody is governing it. Level 2 — Defined AI is built into 2–3 named workflows. Variance commentary. Reconciliation triage. AP coding. Each has a documented owner and a basic prompt template. Sign: "We use AI for X" is in your team comms with a named owner. What's missing: written register across the function, kill criteria, materiality threshold. The workflows run; the controls don't. Risk: An AI-touched output ends up in a regulator's hands. You can describe what AI did. You can't prove the human review happened. Level 3 — Managed Same workflows as level 2, with a written register: every AI workflow listed, owner named, audit log specified. Kill criteria documented. Materiality threshold defined. Sign: When asked "show me your AI register," your team produces it in under 5 minutes. What's still missing: outputs are reliable but not yet measurably better than the human-only baseline. The team has controls but hasn't compounded. Risk: Speed of AI rollout still limited by team-by-team handoffs. Level 3 teams still have a velocity ceiling. Level 4 — Predictable Outputs are reliable, measurable, and reproducible. The team can answer "what would this AI workflow have produced 3 months ago, and does the result still tie?" Reproducibility is the level 4 test. Sign: Quarterly review of AI workflows shows measurable improvement against the pre-AI baseline. Output quality has a number on it. What's missing: improvements happen reactively, not by design. Level 5 — Optimised Continuous improvement on AI workflows is built into the operating cadence. The team is identifying new workflows, retiring obsolete ones, and measuring the cost-vs-time-saved trade-off without prompting. Sign: When something breaks, the team diagnoses, documents, kills, and rebuilds — without escalation. Most senior teams sit at level 2. The 5-minute AI Workflow Maturity Scorecard tells you where your team actually sits across all 4 dimensions (adoption, trust, governance, capability) and benchmarks you against peers. DM SCORECARD for the test. AET reads every DM.
22
AET's 3-rule AI deployment minimum: 1. Register exists (written, not in someone's head) 2. Named human reviewer per workflow 3. Audit log lives somewhere a regulator could reach Without all three, you're not deploying AI. You're letting AI deploy itself.
8
The AI workflow most senior teams won't trust to AI: Going-concern judgement. Question for the operators here: what's the workflow YOUR team won't hand off to AI, no matter how good the model gets?
4
Token economics, current pricing: Claude Sonnet 4: $3 input / $15 output per million tokens. Senior teams burn 75–80% of cost on input context (long PDFs, transcripts, historical data) — not generation. Optimise the input, not the prompt.
19
What AET observed this week: Six senior operators flagged the same thing. "AI saves draft time. It costs verify time." Net? Roughly flat for most workflows. The teams winning are the ones that re-engineered the verify step. Not the draft.
5
AET's 5-minute AI Workflow Maturity Scorecard ranks you across 4 dimensions: — Workflow Adoption — Output Quality & Trust — Governance & Risk — Team Capability 12 questions, 5 maturity levels (Ad Hoc → Optimised). Returns your peer benchmark, ROI gap, and 90-day plan. DM SCORECARD for the link.
22
The AI divide isn't between teams that use AI and teams that don't. It's between teams that govern what AI touches and teams that hope nobody notices. Two years from now, the gap shows up at the auditor's desk.
6