Skip to main content
RESOURCE

How to measure the ROI of an AI consultant (a CFO's framework)

Measure three things: hours saved per automated workflow, revenue recovered from fixed leaks, and decision latency reduced. Track monthly, compare to the fully loaded retainer, publish the delta. If your consultant cannot show you all three by month 3, the engagement is not working.

Ignacio Lopez
Ignacio Lopez·Fractional Head of AI, Work-Smart.ai·Coconut Grove, Miami
Published April 8, 2026·LinkedIn →

The CFO already asked the question. You need a real answer.

The CFO of a $14B wealth advisory firm asked me, in the first 10 minutes of the first call, two questions. What does this cost. What do I get. He was not being difficult. He was doing his job. Every dollar that leaves the operation needs a line item next to it that explains why it left, and what came back. AI consulting is not exempt from that rule. It is the rule.

Most AI consulting engagements never produce a defensible answer to those two questions. The work happens, the team feels busier, the slide decks get longer, and at some point the CFO asks for the ROI report and the consultant sends a deck full of phrases like "increased productivity" and "improved decision velocity" with no numbers under them. That is not a ROI report. That is a marketing brochure. The condition is that AI is now a board-level line item. The reality is that almost nobody is measuring it correctly. What to do about it is the rest of this guide.

Why most AI ROI claims fall apart on contact

The first failure mode is the vendor ROI calculator. A SaaS company hands you a spreadsheet with industry-average multipliers, plugs in your headcount, and produces a number with two decimal places. The number is fiction. It assumes the tool gets adopted at the vendor's claimed rate, it ignores your actual workflows, and it does not subtract the implementation cost or the change-management friction. Operators learn to ignore those calculators. CFOs should too.

The second failure mode is "AI productivity gains" with no baseline. A consultant runs a 90-day engagement, the team uses the new tool, and the consultant reports that productivity is up 18%. Up from what. Measured how. Compared to which week. There is no answer because there was no baseline before the engagement started. Without a baseline, you cannot prove the gain. Without proof, the CFO has to take it on faith, and the CFO does not take things on faith.

The third failure mode is vanity metrics. Chats sent. Prompts run. Automations triggered. These numbers measure activity, not value. A team can run a thousand prompts a week and ship nothing. The right metrics tie back to dollars: hours saved at fully loaded cost, revenue recovered from a leak, decisions made faster on the things that move the business. Anything else is decoration.

Metric 1: Hours saved per automated workflow

This is the easiest metric to measure and the most defensible. Pick a workflow your team runs on a recurring basis. Time the manual version with a stopwatch and the person who does the work, not the manager who guesses. Multiply the time by the frequency. Multiply that by the fully loaded hourly cost of the person doing the work. That is your baseline cost per month. That is the number you compare against.

Once the automation is live, log savings per execution. Most automations do not eliminate the work entirely. They cut it from 60 minutes to 5, or from 45 minutes to a click. The delta per execution times the executions per month is your monthly savings on that workflow. Roll up across all the workflows you have automated and you have your hours-saved line item. Boring. Defensible. Repeatable. The CFO can audit it in 10 minutes.

The example I use most often comes from a construction client. The team was spending 60 minutes per project finding the right document across project folders and email. After the custom search system shipped, the same task took roughly 30 seconds. Across 7 active projects and 4 site managers, that is hundreds of hours per month back to people whose fully loaded cost is not trivial. That is the entire calculation. No multipliers. No assumptions. Just baseline times saved times rate.

Metric 1 Formula

The hours-saved metric is the foundation of every other metric in this framework. If you cannot baseline a single workflow before the engagement starts, the engagement does not start. That is the discipline.

(baseline_minutes - new_minutes) × executions_per_month × loaded_hourly_cost / 60

Metric 2: Revenue recovered from fixed leaks

Hours saved is the floor. Revenue recovered is where AI consulting actually pays for itself. Every mid-market operation has revenue leaking through cracks the team knows about but cannot get to. Quotes that take 3 days when the buyer needed a number by end of day. Renewal notices that go out late. Invoices that never get reconciled because nobody owns the spreadsheet. Pipeline opportunities that go cold because nobody followed up at week 4.

To measure this metric you need a before number. Pull last quarter and count the leaks. How many quotes were late. How many renewals slipped past 90 days. How many invoices sit in dispute. Put a dollar amount on each one. That is your leak baseline. After the engagement, count the same things again with the same definitions. The difference is your revenue recovered. It is not perfect attribution, but it is honest, and it is more honest than any vendor calculator.

The mistake operators make here is trying to claim recovered revenue without a baseline. You cannot say you recovered $200K in late invoices if you never measured how much late invoice revenue you had before. Pick the leaks. Size them. Track them. The CFO will respect a small number that is real over a large number that is invented.

Metric 2 Formula

The leak categories vary by business. For services firms it is usually quote turnaround, renewal slippage, and project overrun catches. For distribution it is order errors, inventory mismatches, and reorder lag. Pick the three biggest in your operation, baseline them in week 1, and track them every month after.

baseline_leak_dollars - current_leak_dollars (per quarter, per leak category)

Metric 3: Decision latency reduced

Decision latency is the time from when a question is asked to when an answer is delivered. How long does month-end close take. How long to get margin per SKU. How long to know cash position. How long to answer a board question about pipeline health. These numbers compound in ways the other two metrics miss. A team that can answer questions in minutes makes better decisions than a team that waits 3 days for a spreadsheet, and the difference shows up in places that are hard to attribute one quarter at a time.

To measure it, pick the three questions leadership asks most often that currently take too long. Time the answer process today, end to end. After the engagement, time the same questions again. The delta is your latency reduction. You will not get a clean dollar figure out of this metric, and that is fine. Report it as days or hours saved, with the question listed, and let the CFO decide what it is worth in context. Latency is the metric that matters most for strategic decisions, even when it is the hardest to monetize.

The reason to measure latency at all is that it is the leading indicator for whether the operation is actually getting smarter. Hours saved tells you the team is working faster on the same tasks. Latency tells you the team can think about new things they could not think about before. That is the structural change AI is supposed to deliver. If latency does not move, the engagement is automating busywork, not raising the ceiling.

Metric 3 Formula

A reasonable target for the first 90 days is to cut decision latency on three named questions by at least half. If you cannot identify three questions, the operation does not have a latency problem and you should focus on metrics 1 and 2.

baseline_latency - current_latency (per question, per month)

The monthly ROI report (one page, no slides)

The monthly report is one page, four sections, sent to the CFO and the leadership team on the first business day of every month. It is not a slide deck. It is not a dashboard screenshot. It is one page of plain text that anyone can read in 3 minutes.

Section 1: hours saved this month. List each automated workflow, the baseline minutes, the new minutes, the executions, and the dollar value at fully loaded cost. Total at the bottom. Section 2: cumulative year-to-date. Same format, rolled up. Section 3: revenue recovered. List each leak category, the baseline dollars, the current dollars, the delta. Total at the bottom. Section 4: decision latency. List the three questions you are tracking, baseline time, current time, delta in days.

Below the four sections, two more lines. Total benefit (hours saved plus revenue recovered). Net of fully loaded cost (total benefit minus retainer minus tooling minus client-side time investment). That is the entire report. No charts. No prose. Just numbers the CFO can audit and a footer with the date and the consultant's name.

A real example: Argentina's largest construction group

One of the engagements I keep coming back to was with Argentina's largest construction group. The presenting problem was operational chaos across 7 active projects. The CFO and the operations team could not see costs in real time, document search took 60 minutes per look-up, and certifications were tracked in spreadsheets nobody trusted. The leak was real but unquantified. The CFO needed a framework, not a tool.

We baselined three workflows in week 1. Document search at 60 minutes per look-up, multiple times per day per site manager. Cost tracking via WhatsApp screenshots and a 15-tab Excel file. Certifications tracked in a shared drive nobody owned. After the build, document search dropped to roughly 30 seconds. Cost tracking moved into a single source of truth the team checked daily. Certifications became a workflow with an owner and an audit trail. Each of those was sized in hours and rate before the engagement so the after-numbers had something to compare against.

The result was not a single dramatic figure. It was a structural change in how the operation ran, measured month over month against three baselines that the CFO had signed off on at week 1. The full case study is on the construction case study page. The part that matters for this guide is the discipline. Three workflows. Baselined before. Measured after. Reported monthly. That is the entire framework, applied to a real operation, with numbers the CFO could defend.

The retainer is not the full cost

Net ROI is gross benefit minus fully loaded cost. The retainer is one line in the cost column. There are three more. Tool licenses are real money. ChatGPT Team and Claude for Work both run roughly 25 to 30 dollars per user per month at list price. For a 50-person company that is meaningful. Client-side time is the second hidden cost. Your team will spend hours in workshops, hours testing, hours adopting. Track those hours at the same fully loaded rate you used for the savings calculation. The third cost is change-management friction. People resist new tools. Productivity dips before it climbs. Budget for the dip and surface it in the first month so nobody is surprised.

For the published pricing context on the retainer side, the AI consulting cost guide covers the standard ranges across audit, build, and retainer. The ROI calculation does not depend on a specific dollar figure for the retainer. It depends on whether the gross benefit, measured honestly, exceeds the total of all four cost lines, measured honestly. That is the only number that matters.

What working looks like at 30, 60, and 90 days

Month 1 is baselines. Three workflows time-studied with the people who do the work, three leak categories sized from last quarter, three decision-latency questions identified and timed. One automation lives by week 4, even if it is small. The first time-savings log starts immediately. If month 1 ends without baselines and a single live workflow, the engagement is on the wrong track and you should fix it right then.

Month 2 is volume. Three to five workflows live, each one logging savings per execution. The first monthly ROI report goes out. The numbers will be small. That is fine. The point of the first report is to prove the report exists, the format is right, and the measurement loop is working. The format does not change after this. Only the numbers change.

Month 3 is signal. Hours saved is climbing. Revenue recovery from at least one leak category is starting to surface. Decision latency on at least one of the three questions has measurably dropped. The retainer is paying for itself or it is not, and you can see it in the report. If month 3 looks like month 1, the engagement is not working and you should have an honest conversation. If month 3 looks like real movement, you are in the loop that this framework is built to produce.

When to fire your AI consultant

The honest part. If your consultant did not set a baseline by week 2, fire them. They are not measuring anything and they will not be able to prove ROI later. If the monthly report is vibes instead of numbers, fire them. If month 3 shows zero hours saved and zero revenue recovered, fire them. If the consultant tells you the gains are real but cannot show you the math, fire them. If the consultant pushes back on publishing the monthly report because the numbers are not flattering, fire them and find someone whose work survives contact with a CFO.

The reason to write this paragraph in a guide on my own site is that I would rather you fire a bad consultant on month 3 than spend a year wondering if it is working. The bar for AI consulting should be the same bar you apply to every other line item in the operating budget. If you want to see what that bar looks like in practice, the Fractional Head of AI engagement is built around exactly this kind of measurement. The services overview shows where it fits with the audit and build tiers. If you want to start with a baseline today, the free assessment walks you through the first three workflows in under an hour. Background on how I work is on the about page.

Common Questions

Frequently Asked Questions

For a properly scoped engagement, the first measurable wins land in month 1, the first monthly report goes out in month 2, and net positive ROI is visible by month 3. If the consultant cannot show you a baseline by week 2 and a working automation by week 6, the engagement is on the wrong track. Faster claims are usually theater. Slower means the foundation work is real, which is fine, but you should know what is happening and why.

For a fractional engagement at the mid-market scale, payback inside 6 months is a reasonable bar. The earliest wins are hours saved on workflows the team runs every week. Those compound monthly. Revenue recovery from fixed leaks lands later, usually months 3 to 6, because the data has to clean up first. If you are still negative at month 6 and the consultant cannot tell you why, that is a real signal.

Pick the three workflows your team complains about most. For each one, run a 5-minute time study with the person who does it. How long does it take, how many times per week, who does it, what does it cost in fully loaded labor. Write it down. That is your baseline. Most teams skip this step because it feels slow. It is the single most important number in the entire engagement, so do it.

Track them, do not count them. Morale, energy, less context-switching, fewer late nights for the analyst, these are real and they matter for retention. But they are not the line item the CFO uses to defend the budget. Keep the hard metrics in the monthly report and keep the soft observations in a separate paragraph at the bottom. The soft signals tell you whether the team is actually adopting the tools, which is the leading indicator for the hard metrics.

Walk away. A consultant who will not commit to a baseline, a monthly metric, and a published report is selling you advice, not outcomes. Advice is fine for a strategy deck. It is not fine for a 5-figure monthly retainer. The honest version of this is a fractional operator who builds working systems and lets the numbers prove the work. If you cannot measure it, you cannot defend it, and you should not be paying for it.

Vendor calculators start from a number the vendor wants to land on and work backwards. They use industry-average multipliers, they assume you actually adopt the tool, and they ignore the implementation cost. The framework on this page starts from your actual baseline, measures what you actually saved, and subtracts the fully loaded cost of the engagement. One is marketing math. The other is operating math. The CFO knows the difference.

Yes, and most operators leave money on the table here. Custom AI development qualifies as research expenditure under Section 174A and IRC Section 41. For mid-market companies, that can offset 20 to 45% of the project cost across federal credits, state credits, and the immediate expensing fix. The net ROI calculation should include the credit recovery as a line item below the hard benefits. The full breakdown lives on the tax credits and grants page at /resources/tax-credits-grants-ai.

The cheapest way to find out if an AI consultant will pay back is to baseline three workflows before you start. The free assessment walks you through it in under an hour.