TAME — Test Automation Man-hours Estimator

Draft 1

TAME: A Man-Hours Efficiency Model for Test Automation

Test Automation Man-hours Estimator — with application to AI-assisted test creation and automated execution.

Abstract

This paper introduces TAME (Test Automation Man-hours Estimator), a two-stage model for quantifying the human effort saved when a testing process moves from manual work to AI-assisted test creation and automated execution.

Unlike informal “percent faster” claims, TAME measures effort in man-hours and separates two stages whose economics differ fundamentally. The framework integrates:

a one-time creation stage, where AI generation plus human review replaces manual authoring;
a recurring execution stage, where automated scripts replace manual re-runs each cycle;
an accuracy-bounded savings ceiling for creation;
a volume-independent break-even point for automation.

The model reports human effort only; the faster wall-clock turnaround of an unattended run is treated as a separate throughput benefit and excluded from the man-hours result.

The two stages also differ in kind, which is the basis of the investment case: manual execution is a raw operating expense, paid in full every cycle and leaving no asset behind, whereas automation scripting is a capital-like outlay, incurred once and amortised across every cycle it serves. Break-even is the point at which that amortised investment falls below the recurring manual expense.

Notation & conventions

Units are minutes unless noted; outputs are man-hours. Symbols are introduced where they first appear — each formula carries a small table of just the symbols it adds, with the illustrative default used in the worked examples. The full list is collected in the Symbol reference at the end.

The defaults are placeholders for explanation only. The savings the model reports depend entirely on these inputs, so substitute your own measured values for every one before drawing any conclusion (see §14, Calibration). Different inputs give different results, by design.

1.Core Philosophy

Most efficiency claims answer one question: how much faster is the new tool? TAME asks a more precise one: how many man-hours does the new process remove, and how does that change as the work repeats?

This distinction matters because creating a test case is paid once, while running it is paid every regression cycle. A model that blends the two hides where the savings actually come from. TAME is built on the principle:

Creation is paid once. Execution is paid every cycle.

The Model in One Line

Before the details, here is the entire model in a single expression. Every time TAME uses — authoring, review, scripting, execution, triage — is a special case of one master form that holds all the variables at once:

\tau_x(i) \;=\; \sigma_x\,\rho_x\,\kappa_x \cdot t_x \cdot i^{-b_x}

Reading it left to right: $\sigma$ , $\rho$ and $\kappa$ are the seniority, client-process and tool-proficiency modifiers; $t_x$ is the intrinsic time the task takes in the neutral case; and $i^{-b_x}$ is the learning discount that accrues with repetition. The document starts from the base — the neutral case where every modifier equals 1 and learning is switched off, so the master form is simply $t_x$ — and then builds each term back in: the base times in §§2–8, the learning term in §9, and the context modifiers in §10, reassembled in full in §11. Leading with the complete form is deliberate: it shows that nothing is omitted, and that the simpler model which follows is this same equation with its extension terms switched off.

2.Manual Creation Cost

Formula

T_{\mathrm{manual}} = N\,t_{\mathrm{author}}

(1)

Symbol	Meaning	Default / units
$T_{\mathrm{manual}}$	Creation man-hours if every case is authored by hand	output
$N$	Number of test cases needed	400
$t_{\mathrm{author}}$	Manual authoring time per case	20 min

Meaning

The baseline effort to author every test case by hand: the number of cases times the time to write one.

Example

$T_{\mathrm{manual}} = 400 \times 20 = 8{,}000$ minutes $=$ 133.3 man-hours.

Assumption

Authoring time per case is treated as an average. In practice it varies by complexity; sampling a representative batch is the way to fix it.

3.AI-Assisted Creation Cost

Formula

T_{\mathrm{AI}} = k\,t_{\mathrm{setup}} + (1-m)\,N\,(t_{\mathrm{review}} + p\,t_{\mathrm{correct}}) + m\,N\,t_{\mathrm{author}}

(2)

Symbol	Meaning	Default / units
$T_{\mathrm{AI}}$	Creation man-hours with AI generation plus human review	output
$k$	Scenarios prompted into the AI tool	20
$t_{\mathrm{setup}}$	One-time setup per scenario — prompting, generation, data, vetting	10 min
$m$	AI miss rate — fraction of needed cases the AI fails to produce	10%
$t_{\mathrm{review}}$	Review time per AI-generated case	4 min
$p$	Fraction of AI cases needing rework	30%
$t_{\mathrm{correct}}$	Correction time per reworked case	6 min

Meaning

AI does not remove the work; it shifts authoring to review. The tool drafts cases that a human reviews; a fraction need rework; and the cases the AI fails to produce fall back to full manual authoring.

Example

Setup $k\,t_{\mathrm{setup}} = 200$ ; reviewed AI cases $0.90 \times 400 \times (4 + 0.3{\times}6) = 2{,}088$ ; missed cases $0.10 \times 400 \times 20 = 800$ . Total $T_{\mathrm{AI}} = 3{,}088$ min $=$ 51.5 man-hours.

What $t_{\mathrm{setup}}$ really contains

The example treats setup as prompt-and-login time alone, but standing up AI generation for a scenario is more than prompting. In full it is the sum of four one-time costs:

t_{\mathrm{setup}} = t_{\mathrm{prompt}} + t_{\mathrm{gen}} + t_{\mathrm{data}} + t_{\mathrm{vet}}

— login and prompting, AI generation plus the orchestration a human shepherds, provisioning the test data at least once, and vetting the generated cases before they are trusted. Counted in full, $t_{\mathrm{setup}}$ can approach the manual authoring time for a scenario. That does not by itself erase the saving, because it is paid once per scenario and spread over every case that scenario generates. AI creation pays off only once enough cases amortise the setup:

\frac{k\,t_{\mathrm{setup}}}{N} \;<\; t_{\mathrm{author}} - (t_{\mathrm{review}} + p\,t_{\mathrm{correct}})

The right side is the per-case advantage of reviewing over authoring; the left is the setup spread across cases. The deeper payoff is reuse on two levels: the setup amortises over cases in creation, and the resulting automated tests amortise again over cycles in execution (§12). The heavier the setup, the more the value depends on running the tests many times rather than on creating them once.

Data provisioning needs one caution. Provisioned once, it is a setup cost as above; but when test data is consumed or deleted between runs, provisioning recurs, and that recurring share belongs in the per-cycle execution cost (§12), not the one-time setup. For stable regression where the data persists, it falls back to zero.

Why this was chosen

Review and rework are the real residual costs of AI generation. Modelling them explicitly keeps the saving honest rather than assuming the AI output is used as-is.

Alternatives considered

Pure generation, no review. Rejected — unreviewed AI cases are not safe to run and overstate the saving.
Flat “X% faster” factor. Rejected — it hides the dependence on miss rate and rework, the variables that actually move the result.

4.Creation Savings and the Accuracy Ceiling

Formula

S_{\mathrm{creation}} = \frac{T_{\mathrm{manual}} - T_{\mathrm{AI}}}{T_{\mathrm{manual}}} = 1 - \frac{T_{\mathrm{AI}}}{T_{\mathrm{manual}}}

(3)

e = \frac{t_{\mathrm{review}} + p\,t_{\mathrm{correct}}}{t_{\mathrm{author}}}

(4)

S_{\mathrm{creation}} \approx (1-m)(1-e)

(5)

Symbol	Meaning	Default / units
$S_{\mathrm{creation}}$	Fraction of creation man-hours saved	output
$e$	Residual effort ratio — share of manual effort an AI case still costs	derived

From definition to closed form

Equation (3) is the definition of savings; (5) is what it becomes once the costs are substituted in. Writing the cost ratio and dividing each term of the numerator by $N\,t_{\mathrm{author}}$ :

\frac{T_{\mathrm{AI}}}{T_{\mathrm{manual}}} = \frac{k\,t_{\mathrm{setup}}}{N\,t_{\mathrm{author}}} + (1-m)\,\frac{t_{\mathrm{review}} + p\,t_{\mathrm{correct}}}{t_{\mathrm{author}}} + m \;\approx\; (1-m)\,e + m

The middle ratio is exactly $e$ from (4); the first term — the one-time setup spread across all $N$ cases — is about 2.5% at the defaults, small enough to drop. Substituting back and factoring out $(1-m)$ gives (5): the saving is coverage $(1-m)$ times per-case efficiency $(1-e)$ — the product that makes the accuracy ceiling visible.

Key property

Savings are bounded by accuracy, not speed. If the AI misses 20% of cases, the saving cannot exceed 80% no matter how fast review becomes.

Example

With $e = (4 + 0.3{\times}6)/20 = 0.29$ and $m = 0.10$ : $S_{\mathrm{creation}} \approx (1-0.10)(1-0.29) \approx$ 61% — saving 81.9 of 133.3 man-hours.

Setup time and what to automate first

The setup term does not touch the execution break-even — that is purely an execution quantity. What it decides is whether the creation stage pays at all, a per-scenario question: a scenario's one-time setup must be repaid by the cases it yields. Per scenario the saving is:

S_{\mathrm{scenario}}(c) = (1-m)(1-e) - \frac{t_{\mathrm{setup}}}{c\,t_{\mathrm{author}}}

The first term is the setup-free ceiling — about 64% at the defaults; the second is the setup spread over the $c$ cases that share it. Setup breaks even when the two are equal:

c^{\star} = \frac{t_{\mathrm{setup}}}{(1-m)(1-e)\,t_{\mathrm{author}}}

At the defaults the denominator is about 12.8 minutes, so $c^{\star}$ is roughly 0.8, 1.6 and 2.4 cases for a $t_{\mathrm{setup}}$ of 10, 20 and 30 minutes. Below that the scenario loses; above it the saving climbs toward the ceiling.

Per-scenario creation saving, by cases yielded (rows) and setup time (columns)
Cases $c$	10 min	20 min	30 min
1	+14%	−36%	−86%
2	+39%	+14%	−11%
5	+54%	+44%	+34%
10	+59%	+54%	+49%
20	+61%	+59%	+56%
50	+63%	+62%	+61%

Creation saving per scenario versus cases yielded, for three setup times, climbing toward a 64 percent ceiling above a shaded loss zone. — **Figure 1.** Creation saving per scenario against the number of cases a scenario yields, for three setup times. A heavy setup (30 min) sits in the loss zone until the scenario yields about two to three cases; a light one (10 min) turns positive almost at once. All three climb toward the same 64% setup-free ceiling.

Automate the low-hanging scenarios first: low setup, many cases. They pay back immediately and fund the harder ones later.

This is why the order of automation matters as much as the decision to automate. Front-load the scenarios that yield early gains; defer the complex, low-yield ones until the suite — and the team's setup speed — has matured (§9). Chasing the hardest scenarios first is the most common way a promising automation effort posts a loss in its opening project.

5.Manual Execution Cost

Formula

H_{\mathrm{manual}} = R\,N\,t_{\mathrm{exec}}

(6)

Symbol	Meaning	Default / units
$H_{\mathrm{manual}}$	Execution man-hours if every case is run by hand each cycle	output
$R$	Execution cycles (regression runs)	12
$t_{\mathrm{exec}}$	Manual execution time per case, per cycle	12 min

Meaning

The recurring baseline: a human re-runs every case, every regression cycle. This cost is paid $R$ times, which is what makes execution the dominant term over a project.

Example

$H_{\mathrm{manual}} = 12 \times 400 \times 12 = 57{,}600$ minutes $=$ 960 man-hours.

Assumption

Every case is executed every cycle. Suites that run only a subset per cycle can scale $R$ per case, or use an effective average.

6.Automated Execution Cost

Formula

H_{\mathrm{auto}} = a\,N\,t_{\mathrm{script}} + R\big[\,a\,N\,(t_{\mathrm{triage}} + \mu\,t_{\mathrm{maint}}) + (1-a)\,N\,t_{\mathrm{exec}}\,\big]

(7)

Symbol	Meaning	Default / units
$H_{\mathrm{auto}}$	Execution man-hours with automation	output
$a$	Automation coverage — share of cases scripted	70%
$t_{\mathrm{script}}$	Scripting time per automated case (one-time)	60 min
$t_{\mathrm{triage}}$	Triage time per automated case, per cycle	1 min
$\mu$	Scripts needing maintenance per cycle	8%
$t_{\mathrm{maint}}$	Maintenance time per affected script	20 min

Meaning

Automation front-loads a large scripting cost, then runs for almost nothing each cycle. The unattended machine run itself costs zero man-hours; the only recurring human costs are triaging results, repairing broken scripts, and executing the cases that were never automated.

Example

Upfront scripting $0.70 \times 400 \times 60 = 16{,}800$ ; per cycle: automated upkeep $280 \times 2.6 = 728$ plus manual remainder $0.30 \times 400 \times 12 = 1{,}440$ , so $2{,}168$ per cycle. $H_{\mathrm{auto}} = 16{,}800 + 12 \times 2{,}168 = 42{,}816$ min $=$ 713.6 man-hours (of which 280 h is one-time scripting).

Why this was chosen

Separating one-time scripting from per-cycle upkeep is what reveals the payback dynamic. Folding them together would hide why automation loses at low cycle counts and wins at high ones.

Alternatives considered

Zero per-cycle cost. Rejected — ignores triage and maintenance, the costs that quietly erode automation ROI.
Maintenance as a one-time cost. Rejected — scripts break as the application changes, so upkeep recurs each cycle.

7.Break-Even Cycles

Formula

B = \frac{t_{\mathrm{script}}}{t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}}}

(8)

Symbol	Meaning	Default / units
$B$	Break-even cycles — cycles for one script to repay its scripting cost	derived

Meaning

The number of cycles needed for one automated script to repay its scripting cost. The manual cases and the coverage fraction appear identically on both sides of the comparison and cancel out. Break-even is fundamentally an amortisation question: how many cycles must share the one-time cost before the amortised figure drops below the recurring manual expense.

Key property

Break-even is independent of test volume and automation coverage.

Example

$B = 60 / (12 - 1 - 1.6) =$ 6.4 cycles. Past this point the suite has repaid its scripting cost and every further cycle widens the gap against manual.

Failure case

If $t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}} \le 0$ , automated upkeep costs as much as a manual run and automation never pays back. The model flags this explicitly.

8.Combined Program Savings

Formula

H_{\mathrm{old}} = N\,t_{\mathrm{author}} + R\,N\,t_{\mathrm{exec}}

(9)

H_{\mathrm{new}} = T_{\mathrm{AI}} + H_{\mathrm{auto}}

(10)

S_{\mathrm{program}} = \frac{H_{\mathrm{old}} - H_{\mathrm{new}}}{H_{\mathrm{old}}}

(11)

Symbol	Meaning	Default / units
$H_{\mathrm{old}}$	Total man-hours if everything stays manual	output
$H_{\mathrm{new}}$	Total man-hours with AI plus automation	output
$S_{\mathrm{program}}$	Fraction of total man-hours saved over the program	output

Worked example

Stage	Manual (h)	New (h)	Saved (h)	Saved
Creation (one-time)	133.3	51.5	81.9	61%
Execution (12 cycles)	960.0	713.6	246.4	26%
Combined program	1,093.3	765.1	328.3	30%

Execution shows only 26% at 12 cycles because upfront scripting and the manual remainder hold it down. Raise the cycle count or the coverage and the figure climbs steeply — the single largest lever in the model is $R$ .

Combined program saving across a range of cycle counts
Cycles $R$	Manual (h)	New (h)	Combined saved
1	213	368	−72%
4	453	476	−5%
6	613	548	+11%
12	1,093	765	30%
24	2,053	1,199	42%
48	3,973	2,066	48%
100	8,133	3,945	52%

The rows are not multiples of the first because each total splits into a one-time cost plus a per-cycle cost: $\text{Manual} = 133.3 + 80R$ and $\text{New} = 331.5 + 36.1R$ hours. Only the per-cycle term scales. The one-time creation (133.3 h manual; 51.5 h of AI creation plus 280 h of upfront scripting for the new process) is paid once, not once per cycle.

Combined program saving rising with cycles, crossing zero near 4.5 cycles and flattening toward 55 percent. — **Figure 2.** Combined program saving against execution cycles. The curve is negative below about four to five cycles — automating the whole program costs more than staying manual until then — then climbs steeply and flattens toward a ceiling near 55%. This combined break-even (≈ 4.5 cycles) sits below the execution break-even $B = 6.4$ of (8), because the one-time creation saving gives the program a head start.

The steep early rise is the point: most of the value is won in the first ten to twenty cycles, which is why $R$ — not coverage or per-case efficiency — is the lever that moves the result most.

9.Learning Curve Extension (optional)

The base model holds every per-unit time constant. In reality, constructive work speeds up with repetition — most sharply for automation scripting, where the first scripts build reusable scaffolding that later scripts inherit. TAME models this with a power-law learning curve that switches off cleanly, reducing exactly to the base model.

Formula

t_i = t_1\,i^{-b}, \qquad b = -\log_2 L

(12)

T_{\mathrm{learn}}(Q) \approx t\,n_0^{\,b}\,\frac{(n_0+Q)^{1-b} - n_0^{\,1-b}}{1-b}

(13)

Symbol	Meaning	Default / units
$L$	Learning rate — time multiplier per doubling	85% / 95%
$b$	Learning exponent, $b=-\log_2 L$	derived
$t_i$	Time to perform the $i$ -th repetition of a learnable activity	derived
$Q$	Number of new repetitions in this project	input
$n_0$	Experience offset — repetitions already completed	prior
$t$	Current per-unit time at the team's present skill	input

Meaning

Each repetition of a learnable activity is a little faster than the last, following a constant percentage improvement per doubling of cumulative experience. The offset $n_0$ says how far up that curve the team already sits — so the entered rate keeps its natural meaning: what the task costs us now.

Parameters

$L$ — learning rate: the time multiplier per doubling. Default 85% for scripting, 95% for authoring and review.
$n_0$ — experience offset: small = cold start (new framework); large = seasoned team (≈ flat).
$L = 100\%\ (b=0)$ — reduces to $T_{\mathrm{learn}} = t \times Q$ : the base model exactly. Learning is off by default.

Where it applies

Scripting — primary. Strongest and best-documented learning effect.
Authoring and review — optional, mild (high $L$ ), applied to both the manual baseline and the AI path so savings are not inflated.
Execution, triage, maintenance — flat. Mechanical and bounded; learning is negligible.

Example

Scripting at $t = 60$ min, $L = 85\%$ (so $b = 0.234$ ), $n_0 = 40$ , over $Q = 280$ scripts gives $T_{\mathrm{learn}} \approx 12{,}260$ min $\approx$ 204 man-hours, against 280 h at a flat rate. The combined program saving rises from 30% to ≈ 37%.

Two power-law learning curves: 85 percent scripting falls to 27 percent, 95 percent authoring only to 66 percent, over 256 units. — **Figure 3.** Power-law learning. Scripting (85%) compounds through reuse, falling to about 27% of first-unit time over eight doublings of cumulative experience; authoring and review (95%) have little reusable artifact, so per-unit time bends only to about 66%. The gap — scripting steeper than authoring — is the substantive claim, not the exact percentages.

Why a power law was chosen

Repeated constructive work has followed a power law since Wright (1936): a straight line on log–log axes, a constant percentage improvement per doubling of experience. This matches how a scripting framework actually matures — early effort builds shared structure, later effort reuses it.

Learning rewards repetition. Scripting repeats the most, so it learns the most.

10.Effort Modifiers: Seniority, Client Process, Tool Proficiency

The base model treats each per-unit time as a fixed average. Three real-world factors shift those times systematically: who does the work (seniority), where it is done (the client's process), and how well the team knows the tools. TAME folds them in as dimensionless multipliers on the base times, each defaulting to 1 so the base model is the neutral case.

Formula

t_x^{\mathrm{eff}} = \sigma_x\,\rho_x\,\kappa_x\,t_x

(14)

Symbol	Meaning	Default / units
$\sigma$	Seniority factor — skill of who performs the activity (< 1 senior, > 1 junior)	0.7–1.4
$\rho$	Client-process factor — governance overhead per unit of work	1.0–1.8
$\kappa$	Tool-proficiency factor (AI and automation)	0.7–1.5
$t_x^{\mathrm{eff}}$	Effective time for activity $x$ after modifiers	derived

Every per-unit time $t$ in §§2–8 is read as its effective value $t^{\mathrm{eff}}$ . The factors apply per activity; an activity a factor does not touch simply takes 1.

Where each factor applies
Activity	$\sigma$ seniority	$\rho$ process	$\kappa$ tools
Manual authoring	✓	✓	—
AI review	✓	◐	$\kappa_{\mathrm{AI}}$
Rework	✓	—	$\kappa_{\mathrm{AI}}$
Prompt / setup	◐	—	$\kappa_{\mathrm{AI}}$
Manual execution	◐	✓✓	—
Scripting	✓	—	$\kappa_{\mathrm{auto}}$
Triage	✓	—	$\kappa_{\mathrm{auto}}$
Maintenance	✓	—	$\kappa_{\mathrm{auto}}$
Automated run	—	◐	—

✓✓ strong · ✓ applies · ◐ minor · — none

Seniority ( $\sigma$ )

The skill level of whoever performs the activity. $\sigma < 1$ is faster (senior), $\sigma > 1$ is slower (junior). The effect is strongest on judgement-heavy work — review, rework, scripting — and weakest on mechanical execution. Typical range 0.7–1.4. $\sigma$ scales man-hours only; to convert to cost, weight each activity's effective hours by the loaded rate of the role that performs it.

Client process ( $\rho$ )

The client's governance overhead per unit of work: environment access, approvals, evidence capture, traceability, sign-offs. $\rho = 1$ is a lightweight or agile client; $\rho$ rises toward ~1.8 in heavily regulated programs. The asymmetry is the important part: a human pays $\rho$ on every manual run, while an automated run emits its logs and audit trail automatically, so $\rho$ barely touches it.

Governance is paid on every manual run, but essentially once by automation.

So high process maturity multiplies the automation advantage — central for regulated programs such as SAP finance, where every manual execution carries mandatory documentation. Concretely, $\rho$ rises with the compliance regime; common ones include:

SOX §404 — IT general controls on financial-reporting systems such as SAP FICO; every change is documented, tested, and signed off.
FDA 21 CFR Part 11 with GAMP 5 — computerized-system validation in life sciences (IQ/OQ/PQ protocols, full traceability), the heaviest common regime.
PCI-DSS — controlled testing and evidence for payment-card systems.
GDPR / HIPAA — data handling, masking, and privacy controls in test environments.
SOC 1 / SOC 2 and SR 11-7 — audit and model-risk controls in banking and financial services.
IEC 62304 / DO-178C / ISO 26262 — verification rigour in medical, avionics, and automotive software.

Tool proficiency ( $\kappa$ )

How well the team knows the specific tools — the AI generator ( $\kappa_{\mathrm{AI}}$ ) and the automation framework ( $\kappa_{\mathrm{auto}}$ ). It works through two channels. Time: $\kappa$ multiplies tool-mediated times. Quality: a fluent team prompts better and writes sturdier scripts, lowering the rework, miss, and maintenance fractions:

p_{\mathrm{eff}} = \kappa_{\mathrm{AI}}\,p, \qquad m_{\mathrm{eff}} = \kappa_{\mathrm{AI}}\,m, \qquad \mu_{\mathrm{eff}} = \kappa_{\mathrm{auto}}\,\mu

(15)

$\kappa$ is today's static proficiency; the learning curve (§9) is its trajectory. For scripting, set $\kappa_{\mathrm{auto}}$ for the current rate and $n_0$ for how fast it improves — do not count the same gain in both.

Why multiplicative

The factors compound rather than add: a senior expert at a low-governance client is fast on every count, and the effects stack proportionally to task size. Multiplicative factors keep each cause separable and auditable, and collapse to the base model when set to 1.

Example

At a heavily governed client, $\rho \approx 1.6$ on manual execution lifts it to $12 \times 400 \times 12 \times 1.6 \approx$ 1,536 man-hours, up from 960. Automated upkeep is almost unchanged, so the program saving rises from 30% to roughly 44% and break-even arrives sooner.

11.The Complete Model: Base Plus Extensions

Having developed each term on its own — the base times (§§2–8), the learning curve (§9), and the effort modifiers (§10) — this section reassembles them into the complete form previewed in §1. The base is the starting point, and the learning and context terms are extensions switched on top of it.

The master time

Every time that appears anywhere in TAME is, in full generality, one expression combining the intrinsic cost, the context modifiers of §10, and the learning term of §9:

\tau_x(i) = \sigma_x\,\rho_x\,\kappa_x \cdot t_x \cdot i^{-b_x}

(16)

Symbol	Meaning	Default / units
$\tau_x(i)$	Effective time for the $i$ -th unit of activity $x$	derived
$t_x$	Intrinsic time for activity $x$ — the neutral, first-unit cost	base
$b_x$	Learning exponent for activity $x$ , $b_x = -\log_2 L_x$	derived

Three independent influences: intrinsic cost (the neutral first-unit time), context (who does it, where, how skilled — each 1 in the neutral case), and experience (the learning discount; $b_x = 0$ for activities that do not learn).

From per-unit time to the totals

The master time is per repetition. The stage totals need the average per-unit time across a whole batch — the modifiers times the learning-integrated base:

\bar\tau_x = \frac{1}{n}\sum_i \tau_x(i) = \sigma_x\,\rho_x\,\kappa_x \cdot \frac{1}{n}\sum_i t_x\,i^{-b_x}

(17)

The inner sum is the cumulative learning form of (13); with learning off it equals the count times the intrinsic time, so the batch-effective time reduces to the modifiers times $t_x$ . This is what enters the totals.

The master totals

T_{\mathrm{create}} = k\,\tau_{\mathrm{setup}} + (1-m)\,N\,(\tau_{\mathrm{review}} + p\,\tau_{\mathrm{correct}}) + m\,N\,\tau_{\mathrm{author}}

(18)

H_{\mathrm{exec}} = a\,N\,\tau_{\mathrm{script}} + R\big[\,a\,N\,(\tau_{\mathrm{triage}} + \mu\,\tau_{\mathrm{maint}}) + (1-a)\,N\,\tau_{\mathrm{exec}}\,\big]

(19)

These are (2) and (7) with every constant time replaced by its batch-effective value. The quality terms travel the same way — $p$ , $m$ , $\mu$ carry the tool-proficiency channel of (15).

Consistency: the neutral case recovers the base

Setting $\sigma_x = \rho_x = \kappa_x = 1$ and $b_x = 0$ , the master time collapses to the intrinsic time:

\tau_x(i) = (1)(1)(1) \cdot t_x \cdot i^{0} = t_x

(20)

and the complete totals (18) and (19) return exactly to the constant-rate equations (2) and (7). The base is the complete model with its extensions switched off — precisely as the one-line preview promised.

One formula, different inputs → different savings
Setting	Inputs changed from neutral	Saving
Neutral baseline	all $\sigma,\rho,\kappa = 1$ ; learning off	30%
+ scripting learning	$L = 85\%$ , $n_0 = 40$	≈ 37%
+ heavy governance	$\rho \approx 1.6$ on manual execution	≈ 44%
+ senior, tool-fluent team	$\sigma \approx 0.85$ , $\kappa \approx 0.8$	higher still

The base model is one configuration. Different inputs give different savings.

Every row uses the identical formula; only the inputs differ. The intended use is for a reader to set the variables to their own situation and read off their own number. The model's job is not to assert a single percentage — it is to be honest about which variables move the result and to let each reader arrive at theirs.

Worked example — a governed SAP FICO program

The modifiers are not academic; they often decide the verdict. Take the same 400-case suite inside a SOX-controlled SAP FICO program. Under §404 IT general controls — and, in life-sciences finance, GAMP 5 validation — every manual test execution carries mandatory documentation: pre-approval, step-by-step evidence, requirement traceability, four-eyes review, and archival. That overhead lands on each manual run, so $\rho \approx 1.6$ on manual execution, while the automated run emits the same evidence as a by-product and barely feels it (§10). Carry that one change through the whole model:

The same suite, neutral vs governed — cost layer on (w = $80/h, C_fix = $20,000, C_cyc = $500), R = 12 cycles
Quantity	Neutral	Governed SAP	+ Senior, fluent
Context modifiers	all 1	$\rho = 1.6$	$\rho{=}1.6,\ \sigma{=}0.85,\ \kappa{=}0.8$
Creation saving	61%	61%	70%
Manual execution	960 h	1,536 h	1,536 h
Effort break-even $B$	6.4 cyc	3.6 cyc	2.3 cyc
Combined saving	30%	44%	53%
Man-hours saved (12 cycles)	328 h	732 h	877 h
Cost-aware break-even $B_{\mathrm{cost}}$	14.1 cyc	7.4 cyc	5.8 cyc
Net value $V(12)$	−$6,288	+$25,968	+$37,838

The governance that punishes every manual run is exactly what makes automation win. In the neutral case automation barely breaks even in money — at the planned 12 cycles it loses about $6.3k. The same suite in the governed program flips to +$26k, because the manual baseline it now replaces nearly doubled (960 → 1,536 h) and the cost-aware break-even falls from 14 cycles to 7. Layer on a senior, tool-fluent team ( $\sigma \approx 0.85$ , $\kappa \approx 0.8$ ) and creation saving climbs to 70%, combined to 53%, and net value to +$38k — note that $\sigma$ and $\kappa$ sharpen creation and the automated upkeep but leave the manual baseline untouched, so they widen the gap rather than create it.

High process maturity multiplies the automation advantage. The heavier the governance on each manual run, the stronger the case to automate it away.

Reproduce it live: open the calculator, switch on the cost layer, and set the context modifiers to $\rho = 1.6$ (governance on manual exec), $\sigma = 0.85$ , and both $\kappa = 0.8$ (AI and automation).

12.Tooling Cost and the Investment Decision

Man-hours are the headline, but the go/no-go for automation also turns on money — the loaded cost of the hours saved, and the tool's own licence and infrastructure cost. This section adds the cost layer and turns the effort break-even of §7 into a money break-even management can act on.

From man-hours to money

\mathrm{Cost} = w\,H + C_{\mathrm{fix}} + C_{\mathrm{cyc}}\,R

(21)

Symbol	Meaning	Default / units
$w$	Blended loaded labour rate (cost per man-hour)	$/h
$H$	Man-hours from the model (manual or new process)	output
$C_{\mathrm{fix}}$	One-time tooling cost (licence, onboarding, setup)	$
$C_{\mathrm{cyc}}$	Per-cycle tooling cost (amortised licence, compute)	$/cycle

One-time $C_{\mathrm{fix}}$ : platform licence onboarding, implementation and CI integration, initial environment build, one-off training. Per-cycle $C_{\mathrm{cyc}}$ : subscription and per-seat licences amortised to a cycle, runner/compute or cloud-grid minutes, and any per-execution vendor charge.

Cost-aware break-even

B_{\mathrm{cost}} = \frac{w\,a\,N\,t_{\mathrm{script}} + C_{\mathrm{fix}}}{w\,a\,N\,(t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}}) - C_{\mathrm{cyc}}}

(22)

The numerator is the total upfront automation investment; the denominator is the per-cycle cost saving net of tooling. With both tooling terms zero, dividing through recovers the effort break-even (8) exactly.

Licence cost raises the break-even two ways: a fixed cost lifts the numerator; a per-cycle cost shrinks the denominator.

The viability test and net value

V(R) = \big[\,w\,a\,N\,(t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}}) - C_{\mathrm{cyc}}\,\big]\,R - \big(w\,a\,N\,t_{\mathrm{script}} + C_{\mathrm{fix}}\big)

(23)

Symbol	Meaning	Default / units
$B_{\mathrm{cost}}$	Cost-aware break-even — cycles to repay the upfront investment	derived
$V(R)$	Net value at R cycles (positive ⇒ worth automating)	derived

$V$ is positive precisely when the planned cycles exceed the cost-aware break-even. First the per-cycle saving must be positive; second the program must run more cycles than $B_{\mathrm{cost}}$ .

Worked example

With $w = \$80/\text{h}$ , $C_{\mathrm{fix}} = \$20{,}000$ , $C_{\mathrm{cyc}} = \$500$ : upfront investment $\$42{,}400$ ; per-cycle saving net of tool $\$3{,}009$ ; $B_{\mathrm{cost}} = 42{,}400 / 3{,}009 =$ 14.1 cycles. The effort break-even was 6.4 — licence and compute more than double it. At a planned 12 cycles, automation would lose roughly $V \approx -\$6{,}300$ ; it becomes worth it only beyond ~14 cycles.

Amortisation: how much to invest

Manual execution is a raw operating expense — the same cost every cycle, capitalising into nothing. Automation scripting is a capital-like outlay — spent once, then spread across every cycle it serves.

A_{\mathrm{manual}} = w\,N\,t_{\mathrm{exec}}

(24)

A_{\mathrm{auto}}(R) = \frac{w\,a\,N\,t_{\mathrm{script}} + C_{\mathrm{fix}}}{R} + g

(25)

g = w\big[\,a\,N\,(t_{\mathrm{triage}} + \mu\,t_{\mathrm{maint}}) + (1-a)\,N\,t_{\mathrm{exec}}\,\big] + C_{\mathrm{cyc}}

The automation cost per cycle is a hyperbola in $R$ : the upfront term shrinks as more cycles share it, so the cost per cycle falls toward the running-cost floor $g$ , while the manual line stays flat. The two cross exactly at the cost-aware break-even:

A_{\mathrm{auto}}(R) < A_{\mathrm{manual}} \quad\Longleftrightarrow\quad R > B_{\mathrm{cost}}

(26)

Automation cost per cycle falling as a hyperbola toward a floor and crossing the flat manual line near 14 cycles. — **Figure 4.** Cost per cycle: a flat manual expense against the amortised automation hyperbola, falling toward the running-cost floor $g$ and crossing the manual line at the cost-aware break-even, ≈ 14.1 cycles in the worked example. Left of the crossing, automation is the more expensive choice.

Manual execution is expensed every cycle; automation is a one-time asset amortised across them.

How many cycles does a project actually run?

R = R_{\mathrm{impl}} + R_{\mathrm{reg}}, \qquad R_{\mathrm{reg}} = f\cdot W

(27)

Symbol	Meaning	Default / units
$R_{\mathrm{impl}}$	Implementation cycles — the SIT and UAT passes during the project	≈ 3–4
$R_{\mathrm{reg}}$	Ongoing regression cycles after go-live	input
$f,\,W$	Regression runs per period; periods the suite is maintained	input

A typical implementation runs only a handful of passes — two system-integration cycles (SIT1, SIT2), a user-acceptance cycle (UAT), and often a pre-go-live dry run — so $R_{\mathrm{impl}}$ is about three to four. Everything beyond is regression. Since the cost-aware break-even was ~14 cycles but implementation supplies only three or four, the investment is not recovered by the project that builds the suite — it is recovered, if at all, by the regression that follows. Automation pays back only when:

R_{\mathrm{impl}} + R_{\mathrm{reg}} \ge B_{\mathrm{cost}} \quad\Longleftrightarrow\quad R_{\mathrm{reg}} \ge B_{\mathrm{cost}} - R_{\mathrm{impl}}

(28)

Automation is rarely repaid by the project that builds it — it is repaid by the regression cycles that follow.

Guiding the decision

Estimate the loaded rate $w$ for the people who run and maintain the suite.
Get the tool economics from the vendor quote: split into one-time $C_{\mathrm{fix}}$ and recurring $C_{\mathrm{cyc}}$ .
Compute the per-cycle saving net of tooling. If zero or negative, stop — no number of cycles redeems it.
Otherwise compute $B_{\mathrm{cost}}$ and compare to the cycles you genuinely expect — implementation passes plus realistic regression, not the horizon you hope for.
Invest only if expected cycles comfortably exceed the break-even, leaving margin for maintenance spikes and coverage that falls short of plan.

13.System Behavior

Few cycles ( $R$ below $B$ ). Scripting cost not yet repaid; automation shows a net loss; stay manual.

Many cycles ( $R$ well above $B$ ). Upfront scripting amortizes away; each cycle is near-free; percent saved approaches the coverage limit.

Low coverage (small $a$ ). Most cases still run manually each cycle; saving is capped well below 100% regardless of cycles.

14.Calibration

The defaults are industry-typical priors, not measured truth. Before reporting, sample real cases to fix the parameters that matter most:

$t_{\mathrm{author}}$ and $t_{\mathrm{review}}$ — time a small batch by hand and by review to set the creation ratio.
$m$ — the AI miss rate is the hard ceiling on creation savings and the figure most often forgotten.
$t_{\mathrm{script}}$ and $\mu$ — scripting effort and flaky-test maintenance are where optimistic automation cases break down.
$a$ and $R$ — be honest about how many cases can truly be automated and how often the suite runs.
$L$ and $n_0$ — if learning is enabled, fit them from a script-time log: plot per-script time against cumulative count on log–log axes; the slope gives $b$ (hence $L$ ), and how far in you already are gives $n_0$ .
$\sigma,\rho,\kappa$ — set seniority from the staffing mix, the client-process factor from the documentation each run carries, and tool proficiency from the team's familiarity with the tools.

15.Scope and Limitations

Man-hours only. Faster wall-clock turnaround from unattended runs is a separate throughput benefit, intentionally excluded.
Linear per-cycle costs. Maintenance is modelled as a steady fraction; it can spike around major releases.
Learning is optional and parametric. When enabled it assumes a single stable learning rate per activity; abrupt tooling changes or heavy turnover are only approximated.
Quality effects excluded. Earlier defect detection and broader coverage have real value but sit outside this effort-based model — treat them as unquantified upside.
Inputs are estimates. The output is only as good as the calibration in §14; present it as a defensible estimate with stated assumptions, not a measurement.

16.Conclusion

TAME shifts the question from an informal speed claim to a measured one: how many man-hours does the new process remove, stage by stage, as the work repeats? The model combines a one-time, accuracy-bounded creation saving; a recurring execution saving that compounds with cycles; and a volume-independent break-even point. It rests on a simple philosophy:

Manual testing pays by the cycle. Automation pays once, then runs free.

Rationale

This appendix consolidates the reasoning behind every modelling choice. Each entry states the decision, the reasoning, and the main alternative rejected.

Two stages

Separate creation from execution: their cost structures are categorically different — creation incurred once, execution every cycle — so a single blended figure hides where the savings come from. Rejected: a one-number “X% faster.”

Man-hours as the unit

Measure human effort, not calendar time or money. The mandate is effort saved; wall-clock turnaround would inflate the figure if mixed in, and money depends on rates that vary by organisation. Man-hours convert cleanly to cost later by weighting with loaded rates (§12).

Linear in case count and cycles — and why not exponential

Authoring or running one case does not change the cost of the next: no shared computation, no feedback loop, so no compounding mechanism justifies exponential growth. An exponential form here would be curve-fitting without a cause. Linearity is also what makes break-even independent of volume and coverage — a consequence, not a coincidence. The only legitimate departure is the mild sub-linear discount from learning, handled separately (§9).

Rework as an expected value

Rework enters as the expected cost per case. TAME predicts an aggregate over many cases, and by the law of large numbers the expected value is an accurate predictor of a sum — the same logic an insurer uses pricing on expected loss.

A missed case costs full authoring time

The conservative choice — a reviewer with context might author a missed case slightly faster, so assuming full cost cannot overstate the saving. Adjust downward only with evidence.

The capped form $(1-m)(1-e)$

Express the creation saving as two bounded factors — coverage and per-case efficiency. This makes explicit that accuracy, not speed, sets the ceiling — the most scrutinised claim in any AI-efficiency pitch. A raw speed-up hides the miss-rate ceiling and was rejected.

Why a power law for learning — the alternatives, formally

A learning law must stay positive, reproduce constant-improvement-per-doubling, and be parsimonious. One diagnostic separates the candidates — the ratio of times one doubling apart. For a power law it is constant:

\frac{t(2i)}{t(i)} = \frac{(2i)^{-b}}{i^{-b}} = 2^{-b} = L \quad(\text{constant})

a straight line on log–log axes (Wright, 1936). Exponential to a floor measures improvement against the floor, so the ratio decays with experience and the curve flattens too early — understating long-run gains; kept only as the documented variant when a hard floor exists. Logarithmic is unbounded below — it crosses zero and turns negative, physically impossible; rejected outright. Logistic S-curve needs two hard-to-identify parameters and models slow early improvement, the opposite of the steep initial drop real learning shows; the offset $n_0$ already absorbs any head start. Only the power law stays positive, keeps the per-doubling ratio constant, and does so with one parameter.

Anchoring learning on current skill

Anchor on the team's present rate through an experience offset, not a first-unit time. A “first script ever” time is nearly impossible to estimate and produces absurd deflation across hundreds of units; anchoring on what a script costs now keeps the entered rate meaningful and lets the offset carry how far up the curve the team sits.

Why 85% for scripting and 95% for authoring and review

Which rate fits an activity is governed by one question: how much does performing it once create reusable leverage for the next time? Scripting is constructive with high leverage — the first scripts build the framework (page objects, locator helpers, fixtures, CI wiring) that every later script inherits — exactly the mechanism behind the steep 80–85% curves long reported for construction and tooling. Authoring is cognitive with low leverage — each case targets different functionality, with a floor set by human comprehension speed that repetition cannot compress — pointing to a shallow curve near 95%. The gap, not the exact figures, is the substantive claim: over eight doublings an 85% curve cuts per-unit time to ~27%, a 95% curve only to ~66%. Both are priors to fit from a time log before production.

Effort modifiers multiply, not add

Each is a proportional effect on the same unit of work, and proportional effects compound — a senior 15% faster at a client 60% heavier lands at $0.85 \times 1.6$ , not $0.85 + 0.6$ . Multiplication keeps each cause separable and auditable and reduces to the base model at 1. Additive overheads fit only genuinely fixed steps, kept as a variant.

Tool proficiency scales the quality terms too

Fluency with a tool is not only faster but better — a skilled prompter elicits fewer misses and less rework, a skilled engineer writes sturdier scripts. Limiting $\kappa$ to time would miss its largest effect. It is tied to the learning offset so the same improvement is not counted twice.

Why the client-process factor reaches about 1.8

$\rho = 1.8$ means roughly 0.8 hour of documentation, review and sign-off for every hour of testing — high-risk validated environments (GAMP Category 5, SOX-controlled finance) where every execution is pre-approved, evidenced step by step, traced to a requirement, four-eyes reviewed and archived. Reported compliance overhead runs ~+30% for light regimes to +80–100% for the strictest. The asymmetry is what makes the factor matter: automation produces most of this evidence as a by-product of running, so a higher $\rho$ widens the automation advantage rather than narrowing it.

The machine run counts as zero man-hours

The model measures people's time, and a script on a server consumes none. The recurring human costs that remain — triage and maintenance — are modelled explicitly. If runs are supervised in practice, that time belongs in the triage term, not at zero.

Every factor defaults to 1

Every modifier, and the learning switch, defaults to the neutral value, so the base model is recoverable exactly and added realism is opt-in and auditable.

Symbol reference

Every symbol in one place. Defaults are the illustrative values used in the worked examples — placeholders, not recommendations (see §14, Calibration).

Symbol	Meaning	Default / units
Creation inputs
$N$	Number of test cases needed	400
$k$	Scenarios prompted into the AI tool	20
$t_{\mathrm{author}}$	Manual authoring time per case	20 min
$t_{\mathrm{setup}}$	Login and prompt time per scenario	10 min
$t_{\mathrm{review}}$	Review time per AI-generated case	4 min
$p$	Fraction of AI cases needing rework	30%
$t_{\mathrm{correct}}$	Correction time per reworked case	6 min
$m$	AI miss rate — cases authored manually	10%
Execution inputs
$a$	Automation coverage — share of cases scripted	70%
$t_{\mathrm{script}}$	Scripting time per automated case (one-time)	60 min
$t_{\mathrm{exec}}$	Manual execution time per case, per cycle	12 min
$t_{\mathrm{triage}}$	Triage time per automated case, per cycle	1 min
$\mu$	Scripts needing maintenance per cycle	8%
$t_{\mathrm{maint}}$	Maintenance time per affected script	20 min
$R$	Execution cycles (regression runs)	12
Learning curve (extension)
$L$	Learning rate — time multiplier per doubling	85% / 95%
$b$	Learning exponent, $b=-\log_2 L$	derived
$n_0$	Experience offset — units already completed	prior
Effort modifiers (extension)
$\sigma$	Seniority factor — skill of who performs the activity	0.7–1.4
$\rho$	Client-process factor — governance overhead	1.0–1.8
$\kappa$	Tool-proficiency factor (AI and automation)	0.7–1.5
Cost layer (optional)
$w$	Blended loaded labour rate (cost per man-hour)	$/h
$C_{\mathrm{fix}}$	One-time tooling cost (licence, onboarding, setup)	$
$C_{\mathrm{cyc}}$	Per-cycle tooling cost (amortised licence, compute)	$/cycle
Derived quantities and outputs
$e$	Residual effort ratio per AI case	derived
$B,\ B_{\mathrm{cost}}$	Break-even cycles (effort; cost-aware)	derived
$T_{\mathrm{manual}},\,T_{\mathrm{AI}}$	Creation effort — manual baseline; AI plus review	eff-hrs
$H_{\mathrm{manual}},\,H_{\mathrm{auto}}$	Execution effort — manual baseline; automated	eff-hrs
$S_{\mathrm{creation}},\,S_{\mathrm{program}}$	Fraction of effort saved (creation; whole program)	output

Prepared as a companion to the TAME calculator. Figures recompute live in the model.

TAME: A Man-Hours Efficiency Model for Test Automation

Abstract

Notation & conventions

1.Core Philosophy

The Model in One Line

2.Manual Creation Cost

3.AI-Assisted Creation Cost

4.Creation Savings and the Accuracy Ceiling

5.Manual Execution Cost

6.Automated Execution Cost

7.Break-Even Cycles

8.Combined Program Savings

9.Learning Curve Extension (optional)

10.Effort Modifiers: Seniority, Client Process, Tool Proficiency

11.The Complete Model: Base Plus Extensions

12.Tooling Cost and the Investment Decision

13.System Behavior

14.Calibration

15.Scope and Limitations

16.Conclusion

Rationale

Two stages

Man-hours as the unit

Linear in case count and cycles — and why not exponential

Rework as an expected value

A missed case costs full authoring time

The capped form (1−m)(1−e)(1-m)(1-e)(1−m)(1−e)

Why a power law for learning — the alternatives, formally

Anchoring learning on current skill

Why 85% for scripting and 95% for authoring and review

Effort modifiers multiply, not add

Tool proficiency scales the quality terms too

Why the client-process factor reaches about 1.8

The machine run counts as zero man-hours

Every factor defaults to 1

Symbol reference

The capped form $(1-m)(1-e)$