SpecforgeMethodology · TAMEOpen the calculator →
TAME — Test Automation Man-hours Estimator

Draft 1

TAME: A Man-Hours Efficiency Model for Test Automation

Test Automation Man-hours Estimator — with application to AI-assisted test creation and automated execution.

Abstract

This paper introduces TAME (Test Automation Man-hours Estimator), a two-stage model for quantifying the human effort saved when a testing process moves from manual work to AI-assisted test creation and automated execution.

Unlike informal “percent faster” claims, TAME measures effort in man-hours and separates two stages whose economics differ fundamentally. The framework integrates:

  • a one-time creation stage, where AI generation plus human review replaces manual authoring;
  • a recurring execution stage, where automated scripts replace manual re-runs each cycle;
  • an accuracy-bounded savings ceiling for creation;
  • a volume-independent break-even point for automation.

The model reports human effort only; the faster wall-clock turnaround of an unattended run is treated as a separate throughput benefit and excluded from the man-hours result.

The two stages also differ in kind, which is the basis of the investment case: manual execution is a raw operating expense, paid in full every cycle and leaving no asset behind, whereas automation scripting is a capital-like outlay, incurred once and amortised across every cycle it serves. Break-even is the point at which that amortised investment falls below the recurring manual expense.

Notation & conventions

Units are minutes unless noted; outputs are man-hours. Symbols are introduced where they first appear — each formula carries a small table of just the symbols it adds, with the illustrative default used in the worked examples. The full list is collected in the Symbol reference at the end.

The defaults are placeholders for explanation only. The savings the model reports depend entirely on these inputs, so substitute your own measured values for every one before drawing any conclusion (see §14, Calibration). Different inputs give different results, by design.

1.Core Philosophy

Most efficiency claims answer one question: how much faster is the new tool? TAME asks a more precise one: how many man-hours does the new process remove, and how does that change as the work repeats?

This distinction matters because creating a test case is paid once, while running it is paid every regression cycle. A model that blends the two hides where the savings actually come from. TAME is built on the principle:

Creation is paid once. Execution is paid every cycle.

The Model in One Line

Before the details, here is the entire model in a single expression. Every time TAME uses — authoring, review, scripting, execution, triage — is a special case of one master form that holds all the variables at once:

τx(i)  =  σxρxκxtxibx\tau_x(i) \;=\; \sigma_x\,\rho_x\,\kappa_x \cdot t_x \cdot i^{-b_x}

Reading it left to right: σ\sigma, ρ\rho and κ\kappa are the seniority, client-process and tool-proficiency modifiers; txt_x is the intrinsic time the task takes in the neutral case; and ibxi^{-b_x} is the learning discount that accrues with repetition. The document starts from the base — the neutral case where every modifier equals 1 and learning is switched off, so the master form is simply txt_x — and then builds each term back in: the base times in §§2–8, the learning term in §9, and the context modifiers in §10, reassembled in full in §11. Leading with the complete form is deliberate: it shows that nothing is omitted, and that the simpler model which follows is this same equation with its extension terms switched off.

2.Manual Creation Cost

Formula

Tmanual=NtauthorT_{\mathrm{manual}} = N\,t_{\mathrm{author}}
(1)
SymbolMeaningDefault / units
TmanualT_{\mathrm{manual}}Creation man-hours if every case is authored by handoutput
NNNumber of test cases needed400
tauthort_{\mathrm{author}}Manual authoring time per case20 min

Meaning

The baseline effort to author every test case by hand: the number of cases times the time to write one.

Example

Tmanual=400×20=8,000T_{\mathrm{manual}} = 400 \times 20 = 8{,}000 minutes == 133.3 man-hours.

Assumption

Authoring time per case is treated as an average. In practice it varies by complexity; sampling a representative batch is the way to fix it.

3.AI-Assisted Creation Cost

Formula

TAI=ktsetup+(1m)N(treview+ptcorrect)+mNtauthorT_{\mathrm{AI}} = k\,t_{\mathrm{setup}} + (1-m)\,N\,(t_{\mathrm{review}} + p\,t_{\mathrm{correct}}) + m\,N\,t_{\mathrm{author}}
(2)
SymbolMeaningDefault / units
TAIT_{\mathrm{AI}}Creation man-hours with AI generation plus human reviewoutput
kkScenarios prompted into the AI tool20
tsetupt_{\mathrm{setup}}One-time setup per scenario — prompting, generation, data, vetting10 min
mmAI miss rate — fraction of needed cases the AI fails to produce10%
treviewt_{\mathrm{review}}Review time per AI-generated case4 min
ppFraction of AI cases needing rework30%
tcorrectt_{\mathrm{correct}}Correction time per reworked case6 min

Meaning

AI does not remove the work; it shifts authoring to review. The tool drafts cases that a human reviews; a fraction need rework; and the cases the AI fails to produce fall back to full manual authoring.

Example

Setup ktsetup=200k\,t_{\mathrm{setup}} = 200; reviewed AI cases 0.90×400×(4+0.3×6)=2,0880.90 \times 400 \times (4 + 0.3{\times}6) = 2{,}088; missed cases 0.10×400×20=8000.10 \times 400 \times 20 = 800. Total TAI=3,088T_{\mathrm{AI}} = 3{,}088 min == 51.5 man-hours.

What tsetupt_{\mathrm{setup}} really contains

The example treats setup as prompt-and-login time alone, but standing up AI generation for a scenario is more than prompting. In full it is the sum of four one-time costs:

tsetup=tprompt+tgen+tdata+tvett_{\mathrm{setup}} = t_{\mathrm{prompt}} + t_{\mathrm{gen}} + t_{\mathrm{data}} + t_{\mathrm{vet}}

— login and prompting, AI generation plus the orchestration a human shepherds, provisioning the test data at least once, and vetting the generated cases before they are trusted. Counted in full, tsetupt_{\mathrm{setup}} can approach the manual authoring time for a scenario. That does not by itself erase the saving, because it is paid once per scenario and spread over every case that scenario generates. AI creation pays off only once enough cases amortise the setup:

ktsetupN  <  tauthor(treview+ptcorrect)\frac{k\,t_{\mathrm{setup}}}{N} \;<\; t_{\mathrm{author}} - (t_{\mathrm{review}} + p\,t_{\mathrm{correct}})

The right side is the per-case advantage of reviewing over authoring; the left is the setup spread across cases. The deeper payoff is reuse on two levels: the setup amortises over cases in creation, and the resulting automated tests amortise again over cycles in execution (§12). The heavier the setup, the more the value depends on running the tests many times rather than on creating them once.

Data provisioning needs one caution. Provisioned once, it is a setup cost as above; but when test data is consumed or deleted between runs, provisioning recurs, and that recurring share belongs in the per-cycle execution cost (§12), not the one-time setup. For stable regression where the data persists, it falls back to zero.

Why this was chosen

Review and rework are the real residual costs of AI generation. Modelling them explicitly keeps the saving honest rather than assuming the AI output is used as-is.

Alternatives considered

  • Pure generation, no review. Rejected — unreviewed AI cases are not safe to run and overstate the saving.
  • Flat “X% faster” factor. Rejected — it hides the dependence on miss rate and rework, the variables that actually move the result.

4.Creation Savings and the Accuracy Ceiling

Formula

Screation=TmanualTAITmanual=1TAITmanualS_{\mathrm{creation}} = \frac{T_{\mathrm{manual}} - T_{\mathrm{AI}}}{T_{\mathrm{manual}}} = 1 - \frac{T_{\mathrm{AI}}}{T_{\mathrm{manual}}}
(3)
e=treview+ptcorrecttauthore = \frac{t_{\mathrm{review}} + p\,t_{\mathrm{correct}}}{t_{\mathrm{author}}}
(4)
Screation(1m)(1e)S_{\mathrm{creation}} \approx (1-m)(1-e)
(5)
SymbolMeaningDefault / units
ScreationS_{\mathrm{creation}}Fraction of creation man-hours savedoutput
eeResidual effort ratio — share of manual effort an AI case still costsderived

From definition to closed form

Equation (3) is the definition of savings; (5) is what it becomes once the costs are substituted in. Writing the cost ratio and dividing each term of the numerator by NtauthorN\,t_{\mathrm{author}}:

TAITmanual=ktsetupNtauthor+(1m)treview+ptcorrecttauthor+m    (1m)e+m\frac{T_{\mathrm{AI}}}{T_{\mathrm{manual}}} = \frac{k\,t_{\mathrm{setup}}}{N\,t_{\mathrm{author}}} + (1-m)\,\frac{t_{\mathrm{review}} + p\,t_{\mathrm{correct}}}{t_{\mathrm{author}}} + m \;\approx\; (1-m)\,e + m

The middle ratio is exactly ee from (4); the first term — the one-time setup spread across all NN cases — is about 2.5% at the defaults, small enough to drop. Substituting back and factoring out (1m)(1-m) gives (5): the saving is coverage (1m)(1-m) times per-case efficiency (1e)(1-e) — the product that makes the accuracy ceiling visible.

Key property

Savings are bounded by accuracy, not speed. If the AI misses 20% of cases, the saving cannot exceed 80% no matter how fast review becomes.

Example

With e=(4+0.3×6)/20=0.29e = (4 + 0.3{\times}6)/20 = 0.29 and m=0.10m = 0.10: Screation(10.10)(10.29)S_{\mathrm{creation}} \approx (1-0.10)(1-0.29) \approx 61% — saving 81.9 of 133.3 man-hours.

Setup time and what to automate first

The setup term does not touch the execution break-even — that is purely an execution quantity. What it decides is whether the creation stage pays at all, a per-scenario question: a scenario's one-time setup must be repaid by the cases it yields. Per scenario the saving is:

Sscenario(c)=(1m)(1e)tsetupctauthorS_{\mathrm{scenario}}(c) = (1-m)(1-e) - \frac{t_{\mathrm{setup}}}{c\,t_{\mathrm{author}}}

The first term is the setup-free ceiling — about 64% at the defaults; the second is the setup spread over the cc cases that share it. Setup breaks even when the two are equal:

c=tsetup(1m)(1e)tauthorc^{\star} = \frac{t_{\mathrm{setup}}}{(1-m)(1-e)\,t_{\mathrm{author}}}

At the defaults the denominator is about 12.8 minutes, so cc^{\star} is roughly 0.8, 1.6 and 2.4 cases for a tsetupt_{\mathrm{setup}} of 10, 20 and 30 minutes. Below that the scenario loses; above it the saving climbs toward the ceiling.

Per-scenario creation saving, by cases yielded (rows) and setup time (columns)
Cases cc10 min20 min30 min
1+14%−36%−86%
2+39%+14%−11%
5+54%+44%+34%
10+59%+54%+49%
20+61%+59%+56%
50+63%+62%+61%
Creation saving per scenario versus cases yielded, for three setup times, climbing toward a 64 percent ceiling above a shaded loss zone.
Figure 1. Creation saving per scenario against the number of cases a scenario yields, for three setup times. A heavy setup (30 min) sits in the loss zone until the scenario yields about two to three cases; a light one (10 min) turns positive almost at once. All three climb toward the same 64% setup-free ceiling.
Automate the low-hanging scenarios first: low setup, many cases. They pay back immediately and fund the harder ones later.

This is why the order of automation matters as much as the decision to automate. Front-load the scenarios that yield early gains; defer the complex, low-yield ones until the suite — and the team's setup speed — has matured (§9). Chasing the hardest scenarios first is the most common way a promising automation effort posts a loss in its opening project.

5.Manual Execution Cost

Formula

Hmanual=RNtexecH_{\mathrm{manual}} = R\,N\,t_{\mathrm{exec}}
(6)
SymbolMeaningDefault / units
HmanualH_{\mathrm{manual}}Execution man-hours if every case is run by hand each cycleoutput
RRExecution cycles (regression runs)12
texect_{\mathrm{exec}}Manual execution time per case, per cycle12 min

Meaning

The recurring baseline: a human re-runs every case, every regression cycle. This cost is paid RR times, which is what makes execution the dominant term over a project.

Example

Hmanual=12×400×12=57,600H_{\mathrm{manual}} = 12 \times 400 \times 12 = 57{,}600 minutes == 960 man-hours.

Assumption

Every case is executed every cycle. Suites that run only a subset per cycle can scale RR per case, or use an effective average.

6.Automated Execution Cost

Formula

Hauto=aNtscript+R[aN(ttriage+μtmaint)+(1a)Ntexec]H_{\mathrm{auto}} = a\,N\,t_{\mathrm{script}} + R\big[\,a\,N\,(t_{\mathrm{triage}} + \mu\,t_{\mathrm{maint}}) + (1-a)\,N\,t_{\mathrm{exec}}\,\big]
(7)
SymbolMeaningDefault / units
HautoH_{\mathrm{auto}}Execution man-hours with automationoutput
aaAutomation coverage — share of cases scripted70%
tscriptt_{\mathrm{script}}Scripting time per automated case (one-time)60 min
ttriaget_{\mathrm{triage}}Triage time per automated case, per cycle1 min
μ\muScripts needing maintenance per cycle8%
tmaintt_{\mathrm{maint}}Maintenance time per affected script20 min

Meaning

Automation front-loads a large scripting cost, then runs for almost nothing each cycle. The unattended machine run itself costs zero man-hours; the only recurring human costs are triaging results, repairing broken scripts, and executing the cases that were never automated.

Example

Upfront scripting 0.70×400×60=16,8000.70 \times 400 \times 60 = 16{,}800; per cycle: automated upkeep 280×2.6=728280 \times 2.6 = 728 plus manual remainder 0.30×400×12=1,4400.30 \times 400 \times 12 = 1{,}440, so 2,1682{,}168 per cycle. Hauto=16,800+12×2,168=42,816H_{\mathrm{auto}} = 16{,}800 + 12 \times 2{,}168 = 42{,}816 min == 713.6 man-hours (of which 280 h is one-time scripting).

Why this was chosen

Separating one-time scripting from per-cycle upkeep is what reveals the payback dynamic. Folding them together would hide why automation loses at low cycle counts and wins at high ones.

Alternatives considered

  • Zero per-cycle cost. Rejected — ignores triage and maintenance, the costs that quietly erode automation ROI.
  • Maintenance as a one-time cost. Rejected — scripts break as the application changes, so upkeep recurs each cycle.

7.Break-Even Cycles

Formula

B=tscripttexecttriageμtmaintB = \frac{t_{\mathrm{script}}}{t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}}}
(8)
SymbolMeaningDefault / units
BBBreak-even cycles — cycles for one script to repay its scripting costderived

Meaning

The number of cycles needed for one automated script to repay its scripting cost. The manual cases and the coverage fraction appear identically on both sides of the comparison and cancel out. Break-even is fundamentally an amortisation question: how many cycles must share the one-time cost before the amortised figure drops below the recurring manual expense.

Key property

Break-even is independent of test volume and automation coverage.

Example

B=60/(1211.6)=B = 60 / (12 - 1 - 1.6) = 6.4 cycles. Past this point the suite has repaid its scripting cost and every further cycle widens the gap against manual.

Failure case

If texecttriageμtmaint0t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}} \le 0, automated upkeep costs as much as a manual run and automation never pays back. The model flags this explicitly.

8.Combined Program Savings

Formula

Hold=Ntauthor+RNtexecH_{\mathrm{old}} = N\,t_{\mathrm{author}} + R\,N\,t_{\mathrm{exec}}
(9)
Hnew=TAI+HautoH_{\mathrm{new}} = T_{\mathrm{AI}} + H_{\mathrm{auto}}
(10)
Sprogram=HoldHnewHoldS_{\mathrm{program}} = \frac{H_{\mathrm{old}} - H_{\mathrm{new}}}{H_{\mathrm{old}}}
(11)
SymbolMeaningDefault / units
HoldH_{\mathrm{old}}Total man-hours if everything stays manualoutput
HnewH_{\mathrm{new}}Total man-hours with AI plus automationoutput
SprogramS_{\mathrm{program}}Fraction of total man-hours saved over the programoutput

Worked example

StageManual (h)New (h)Saved (h)Saved
Creation (one-time)133.351.581.961%
Execution (12 cycles)960.0713.6246.426%
Combined program1,093.3765.1328.330%

Execution shows only 26% at 12 cycles because upfront scripting and the manual remainder hold it down. Raise the cycle count or the coverage and the figure climbs steeply — the single largest lever in the model is RR.

Combined program saving across a range of cycle counts
Cycles RRManual (h)New (h)Combined saved
1213368−72%
4453476−5%
6613548+11%
121,09376530%
242,0531,19942%
483,9732,06648%
1008,1333,94552%

The rows are not multiples of the first because each total splits into a one-time cost plus a per-cycle cost: Manual=133.3+80R\text{Manual} = 133.3 + 80R and New=331.5+36.1R\text{New} = 331.5 + 36.1R hours. Only the per-cycle term scales. The one-time creation (133.3 h manual; 51.5 h of AI creation plus 280 h of upfront scripting for the new process) is paid once, not once per cycle.

Combined program saving rising with cycles, crossing zero near 4.5 cycles and flattening toward 55 percent.
Figure 2. Combined program saving against execution cycles. The curve is negative below about four to five cycles — automating the whole program costs more than staying manual until then — then climbs steeply and flattens toward a ceiling near 55%. This combined break-even (≈ 4.5 cycles) sits below the execution break-even B=6.4B = 6.4 of (8), because the one-time creation saving gives the program a head start.

The steep early rise is the point: most of the value is won in the first ten to twenty cycles, which is why RR — not coverage or per-case efficiency — is the lever that moves the result most.

9.Learning Curve Extension (optional)

The base model holds every per-unit time constant. In reality, constructive work speeds up with repetition — most sharply for automation scripting, where the first scripts build reusable scaffolding that later scripts inherit. TAME models this with a power-law learning curve that switches off cleanly, reducing exactly to the base model.

Formula

ti=t1ib,b=log2Lt_i = t_1\,i^{-b}, \qquad b = -\log_2 L
(12)
Tlearn(Q)tn0b(n0+Q)1bn01b1bT_{\mathrm{learn}}(Q) \approx t\,n_0^{\,b}\,\frac{(n_0+Q)^{1-b} - n_0^{\,1-b}}{1-b}
(13)
SymbolMeaningDefault / units
LLLearning rate — time multiplier per doubling85% / 95%
bbLearning exponent, b=log2Lb=-\log_2 Lderived
tit_iTime to perform the ii-th repetition of a learnable activityderived
QQNumber of new repetitions in this projectinput
n0n_0Experience offset — repetitions already completedprior
ttCurrent per-unit time at the team's present skillinput

Meaning

Each repetition of a learnable activity is a little faster than the last, following a constant percentage improvement per doubling of cumulative experience. The offset n0n_0 says how far up that curve the team already sits — so the entered rate keeps its natural meaning: what the task costs us now.

Parameters

  • LL — learning rate: the time multiplier per doubling. Default 85% for scripting, 95% for authoring and review.
  • n0n_0 — experience offset: small = cold start (new framework); large = seasoned team (≈ flat).
  • L=100% (b=0)L = 100\%\ (b=0) — reduces to Tlearn=t×QT_{\mathrm{learn}} = t \times Q: the base model exactly. Learning is off by default.

Where it applies

  • Scripting — primary. Strongest and best-documented learning effect.
  • Authoring and review — optional, mild (high LL), applied to both the manual baseline and the AI path so savings are not inflated.
  • Execution, triage, maintenance — flat. Mechanical and bounded; learning is negligible.

Example

Scripting at t=60t = 60 min, L=85%L = 85\% (so b=0.234b = 0.234), n0=40n_0 = 40, over Q=280Q = 280 scripts gives Tlearn12,260T_{\mathrm{learn}} \approx 12{,}260 min \approx 204 man-hours, against 280 h at a flat rate. The combined program saving rises from 30% to ≈ 37%.

Two power-law learning curves: 85 percent scripting falls to 27 percent, 95 percent authoring only to 66 percent, over 256 units.
Figure 3. Power-law learning. Scripting (85%) compounds through reuse, falling to about 27% of first-unit time over eight doublings of cumulative experience; authoring and review (95%) have little reusable artifact, so per-unit time bends only to about 66%. The gap — scripting steeper than authoring — is the substantive claim, not the exact percentages.

Why a power law was chosen

Repeated constructive work has followed a power law since Wright (1936): a straight line on log–log axes, a constant percentage improvement per doubling of experience. This matches how a scripting framework actually matures — early effort builds shared structure, later effort reuses it.

Learning rewards repetition. Scripting repeats the most, so it learns the most.

10.Effort Modifiers: Seniority, Client Process, Tool Proficiency

The base model treats each per-unit time as a fixed average. Three real-world factors shift those times systematically: who does the work (seniority), where it is done (the client's process), and how well the team knows the tools. TAME folds them in as dimensionless multipliers on the base times, each defaulting to 1 so the base model is the neutral case.

Formula

txeff=σxρxκxtxt_x^{\mathrm{eff}} = \sigma_x\,\rho_x\,\kappa_x\,t_x
(14)
SymbolMeaningDefault / units
σ\sigmaSeniority factor — skill of who performs the activity (< 1 senior, > 1 junior)0.7–1.4
ρ\rhoClient-process factor — governance overhead per unit of work1.0–1.8
κ\kappaTool-proficiency factor (AI and automation)0.7–1.5
txefft_x^{\mathrm{eff}}Effective time for activity xx after modifiersderived

Every per-unit time tt in §§2–8 is read as its effective value tefft^{\mathrm{eff}}. The factors apply per activity; an activity a factor does not touch simply takes 1.

Where each factor applies
Activityσ\sigma seniorityρ\rho processκ\kappa tools
Manual authoring
AI reviewκAI\kappa_{\mathrm{AI}}
ReworkκAI\kappa_{\mathrm{AI}}
Prompt / setupκAI\kappa_{\mathrm{AI}}
Manual execution✓✓
Scriptingκauto\kappa_{\mathrm{auto}}
Triageκauto\kappa_{\mathrm{auto}}
Maintenanceκauto\kappa_{\mathrm{auto}}
Automated run

✓✓ strong · ✓ applies · ◐ minor · — none

Seniority (σ\sigma)

The skill level of whoever performs the activity. σ<1\sigma < 1 is faster (senior), σ>1\sigma > 1 is slower (junior). The effect is strongest on judgement-heavy work — review, rework, scripting — and weakest on mechanical execution. Typical range 0.7–1.4. σ\sigma scales man-hours only; to convert to cost, weight each activity's effective hours by the loaded rate of the role that performs it.

Client process (ρ\rho)

The client's governance overhead per unit of work: environment access, approvals, evidence capture, traceability, sign-offs. ρ=1\rho = 1 is a lightweight or agile client; ρ\rho rises toward ~1.8 in heavily regulated programs. The asymmetry is the important part: a human pays ρ\rho on every manual run, while an automated run emits its logs and audit trail automatically, so ρ\rho barely touches it.

Governance is paid on every manual run, but essentially once by automation.

So high process maturity multiplies the automation advantage — central for regulated programs such as SAP finance, where every manual execution carries mandatory documentation. Concretely, ρ\rho rises with the compliance regime; common ones include:

  • SOX §404 — IT general controls on financial-reporting systems such as SAP FICO; every change is documented, tested, and signed off.
  • FDA 21 CFR Part 11 with GAMP 5 — computerized-system validation in life sciences (IQ/OQ/PQ protocols, full traceability), the heaviest common regime.
  • PCI-DSS — controlled testing and evidence for payment-card systems.
  • GDPR / HIPAA — data handling, masking, and privacy controls in test environments.
  • SOC 1 / SOC 2 and SR 11-7 — audit and model-risk controls in banking and financial services.
  • IEC 62304 / DO-178C / ISO 26262 — verification rigour in medical, avionics, and automotive software.

Tool proficiency (κ\kappa)

How well the team knows the specific tools — the AI generator (κAI\kappa_{\mathrm{AI}}) and the automation framework (κauto\kappa_{\mathrm{auto}}). It works through two channels. Time: κ\kappa multiplies tool-mediated times. Quality: a fluent team prompts better and writes sturdier scripts, lowering the rework, miss, and maintenance fractions:

peff=κAIp,meff=κAIm,μeff=κautoμp_{\mathrm{eff}} = \kappa_{\mathrm{AI}}\,p, \qquad m_{\mathrm{eff}} = \kappa_{\mathrm{AI}}\,m, \qquad \mu_{\mathrm{eff}} = \kappa_{\mathrm{auto}}\,\mu
(15)

κ\kappa is today's static proficiency; the learning curve (§9) is its trajectory. For scripting, set κauto\kappa_{\mathrm{auto}} for the current rate and n0n_0 for how fast it improves — do not count the same gain in both.

Why multiplicative

The factors compound rather than add: a senior expert at a low-governance client is fast on every count, and the effects stack proportionally to task size. Multiplicative factors keep each cause separable and auditable, and collapse to the base model when set to 1.

Example

At a heavily governed client, ρ1.6\rho \approx 1.6 on manual execution lifts it to 12×400×12×1.612 \times 400 \times 12 \times 1.6 \approx 1,536 man-hours, up from 960. Automated upkeep is almost unchanged, so the program saving rises from 30% to roughly 44% and break-even arrives sooner.

11.The Complete Model: Base Plus Extensions

Having developed each term on its own — the base times (§§2–8), the learning curve (§9), and the effort modifiers (§10) — this section reassembles them into the complete form previewed in §1. The base is the starting point, and the learning and context terms are extensions switched on top of it.

The master time

Every time that appears anywhere in TAME is, in full generality, one expression combining the intrinsic cost, the context modifiers of §10, and the learning term of §9:

τx(i)=σxρxκxtxibx\tau_x(i) = \sigma_x\,\rho_x\,\kappa_x \cdot t_x \cdot i^{-b_x}
(16)
SymbolMeaningDefault / units
τx(i)\tau_x(i)Effective time for the ii-th unit of activity xxderived
txt_xIntrinsic time for activity xx — the neutral, first-unit costbase
bxb_xLearning exponent for activity xx, bx=log2Lxb_x = -\log_2 L_xderived

Three independent influences: intrinsic cost (the neutral first-unit time), context (who does it, where, how skilled — each 1 in the neutral case), and experience (the learning discount; bx=0b_x = 0 for activities that do not learn).

From per-unit time to the totals

The master time is per repetition. The stage totals need the average per-unit time across a whole batch — the modifiers times the learning-integrated base:

τˉx=1niτx(i)=σxρxκx1nitxibx\bar\tau_x = \frac{1}{n}\sum_i \tau_x(i) = \sigma_x\,\rho_x\,\kappa_x \cdot \frac{1}{n}\sum_i t_x\,i^{-b_x}
(17)

The inner sum is the cumulative learning form of (13); with learning off it equals the count times the intrinsic time, so the batch-effective time reduces to the modifiers times txt_x. This is what enters the totals.

The master totals

Tcreate=kτsetup+(1m)N(τreview+pτcorrect)+mNτauthorT_{\mathrm{create}} = k\,\tau_{\mathrm{setup}} + (1-m)\,N\,(\tau_{\mathrm{review}} + p\,\tau_{\mathrm{correct}}) + m\,N\,\tau_{\mathrm{author}}
(18)
Hexec=aNτscript+R[aN(τtriage+μτmaint)+(1a)Nτexec]H_{\mathrm{exec}} = a\,N\,\tau_{\mathrm{script}} + R\big[\,a\,N\,(\tau_{\mathrm{triage}} + \mu\,\tau_{\mathrm{maint}}) + (1-a)\,N\,\tau_{\mathrm{exec}}\,\big]
(19)

These are (2) and (7) with every constant time replaced by its batch-effective value. The quality terms travel the same way — pp, mm, μ\mu carry the tool-proficiency channel of (15).

Consistency: the neutral case recovers the base

Setting σx=ρx=κx=1\sigma_x = \rho_x = \kappa_x = 1 and bx=0b_x = 0, the master time collapses to the intrinsic time:

τx(i)=(1)(1)(1)txi0=tx\tau_x(i) = (1)(1)(1) \cdot t_x \cdot i^{0} = t_x
(20)

and the complete totals (18) and (19) return exactly to the constant-rate equations (2) and (7). The base is the complete model with its extensions switched off — precisely as the one-line preview promised.

One formula, different inputs → different savings
SettingInputs changed from neutralSaving
Neutral baselineall σ,ρ,κ=1\sigma,\rho,\kappa = 1; learning off30%
+ scripting learningL=85%L = 85\%, n0=40n_0 = 40≈ 37%
+ heavy governanceρ1.6\rho \approx 1.6 on manual execution≈ 44%
+ senior, tool-fluent teamσ0.85\sigma \approx 0.85, κ0.8\kappa \approx 0.8higher still
The base model is one configuration. Different inputs give different savings.

Every row uses the identical formula; only the inputs differ. The intended use is for a reader to set the variables to their own situation and read off their own number. The model's job is not to assert a single percentage — it is to be honest about which variables move the result and to let each reader arrive at theirs.

Worked example — a governed SAP FICO program

The modifiers are not academic; they often decide the verdict. Take the same 400-case suite inside a SOX-controlled SAP FICO program. Under §404 IT general controls — and, in life-sciences finance, GAMP 5 validation — every manual test execution carries mandatory documentation: pre-approval, step-by-step evidence, requirement traceability, four-eyes review, and archival. That overhead lands on each manual run, so ρ1.6\rho \approx 1.6 on manual execution, while the automated run emits the same evidence as a by-product and barely feels it (§10). Carry that one change through the whole model:

The same suite, neutral vs governed — cost layer on (w = $80/h, C_fix = $20,000, C_cyc = $500), R = 12 cycles
QuantityNeutralGoverned SAP+ Senior, fluent
Context modifiersall 1ρ=1.6\rho = 1.6ρ=1.6, σ=0.85, κ=0.8\rho{=}1.6,\ \sigma{=}0.85,\ \kappa{=}0.8
Creation saving61%61%70%
Manual execution960 h1,536 h1,536 h
Effort break-even BB6.4 cyc3.6 cyc2.3 cyc
Combined saving30%44%53%
Man-hours saved (12 cycles)328 h732 h877 h
Cost-aware break-even BcostB_{\mathrm{cost}}14.1 cyc7.4 cyc5.8 cyc
Net value V(12)V(12)−$6,288+$25,968+$37,838

The governance that punishes every manual run is exactly what makes automation win. In the neutral case automation barely breaks even in money — at the planned 12 cycles it loses about $6.3k. The same suite in the governed program flips to +$26k, because the manual baseline it now replaces nearly doubled (960 → 1,536 h) and the cost-aware break-even falls from 14 cycles to 7. Layer on a senior, tool-fluent team (σ0.85\sigma \approx 0.85, κ0.8\kappa \approx 0.8) and creation saving climbs to 70%, combined to 53%, and net value to +$38k — note that σ\sigma and κ\kappa sharpen creation and the automated upkeep but leave the manual baseline untouched, so they widen the gap rather than create it.

High process maturity multiplies the automation advantage. The heavier the governance on each manual run, the stronger the case to automate it away.

Reproduce it live: open the calculator, switch on the cost layer, and set the context modifiers to ρ=1.6\rho = 1.6 (governance on manual exec), σ=0.85\sigma = 0.85, and both κ=0.8\kappa = 0.8 (AI and automation).

12.Tooling Cost and the Investment Decision

Man-hours are the headline, but the go/no-go for automation also turns on money — the loaded cost of the hours saved, and the tool's own licence and infrastructure cost. This section adds the cost layer and turns the effort break-even of §7 into a money break-even management can act on.

From man-hours to money

Cost=wH+Cfix+CcycR\mathrm{Cost} = w\,H + C_{\mathrm{fix}} + C_{\mathrm{cyc}}\,R
(21)
SymbolMeaningDefault / units
wwBlended loaded labour rate (cost per man-hour)$/h
HHMan-hours from the model (manual or new process)output
CfixC_{\mathrm{fix}}One-time tooling cost (licence, onboarding, setup)$
CcycC_{\mathrm{cyc}}Per-cycle tooling cost (amortised licence, compute)$/cycle

One-time CfixC_{\mathrm{fix}}: platform licence onboarding, implementation and CI integration, initial environment build, one-off training. Per-cycle CcycC_{\mathrm{cyc}}: subscription and per-seat licences amortised to a cycle, runner/compute or cloud-grid minutes, and any per-execution vendor charge.

Cost-aware break-even

Bcost=waNtscript+CfixwaN(texecttriageμtmaint)CcycB_{\mathrm{cost}} = \frac{w\,a\,N\,t_{\mathrm{script}} + C_{\mathrm{fix}}}{w\,a\,N\,(t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}}) - C_{\mathrm{cyc}}}
(22)

The numerator is the total upfront automation investment; the denominator is the per-cycle cost saving net of tooling. With both tooling terms zero, dividing through recovers the effort break-even (8) exactly.

Licence cost raises the break-even two ways: a fixed cost lifts the numerator; a per-cycle cost shrinks the denominator.

The viability test and net value

V(R)=[waN(texecttriageμtmaint)Ccyc]R(waNtscript+Cfix)V(R) = \big[\,w\,a\,N\,(t_{\mathrm{exec}} - t_{\mathrm{triage}} - \mu\,t_{\mathrm{maint}}) - C_{\mathrm{cyc}}\,\big]\,R - \big(w\,a\,N\,t_{\mathrm{script}} + C_{\mathrm{fix}}\big)
(23)
SymbolMeaningDefault / units
BcostB_{\mathrm{cost}}Cost-aware break-even — cycles to repay the upfront investmentderived
V(R)V(R)Net value at R cycles (positive ⇒ worth automating)derived

VV is positive precisely when the planned cycles exceed the cost-aware break-even. First the per-cycle saving must be positive; second the program must run more cycles than BcostB_{\mathrm{cost}}.

Worked example

With w=$80/hw = \$80/\text{h}, Cfix=$20,000C_{\mathrm{fix}} = \$20{,}000, Ccyc=$500C_{\mathrm{cyc}} = \$500: upfront investment $42,400\$42{,}400; per-cycle saving net of tool $3,009\$3{,}009; Bcost=42,400/3,009=B_{\mathrm{cost}} = 42{,}400 / 3{,}009 = 14.1 cycles. The effort break-even was 6.4 — licence and compute more than double it. At a planned 12 cycles, automation would lose roughly V$6,300V \approx -\$6{,}300; it becomes worth it only beyond ~14 cycles.

Amortisation: how much to invest

Manual execution is a raw operating expense — the same cost every cycle, capitalising into nothing. Automation scripting is a capital-like outlay — spent once, then spread across every cycle it serves.

Amanual=wNtexecA_{\mathrm{manual}} = w\,N\,t_{\mathrm{exec}}
(24)
Aauto(R)=waNtscript+CfixR+gA_{\mathrm{auto}}(R) = \frac{w\,a\,N\,t_{\mathrm{script}} + C_{\mathrm{fix}}}{R} + g
(25)
g=w[aN(ttriage+μtmaint)+(1a)Ntexec]+Ccycg = w\big[\,a\,N\,(t_{\mathrm{triage}} + \mu\,t_{\mathrm{maint}}) + (1-a)\,N\,t_{\mathrm{exec}}\,\big] + C_{\mathrm{cyc}}

The automation cost per cycle is a hyperbola in RR: the upfront term shrinks as more cycles share it, so the cost per cycle falls toward the running-cost floor gg, while the manual line stays flat. The two cross exactly at the cost-aware break-even:

Aauto(R)<AmanualR>BcostA_{\mathrm{auto}}(R) < A_{\mathrm{manual}} \quad\Longleftrightarrow\quad R > B_{\mathrm{cost}}
(26)
Automation cost per cycle falling as a hyperbola toward a floor and crossing the flat manual line near 14 cycles.
Figure 4. Cost per cycle: a flat manual expense against the amortised automation hyperbola, falling toward the running-cost floor gg and crossing the manual line at the cost-aware break-even, ≈ 14.1 cycles in the worked example. Left of the crossing, automation is the more expensive choice.
Manual execution is expensed every cycle; automation is a one-time asset amortised across them.

How many cycles does a project actually run?

R=Rimpl+Rreg,Rreg=fWR = R_{\mathrm{impl}} + R_{\mathrm{reg}}, \qquad R_{\mathrm{reg}} = f\cdot W
(27)
SymbolMeaningDefault / units
RimplR_{\mathrm{impl}}Implementation cycles — the SIT and UAT passes during the project≈ 3–4
RregR_{\mathrm{reg}}Ongoing regression cycles after go-liveinput
f,Wf,\,WRegression runs per period; periods the suite is maintainedinput

A typical implementation runs only a handful of passes — two system-integration cycles (SIT1, SIT2), a user-acceptance cycle (UAT), and often a pre-go-live dry run — so RimplR_{\mathrm{impl}} is about three to four. Everything beyond is regression. Since the cost-aware break-even was ~14 cycles but implementation supplies only three or four, the investment is not recovered by the project that builds the suite — it is recovered, if at all, by the regression that follows. Automation pays back only when:

Rimpl+RregBcostRregBcostRimplR_{\mathrm{impl}} + R_{\mathrm{reg}} \ge B_{\mathrm{cost}} \quad\Longleftrightarrow\quad R_{\mathrm{reg}} \ge B_{\mathrm{cost}} - R_{\mathrm{impl}}
(28)
Automation is rarely repaid by the project that builds it — it is repaid by the regression cycles that follow.

Guiding the decision

  • Estimate the loaded rate ww for the people who run and maintain the suite.
  • Get the tool economics from the vendor quote: split into one-time CfixC_{\mathrm{fix}} and recurring CcycC_{\mathrm{cyc}}.
  • Compute the per-cycle saving net of tooling. If zero or negative, stop — no number of cycles redeems it.
  • Otherwise compute BcostB_{\mathrm{cost}} and compare to the cycles you genuinely expect — implementation passes plus realistic regression, not the horizon you hope for.
  • Invest only if expected cycles comfortably exceed the break-even, leaving margin for maintenance spikes and coverage that falls short of plan.

13.System Behavior

Few cycles (RR below BB). Scripting cost not yet repaid; automation shows a net loss; stay manual.

Many cycles (RR well above BB). Upfront scripting amortizes away; each cycle is near-free; percent saved approaches the coverage limit.

Low coverage (small aa). Most cases still run manually each cycle; saving is capped well below 100% regardless of cycles.

14.Calibration

The defaults are industry-typical priors, not measured truth. Before reporting, sample real cases to fix the parameters that matter most:

  • tauthort_{\mathrm{author}} and treviewt_{\mathrm{review}} — time a small batch by hand and by review to set the creation ratio.
  • mm — the AI miss rate is the hard ceiling on creation savings and the figure most often forgotten.
  • tscriptt_{\mathrm{script}} and μ\mu — scripting effort and flaky-test maintenance are where optimistic automation cases break down.
  • aa and RR — be honest about how many cases can truly be automated and how often the suite runs.
  • LL and n0n_0 — if learning is enabled, fit them from a script-time log: plot per-script time against cumulative count on log–log axes; the slope gives bb (hence LL), and how far in you already are gives n0n_0.
  • σ,ρ,κ\sigma,\rho,\kappa — set seniority from the staffing mix, the client-process factor from the documentation each run carries, and tool proficiency from the team's familiarity with the tools.

15.Scope and Limitations

  • Man-hours only. Faster wall-clock turnaround from unattended runs is a separate throughput benefit, intentionally excluded.
  • Linear per-cycle costs. Maintenance is modelled as a steady fraction; it can spike around major releases.
  • Learning is optional and parametric. When enabled it assumes a single stable learning rate per activity; abrupt tooling changes or heavy turnover are only approximated.
  • Quality effects excluded. Earlier defect detection and broader coverage have real value but sit outside this effort-based model — treat them as unquantified upside.
  • Inputs are estimates. The output is only as good as the calibration in §14; present it as a defensible estimate with stated assumptions, not a measurement.

16.Conclusion

TAME shifts the question from an informal speed claim to a measured one: how many man-hours does the new process remove, stage by stage, as the work repeats? The model combines a one-time, accuracy-bounded creation saving; a recurring execution saving that compounds with cycles; and a volume-independent break-even point. It rests on a simple philosophy:

Manual testing pays by the cycle. Automation pays once, then runs free.

Rationale

This appendix consolidates the reasoning behind every modelling choice. Each entry states the decision, the reasoning, and the main alternative rejected.

Two stages

Separate creation from execution: their cost structures are categorically different — creation incurred once, execution every cycle — so a single blended figure hides where the savings come from. Rejected: a one-number “X% faster.”

Man-hours as the unit

Measure human effort, not calendar time or money. The mandate is effort saved; wall-clock turnaround would inflate the figure if mixed in, and money depends on rates that vary by organisation. Man-hours convert cleanly to cost later by weighting with loaded rates (§12).

Linear in case count and cycles — and why not exponential

Authoring or running one case does not change the cost of the next: no shared computation, no feedback loop, so no compounding mechanism justifies exponential growth. An exponential form here would be curve-fitting without a cause. Linearity is also what makes break-even independent of volume and coverage — a consequence, not a coincidence. The only legitimate departure is the mild sub-linear discount from learning, handled separately (§9).

Rework as an expected value

Rework enters as the expected cost per case. TAME predicts an aggregate over many cases, and by the law of large numbers the expected value is an accurate predictor of a sum — the same logic an insurer uses pricing on expected loss.

A missed case costs full authoring time

The conservative choice — a reviewer with context might author a missed case slightly faster, so assuming full cost cannot overstate the saving. Adjust downward only with evidence.

The capped form (1m)(1e)(1-m)(1-e)

Express the creation saving as two bounded factors — coverage and per-case efficiency. This makes explicit that accuracy, not speed, sets the ceiling — the most scrutinised claim in any AI-efficiency pitch. A raw speed-up hides the miss-rate ceiling and was rejected.

Why a power law for learning — the alternatives, formally

A learning law must stay positive, reproduce constant-improvement-per-doubling, and be parsimonious. One diagnostic separates the candidates — the ratio of times one doubling apart. For a power law it is constant:

t(2i)t(i)=(2i)bib=2b=L(constant)\frac{t(2i)}{t(i)} = \frac{(2i)^{-b}}{i^{-b}} = 2^{-b} = L \quad(\text{constant})

a straight line on log–log axes (Wright, 1936). Exponential to a floor measures improvement against the floor, so the ratio decays with experience and the curve flattens too early — understating long-run gains; kept only as the documented variant when a hard floor exists. Logarithmic is unbounded below — it crosses zero and turns negative, physically impossible; rejected outright. Logistic S-curve needs two hard-to-identify parameters and models slow early improvement, the opposite of the steep initial drop real learning shows; the offset n0n_0 already absorbs any head start. Only the power law stays positive, keeps the per-doubling ratio constant, and does so with one parameter.

Anchoring learning on current skill

Anchor on the team's present rate through an experience offset, not a first-unit time. A “first script ever” time is nearly impossible to estimate and produces absurd deflation across hundreds of units; anchoring on what a script costs now keeps the entered rate meaningful and lets the offset carry how far up the curve the team sits.

Why 85% for scripting and 95% for authoring and review

Which rate fits an activity is governed by one question: how much does performing it once create reusable leverage for the next time? Scripting is constructive with high leverage — the first scripts build the framework (page objects, locator helpers, fixtures, CI wiring) that every later script inherits — exactly the mechanism behind the steep 80–85% curves long reported for construction and tooling. Authoring is cognitive with low leverage — each case targets different functionality, with a floor set by human comprehension speed that repetition cannot compress — pointing to a shallow curve near 95%. The gap, not the exact figures, is the substantive claim: over eight doublings an 85% curve cuts per-unit time to ~27%, a 95% curve only to ~66%. Both are priors to fit from a time log before production.

Effort modifiers multiply, not add

Each is a proportional effect on the same unit of work, and proportional effects compound — a senior 15% faster at a client 60% heavier lands at 0.85×1.60.85 \times 1.6, not 0.85+0.60.85 + 0.6. Multiplication keeps each cause separable and auditable and reduces to the base model at 1. Additive overheads fit only genuinely fixed steps, kept as a variant.

Tool proficiency scales the quality terms too

Fluency with a tool is not only faster but better — a skilled prompter elicits fewer misses and less rework, a skilled engineer writes sturdier scripts. Limiting κ\kappa to time would miss its largest effect. It is tied to the learning offset so the same improvement is not counted twice.

Why the client-process factor reaches about 1.8

ρ=1.8\rho = 1.8 means roughly 0.8 hour of documentation, review and sign-off for every hour of testing — high-risk validated environments (GAMP Category 5, SOX-controlled finance) where every execution is pre-approved, evidenced step by step, traced to a requirement, four-eyes reviewed and archived. Reported compliance overhead runs ~+30% for light regimes to +80–100% for the strictest. The asymmetry is what makes the factor matter: automation produces most of this evidence as a by-product of running, so a higher ρ\rho widens the automation advantage rather than narrowing it.

The machine run counts as zero man-hours

The model measures people's time, and a script on a server consumes none. The recurring human costs that remain — triage and maintenance — are modelled explicitly. If runs are supervised in practice, that time belongs in the triage term, not at zero.

Every factor defaults to 1

Every modifier, and the learning switch, defaults to the neutral value, so the base model is recoverable exactly and added realism is opt-in and auditable.


Symbol reference

Every symbol in one place. Defaults are the illustrative values used in the worked examples — placeholders, not recommendations (see §14, Calibration).

SymbolMeaningDefault / units
Creation inputs
NNNumber of test cases needed400
kkScenarios prompted into the AI tool20
tauthort_{\mathrm{author}}Manual authoring time per case20 min
tsetupt_{\mathrm{setup}}Login and prompt time per scenario10 min
treviewt_{\mathrm{review}}Review time per AI-generated case4 min
ppFraction of AI cases needing rework30%
tcorrectt_{\mathrm{correct}}Correction time per reworked case6 min
mmAI miss rate — cases authored manually10%
Execution inputs
aaAutomation coverage — share of cases scripted70%
tscriptt_{\mathrm{script}}Scripting time per automated case (one-time)60 min
texect_{\mathrm{exec}}Manual execution time per case, per cycle12 min
ttriaget_{\mathrm{triage}}Triage time per automated case, per cycle1 min
μ\muScripts needing maintenance per cycle8%
tmaintt_{\mathrm{maint}}Maintenance time per affected script20 min
RRExecution cycles (regression runs)12
Learning curve (extension)
LLLearning rate — time multiplier per doubling85% / 95%
bbLearning exponent, b=log2Lb=-\log_2 Lderived
n0n_0Experience offset — units already completedprior
Effort modifiers (extension)
σ\sigmaSeniority factor — skill of who performs the activity0.7–1.4
ρ\rhoClient-process factor — governance overhead1.0–1.8
κ\kappaTool-proficiency factor (AI and automation)0.7–1.5
Cost layer (optional)
wwBlended loaded labour rate (cost per man-hour)$/h
CfixC_{\mathrm{fix}}One-time tooling cost (licence, onboarding, setup)$
CcycC_{\mathrm{cyc}}Per-cycle tooling cost (amortised licence, compute)$/cycle
Derived quantities and outputs
eeResidual effort ratio per AI casederived
B, BcostB,\ B_{\mathrm{cost}}Break-even cycles (effort; cost-aware)derived
Tmanual,TAIT_{\mathrm{manual}},\,T_{\mathrm{AI}}Creation effort — manual baseline; AI plus revieweff-hrs
Hmanual,HautoH_{\mathrm{manual}},\,H_{\mathrm{auto}}Execution effort — manual baseline; automatedeff-hrs
Screation,SprogramS_{\mathrm{creation}},\,S_{\mathrm{program}}Fraction of effort saved (creation; whole program)output

Prepared as a companion to the TAME calculator. Figures recompute live in the model.