Draft 1
TAME: A Man-Hours Efficiency Model for Test Automation
Test Automation Man-hours Estimator — with application to AI-assisted test creation and automated execution.
Abstract
This paper introduces TAME (Test Automation Man-hours Estimator), a two-stage model for quantifying the human effort saved when a testing process moves from manual work to AI-assisted test creation and automated execution.
Unlike informal “percent faster” claims, TAME measures effort in man-hours and separates two stages whose economics differ fundamentally. The framework integrates:
- a one-time creation stage, where AI generation plus human review replaces manual authoring;
- a recurring execution stage, where automated scripts replace manual re-runs each cycle;
- an accuracy-bounded savings ceiling for creation;
- a volume-independent break-even point for automation.
The model reports human effort only; the faster wall-clock turnaround of an unattended run is treated as a separate throughput benefit and excluded from the man-hours result.
The two stages also differ in kind, which is the basis of the investment case: manual execution is a raw operating expense, paid in full every cycle and leaving no asset behind, whereas automation scripting is a capital-like outlay, incurred once and amortised across every cycle it serves. Break-even is the point at which that amortised investment falls below the recurring manual expense.
Notation & conventions
Units are minutes unless noted; outputs are man-hours. Symbols are introduced where they first appear — each formula carries a small table of just the symbols it adds, with the illustrative default used in the worked examples. The full list is collected in the Symbol reference at the end.
The defaults are placeholders for explanation only. The savings the model reports depend entirely on these inputs, so substitute your own measured values for every one before drawing any conclusion (see §14, Calibration). Different inputs give different results, by design.
1.Core Philosophy
Most efficiency claims answer one question: how much faster is the new tool? TAME asks a more precise one: how many man-hours does the new process remove, and how does that change as the work repeats?
This distinction matters because creating a test case is paid once, while running it is paid every regression cycle. A model that blends the two hides where the savings actually come from. TAME is built on the principle:
The Model in One Line
Before the details, here is the entire model in a single expression. Every time TAME uses — authoring, review, scripting, execution, triage — is a special case of one master form that holds all the variables at once:
Reading it left to right: , and are the seniority, client-process and tool-proficiency modifiers; is the intrinsic time the task takes in the neutral case; and is the learning discount that accrues with repetition. The document starts from the base — the neutral case where every modifier equals 1 and learning is switched off, so the master form is simply — and then builds each term back in: the base times in §§2–8, the learning term in §9, and the context modifiers in §10, reassembled in full in §11. Leading with the complete form is deliberate: it shows that nothing is omitted, and that the simpler model which follows is this same equation with its extension terms switched off.
2.Manual Creation Cost
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Creation man-hours if every case is authored by hand | output | |
| Number of test cases needed | 400 | |
| Manual authoring time per case | 20 min |
Meaning
The baseline effort to author every test case by hand: the number of cases times the time to write one.
Example
minutes 133.3 man-hours.
Assumption
Authoring time per case is treated as an average. In practice it varies by complexity; sampling a representative batch is the way to fix it.
3.AI-Assisted Creation Cost
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Creation man-hours with AI generation plus human review | output | |
| Scenarios prompted into the AI tool | 20 | |
| One-time setup per scenario — prompting, generation, data, vetting | 10 min | |
| AI miss rate — fraction of needed cases the AI fails to produce | 10% | |
| Review time per AI-generated case | 4 min | |
| Fraction of AI cases needing rework | 30% | |
| Correction time per reworked case | 6 min |
Meaning
AI does not remove the work; it shifts authoring to review. The tool drafts cases that a human reviews; a fraction need rework; and the cases the AI fails to produce fall back to full manual authoring.
Example
Setup ; reviewed AI cases ; missed cases . Total min 51.5 man-hours.
What really contains
The example treats setup as prompt-and-login time alone, but standing up AI generation for a scenario is more than prompting. In full it is the sum of four one-time costs:
— login and prompting, AI generation plus the orchestration a human shepherds, provisioning the test data at least once, and vetting the generated cases before they are trusted. Counted in full, can approach the manual authoring time for a scenario. That does not by itself erase the saving, because it is paid once per scenario and spread over every case that scenario generates. AI creation pays off only once enough cases amortise the setup:
The right side is the per-case advantage of reviewing over authoring; the left is the setup spread across cases. The deeper payoff is reuse on two levels: the setup amortises over cases in creation, and the resulting automated tests amortise again over cycles in execution (§12). The heavier the setup, the more the value depends on running the tests many times rather than on creating them once.
Data provisioning needs one caution. Provisioned once, it is a setup cost as above; but when test data is consumed or deleted between runs, provisioning recurs, and that recurring share belongs in the per-cycle execution cost (§12), not the one-time setup. For stable regression where the data persists, it falls back to zero.
Why this was chosen
Review and rework are the real residual costs of AI generation. Modelling them explicitly keeps the saving honest rather than assuming the AI output is used as-is.
Alternatives considered
- Pure generation, no review. Rejected — unreviewed AI cases are not safe to run and overstate the saving.
- Flat “X% faster” factor. Rejected — it hides the dependence on miss rate and rework, the variables that actually move the result.
4.Creation Savings and the Accuracy Ceiling
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Fraction of creation man-hours saved | output | |
| Residual effort ratio — share of manual effort an AI case still costs | derived |
From definition to closed form
Equation (3) is the definition of savings; (5) is what it becomes once the costs are substituted in. Writing the cost ratio and dividing each term of the numerator by :
The middle ratio is exactly from (4); the first term — the one-time setup spread across all cases — is about 2.5% at the defaults, small enough to drop. Substituting back and factoring out gives (5): the saving is coverage times per-case efficiency — the product that makes the accuracy ceiling visible.
Key property
Example
With and : 61% — saving 81.9 of 133.3 man-hours.
Setup time and what to automate first
The setup term does not touch the execution break-even — that is purely an execution quantity. What it decides is whether the creation stage pays at all, a per-scenario question: a scenario's one-time setup must be repaid by the cases it yields. Per scenario the saving is:
The first term is the setup-free ceiling — about 64% at the defaults; the second is the setup spread over the cases that share it. Setup breaks even when the two are equal:
At the defaults the denominator is about 12.8 minutes, so is roughly 0.8, 1.6 and 2.4 cases for a of 10, 20 and 30 minutes. Below that the scenario loses; above it the saving climbs toward the ceiling.
| Cases | 10 min | 20 min | 30 min |
|---|---|---|---|
| 1 | +14% | −36% | −86% |
| 2 | +39% | +14% | −11% |
| 5 | +54% | +44% | +34% |
| 10 | +59% | +54% | +49% |
| 20 | +61% | +59% | +56% |
| 50 | +63% | +62% | +61% |
This is why the order of automation matters as much as the decision to automate. Front-load the scenarios that yield early gains; defer the complex, low-yield ones until the suite — and the team's setup speed — has matured (§9). Chasing the hardest scenarios first is the most common way a promising automation effort posts a loss in its opening project.
5.Manual Execution Cost
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Execution man-hours if every case is run by hand each cycle | output | |
| Execution cycles (regression runs) | 12 | |
| Manual execution time per case, per cycle | 12 min |
Meaning
The recurring baseline: a human re-runs every case, every regression cycle. This cost is paid times, which is what makes execution the dominant term over a project.
Example
minutes 960 man-hours.
Assumption
Every case is executed every cycle. Suites that run only a subset per cycle can scale per case, or use an effective average.
6.Automated Execution Cost
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Execution man-hours with automation | output | |
| Automation coverage — share of cases scripted | 70% | |
| Scripting time per automated case (one-time) | 60 min | |
| Triage time per automated case, per cycle | 1 min | |
| Scripts needing maintenance per cycle | 8% | |
| Maintenance time per affected script | 20 min |
Meaning
Automation front-loads a large scripting cost, then runs for almost nothing each cycle. The unattended machine run itself costs zero man-hours; the only recurring human costs are triaging results, repairing broken scripts, and executing the cases that were never automated.
Example
Upfront scripting ; per cycle: automated upkeep plus manual remainder , so per cycle. min 713.6 man-hours (of which 280 h is one-time scripting).
Why this was chosen
Separating one-time scripting from per-cycle upkeep is what reveals the payback dynamic. Folding them together would hide why automation loses at low cycle counts and wins at high ones.
Alternatives considered
- Zero per-cycle cost. Rejected — ignores triage and maintenance, the costs that quietly erode automation ROI.
- Maintenance as a one-time cost. Rejected — scripts break as the application changes, so upkeep recurs each cycle.
7.Break-Even Cycles
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Break-even cycles — cycles for one script to repay its scripting cost | derived |
Meaning
The number of cycles needed for one automated script to repay its scripting cost. The manual cases and the coverage fraction appear identically on both sides of the comparison and cancel out. Break-even is fundamentally an amortisation question: how many cycles must share the one-time cost before the amortised figure drops below the recurring manual expense.
Key property
Example
6.4 cycles. Past this point the suite has repaid its scripting cost and every further cycle widens the gap against manual.
Failure case
If , automated upkeep costs as much as a manual run and automation never pays back. The model flags this explicitly.
8.Combined Program Savings
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Total man-hours if everything stays manual | output | |
| Total man-hours with AI plus automation | output | |
| Fraction of total man-hours saved over the program | output |
Worked example
| Stage | Manual (h) | New (h) | Saved (h) | Saved |
|---|---|---|---|---|
| Creation (one-time) | 133.3 | 51.5 | 81.9 | 61% |
| Execution (12 cycles) | 960.0 | 713.6 | 246.4 | 26% |
| Combined program | 1,093.3 | 765.1 | 328.3 | 30% |
Execution shows only 26% at 12 cycles because upfront scripting and the manual remainder hold it down. Raise the cycle count or the coverage and the figure climbs steeply — the single largest lever in the model is .
| Cycles | Manual (h) | New (h) | Combined saved |
|---|---|---|---|
| 1 | 213 | 368 | −72% |
| 4 | 453 | 476 | −5% |
| 6 | 613 | 548 | +11% |
| 12 | 1,093 | 765 | 30% |
| 24 | 2,053 | 1,199 | 42% |
| 48 | 3,973 | 2,066 | 48% |
| 100 | 8,133 | 3,945 | 52% |
The rows are not multiples of the first because each total splits into a one-time cost plus a per-cycle cost: and hours. Only the per-cycle term scales. The one-time creation (133.3 h manual; 51.5 h of AI creation plus 280 h of upfront scripting for the new process) is paid once, not once per cycle.
The steep early rise is the point: most of the value is won in the first ten to twenty cycles, which is why — not coverage or per-case efficiency — is the lever that moves the result most.
9.Learning Curve Extension (optional)
The base model holds every per-unit time constant. In reality, constructive work speeds up with repetition — most sharply for automation scripting, where the first scripts build reusable scaffolding that later scripts inherit. TAME models this with a power-law learning curve that switches off cleanly, reducing exactly to the base model.
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Learning rate — time multiplier per doubling | 85% / 95% | |
| Learning exponent, | derived | |
| Time to perform the -th repetition of a learnable activity | derived | |
| Number of new repetitions in this project | input | |
| Experience offset — repetitions already completed | prior | |
| Current per-unit time at the team's present skill | input |
Meaning
Each repetition of a learnable activity is a little faster than the last, following a constant percentage improvement per doubling of cumulative experience. The offset says how far up that curve the team already sits — so the entered rate keeps its natural meaning: what the task costs us now.
Parameters
- — learning rate: the time multiplier per doubling. Default 85% for scripting, 95% for authoring and review.
- — experience offset: small = cold start (new framework); large = seasoned team (≈ flat).
- — reduces to : the base model exactly. Learning is off by default.
Where it applies
- Scripting — primary. Strongest and best-documented learning effect.
- Authoring and review — optional, mild (high ), applied to both the manual baseline and the AI path so savings are not inflated.
- Execution, triage, maintenance — flat. Mechanical and bounded; learning is negligible.
Example
Scripting at min, (so ), , over scripts gives min 204 man-hours, against 280 h at a flat rate. The combined program saving rises from 30% to ≈ 37%.
Why a power law was chosen
Repeated constructive work has followed a power law since Wright (1936): a straight line on log–log axes, a constant percentage improvement per doubling of experience. This matches how a scripting framework actually matures — early effort builds shared structure, later effort reuses it.
10.Effort Modifiers: Seniority, Client Process, Tool Proficiency
The base model treats each per-unit time as a fixed average. Three real-world factors shift those times systematically: who does the work (seniority), where it is done (the client's process), and how well the team knows the tools. TAME folds them in as dimensionless multipliers on the base times, each defaulting to 1 so the base model is the neutral case.
Formula
| Symbol | Meaning | Default / units |
|---|---|---|
| Seniority factor — skill of who performs the activity (< 1 senior, > 1 junior) | 0.7–1.4 | |
| Client-process factor — governance overhead per unit of work | 1.0–1.8 | |
| Tool-proficiency factor (AI and automation) | 0.7–1.5 | |
| Effective time for activity after modifiers | derived |
Every per-unit time in §§2–8 is read as its effective value . The factors apply per activity; an activity a factor does not touch simply takes 1.
| Activity | seniority | process | tools |
|---|---|---|---|
| Manual authoring | ✓ | ✓ | — |
| AI review | ✓ | ◐ | |
| Rework | ✓ | — | |
| Prompt / setup | ◐ | — | |
| Manual execution | ◐ | ✓✓ | — |
| Scripting | ✓ | — | |
| Triage | ✓ | — | |
| Maintenance | ✓ | — | |
| Automated run | — | ◐ | — |
✓✓ strong · ✓ applies · ◐ minor · — none
Seniority ()
The skill level of whoever performs the activity. is faster (senior), is slower (junior). The effect is strongest on judgement-heavy work — review, rework, scripting — and weakest on mechanical execution. Typical range 0.7–1.4. scales man-hours only; to convert to cost, weight each activity's effective hours by the loaded rate of the role that performs it.
Client process ()
The client's governance overhead per unit of work: environment access, approvals, evidence capture, traceability, sign-offs. is a lightweight or agile client; rises toward ~1.8 in heavily regulated programs. The asymmetry is the important part: a human pays on every manual run, while an automated run emits its logs and audit trail automatically, so barely touches it.
So high process maturity multiplies the automation advantage — central for regulated programs such as SAP finance, where every manual execution carries mandatory documentation. Concretely, rises with the compliance regime; common ones include:
- SOX §404 — IT general controls on financial-reporting systems such as SAP FICO; every change is documented, tested, and signed off.
- FDA 21 CFR Part 11 with GAMP 5 — computerized-system validation in life sciences (IQ/OQ/PQ protocols, full traceability), the heaviest common regime.
- PCI-DSS — controlled testing and evidence for payment-card systems.
- GDPR / HIPAA — data handling, masking, and privacy controls in test environments.
- SOC 1 / SOC 2 and SR 11-7 — audit and model-risk controls in banking and financial services.
- IEC 62304 / DO-178C / ISO 26262 — verification rigour in medical, avionics, and automotive software.
Tool proficiency ()
How well the team knows the specific tools — the AI generator () and the automation framework (). It works through two channels. Time: multiplies tool-mediated times. Quality: a fluent team prompts better and writes sturdier scripts, lowering the rework, miss, and maintenance fractions:
is today's static proficiency; the learning curve (§9) is its trajectory. For scripting, set for the current rate and for how fast it improves — do not count the same gain in both.
Why multiplicative
The factors compound rather than add: a senior expert at a low-governance client is fast on every count, and the effects stack proportionally to task size. Multiplicative factors keep each cause separable and auditable, and collapse to the base model when set to 1.
Example
At a heavily governed client, on manual execution lifts it to 1,536 man-hours, up from 960. Automated upkeep is almost unchanged, so the program saving rises from 30% to roughly 44% and break-even arrives sooner.
11.The Complete Model: Base Plus Extensions
Having developed each term on its own — the base times (§§2–8), the learning curve (§9), and the effort modifiers (§10) — this section reassembles them into the complete form previewed in §1. The base is the starting point, and the learning and context terms are extensions switched on top of it.
The master time
Every time that appears anywhere in TAME is, in full generality, one expression combining the intrinsic cost, the context modifiers of §10, and the learning term of §9:
| Symbol | Meaning | Default / units |
|---|---|---|
| Effective time for the -th unit of activity | derived | |
| Intrinsic time for activity — the neutral, first-unit cost | base | |
| Learning exponent for activity , | derived |
Three independent influences: intrinsic cost (the neutral first-unit time), context (who does it, where, how skilled — each 1 in the neutral case), and experience (the learning discount; for activities that do not learn).
From per-unit time to the totals
The master time is per repetition. The stage totals need the average per-unit time across a whole batch — the modifiers times the learning-integrated base:
The inner sum is the cumulative learning form of (13); with learning off it equals the count times the intrinsic time, so the batch-effective time reduces to the modifiers times . This is what enters the totals.
The master totals
These are (2) and (7) with every constant time replaced by its batch-effective value. The quality terms travel the same way — , , carry the tool-proficiency channel of (15).
Consistency: the neutral case recovers the base
Setting and , the master time collapses to the intrinsic time:
and the complete totals (18) and (19) return exactly to the constant-rate equations (2) and (7). The base is the complete model with its extensions switched off — precisely as the one-line preview promised.
| Setting | Inputs changed from neutral | Saving |
|---|---|---|
| Neutral baseline | all ; learning off | 30% |
| + scripting learning | , | ≈ 37% |
| + heavy governance | on manual execution | ≈ 44% |
| + senior, tool-fluent team | , | higher still |
Every row uses the identical formula; only the inputs differ. The intended use is for a reader to set the variables to their own situation and read off their own number. The model's job is not to assert a single percentage — it is to be honest about which variables move the result and to let each reader arrive at theirs.
Worked example — a governed SAP FICO program
The modifiers are not academic; they often decide the verdict. Take the same 400-case suite inside a SOX-controlled SAP FICO program. Under §404 IT general controls — and, in life-sciences finance, GAMP 5 validation — every manual test execution carries mandatory documentation: pre-approval, step-by-step evidence, requirement traceability, four-eyes review, and archival. That overhead lands on each manual run, so on manual execution, while the automated run emits the same evidence as a by-product and barely feels it (§10). Carry that one change through the whole model:
| Quantity | Neutral | Governed SAP | + Senior, fluent |
|---|---|---|---|
| Context modifiers | all 1 | ||
| Creation saving | 61% | 61% | 70% |
| Manual execution | 960 h | 1,536 h | 1,536 h |
| Effort break-even | 6.4 cyc | 3.6 cyc | 2.3 cyc |
| Combined saving | 30% | 44% | 53% |
| Man-hours saved (12 cycles) | 328 h | 732 h | 877 h |
| Cost-aware break-even | 14.1 cyc | 7.4 cyc | 5.8 cyc |
| Net value | −$6,288 | +$25,968 | +$37,838 |
The governance that punishes every manual run is exactly what makes automation win. In the neutral case automation barely breaks even in money — at the planned 12 cycles it loses about $6.3k. The same suite in the governed program flips to +$26k, because the manual baseline it now replaces nearly doubled (960 → 1,536 h) and the cost-aware break-even falls from 14 cycles to 7. Layer on a senior, tool-fluent team (, ) and creation saving climbs to 70%, combined to 53%, and net value to +$38k — note that and sharpen creation and the automated upkeep but leave the manual baseline untouched, so they widen the gap rather than create it.
Reproduce it live: open the calculator, switch on the cost layer, and set the context modifiers to (governance on manual exec), , and both (AI and automation).
12.Tooling Cost and the Investment Decision
Man-hours are the headline, but the go/no-go for automation also turns on money — the loaded cost of the hours saved, and the tool's own licence and infrastructure cost. This section adds the cost layer and turns the effort break-even of §7 into a money break-even management can act on.
From man-hours to money
| Symbol | Meaning | Default / units |
|---|---|---|
| Blended loaded labour rate (cost per man-hour) | $/h | |
| Man-hours from the model (manual or new process) | output | |
| One-time tooling cost (licence, onboarding, setup) | $ | |
| Per-cycle tooling cost (amortised licence, compute) | $/cycle |
One-time : platform licence onboarding, implementation and CI integration, initial environment build, one-off training. Per-cycle : subscription and per-seat licences amortised to a cycle, runner/compute or cloud-grid minutes, and any per-execution vendor charge.
Cost-aware break-even
The numerator is the total upfront automation investment; the denominator is the per-cycle cost saving net of tooling. With both tooling terms zero, dividing through recovers the effort break-even (8) exactly.
The viability test and net value
| Symbol | Meaning | Default / units |
|---|---|---|
| Cost-aware break-even — cycles to repay the upfront investment | derived | |
| Net value at R cycles (positive ⇒ worth automating) | derived |
is positive precisely when the planned cycles exceed the cost-aware break-even. First the per-cycle saving must be positive; second the program must run more cycles than .
Worked example
With , , : upfront investment ; per-cycle saving net of tool ; 14.1 cycles. The effort break-even was 6.4 — licence and compute more than double it. At a planned 12 cycles, automation would lose roughly ; it becomes worth it only beyond ~14 cycles.
Amortisation: how much to invest
Manual execution is a raw operating expense — the same cost every cycle, capitalising into nothing. Automation scripting is a capital-like outlay — spent once, then spread across every cycle it serves.
The automation cost per cycle is a hyperbola in : the upfront term shrinks as more cycles share it, so the cost per cycle falls toward the running-cost floor , while the manual line stays flat. The two cross exactly at the cost-aware break-even:
How many cycles does a project actually run?
| Symbol | Meaning | Default / units |
|---|---|---|
| Implementation cycles — the SIT and UAT passes during the project | ≈ 3–4 | |
| Ongoing regression cycles after go-live | input | |
| Regression runs per period; periods the suite is maintained | input |
A typical implementation runs only a handful of passes — two system-integration cycles (SIT1, SIT2), a user-acceptance cycle (UAT), and often a pre-go-live dry run — so is about three to four. Everything beyond is regression. Since the cost-aware break-even was ~14 cycles but implementation supplies only three or four, the investment is not recovered by the project that builds the suite — it is recovered, if at all, by the regression that follows. Automation pays back only when:
Guiding the decision
- Estimate the loaded rate for the people who run and maintain the suite.
- Get the tool economics from the vendor quote: split into one-time and recurring .
- Compute the per-cycle saving net of tooling. If zero or negative, stop — no number of cycles redeems it.
- Otherwise compute and compare to the cycles you genuinely expect — implementation passes plus realistic regression, not the horizon you hope for.
- Invest only if expected cycles comfortably exceed the break-even, leaving margin for maintenance spikes and coverage that falls short of plan.
13.System Behavior
Few cycles ( below ). Scripting cost not yet repaid; automation shows a net loss; stay manual.
Many cycles ( well above ). Upfront scripting amortizes away; each cycle is near-free; percent saved approaches the coverage limit.
Low coverage (small ). Most cases still run manually each cycle; saving is capped well below 100% regardless of cycles.
14.Calibration
The defaults are industry-typical priors, not measured truth. Before reporting, sample real cases to fix the parameters that matter most:
- and — time a small batch by hand and by review to set the creation ratio.
- — the AI miss rate is the hard ceiling on creation savings and the figure most often forgotten.
- and — scripting effort and flaky-test maintenance are where optimistic automation cases break down.
- and — be honest about how many cases can truly be automated and how often the suite runs.
- and — if learning is enabled, fit them from a script-time log: plot per-script time against cumulative count on log–log axes; the slope gives (hence ), and how far in you already are gives .
- — set seniority from the staffing mix, the client-process factor from the documentation each run carries, and tool proficiency from the team's familiarity with the tools.
15.Scope and Limitations
- Man-hours only. Faster wall-clock turnaround from unattended runs is a separate throughput benefit, intentionally excluded.
- Linear per-cycle costs. Maintenance is modelled as a steady fraction; it can spike around major releases.
- Learning is optional and parametric. When enabled it assumes a single stable learning rate per activity; abrupt tooling changes or heavy turnover are only approximated.
- Quality effects excluded. Earlier defect detection and broader coverage have real value but sit outside this effort-based model — treat them as unquantified upside.
- Inputs are estimates. The output is only as good as the calibration in §14; present it as a defensible estimate with stated assumptions, not a measurement.
16.Conclusion
TAME shifts the question from an informal speed claim to a measured one: how many man-hours does the new process remove, stage by stage, as the work repeats? The model combines a one-time, accuracy-bounded creation saving; a recurring execution saving that compounds with cycles; and a volume-independent break-even point. It rests on a simple philosophy:
Rationale
This appendix consolidates the reasoning behind every modelling choice. Each entry states the decision, the reasoning, and the main alternative rejected.
Two stages
Separate creation from execution: their cost structures are categorically different — creation incurred once, execution every cycle — so a single blended figure hides where the savings come from. Rejected: a one-number “X% faster.”
Man-hours as the unit
Measure human effort, not calendar time or money. The mandate is effort saved; wall-clock turnaround would inflate the figure if mixed in, and money depends on rates that vary by organisation. Man-hours convert cleanly to cost later by weighting with loaded rates (§12).
Linear in case count and cycles — and why not exponential
Authoring or running one case does not change the cost of the next: no shared computation, no feedback loop, so no compounding mechanism justifies exponential growth. An exponential form here would be curve-fitting without a cause. Linearity is also what makes break-even independent of volume and coverage — a consequence, not a coincidence. The only legitimate departure is the mild sub-linear discount from learning, handled separately (§9).
Rework as an expected value
Rework enters as the expected cost per case. TAME predicts an aggregate over many cases, and by the law of large numbers the expected value is an accurate predictor of a sum — the same logic an insurer uses pricing on expected loss.
A missed case costs full authoring time
The conservative choice — a reviewer with context might author a missed case slightly faster, so assuming full cost cannot overstate the saving. Adjust downward only with evidence.
The capped form
Express the creation saving as two bounded factors — coverage and per-case efficiency. This makes explicit that accuracy, not speed, sets the ceiling — the most scrutinised claim in any AI-efficiency pitch. A raw speed-up hides the miss-rate ceiling and was rejected.
Why a power law for learning — the alternatives, formally
A learning law must stay positive, reproduce constant-improvement-per-doubling, and be parsimonious. One diagnostic separates the candidates — the ratio of times one doubling apart. For a power law it is constant:
a straight line on log–log axes (Wright, 1936). Exponential to a floor measures improvement against the floor, so the ratio decays with experience and the curve flattens too early — understating long-run gains; kept only as the documented variant when a hard floor exists. Logarithmic is unbounded below — it crosses zero and turns negative, physically impossible; rejected outright. Logistic S-curve needs two hard-to-identify parameters and models slow early improvement, the opposite of the steep initial drop real learning shows; the offset already absorbs any head start. Only the power law stays positive, keeps the per-doubling ratio constant, and does so with one parameter.
Anchoring learning on current skill
Anchor on the team's present rate through an experience offset, not a first-unit time. A “first script ever” time is nearly impossible to estimate and produces absurd deflation across hundreds of units; anchoring on what a script costs now keeps the entered rate meaningful and lets the offset carry how far up the curve the team sits.
Why 85% for scripting and 95% for authoring and review
Which rate fits an activity is governed by one question: how much does performing it once create reusable leverage for the next time? Scripting is constructive with high leverage — the first scripts build the framework (page objects, locator helpers, fixtures, CI wiring) that every later script inherits — exactly the mechanism behind the steep 80–85% curves long reported for construction and tooling. Authoring is cognitive with low leverage — each case targets different functionality, with a floor set by human comprehension speed that repetition cannot compress — pointing to a shallow curve near 95%. The gap, not the exact figures, is the substantive claim: over eight doublings an 85% curve cuts per-unit time to ~27%, a 95% curve only to ~66%. Both are priors to fit from a time log before production.
Effort modifiers multiply, not add
Each is a proportional effect on the same unit of work, and proportional effects compound — a senior 15% faster at a client 60% heavier lands at , not . Multiplication keeps each cause separable and auditable and reduces to the base model at 1. Additive overheads fit only genuinely fixed steps, kept as a variant.
Tool proficiency scales the quality terms too
Fluency with a tool is not only faster but better — a skilled prompter elicits fewer misses and less rework, a skilled engineer writes sturdier scripts. Limiting to time would miss its largest effect. It is tied to the learning offset so the same improvement is not counted twice.
Why the client-process factor reaches about 1.8
means roughly 0.8 hour of documentation, review and sign-off for every hour of testing — high-risk validated environments (GAMP Category 5, SOX-controlled finance) where every execution is pre-approved, evidenced step by step, traced to a requirement, four-eyes reviewed and archived. Reported compliance overhead runs ~+30% for light regimes to +80–100% for the strictest. The asymmetry is what makes the factor matter: automation produces most of this evidence as a by-product of running, so a higher widens the automation advantage rather than narrowing it.
The machine run counts as zero man-hours
The model measures people's time, and a script on a server consumes none. The recurring human costs that remain — triage and maintenance — are modelled explicitly. If runs are supervised in practice, that time belongs in the triage term, not at zero.
Every factor defaults to 1
Every modifier, and the learning switch, defaults to the neutral value, so the base model is recoverable exactly and added realism is opt-in and auditable.
Symbol reference
Every symbol in one place. Defaults are the illustrative values used in the worked examples — placeholders, not recommendations (see §14, Calibration).
| Symbol | Meaning | Default / units |
|---|---|---|
| Creation inputs | ||
| Number of test cases needed | 400 | |
| Scenarios prompted into the AI tool | 20 | |
| Manual authoring time per case | 20 min | |
| Login and prompt time per scenario | 10 min | |
| Review time per AI-generated case | 4 min | |
| Fraction of AI cases needing rework | 30% | |
| Correction time per reworked case | 6 min | |
| AI miss rate — cases authored manually | 10% | |
| Execution inputs | ||
| Automation coverage — share of cases scripted | 70% | |
| Scripting time per automated case (one-time) | 60 min | |
| Manual execution time per case, per cycle | 12 min | |
| Triage time per automated case, per cycle | 1 min | |
| Scripts needing maintenance per cycle | 8% | |
| Maintenance time per affected script | 20 min | |
| Execution cycles (regression runs) | 12 | |
| Learning curve (extension) | ||
| Learning rate — time multiplier per doubling | 85% / 95% | |
| Learning exponent, | derived | |
| Experience offset — units already completed | prior | |
| Effort modifiers (extension) | ||
| Seniority factor — skill of who performs the activity | 0.7–1.4 | |
| Client-process factor — governance overhead | 1.0–1.8 | |
| Tool-proficiency factor (AI and automation) | 0.7–1.5 | |
| Cost layer (optional) | ||
| Blended loaded labour rate (cost per man-hour) | $/h | |
| One-time tooling cost (licence, onboarding, setup) | $ | |
| Per-cycle tooling cost (amortised licence, compute) | $/cycle | |
| Derived quantities and outputs | ||
| Residual effort ratio per AI case | derived | |
| Break-even cycles (effort; cost-aware) | derived | |
| Creation effort — manual baseline; AI plus review | eff-hrs | |
| Execution effort — manual baseline; automated | eff-hrs | |
| Fraction of effort saved (creation; whole program) | output | |
Prepared as a companion to the TAME calculator. Figures recompute live in the model.