Agents at Work: Phase 4 Report

On Sale

£14.99

Added to cart

Add to wishlist

A behavioural audit of how AI judgement holds under repetition, constraint and cross-system evaluation.

Best place to start if you are interested in AI reliability, stability, and real-world evaluation.

DESCRIPTION

This report presents Phase 4 of the Agents at Work research series, examining how an AI system behaves when assessing age-related bias in job adverts under repeated evaluation.

Building on earlier phases, which identified variation in AI judgement, Phase 4 examines how that behaviour holds at scale and under structured test conditions.

The report applies a behavioural audit approach to observe how judgement changes when the same task is repeated, when input is constrained, and when outputs are compared across systems.

Rather than evaluating individual outputs, the focus is on observable system behaviour.

Key findings include:

Judgement stability is conditional, not absolute
Confidence remains stable even when judgements vary
Explanations can differ while remaining plausible
Reduced context produces more uniform, not more cautious, outputs
Agreement and confidence do not align as reliability signals

Taken together, these findings highlight a distinction between what AI systems produce and how they behave over time.

WHAT THIS REPORT DOES

Phase 4 examines how AI judgement behaves under:

repeated execution of the same task
constrained or reduced input context
cross-system comparison
interaction between confidence, agreement and explanation signals

The focus is on behavioural patterns rather than single results.

WHAT THIS REPORT DOES NOT DO

This report does not:

assess real-world discrimination or hiring outcomes
evaluate employer intent
provide compliance or legal determinations
measure model accuracy against ground truth

The analysis focuses on system behaviour under controlled conditions.

WHO IS THIS FOR

This report is intended for:

researchers examining AI system behaviour
audit, risk and assurance professionals
policymakers and regulators
practitioners working with AI decision-support systems

RESEARCH CONTEXT

This report forms Phase 4 of the Agents at Work series:

Phase 1 — detection of age-adjacent language
Phase 2 — interpretation of that language
Phase 3 — behavioural variation under repetition
Phase 4 — evaluation of how that behaviour holds under structured conditions

WHY THIS MATTERS

AI systems are often trusted based on individual outputs.

This report shows that reliability cannot be inferred from a single result.

A system may produce outputs that are clear, confident and well explained, while the underlying judgement does not remain stable.

LICENCE AND USAGE

Licensed under Creative Commons CC BY-NC-ND 4.0.

The underlying methodology, agent design, prompts and analytical framework remain proprietary and are not licensed for reuse.

You will get a PDF (976KB) file

Agents at Work: Phase 4 Report

DESCRIPTION

WHAT THIS REPORT DOES

WHAT THIS REPORT DOES NOT DO

WHO IS THIS FOR

RESEARCH CONTEXT

WHY THIS MATTERS

LICENCE AND USAGE

Protected Groups in AI Governance

Protected Groups in AI Governance

Better Data, Better Faces (Part 2) - Age Bias in Image Models

Better Data, Better Faces (Part 2) - Age Bias in Image Models

Ageless Archetypes?

Ageless Archetypes?

Agents at Work – Complete Series (Phases 1–4) with Companion Ebook and Behavioural Themes Guide

Agents at Work – Complete Series (Phases 1–4) with Companion Ebook and Behavioural Themes Guide

Agent at Work Phase 2: Age-Coded Reasoning in UK Job Adverts

Agent at Work Phase 2: Age-Coded Reasoning in UK Job Adverts

Agent at Work Phase 1: Age Bias Analysis of UK Job Adverts

Agent at Work Phase 1: Age Bias Analysis of UK Job Adverts

Agents at Work: Phase 3 Report

Agents at Work: Phase 3 Report

Agents at Work: Phase 4 Report

DESCRIPTION

WHAT THIS REPORT DOES

WHAT THIS REPORT DOES NOT DO

WHO IS THIS FOR

RESEARCH CONTEXT

WHY THIS MATTERS

LICENCE AND USAGE

You Might Also Like