Agents at Work: Phase 3 Report

On Sale

£11.99

Added to cart

Add to wishlist

A behavioural audit of how AI judgement changes under repetition, ambiguity and constraint.

Introduces the behavioural framework used to evaluate AI judgement.

Description

This report presents Phase 3 of the Agents at Work research series, examining how an AI system behaves when asked to evaluate age-related bias in recruitment language under repeated and constrained conditions.

Building on earlier phases, which examined where age-related signals appear and how they are interpreted, Phase 3 focuses on how judgement behaves when the same task is performed multiple times.

The report introduces a behavioural framework for evaluating AI systems beyond single outputs, examining patterns of stability, variation and signal response over repeated evaluation.

What This Report Does

Phase 3 examines how AI judgement behaves under:

repeated execution of the same task
ambiguous or borderline language
partial or degraded input context
variation in internal signals such as confidence and agreement

The report applies a structured behavioural audit to analyse:

run-to-run judgement stability
variation in explanations
confidence behaviour under uncertainty
consistency of cue identification
cross-model agreement
responsiveness of internal self-review signals
sensitivity to truncated input

The focus is on observable behaviour rather than individual results.

What This Report Does Not Do

This report does not:

assess real-world discrimination or hiring outcomes
determine employer intent
provide compliance or legal determinations
measure model accuracy against ground truth

The analysis focuses on system behaviour under controlled conditions.

Who This Is For

This report is intended for:

researchers examining AI system behaviour
audit, risk and assurance professionals
policymakers and regulators
practitioners working with AI decision-support systems

Research Context

This report forms Phase 3 of the Agents at Work series.

Phase 1 examines detection of age-related signals
Phase 2 examines interpretation of those signals
Phase 3 examines how judgement behaves under repetition and constraint

This phase establishes the behavioural perspective that underpins later evaluation work.

Why This Matters

AI systems are often trusted based on individual outputs and fluent explanations.

Phase 3 shows that these signals do not fully reflect how a system behaves over time. Reliability emerges from patterns of behaviour, not from a single result.