Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions docs/mkdocs/en/replay-consistency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Replay Consistency Test Framework

tRPC-Agent supports InMemory, SQL, and Redis backends for Session/Memory storage. In production, developers often prototype with InMemory and then switch to SQL or Redis. If different backends produce inconsistent event order, state, memory, or summary data for the same agent trajectory, it leads to replay errors, context loss, long-term memory corruption, and summary overwrite issues.

This framework provides a set of standardized input trajectories to drive multiple backends, automatically generates diff reports, and pinpoints the field path and values of each inconsistency. It serves both as a testing tool and a quality benchmark for backend implementations.

## Architecture

Core components:

- **ReplayCase / ReplayStep**: JSONL files defining standardized input trajectories
- **ReplayHarness**: Parses JSONL steps, drives two backends in parallel, collects raw results
- **DiffEngine**: Four-dimension comparison (events / state / memory / summary), produces DiffReport
- **Normalizer**: Truncates timestamps to second precision, reassigns stable IDs by content, excludes `is_final_response`

Based on [tests/sessions/conftest.py](../../../tests/sessions/conftest.py) and [tests/sessions/test_replay_consistency.py](../../../tests/sessions/test_replay_consistency.py).

## Replay Cases

| # | Case Name | Type | Description |
|---|---|---|---|
| 1 | `single_turn` | Normal | Single user → agent exchange |
| 2 | `multi_turn` | Normal | 3 rounds of alternating conversation |
| 3 | `tool_call` | Normal | function_call + function_response |
| 4 | `state_update` | Normal | Multiple state_delta writes and overwrites |
| 5 | `memory_rw` | Normal | store_session + search_memory |
| 6 | `summary_gen` | Normal | 22-turn conversation triggering summary |
| 7 | `summary_truncate` | Known divergence | Two-layer validation: strict metadata + per-backend semantics |
| 8 | `exception_recovery` | Injected | inject_skip_append to simulate write failure |
| 9 | `injected_event_order` | Injected | inject_reorder_events to swap events |
| 10 | `injected_summary_session` | Injected | inject_summary_session_id to alter summary ownership |

## Normalization Strategy

Before cross-backend comparison, non-business differences are removed:

| Field | Treatment |
|-------|-----------|
| event.timestamp | Truncate to second precision (int) |
| event.id | Reassign stable ID sorted by content |
| state_delta | Unify JSON key ordering |
| is_final_response | Excluded (computed property differs across serialization paths) |

Three categories of differences are explicitly allowed and written to allowed_diff:

1. Backend-generated `invocation_id`
2. Backend-specific `save_key` format differences
3. Event count differences after summary compression (InMemory stores compressed events in memory; SQL get_session re-reads all raw events from the event table)

## Summary Comparison Strategy

The comparison operates in two layers:

1. **Summary metadata**: `session_id`, `summary_text`, `original_event_count`, `compressed_event_count` must be strictly consistent across backends — this is the core requirement for replay correctness
2. **Per-backend independent validation**: summary text is non-empty, compression has taken effect (compressed < original), and new events appended after compression are preserved

The exact boundary between summary text and retained events is allowed to differ due to backend storage model differences.

## Backend Access

| Mode | Backend A | Backend B | Trigger |
|------|-----------|-----------|---------|
| Lightweight (default) | InMemorySessionService | SqlSessionService(SQLite) | Always |
| SQL integration | InMemorySessionService | SqlSessionService(MySQL) | TEST_MYSQL_URL |
| Redis integration | InMemorySessionService | RedisSessionService | TEST_REDIS_URL |

All three backends conform to the `SessionServiceABC` interface; adding a new backend only requires implementing that interface.
67 changes: 67 additions & 0 deletions docs/mkdocs/zh/replay-consistency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# 回放一致性测试框架

tRPC-Agent 支持 InMemory、SQL、Redis 三种 Session/Memory 后端。生产环境常先用 InMemory 开发,再切换到 SQL 或 Redis。不同后端在同一条 Agent 轨迹下保存的事件顺序、state、memory 或 summary 不一致时,会导致回放错乱、上下文丢失、长期记忆污染、摘要覆盖错误等问题。

该框架提供一组标准化输入轨迹驱动多个后端,自动生成差异报告,定位不一致的字段路径和具体值。它既是测试工具,也是后端实现质量的基准。

## 架构

核心组件:

- **ReplayCase / ReplayStep**:JSONL 文件定义标准输入轨迹
- **ReplayHarness**:解析 JSONL 步骤,并行驱动两个后端执行,收集原始结果
- **DiffEngine**:四维度比较(events / state / memory / summary),产出 DiffReport
- **Normalizer**:时间戳截断到秒级、ID 按内容重赋、`is_final_response` 排除

基于 [tests/sessions/conftest.py](../../../tests/sessions/conftest.py) 和 [tests/sessions/test_replay_consistency.py](../../../tests/sessions/test_replay_consistency.py) 实现。

## Replay Case

| # | Case 名称 | 类型 | 说明 |
|---|---|---|---|
| 1 | `single_turn` | 正常 | 单轮 user → agent 对话 |
| 2 | `multi_turn` | 正常 | 3 轮交替对话 |
| 3 | `tool_call` | 正常 | function_call + function_response |
| 4 | `state_update` | 正常 | 多次 state_delta 写入覆盖 |
| 5 | `memory_rw` | 正常 | store_session + search_memory |
| 6 | `summary_gen` | 正常 | 22 轮对话触发摘要 |
| 7 | `summary_truncate` | 已知不一致 | 两层验证:元数据严格 + 单端语义 |
| 8 | `exception_recovery` | 注入 | inject_skip_append 模拟写入失败 |
| 9 | `injected_event_order` | 注入 | inject_reorder_events 交换事件 |
| 10 | `injected_summary_session` | 注入 | inject_summary_session_id 篡改归属 |

## 归一化策略

跨后端比较前需去除非业务差异:

| 字段 | 处理方式 |
|------|---------|
| event.timestamp | 截断到秒级精度(int) |
| event.id | 按内容排序后重赋稳定 ID |
| state_delta | 统一 JSON key 排序 |
| is_final_response | 排除(computed property,序列化路径不同) |

三类差异明确允许,写入 allowed_diff:

1. 后端自动生成的 `invocation_id`
2. 不同后端的 `save_key` 格式差异
3. Summary 压缩后事件总数的差异(InMemory 在内存中压缩事件列表,SQL 的 get_session 从事件表重新读取全部原始事件)

## Summary 比较策略

分两层:

1. **摘要元数据**:`session_id`、`summary_text`、`original_event_count`、`compressed_event_count` 跨后端严格一致——这是回放正确性的核心
2. **单后端独立验证**:摘要文本非空、压缩已生效(compressed < original)、压缩后追加的新事件已保留

摘要文本与事件列表的精确分界允许因后端存储模型不同而异。

## 后端接入

| 模式 | 后端 A | 后端 B | 触发条件 |
|------|--------|--------|----------|
| 轻量模式(默认) | InMemorySessionService | SqlSessionService(SQLite) | 无条件 |
| SQL 集成模式 | InMemorySessionService | SqlSessionService(MySQL) | TEST_MYSQL_URL |
| Redis 集成模式 | InMemorySessionService | RedisSessionService | TEST_REDIS_URL |

三个后端的 `SessionServiceABC` 接口一致,新增后端只需实现该接口即可接入框架。
Loading
Loading