fix(harvest): exclude sub-agent transcripts and agent-generated sessions from task mining#99
Open
codeL1985 wants to merge 1 commit into
Open
fix(harvest): exclude sub-agent transcripts and agent-generated sessions from task mining#99codeL1985 wants to merge 1 commit into
codeL1985 wants to merge 1 commit into
Conversation
…k mining Three filters to stop machine-generated prompts polluting the mined task pool (they dominated ~80% of tasks on this machine): - _is_meta_prompt: drop expanded slash-command bodies (<command-message> tags or '# /' headers) — plugin self-invocations are not user intents - _AGENT_SESSION_MARKERS + _is_agent_session: skip sessions whose first prompt is another tool's agent brief (claude-mem observers, CLAUDE.md critic sub-agents, SkillOpt-Sleep's own command body) - load walk: skip <session>/subagents/ dirs and agent-*.jsonl files — Agent-tool sidechain transcripts are Claude-authored, not user tasks Verified: harvest went from 120 sessions / mostly-noise tasks to 55 sessions / 38 real user tasks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
|
@codeL1985 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Harvest mines machine-generated prompts as if they were user tasks. On a machine with memory/observer plugins installed (e.g. claude-mem) and regular Agent-tool use, ~80% of mined tasks were not authored by the user:
<session>/subagents/agent-*.jsonl;os.walksweeps them in, so every Task/Agent spawn's prompt becomes a "recurring user task" (one session contributed 19 near-duplicate tasks).claude -pcalls and sub-agent fan-out as user tasks, inflating recurring-task signal #62 never catches them._is_meta_promptonly skips short/cmdforms, so the plugin's own/skillopt-sleepcommand body got mined as a task (self-harvest).Downstream effect: the optimizer trains and gates on prompts the user never wrote.
Fix
subagents/directories andagent-*.jsonlfiles in the transcript walk (structural, no heuristics)._AGENT_SESSION_MARKERS+_is_agent_session(): drop sessions whose FIRST user prompt is a known agent brief. Ships with markers for claude-mem and this plugin's own command; user-extensible viaSKILLOPT_SLEEP_AGENT_MARKERS(comma-separated), mirroring the existingSKILLOPT_SLEEP_NEG_FEEDBACKpattern._is_meta_prompt: also skip prompts containing<command-message>/<command-name>or starting with# /.Verification
On a real ~/.claude with claude-mem installed: harvest went from 120 sessions / mostly machine-generated tasks to 55 sessions / 38 tasks, all genuinely user-authored.
🤖 Generated with Claude Code