amanojak

amanojak

Users
Tweets

amanojak

@amanojak2024

Jun 3

Replying to @amanojak2024 @claudeai

course <invoke　対応版今後同種が増えてもいいように注釈付き必ず最初にサンドボックス内で走らせてからをおすすめします ※これも自己責任でよろしく Claude Codeの厄介バグ対策・完全版。ツール呼び出しが文字化けして「実行したフリ」になる(ツールは動いてないのに『やりました』と報告＋会話ログに壊れたXMLが焼き付く)やつ。これを①検出して②止めて正しく出し直させ、③今の会話ログの汚染を掃除し、さらに④過去の会話ログも起動時に一括で自動掃除する。新しい漏れ語が増えても1語追加か環境変数で柔軟に対応できる。Node入ってればOS問わず動く。内部依存ゼロ。やり方は超かんたん。下の「ここから」〜「ここまで」を全部コピーして自分のClaude Codeに貼るだけ。Claudeが2つのファイルを作って2つのフックを設定し動作確認までやってくれる。 ━━━ ここからコピーしてClaudeに貼る ━━━ あなた(Claude Code)に常設の安全フックを導入してほしい。目的は「ツール呼び出しの漏れ」の自動ブロックと、漏れて会話ログに焼き付いた汚染テキストの自動掃除(現在の会話＋過去の全会話ログ)。背景: アシスタントがツールを実際に呼ばず、ツール呼び出しの生マークアップ(行頭の invoke / function_calls / parameter 開始タグ)を本文テキストとして書いてしまうことがある。ツールは1回も走らないのに完了報告が出て、会話ログ(transcript)にも壊れたマークアップが残る。これを次の二段構えで処理したい: (A) Stopフックで検出し、本物のツール実行が無ければstopをブロックして正しい構造化形式での1回だけの再発行を強制＋現在の会話ログの漏れを除去・要約に整形。(B) SessionStartフックで、同じプロジェクトフォルダの過去の会話ログ(.jsonl)を一括で掃除(差分のみ・liveの会話は不可侵)。手順: 次の内容で ~/.claude/hooks/toolcall-leak-clean.js を一字一句このまま作成する(掃除モジュール。現ログ掃除と過去ログ一括掃除の両方を担う。ディレクトリが無ければ作る): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Cleaner - portable companion to toolcall-leak-guard.js * ==================================================================== * Removes leaked tool-call markup that got baked into an assistant message as * plain text, and rewrites the JSONL transcript in place. It targets ONLY * assistant text blocks; user messages are never touched. * * A "leak burst" = a stray marker word (see MARKERS below: call/court/count/course) alone on a line, * followed by one or more line-leading tool-call blocks (and possibly an * unterminated tail). Each burst is replaced by a short, human-readable summary * that PRESERVES the fact a call was attempted and its arguments (full value, * untruncated) - only the raw markup is stripped. Genuine prose before/after the * burst is kept verbatim. Fail-safe: never throws to the caller. * * cleanText() uses a DETERMINISTIC line-based parser (not one big backtracking * regex). This guarantees linear time even on multi-MB logs (no ReDoS surface) * and makes over-deletion impossible: a burst ends the moment a line is no * longer part of the markup, so following prose is always preserved. * * Two ways it runs: * 1. Called by the Stop guard on the CURRENT live transcript (immediate repair). * 2. As a SessionStart hook, it sweeps every OTHER past transcript .jsonl in the * same project folder (incremental: only files changed since the last sweep), * leaving the live conversation untouched. This retro-cleans old logs. * * Usage: * - as a module : const { cleanFile, cleanText, runSessionStart } = require('./toolcall-leak-clean.js') * - clean one file : node toolcall-leak-clean.js <file.jsonl> * - SessionStart : node toolcall-leak-clean.js (reads hook payload on stdin) */ const fs = require('fs'); const path = require('path'); // == Leak "marker" words -- HOW TO EXTEND (read me) =========================== // A leak prints a short stray word on its own line right before the tag. Seen in // the wild: call, court, count, course. MARKERS below is the SINGLE SOURCE OF // TRUTH -- every marker regex in this file is built from it. When a NEW variant // shows up, add it with NO other edits, either way: // 1) add the lowercase word to the default array below, OR // 2) set env LEAK_GUARD_MARKERS="call,court,count,course,<newword>" (overrides // the default array -- handy when you cannot edit the file). // Matching is case-insensitive. Keeping this curated list = zero false positives. // If you would rather not maintain a list at all, set env LEAK_GUARD_LOOSE=1 (see // the guard) to catch ANY line-leading tool-call tag even without a marker word. // ============================================================================ let MARKERS = (process.env.LEAK_GUARD_MARKERS || 'call,court,count,course') .split(',').map(s => s.trim().toLowerCase()).filter(Boolean); if (!MARKERS.length) MARKERS = ['call', 'court', 'count', 'course']; // never end up empty const M = '(?:' MARKERS.map(s => s.replace(/[.* ?^${}()|[\]\\]/g, '\\$&')).join('|') ')'; // regex-escaped, OR-joined // fast pre-check: a marker line immediately followed by a line-leading tag const HASLEAK = new RegExp('(?:^|\\n)[ \\t]*' M '[ \\t]*\\r?\\n[ \\t]*<\\s*(?:antml:)?(?:invoke|parameter|function_calls)\\b', 'i'); // cheap raw-string pre-filter: skip files with no leak trace without parsing JSON const PREFILTER = new RegExp('<\\s*\\/?\\s*(?:antml:)?(?:invoke|function_calls)\\b|(?:^|\\n)[ \\t]*' M '[ \\t]*(?:\\n|<)', 'i'); // a marker word alone on a line const MARKER_LINE = new RegExp('^[ \\t]*' M '[ \\t]*$', 'i'); // lone marker lines to strip after a burst is removed const MARKER_STRIP = new RegExp('^[ \\t]*' M '[ \\t]*$', 'gim'); // a line that opens a tool-call block const OPEN_LINE = /^[ \t]*<\s*(?:antml:)?(?:invoke|function_calls)\b/i; // any line that is part of tool-call markup (open/close/parameter) const MARKUP_LINE = /^[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls|\/invoke|\/parameter|\/function_calls)\b/i; function summarizeInvoke(name, body) { // pull each <parameter name=...>value</parameter> out of one call block const paramRe = /<\s*(?:antml:)?parameter\s name\s*=\s*["']?([^"'>\s] )["']?\s*>([\s\S]*?)(?:<\/\s*(?:antml:)?parameter\s*>|$)/gi; const params = []; let pm; while ((pm = paramRe.exec(body)) !== null) { params.push(' ' pm[1] '=' (pm[2] || '')); // full value, no truncation (preserve) } const head = '[leaked tool-call markup removed (was never executed; the real call is reissued separately): ' name; if (!params.length) return head ']'; return head '\n' params.join('\n') ']'; } // Deterministic line-based cleaner. Linear time; no catastrophic backtracking. function cleanText(text) { if (typeof text !== 'string' || !HASLEAK.test(text)) return { text, changed: false }; const lines = text.split(/\r?\n/); const resultLines = []; let changed = false; let i = 0; while (i < lines.length) { const isBurst = MARKER_LINE.test(lines[i]) && i 1 < lines.length && OPEN_LINE.test(lines[i 1]); if (!isBurst) { resultLines.push(lines[i]); i ; continue; } changed = true; i ; // drop the marker line // collect contiguous markup lines into one burst (stop at the next marker, // or the first line that is neither markup nor a value-continuation) let burst = ''; while (i < lines.length) { if (burst && MARKER_LINE.test(lines[i])) break; // next burst starts burst = (burst ? '\n' : '') lines[i]; const next = i 1 < lines.length ? lines[i 1] : null; i ; if (next === null) break; if (!MARKUP_LINE.test(next)) break; // markup ended -> prose resumes } const invokeRe = /<\s*(?:antml:)?invoke\s name\s*=\s*["']?([^"'>\s] )["']?[^>]*>([\s\S]*?)(?:<\/\s*(?:antml:)?invoke\s*>|$)/gi; const summaries = []; let im; while ((im = invokeRe.exec(burst)) !== null) summaries.push(summarizeInvoke(im[1], im[2] || '')); if (summaries.length === 0) summaries.push('[leaked tool-call markup removed (was never executed; reissued separately)]'); for (const s of summaries) resultLines.push(s); } let n = resultLines.join('\n'); if (changed) n = n.replace(MARKER_STRIP, ''); // drop stray marker lines left by stacked markers n = n.replace(/\n{3,}/g, '\n\n').replace(/[ \t] \n/g, '\n').trim(); return { text: n, changed: n !== text || changed }; } function cleanFileRaw(file, raw) { const lines = raw.split('\n'); let removed = 0; let fileChanged = false; for (let i = 0; i < lines.length; i ) { const line = lines[i]; if (!line.trim()) continue; let obj; try { obj = JSON.parse(line); } catch (_) { continue; } // non-JSON: leave as-is const isAssistant = obj.type === 'assistant' || (obj.message && obj.message.role === 'assistant'); if (!isAssistant) continue; const msg = obj.message || obj; const content = msg.content; if (!Array.isArray(content)) continue; let lineChanged = false; for (const b of content) { if (!b || typeof b !== 'object' || b.type !== 'text') continue; const r = cleanText(b.text || ''); if (r.changed) { b.text = r.text; removed ; lineChanged = true; } } if (lineChanged) { lines[i] = JSON.stringify(obj); fileChanged = true; } } if (fileChanged) { try { const tmp = file '.leakclean.tmp'; fs.writeFileSync(tmp, lines.join('\n')); fs.renameSync(tmp, file); // atomic replace: a reader sees old OR new, never a half-written file } catch (_) { return 0; } // file locked/busy (e.g. held open by the app) -> skip; never corrupt, never abort the sweep } return removed; } function cleanFile(file) { let raw; try { raw = fs.readFileSync(file, 'utf8'); } catch (_) { return 0; } return cleanFileRaw(file, raw); } // recursively collect *.jsonl under dir, skipping our own helper/backup folders function collectJsonl(dir, out) { let ents; try { ents = fs.readdirSync(dir, { withFileTypes: true }); } catch (_) { return; } for (const e of ents) { const p = path.join(dir, e.name); if (e.isDirectory()) { if (e.name.startsWith('_')) continue; // skip _backup/_temp style folders collectJsonl(p, out); } else if (e.isFile() && e.name.endsWith('.jsonl')) { out.push(p); } } } // SessionStart sweep: clean every OTHER past transcript in the project folder. // Incremental (only files changed since the last sweep). Never touches the live // conversation. Returns a small summary. Fail-safe. function runSessionStart(payload) { const tpath = (payload && (payload.transcript_path || payload.transcriptPath)) || ''; if (!tpath) return { scanned: 0, touched: 0, removed: 0 }; const projDir = path.dirname(tpath); const current = path.resolve(tpath); const statePath = path.join(projDir, '_leakclean_state.json'); let lastRun = 0; try { lastRun = Number(JSON.parse(fs.readFileSync(statePath, 'utf8')).lastRun) || 0; } catch (_) {} const startedAt = Date.now(); const files = []; collectJsonl(projDir, files); let removed = 0, touched = 0, scanned = 0; for (const f of files) { if (path.resolve(f) === current) continue; // never touch the live conversation let st; try { st = fs.statSync(f); } catch (_) { continue; } if (st.mtimeMs <= lastRun) continue; // unchanged since last sweep -> skip scanned ; let raw; try { raw = fs.readFileSync(f, 'utf8'); } catch (_) { continue; } if (!PREFILTER.test(raw)) continue; // no leak trace -> skip without parsing const r = cleanFileRaw(f, raw); if (r > 0) { removed = r; touched ; } } try { fs.writeFileSync(statePath, JSON.stringify({ lastRun: startedAt })); } catch (_) {} if (removed > 0) { try { fs.appendFileSync(path.join(projDir, '_leakclean_activity.log'), `[${new Date().toISOString()}] SessionStart: cleaned ${removed} leaked block(s) across ${touched} file(s) (scanned ${scanned}, excluded live ${path.basename(current)}, ${Date.now() - startedAt}ms)\n`); } catch (_) {} } return { scanned, touched, removed }; } module.exports = { cleanText, cleanFile, cleanFileRaw, collectJsonl, runSessionStart }; if (require.main === module) { const arg = process.argv[2]; if (arg && arg.endsWith('.jsonl')) { const n = cleanFile(arg); process.stdout.write('cleaned ' n ' leaked block(s) in ' arg '\n'); process.exit(0); } // SessionStart hook: read payload from stdin, sweep, always output {} (fail-open) let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) {} let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) {} try { runSessionStart(payload); } catch (_) {} process.stdout.write('{}'); process.exit(0); } 次の内容で ~/.claude/hooks/toolcall-leak-guard.js を一字一句このまま作成する(検出＋ブロック＋現ログ掃除呼び出し。手順1と同じフォルダに置くこと。ガードが同フォルダの掃除モジュールを呼ぶ): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Guard - portable Claude Code "Stop" hook * ======================================================= * PROBLEM * Occasionally an assistant turn prints the RAW tool-call markup as plain * text - a line that starts with a tool-call start-tag (the invoke, * function_calls, or parameter tag) - instead of actually invoking the tool. * The tool never runs, yet the model frequently reports as if it had. This * corrupts the transcript and produces fake "done" reports. * * WHAT THIS DOES * Runs when the assistant finishes a turn (the "Stop" event). It reads the * transcript, inspects the LAST assistant message, and: * - if that message already contains a real structured tool_use block, * it does nothing (the model really did call a tool this turn); * - else if it detects leaked tool-call markup in the plain text, it * BLOCKS the stop and tells the model to re-issue the call exactly once * using the proper structured mechanism. * * DESIGN RULES * - Fail-open: any error or unexpected shape -> allow (never wedge a session). * - Loop-guard: if we already blocked once this stop-cycle * (payload.stop_hook_active), give up and allow (no infinite loop). * - Zero false positives by default: requires a stray marker word on its own * line right before the tag. Prose that merely QUOTES a tag inline or * inside a code fence will NOT trigger. * * CONTRACT (Claude Code Stop hook) * stdin : JSON hook payload (transcript_path, stop_hook_active, ...) * stdout : {} -> allow the stop * {"decision":"block","reason":"..."} -> force the model to continue */ const fs = require('fs'); function out(o) { try { process.stdout.write(JSON.stringify(o)); } catch (_) {} } function allow() { out({}); process.exit(0); } function block(reason) { out({ decision: 'block', reason }); process.exit(0); } // read hook payload from stdin let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) { allow(); } let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) { allow(); } // loop guard: if a previous Stop hook in this same cycle already blocked, stop here. if (payload.stop_hook_active) allow(); const tpath = payload.transcript_path || payload.transcriptPath; if (!tpath) allow(); let raw; try { raw = fs.readFileSync(tpath, 'utf8'); } catch (_) { allow(); } // find the last assistant message in the JSONL transcript const lines = raw.split(/\r?\n/).filter(Boolean); let last = null; for (let i = lines.length - 1; i >= 0; i--) { let o; try { o = JSON.parse(lines[i]); } catch (_) { continue; } const m = o.message || o; const role = (m && m.role) || o.role || o.type; if (role === 'assistant') { last = m; break; } } if (!last) allow(); let content = last.content; if (typeof content === 'string') content = [{ type: 'text', text: content }]; if (!Array.isArray(content)) allow(); let hasToolUse = false; let text = ''; for (const b of content) { if (!b || typeof b !== 'object') continue; // forward-compatible: treat any tool-invocation block as real activity, // tolerant of future/variant type names so an SDK rename can't silently break it if (b.type === 'tool_use' || b.type === 'server_tool_use' || b.type === 'tool_call') hasToolUse = true; if (b.type === 'text' && typeof b.text === 'string') text = '\n' b.text; } // leak signature // STRICT (default): a stray marker word (see MARKERS below: call/court/count/course) alone on a line, // immediately followed by a line that STARTS with a tool-call start-tag. // This is the empirically observed shape of real leaks and is // false-positive-proof. // LOOSE (opt-in via env LEAK_GUARD_LOOSE=1): any line that starts with a // tool-call start-tag. Higher recall, tiny false-positive risk on prose that // quotes a tag at the start of a line. // == Leak "marker" words -- HOW TO EXTEND (read me) =========================== // A leak prints a short stray word on its own line right before the tag. Seen in // the wild: call, court, count, course. MARKERS below is the SINGLE SOURCE OF // TRUTH -- STRICT is built from it. When a NEW variant shows up, add it with NO // other edits, either way: // 1) add the lowercase word to the default array below, OR // 2) set env LEAK_GUARD_MARKERS="call,court,count,course,<newword>" (overrides // the default array -- handy when you cannot edit the file). // Matching is case-insensitive. Keep this list = zero false positives. If you'd // rather not maintain it, set env LEAK_GUARD_LOOSE=1 to fall back to LOOSE, which // fires on ANY line-leading tool-call tag even without a marker word (slightly // higher false-positive risk on prose that quotes a tag at the start of a line). // Keep the SAME default list in the cleaner file so both halves agree. // ============================================================================ let MARKERS = (process.env.LEAK_GUARD_MARKERS || 'call,court,count,course') .split(',').map(s => s.trim().toLowerCase()).filter(Boolean); if (!MARKERS.length) MARKERS = ['call', 'court', 'count', 'course']; // never end up empty const M = '(?:' MARKERS.map(s => s.replace(/[.* ?^${}()|[\]\\]/g, '\\$&')).join('|') ')'; // regex-escaped, OR-joined const STRICT = new RegExp('(?:^|\\n)[ \\t]*' M '[ \\t]*\\r?\\n[ \\t]*<\\s*(?:antml:)?(?:invoke|parameter|function_calls)\\b', 'i'); const LOOSE = /(?:^|\n)[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const re = process.env.LEAK_GUARD_LOOSE ? LOOSE : STRICT; const isLeak = re.test(text); // in-place repair: if the cleaner companion is present, strip the leaked markup // from the transcript so the saved conversation is not corrupted. fail-open if // the file is absent (detect block still works without it). if (isLeak) { try { require(require('path').join(__dirname, 'toolcall-leak-clean.js')).cleanFile(tpath); } catch (_) {} } // a real structured tool call this turn -> never block. if (hasToolUse) allow(); if (isLeak) { block( 'Tool-call leak detected. Your last message contains raw tool-call markup as ' 'plain text (a line beginning with a tool-call start-tag), but no tool was ' 'actually invoked, so nothing ran. Re-issue the intended tool call exactly ' 'once using the proper structured tool-call mechanism. Never paste tool-call ' 'markup as prose.' ); } allow(); ~/.claude/settings.json に次の2つのフックを既存を壊さず追記する(hooks.Stop と hooks.SessionStart)。command のパスは手順1・2で作った絶対パスに置換する(Windowsは node "C:\\Users\\<you>\\.claude\\hooks\\toolcall-leak-guard.js" のようにエスケープ)。該当キーが無ければ新規作成、既にあれば配列に追記: { "hooks": { "Stop": [ { "hooks": [ { "type": "command", "command": "node \"<guardの絶対パス>\"", "timeout": 8 } ] } ], "SessionStart": [ { "hooks": [ { "type": "command", "command": "node \"<cleanの絶対パス>\"", "timeout": 15 } ] } ] } } 動作確認を実際に実行して結果を見せる: (a) 漏れ入りの擬似transcript(本物のtool_useは無し、本文に court マーカー＋行頭invokeタグ＋閉じタグ付きの呼び出しを含む)をnodeでガードに食わせ、{"decision":"block",...} が返り、かつそのtranscriptから生マークアップが消えて要約に変わること。(b) 普通の文章では {} が返ること。(c) 別フォルダに漏れ入りの過去ログ複数＋liveログを作り、cleanモジュールを runSessionStart で走らせ、過去ログだけ掃除されてliveログは不可侵であること。 (d) 環境変数 LEAK_GUARD_MARKERS="call,court,count,course,cause" を付けると新語 cause の漏れもブロックされ、無指定では cause が素通りすることを確認(柔軟拡張の実証)。新しい漏れ語が増えたときの拡張方法をユーザーに伝える: 漏れの目印になる単独語(マーカー語)は、両ファイル冒頭の MARKERS 配列が唯一の管理元(single source of truth)で、他の正規表現は全てそこから自動生成される。将来 call/court/count/course 以外の新しい単独語(例: cause)が漏れの直前に出るようになったら、対応は2通り — (あ)両ファイルの MARKERS 配列にその小文字1語を足すだけ、(い)ファイルを一切編集せず環境変数 LEAK_GUARD_MARKERS="call,court,count,course,cause" で上書き(フックの command に付けてもよい)。どうしても語を追いかけたくない場合は LEAK_GUARD_LOOSE=1 でマーカー語なしの行頭タグも全部検出できる(誤爆リスクは微増)。反映にはClaude Codeの再起動(新セッション)が必要だと伝える。絶対に削らない制約: フェイルオープン(壊れたら必ず素通り)／ループガード(stop_hook_active が真なら素通り＝無限ループ防止)／掃除はassistantのテキストのみ対象でユーザー発言は不可侵／SessionStart掃除はliveの会話ログを絶対に触らない／既定はSTRICT(マーカー語必須・誤爆ゼロ)。 ━━━ ここまでコピー ━━━ これで次のセッションから、漏れたターンは自動ブロック＆正しい再発行強制、現在の会話ログの汚染XMLは1行要約に自動整形(引数は保持・前後の文章は温存)、さらに過去の会話ログも起動時に一括で自動掃除される(差分のみ・liveは不可侵)。普通の文章やコード引用では発火しない誤爆ゼロ設計。拡張も柔軟: 漏れの目印になる単独語は現在 call / court / count / course の4つ。今後これ以外の語(例: cause)が漏れの直前に出るようになっても、両ファイル冒頭の MARKERS 配列に小文字で1語足すか、環境変数 LEAK_GUARD_MARKERS="call,court,count,course,cause" で上書きするだけで追従する(他の正規表現は MARKERS から自動生成)。ファイルを触りたくない人は LEAK_GUARD_LOOSE=1 で目印語なしでも検出できる。コード冒頭の「HOW TO EXTEND」注釈に同じ手順を明記してある。

5,623

amanojak

amanojak

@amanojak2024

Jun 3

Replying to @amanojak2024 @claudeai

色々検証したけどおそらくこれで1発でセーフガードできそう（自分のは今のとこできている🧐再発なし）心配ならまずはこれがちゃんと動くのか？と先に質問してから動かせばいいかもね ※自己責任でどうぞあなた(Claude Code)に常設の安全フックを導入してほしい。目的は「ツール呼び出しの漏れ」の自動ブロックと、漏れて会話ログに焼き付いた汚染テキストの自動掃除(現在の会話＋過去の全会話ログ)。背景: アシスタントがツールを実際に呼ばず、ツール呼び出しの生マークアップ(行頭の invoke / function_calls / parameter 開始タグ)を本文テキストとして書いてしまうことがある。ツールは1回も走らないのに完了報告が出て、会話ログ(transcript)にも壊れたマークアップが残る。これを次の二段構えで処理したい: (A) Stopフックで検出し、本物のツール実行が無ければstopをブロックして正しい構造化形式での1回だけの再発行を強制＋現在の会話ログの漏れを除去・要約に整形。(B) SessionStartフックで、同じプロジェクトフォルダの過去の会話ログ(.jsonl)を一括で掃除(差分のみ・liveの会話は不可侵)。手順: 次の内容で ~/.claude/hooks/toolcall-leak-clean.js を一字一句このまま作成する(掃除モジュール。現ログ掃除と過去ログ一括掃除の両方を担う。ディレクトリが無ければ作る): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Cleaner - portable companion to toolcall-leak-guard.js * ==================================================================== * Removes leaked tool-call markup that got baked into an assistant message as * plain text, and rewrites the JSONL transcript in place. It targets ONLY * assistant text blocks; user messages are never touched. * * A "leak burst" = a stray marker word (court/count/call) alone on a line, * followed by one or more line-leading tool-call blocks (and possibly an * unterminated tail). Each burst is replaced by a short, human-readable summary * that PRESERVES the fact a call was attempted and its arguments - only the raw * markup is stripped. Genuine prose before/after the burst is kept verbatim. * Line count is preserved; only changed lines are re-serialized; non-JSON lines * pass through untouched. Fail-safe: never throws to the caller. * * Two ways it runs: * 1. Called by the Stop guard on the CURRENT live transcript (immediate repair). * 2. As a SessionStart hook, it sweeps every OTHER past transcript .jsonl in the * same project folder (incremental: only files changed since the last sweep), * leaving the live conversation untouched. This retro-cleans old logs. * * Usage: * - as a module : const { cleanFile, cleanText, runSessionStart } = require('./toolcall-leak-clean.js') * - clean one file : node toolcall-leak-clean.js <file.jsonl> * - SessionStart : node toolcall-leak-clean.js (reads hook payload on stdin) */ const fs = require('fs'); const path = require('path'); // marker line following line-leading tool-call block(s) [ unterminated tail] const BURST = /(?:^|\n)[ \t]*(?:court|count|call)[ \t]*(?:(?:\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|function_calls)\b[\s\S]*?<\/\s*(?:antml:)?(?:invoke|function_calls)\s*>) (?:\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b[\s\S]*)?|\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b[\s\S]*)/gi; // each call block inside a burst (unterminated tail tolerated) const INVOKE = /<\s*(?:antml:)?invoke\s name\s*=\s*["']?([^"'>\s] )["']?[^>]*>([\s\S]*?)(?:<\/\s*(?:antml:)?invoke\s*>|$)/gi; // a single argument inside a call block const PARAM = /<\s*(?:antml:)?parameter\s name\s*=\s*["']?([^"'>\s] )["']?\s*>([\s\S]*?)(?:<\/\s*(?:antml:)?parameter\s*>|$)/gi; // a lone marker word left over after burst removal const LONE_MARKER = /^[ \t]*(?:court|count|call)[ \t]*$/gim; // fast pre-check: a marker line immediately followed by a line-leading tag const HASLEAK = /(?:^|\n)[ \t]*(?:court|count|call)[ \t]*\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const MULTI_NL = /\n{3,}/g; // cheap raw-string pre-filter: skip files with no leak trace without parsing JSON const PREFILTER = /<\s*\/?\s*(?:antml:)?(?:invoke|function_calls)\b|(?:^|\n)[ \t]*(?:court|count|call)[ \t]*(?:\n|<)/i; function summarizeInvoke(name, body) { PARAM.lastIndex = 0; const params = []; let pm; while ((pm = PARAM.exec(body)) !== null) { params.push(' ' pm[1] '=' (pm[2] || '')); // full value, no truncation } const head = '[leaked tool-call markup removed (was never executed; the real call is reissued separately): ' name; if (!params.length) return head ']'; return head '\n' params.join('\n') ']'; } function cleanText(text) { if (typeof text !== 'string' || !HASLEAK.test(text)) return { text, changed: false }; let n = text.replace(BURST, function (burst) { INVOKE.lastIndex = 0; const out = []; let im; while ((im = INVOKE.exec(burst)) !== null) out.push(summarizeInvoke(im[1], im[2] || '')); if (!out.length) out.push('[leaked tool-call markup removed (was never executed; reissued separately)]'); return '\n' out.join('\n'); }); if (n !== text) n = n.replace(LONE_MARKER, ''); n = n.replace(MULTI_NL, '\n\n').replace(/[ \t] \n/g, '\n').trim(); return { text: n, changed: n !== text }; } function cleanFileRaw(file, raw) { const lines = raw.split('\n'); let removed = 0; let fileChanged = false; for (let i = 0; i < lines.length; i ) { const line = lines[i]; if (!line.trim()) continue; let obj; try { obj = JSON.parse(line); } catch (_) { continue; } // non-JSON: leave as-is const isAssistant = obj.type === 'assistant' || (obj.message && obj.message.role === 'assistant'); if (!isAssistant) continue; const msg = obj.message || obj; const content = msg.content; if (!Array.isArray(content)) continue; let lineChanged = false; for (const b of content) { if (!b || typeof b !== 'object' || b.type !== 'text') continue; const r = cleanText(b.text || ''); if (r.changed) { b.text = r.text; removed ; lineChanged = true; } } if (lineChanged) { lines[i] = JSON.stringify(obj); fileChanged = true; } } if (fileChanged) { const tmp = file '.leakclean.tmp'; fs.writeFileSync(tmp, lines.join('\n')); fs.renameSync(tmp, file); // atomic replace } return removed; } function cleanFile(file) { let raw; try { raw = fs.readFileSync(file, 'utf8'); } catch (_) { return 0; } return cleanFileRaw(file, raw); } // recursively collect *.jsonl under dir, skipping our own helper/backup folders function collectJsonl(dir, out) { let ents; try { ents = fs.readdirSync(dir, { withFileTypes: true }); } catch (_) { return; } for (const e of ents) { const p = path.join(dir, e.name); if (e.isDirectory()) { if (e.name.startsWith('_')) continue; // skip _backup/_temp style folders collectJsonl(p, out); } else if (e.isFile() && e.name.endsWith('.jsonl')) { out.push(p); } } } // SessionStart sweep: clean every OTHER past transcript in the project folder. // Incremental (only files changed since the last sweep). Never touches the live // conversation. Returns a small summary. Fail-safe. function runSessionStart(payload) { const tpath = (payload && (payload.transcript_path || payload.transcriptPath)) || ''; if (!tpath) return { scanned: 0, touched: 0, removed: 0 }; const projDir = path.dirname(tpath); const current = path.resolve(tpath); const statePath = path.join(projDir, '_leakclean_state.json'); let lastRun = 0; try { lastRun = Number(JSON.parse(fs.readFileSync(statePath, 'utf8')).lastRun) || 0; } catch (_) {} const startedAt = Date.now(); const files = []; collectJsonl(projDir, files); let removed = 0, touched = 0, scanned = 0; for (const f of files) { if (path.resolve(f) === current) continue; // never touch the live conversation let st; try { st = fs.statSync(f); } catch (_) { continue; } if (st.mtimeMs <= lastRun) continue; // unchanged since last sweep -> skip scanned ; let raw; try { raw = fs.readFileSync(f, 'utf8'); } catch (_) { continue; } if (!PREFILTER.test(raw)) continue; // no leak trace -> skip without parsing const r = cleanFileRaw(f, raw); if (r > 0) { removed = r; touched ; } } try { fs.writeFileSync(statePath, JSON.stringify({ lastRun: startedAt })); } catch (_) {} if (removed > 0) { try { fs.appendFileSync(path.join(projDir, '_leakclean_activity.log'), `[${new Date().toISOString()}] SessionStart: cleaned ${removed} leaked block(s) across ${touched} file(s) (scanned ${scanned}, excluded live ${path.basename(current)}, ${Date.now() - startedAt}ms)\n`); } catch (_) {} } return { scanned, touched, removed }; } module.exports = { cleanText, cleanFile, cleanFileRaw, collectJsonl, runSessionStart }; if (require.main === module) { const arg = process.argv[2]; if (arg && arg.endsWith('.jsonl')) { const n = cleanFile(arg); process.stdout.write('cleaned ' n ' leaked block(s) in ' arg '\n'); process.exit(0); } // SessionStart hook: read payload from stdin, sweep, always output {} (fail-open) let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) {} let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) {} try { runSessionStart(payload); } catch (_) {} process.stdout.write('{}'); process.exit(0); } 次の内容で ~/.claude/hooks/toolcall-leak-guard.js を一字一句このまま作成する(検出＋ブロック＋現ログ掃除呼び出し。手順1と同じフォルダに置くこと。ガードが同フォルダの掃除モジュールを呼ぶ): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Guard - portable Claude Code "Stop" hook * ======================================================= * PROBLEM * Occasionally an assistant turn prints the RAW tool-call markup as plain * text - a line that starts with a tool-call start-tag (the invoke, * function_calls, or parameter tag) - instead of actually invoking the tool. * The tool never runs, yet the model frequently reports as if it had. This * corrupts the transcript and produces fake "done" reports. * * WHAT THIS DOES * Runs when the assistant finishes a turn (the "Stop" event). It reads the * transcript, inspects the LAST assistant message, and: * - if that message already contains a real structured tool_use block, * it does nothing (the model really did call a tool this turn); * - else if it detects leaked tool-call markup in the plain text, it * BLOCKS the stop and tells the model to re-issue the call exactly once * using the proper structured mechanism. * * DESIGN RULES * - Fail-open: any error or unexpected shape -> allow (never wedge a session). * - Loop-guard: if we already blocked once this stop-cycle * (payload.stop_hook_active), give up and allow (no infinite loop). * - Zero false positives by default: requires a stray marker word on its own * line right before the tag. Prose that merely QUOTES a tag inline or * inside a code fence will NOT trigger. * * CONTRACT (Claude Code Stop hook) * stdin : JSON hook payload (transcript_path, stop_hook_active, ...) * stdout : {} -> allow the stop * {"decision":"block","reason":"..."} -> force the model to continue */ const fs = require('fs'); function out(o) { try { process.stdout.write(JSON.stringify(o)); } catch (_) {} } function allow() { out({}); process.exit(0); } function block(reason) { out({ decision: 'block', reason }); process.exit(0); } // read hook payload from stdin let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) { allow(); } let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) { allow(); } // loop guard: if a previous Stop hook in this same cycle already blocked, stop here. if (payload.stop_hook_active) allow(); const tpath = payload.transcript_path || payload.transcriptPath; if (!tpath) allow(); let raw; try { raw = fs.readFileSync(tpath, 'utf8'); } catch (_) { allow(); } // find the last assistant message in the JSONL transcript const lines = raw.split(/\r?\n/).filter(Boolean); let last = null; for (let i = lines.length - 1; i >= 0; i--) { let o; try { o = JSON.parse(lines[i]); } catch (_) { continue; } const m = o.message || o; const role = (m && m.role) || o.role || o.type; if (role === 'assistant') { last = m; break; } } if (!last) allow(); let content = last.content; if (typeof content === 'string') content = [{ type: 'text', text: content }]; if (!Array.isArray(content)) allow(); let hasToolUse = false; let text = ''; for (const b of content) { if (!b || typeof b !== 'object') continue; if (b.type === 'tool_use') hasToolUse = true; if (b.type === 'text' && typeof b.text === 'string') text = '\n' b.text; } // leak signature // STRICT (default): a stray marker word (court/count/call) alone on a line, // immediately followed by a line that STARTS with a tool-call start-tag. // This is the empirically observed shape of real leaks and is // false-positive-proof. // LOOSE (opt-in via env LEAK_GUARD_LOOSE=1): any line that starts with a // tool-call start-tag. Higher recall, tiny false-positive risk on prose that // quotes a tag at the start of a line. const STRICT = /(?:^|\n)[ \t]*(?:court|count|call)[ \t]*\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const LOOSE = /(?:^|\n)[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const re = process.env.LEAK_GUARD_LOOSE ? LOOSE : STRICT; const isLeak = re.test(text); // in-place repair: if the cleaner companion is present, strip the leaked markup // from the transcript so the saved conversation is not corrupted. fail-open if // the file is absent (detect block still works without it). if (isLeak) { try { require(require('path').join(__dirname, 'toolcall-leak-clean.js')).cleanFile(tpath); } catch (_) {} } // a real structured tool call this turn -> never block. if (hasToolUse) allow(); if (isLeak) { block( 'Tool-call leak detected. Your last message contains raw tool-call markup as ' 'plain text (a line beginning with a tool-call start-tag), but no tool was ' 'actually invoked, so nothing ran. Re-issue the intended tool call exactly ' 'once using the proper structured tool-call mechanism. Never paste tool-call ' 'markup as prose.' ); } allow(); ~/.claude/settings.json に次の2つのフックを既存を壊さず追記する(hooks.Stop と hooks.SessionStart)。command のパスは手順1・2で作った絶対パスに置換する(Windowsは node "C:\\Users\\<you>\\.claude\\hooks\\toolcall-leak-guard.js" のようにエスケープ)。該当キーが無ければ新規作成、既にあれば配列に追記: { "hooks": { "Stop": [ { "hooks": [ { "type": "command", "command": "node \"<guardの絶対パス>\"", "timeout": 8 } ] } ], "SessionStart": [ { "hooks": [ { "type": "command", "command": "node \"<cleanの絶対パス>\"", "timeout": 15 } ] } ] } } 動作確認を実際に実行して結果を見せる: (a) 漏れ入りの擬似transcript(本物のtool_useは無し、本文に court マーカー＋行頭invokeタグ＋閉じタグ付きの呼び出しを含む)をnodeでガードに食わせ、{"decision":"block",...} が返り、かつそのtranscriptから生マークアップが消えて要約に変わること。(b) 普通の文章では {} が返ること。(c) 別フォルダに漏れ入りの過去ログ複数＋liveログを作り、cleanモジュールを runSessionStart で走らせ、過去ログだけ掃除されてliveログは不可侵であること。反映にはClaude Codeの再起動(新セッション)が必要だと伝える。絶対に削らない制約: フェイルオープン(壊れたら必ず素通り)／ループガード(stop_hook_active が真なら素通り＝無限ループ防止)／掃除はassistantのテキストのみ対象でユーザー発言は不可侵／SessionStart掃除はliveの会話ログを絶対に触らない／既定はSTRICT(マーカー語必須・誤爆ゼロ)。 ━━━ ここまでコピー ━━━ これで次のセッションから、漏れたターンは自動ブロック＆正しい再発行強制、現在の会話ログの汚染XMLは1行要約に自動整形(引数は保持・前後の文章は温存)、さらに過去の会話ログも起動時に一括で自動掃除される(差分のみ・liveは不可侵)。普通の文章やコード引用では発火しない誤爆ゼロ設計。

1,581

norvid_studies

norvid_studies @norvid_studies

23 Oct 2025

Replying to @norvid_studies @janbamjan

the text being kind of garbled and the production sloppy makes the image funnier to me although the "cleantext" version of the ad would also be slightly differently funny

150

Ezekiel

Ezekiel

@ezekiel_aleke

19 Sep 2025

32REPLACE() – Replace part of string Ex: MaskedPhone = REPLACE(Customers[Phone], 1, 6, "XXXXXX") 33. SUBSTITUTE() – Replace all occurrences Ex: CleanText = SUBSTITUTE(Products[Description], "Old", "New") 34. UPPER() – To uppercase Ex: ShoutName = UPPER(Customers[Name])

245

Cory Doctorow NO LONGER ON TWIT TER

Cory Doctorow NO LONGER ON TWIT TER

@doctorow

24 Feb 2025

#15yrsago Twitter phishing scam memex.craphound.com/2010/02/… #15yrsago Stephen Levy on Google’s algorithm wired.com/2010/02/ff-google-… #15yrsago Cleantext: turn your ASCII pastebombs into formatted text cleantext.org 8/

Data Blogger

Data Blogger @DataBloggerInfo

6 Sep 2022

This is done by using a clean text tool called the CleanTextTool. #CleanTextTool #Cleantext #CleanText towardsdatascience.com/prime…

Vincent D. Warmerdam

Vincent D. Warmerdam @fishnets88

2 Aug 2022

Replying to @Mwadz2

There are tools like ftfy or cleantext that can remove some of those characters. That said: 1.) do you even need pre-trained models? 2.) sometimes special characters have useful information, so filtering them out isn't always a good idea.

Cory Doctorow NO LONGER ON TWIT TER

Cory Doctorow NO LONGER ON TWIT TER

@doctorow

3 Mar 2022

Dang, I'll be on the road with family then. But thank you very much (and doubly for cleantext)

JT

JT @jmons

8 Dec 2021

Oh, also I messed up the schedule, so it's a little late today (even though its pre-recorded days ago)... but time to CLEAN UP THE INTERNET with Clean-text (or cleantext) youtu.be/p7mgbX7ZTZU #python #Advent

幻日

幻日 @__genzitsu__

7 Sep 2021

最近見つけたクールなPythonライブラリ6選海外記事の翻訳なのだが、文章の前処理に使えるcleantext、文章校正ができるGramformer, 文章のスタイルを変換できるStyleformerが印象的だった qiita.com/baby-degu/items/86…

Charles Franklin

Charles Franklin @PollsAndVotes

12 Jul 2021

Replying to @SeanTrende

cleantext <- function(text){ require(stringr) str_replace_all(trimws(tolower(text),which="both"), "[^A-Za-z0-9]", '') }

AccudeTI

AccudeTI @AccudeTi

22 Jun 2021

Guide to CleanText: A #Python Package to Clean Raw Text Data buff.ly/3wjPsja #developer #programador #Programmer #SoftwareEngineer #programación #fullstack #code #lenguajesdeprogramación

CEH-Course.Com

CEH-Course.Com @CEHCourseCom

20 Jun 2021

Useful guide for anyone analyzing large amounts of text using #Python Guide to CleanText: A Python Package to Clean Raw Text Data hubs.la/H0QjFGJ0 #programming #coding #coder #programmer #it

TrainACE

TrainACE @trainACE

18 Jun 2021

Useful guide for anyone analyzing large amounts of text using #Python Guide to CleanText: A Python Package to Clean Raw Text Data hubs.la/H0QjFs80 #programming #coding #coder #programmer #it

AIM

AIM

@Analyticsindiam

13 Jun 2021

Guide to CleanText: A Python Package to Clean Raw Text Data #Analytics #DataScience #AI #IoT #IIoT #Python #CloudComputing #machinelearning #Linux #Programming #Coding #100DaysofCode bit.ly/3xmXbxd

Towards Data Science

Towards Data Science

@TDataScience

5 May 2021

.@aigeek7 dives into an interesting library called CleanText, that eases the process of cleaning textual data and speeds up the data preprocessing pipeline. buff.ly/33cS9Gz