Filter
Exclude
Time range
-
Near
course <invoke 対応版 今後同種が増えてもいいように注釈付き 必ず最初にサンドボックス内で走らせてからをおすすめします ※これも自己責任でよろしく Claude Codeの厄介バグ対策・完全版。 ツール呼び出しが文字化けして「実行したフリ」になる(ツールは動いてないのに『やりました』と報告+会話ログに壊れたXMLが焼き付く)やつ。これを①検出して②止めて正しく出し直させ、③今の会話ログの汚染を掃除し、さらに④過去の会話ログも起動時に一括で自動掃除する。新しい漏れ語が増えても1語追加か環境変数で柔軟に対応できる。Node入ってればOS問わず動く。内部依存ゼロ。 やり方は超かんたん。下の「ここから」〜「ここまで」を全部コピーして自分のClaude Codeに貼るだけ。Claudeが2つのファイルを作って2つのフックを設定し動作確認までやってくれる。 ━━━ ここからコピーしてClaudeに貼る ━━━ あなた(Claude Code)に常設の安全フックを導入してほしい。目的は「ツール呼び出しの漏れ」の自動ブロックと、漏れて会話ログに焼き付いた汚染テキストの自動掃除(現在の会話+過去の全会話ログ)。 背景: アシスタントがツールを実際に呼ばず、ツール呼び出しの生マークアップ(行頭の invoke / function_calls / parameter 開始タグ)を本文テキストとして書いてしまうことがある。ツールは1回も走らないのに完了報告が出て、会話ログ(transcript)にも壊れたマークアップが残る。これを次の二段構えで処理したい: (A) Stopフックで検出し、本物のツール実行が無ければstopをブロックして正しい構造化形式での1回だけの再発行を強制+現在の会話ログの漏れを除去・要約に整形。(B) SessionStartフックで、同じプロジェクトフォルダの過去の会話ログ(.jsonl)を一括で掃除(差分のみ・liveの会話は不可侵)。 手順: 次の内容で ~/.claude/hooks/toolcall-leak-clean.js を一字一句このまま作成する(掃除モジュール。現ログ掃除と過去ログ一括掃除の両方を担う。ディレクトリが無ければ作る): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Cleaner - portable companion to toolcall-leak-guard.js * ==================================================================== * Removes leaked tool-call markup that got baked into an assistant message as * plain text, and rewrites the JSONL transcript in place. It targets ONLY * assistant text blocks; user messages are never touched. * * A "leak burst" = a stray marker word (see MARKERS below: call/court/count/course) alone on a line, * followed by one or more line-leading tool-call blocks (and possibly an * unterminated tail). Each burst is replaced by a short, human-readable summary * that PRESERVES the fact a call was attempted and its arguments (full value, * untruncated) - only the raw markup is stripped. Genuine prose before/after the * burst is kept verbatim. Fail-safe: never throws to the caller. * * cleanText() uses a DETERMINISTIC line-based parser (not one big backtracking * regex). This guarantees linear time even on multi-MB logs (no ReDoS surface) * and makes over-deletion impossible: a burst ends the moment a line is no * longer part of the markup, so following prose is always preserved. * * Two ways it runs: * 1. Called by the Stop guard on the CURRENT live transcript (immediate repair). * 2. As a SessionStart hook, it sweeps every OTHER past transcript .jsonl in the * same project folder (incremental: only files changed since the last sweep), * leaving the live conversation untouched. This retro-cleans old logs. * * Usage: * - as a module : const { cleanFile, cleanText, runSessionStart } = require('./toolcall-leak-clean.js') * - clean one file : node toolcall-leak-clean.js <file.jsonl> * - SessionStart : node toolcall-leak-clean.js (reads hook payload on stdin) */ const fs = require('fs'); const path = require('path'); // == Leak "marker" words -- HOW TO EXTEND (read me) =========================== // A leak prints a short stray word on its own line right before the tag. Seen in // the wild: call, court, count, course. MARKERS below is the SINGLE SOURCE OF // TRUTH -- every marker regex in this file is built from it. When a NEW variant // shows up, add it with NO other edits, either way: // 1) add the lowercase word to the default array below, OR // 2) set env LEAK_GUARD_MARKERS="call,court,count,course,<newword>" (overrides // the default array -- handy when you cannot edit the file). // Matching is case-insensitive. Keeping this curated list = zero false positives. // If you would rather not maintain a list at all, set env LEAK_GUARD_LOOSE=1 (see // the guard) to catch ANY line-leading tool-call tag even without a marker word. // ============================================================================ let MARKERS = (process.env.LEAK_GUARD_MARKERS || 'call,court,count,course') .split(',').map(s => s.trim().toLowerCase()).filter(Boolean); if (!MARKERS.length) MARKERS = ['call', 'court', 'count', 'course']; // never end up empty const M = '(?:' MARKERS.map(s => s.replace(/[.* ?^${}()|[\]\\]/g, '\\$&')).join('|') ')'; // regex-escaped, OR-joined // fast pre-check: a marker line immediately followed by a line-leading tag const HASLEAK = new RegExp('(?:^|\\n)[ \\t]*' M '[ \\t]*\\r?\\n[ \\t]*<\\s*(?:antml:)?(?:invoke|parameter|function_calls)\\b', 'i'); // cheap raw-string pre-filter: skip files with no leak trace without parsing JSON const PREFILTER = new RegExp('<\\s*\\/?\\s*(?:antml:)?(?:invoke|function_calls)\\b|(?:^|\\n)[ \\t]*' M '[ \\t]*(?:\\n|<)', 'i'); // a marker word alone on a line const MARKER_LINE = new RegExp('^[ \\t]*' M '[ \\t]*$', 'i'); // lone marker lines to strip after a burst is removed const MARKER_STRIP = new RegExp('^[ \\t]*' M '[ \\t]*$', 'gim'); // a line that opens a tool-call block const OPEN_LINE = /^[ \t]*<\s*(?:antml:)?(?:invoke|function_calls)\b/i; // any line that is part of tool-call markup (open/close/parameter) const MARKUP_LINE = /^[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls|\/invoke|\/parameter|\/function_calls)\b/i; function summarizeInvoke(name, body) { // pull each <parameter name=...>value</parameter> out of one call block const paramRe = /<\s*(?:antml:)?parameter\s name\s*=\s*["']?([^"'>\s] )["']?\s*>([\s\S]*?)(?:<\/\s*(?:antml:)?parameter\s*>|$)/gi; const params = []; let pm; while ((pm = paramRe.exec(body)) !== null) { params.push(' ' pm[1] '=' (pm[2] || '')); // full value, no truncation (preserve) } const head = '[leaked tool-call markup removed (was never executed; the real call is reissued separately): ' name; if (!params.length) return head ']'; return head '\n' params.join('\n') ']'; } // Deterministic line-based cleaner. Linear time; no catastrophic backtracking. function cleanText(text) { if (typeof text !== 'string' || !HASLEAK.test(text)) return { text, changed: false }; const lines = text.split(/\r?\n/); const resultLines = []; let changed = false; let i = 0; while (i < lines.length) { const isBurst = MARKER_LINE.test(lines[i]) && i 1 < lines.length && OPEN_LINE.test(lines[i 1]); if (!isBurst) { resultLines.push(lines[i]); i ; continue; } changed = true; i ; // drop the marker line // collect contiguous markup lines into one burst (stop at the next marker, // or the first line that is neither markup nor a value-continuation) let burst = ''; while (i < lines.length) { if (burst && MARKER_LINE.test(lines[i])) break; // next burst starts burst = (burst ? '\n' : '') lines[i]; const next = i 1 < lines.length ? lines[i 1] : null; i ; if (next === null) break; if (!MARKUP_LINE.test(next)) break; // markup ended -> prose resumes } const invokeRe = /<\s*(?:antml:)?invoke\s name\s*=\s*["']?([^"'>\s] )["']?[^>]*>([\s\S]*?)(?:<\/\s*(?:antml:)?invoke\s*>|$)/gi; const summaries = []; let im; while ((im = invokeRe.exec(burst)) !== null) summaries.push(summarizeInvoke(im[1], im[2] || '')); if (summaries.length === 0) summaries.push('[leaked tool-call markup removed (was never executed; reissued separately)]'); for (const s of summaries) resultLines.push(s); } let n = resultLines.join('\n'); if (changed) n = n.replace(MARKER_STRIP, ''); // drop stray marker lines left by stacked markers n = n.replace(/\n{3,}/g, '\n\n').replace(/[ \t] \n/g, '\n').trim(); return { text: n, changed: n !== text || changed }; } function cleanFileRaw(file, raw) { const lines = raw.split('\n'); let removed = 0; let fileChanged = false; for (let i = 0; i < lines.length; i ) { const line = lines[i]; if (!line.trim()) continue; let obj; try { obj = JSON.parse(line); } catch (_) { continue; } // non-JSON: leave as-is const isAssistant = obj.type === 'assistant' || (obj.message && obj.message.role === 'assistant'); if (!isAssistant) continue; const msg = obj.message || obj; const content = msg.content; if (!Array.isArray(content)) continue; let lineChanged = false; for (const b of content) { if (!b || typeof b !== 'object' || b.type !== 'text') continue; const r = cleanText(b.text || ''); if (r.changed) { b.text = r.text; removed ; lineChanged = true; } } if (lineChanged) { lines[i] = JSON.stringify(obj); fileChanged = true; } } if (fileChanged) { try { const tmp = file '.leakclean.tmp'; fs.writeFileSync(tmp, lines.join('\n')); fs.renameSync(tmp, file); // atomic replace: a reader sees old OR new, never a half-written file } catch (_) { return 0; } // file locked/busy (e.g. held open by the app) -> skip; never corrupt, never abort the sweep } return removed; } function cleanFile(file) { let raw; try { raw = fs.readFileSync(file, 'utf8'); } catch (_) { return 0; } return cleanFileRaw(file, raw); } // recursively collect *.jsonl under dir, skipping our own helper/backup folders function collectJsonl(dir, out) { let ents; try { ents = fs.readdirSync(dir, { withFileTypes: true }); } catch (_) { return; } for (const e of ents) { const p = path.join(dir, e.name); if (e.isDirectory()) { if (e.name.startsWith('_')) continue; // skip _backup/_temp style folders collectJsonl(p, out); } else if (e.isFile() && e.name.endsWith('.jsonl')) { out.push(p); } } } // SessionStart sweep: clean every OTHER past transcript in the project folder. // Incremental (only files changed since the last sweep). Never touches the live // conversation. Returns a small summary. Fail-safe. function runSessionStart(payload) { const tpath = (payload && (payload.transcript_path || payload.transcriptPath)) || ''; if (!tpath) return { scanned: 0, touched: 0, removed: 0 }; const projDir = path.dirname(tpath); const current = path.resolve(tpath); const statePath = path.join(projDir, '_leakclean_state.json'); let lastRun = 0; try { lastRun = Number(JSON.parse(fs.readFileSync(statePath, 'utf8')).lastRun) || 0; } catch (_) {} const startedAt = Date.now(); const files = []; collectJsonl(projDir, files); let removed = 0, touched = 0, scanned = 0; for (const f of files) { if (path.resolve(f) === current) continue; // never touch the live conversation let st; try { st = fs.statSync(f); } catch (_) { continue; } if (st.mtimeMs <= lastRun) continue; // unchanged since last sweep -> skip scanned ; let raw; try { raw = fs.readFileSync(f, 'utf8'); } catch (_) { continue; } if (!PREFILTER.test(raw)) continue; // no leak trace -> skip without parsing const r = cleanFileRaw(f, raw); if (r > 0) { removed = r; touched ; } } try { fs.writeFileSync(statePath, JSON.stringify({ lastRun: startedAt })); } catch (_) {} if (removed > 0) { try { fs.appendFileSync(path.join(projDir, '_leakclean_activity.log'), `[${new Date().toISOString()}] SessionStart: cleaned ${removed} leaked block(s) across ${touched} file(s) (scanned ${scanned}, excluded live ${path.basename(current)}, ${Date.now() - startedAt}ms)\n`); } catch (_) {} } return { scanned, touched, removed }; } module.exports = { cleanText, cleanFile, cleanFileRaw, collectJsonl, runSessionStart }; if (require.main === module) { const arg = process.argv[2]; if (arg && arg.endsWith('.jsonl')) { const n = cleanFile(arg); process.stdout.write('cleaned ' n ' leaked block(s) in ' arg '\n'); process.exit(0); } // SessionStart hook: read payload from stdin, sweep, always output {} (fail-open) let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) {} let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) {} try { runSessionStart(payload); } catch (_) {} process.stdout.write('{}'); process.exit(0); } 次の内容で ~/.claude/hooks/toolcall-leak-guard.js を一字一句このまま作成する(検出+ブロック+現ログ掃除呼び出し。手順1と同じフォルダに置くこと。ガードが同フォルダの掃除モジュールを呼ぶ): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Guard - portable Claude Code "Stop" hook * ======================================================= * PROBLEM * Occasionally an assistant turn prints the RAW tool-call markup as plain * text - a line that starts with a tool-call start-tag (the invoke, * function_calls, or parameter tag) - instead of actually invoking the tool. * The tool never runs, yet the model frequently reports as if it had. This * corrupts the transcript and produces fake "done" reports. * * WHAT THIS DOES * Runs when the assistant finishes a turn (the "Stop" event). It reads the * transcript, inspects the LAST assistant message, and: * - if that message already contains a real structured tool_use block, * it does nothing (the model really did call a tool this turn); * - else if it detects leaked tool-call markup in the plain text, it * BLOCKS the stop and tells the model to re-issue the call exactly once * using the proper structured mechanism. * * DESIGN RULES * - Fail-open: any error or unexpected shape -> allow (never wedge a session). * - Loop-guard: if we already blocked once this stop-cycle * (payload.stop_hook_active), give up and allow (no infinite loop). * - Zero false positives by default: requires a stray marker word on its own * line right before the tag. Prose that merely QUOTES a tag inline or * inside a code fence will NOT trigger. * * CONTRACT (Claude Code Stop hook) * stdin : JSON hook payload (transcript_path, stop_hook_active, ...) * stdout : {} -> allow the stop * {"decision":"block","reason":"..."} -> force the model to continue */ const fs = require('fs'); function out(o) { try { process.stdout.write(JSON.stringify(o)); } catch (_) {} } function allow() { out({}); process.exit(0); } function block(reason) { out({ decision: 'block', reason }); process.exit(0); } // read hook payload from stdin let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) { allow(); } let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) { allow(); } // loop guard: if a previous Stop hook in this same cycle already blocked, stop here. if (payload.stop_hook_active) allow(); const tpath = payload.transcript_path || payload.transcriptPath; if (!tpath) allow(); let raw; try { raw = fs.readFileSync(tpath, 'utf8'); } catch (_) { allow(); } // find the last assistant message in the JSONL transcript const lines = raw.split(/\r?\n/).filter(Boolean); let last = null; for (let i = lines.length - 1; i >= 0; i--) { let o; try { o = JSON.parse(lines[i]); } catch (_) { continue; } const m = o.message || o; const role = (m && m.role) || o.role || o.type; if (role === 'assistant') { last = m; break; } } if (!last) allow(); let content = last.content; if (typeof content === 'string') content = [{ type: 'text', text: content }]; if (!Array.isArray(content)) allow(); let hasToolUse = false; let text = ''; for (const b of content) { if (!b || typeof b !== 'object') continue; // forward-compatible: treat any tool-invocation block as real activity, // tolerant of future/variant type names so an SDK rename can't silently break it if (b.type === 'tool_use' || b.type === 'server_tool_use' || b.type === 'tool_call') hasToolUse = true; if (b.type === 'text' && typeof b.text === 'string') text = '\n' b.text; } // leak signature // STRICT (default): a stray marker word (see MARKERS below: call/court/count/course) alone on a line, // immediately followed by a line that STARTS with a tool-call start-tag. // This is the empirically observed shape of real leaks and is // false-positive-proof. // LOOSE (opt-in via env LEAK_GUARD_LOOSE=1): any line that starts with a // tool-call start-tag. Higher recall, tiny false-positive risk on prose that // quotes a tag at the start of a line. // == Leak "marker" words -- HOW TO EXTEND (read me) =========================== // A leak prints a short stray word on its own line right before the tag. Seen in // the wild: call, court, count, course. MARKERS below is the SINGLE SOURCE OF // TRUTH -- STRICT is built from it. When a NEW variant shows up, add it with NO // other edits, either way: // 1) add the lowercase word to the default array below, OR // 2) set env LEAK_GUARD_MARKERS="call,court,count,course,<newword>" (overrides // the default array -- handy when you cannot edit the file). // Matching is case-insensitive. Keep this list = zero false positives. If you'd // rather not maintain it, set env LEAK_GUARD_LOOSE=1 to fall back to LOOSE, which // fires on ANY line-leading tool-call tag even without a marker word (slightly // higher false-positive risk on prose that quotes a tag at the start of a line). // Keep the SAME default list in the cleaner file so both halves agree. // ============================================================================ let MARKERS = (process.env.LEAK_GUARD_MARKERS || 'call,court,count,course') .split(',').map(s => s.trim().toLowerCase()).filter(Boolean); if (!MARKERS.length) MARKERS = ['call', 'court', 'count', 'course']; // never end up empty const M = '(?:' MARKERS.map(s => s.replace(/[.* ?^${}()|[\]\\]/g, '\\$&')).join('|') ')'; // regex-escaped, OR-joined const STRICT = new RegExp('(?:^|\\n)[ \\t]*' M '[ \\t]*\\r?\\n[ \\t]*<\\s*(?:antml:)?(?:invoke|parameter|function_calls)\\b', 'i'); const LOOSE = /(?:^|\n)[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const re = process.env.LEAK_GUARD_LOOSE ? LOOSE : STRICT; const isLeak = re.test(text); // in-place repair: if the cleaner companion is present, strip the leaked markup // from the transcript so the saved conversation is not corrupted. fail-open if // the file is absent (detect block still works without it). if (isLeak) { try { require(require('path').join(__dirname, 'toolcall-leak-clean.js')).cleanFile(tpath); } catch (_) {} } // a real structured tool call this turn -> never block. if (hasToolUse) allow(); if (isLeak) { block( 'Tool-call leak detected. Your last message contains raw tool-call markup as ' 'plain text (a line beginning with a tool-call start-tag), but no tool was ' 'actually invoked, so nothing ran. Re-issue the intended tool call exactly ' 'once using the proper structured tool-call mechanism. Never paste tool-call ' 'markup as prose.' ); } allow(); ~/.claude/settings.json に次の2つのフックを既存を壊さず追記する(hooks.Stop と hooks.SessionStart)。command のパスは手順1・2で作った絶対パスに置換する(Windowsは node "C:\\Users\\<you>\\.claude\\hooks\\toolcall-leak-guard.js" のようにエスケープ)。該当キーが無ければ新規作成、既にあれば配列に追記: { "hooks": { "Stop": [ { "hooks": [ { "type": "command", "command": "node \"<guardの絶対パス>\"", "timeout": 8 } ] } ], "SessionStart": [ { "hooks": [ { "type": "command", "command": "node \"<cleanの絶対パス>\"", "timeout": 15 } ] } ] } } 動作確認を実際に実行して結果を見せる: (a) 漏れ入りの擬似transcript(本物のtool_useは無し、本文に court マーカー+行頭invokeタグ+閉じタグ付きの呼び出しを含む)をnodeでガードに食わせ、{"decision":"block",...} が返り、かつそのtranscriptから生マークアップが消えて要約に変わること。(b) 普通の文章では {} が返ること。(c) 別フォルダに漏れ入りの過去ログ複数+liveログを作り、cleanモジュールを runSessionStart で走らせ、過去ログだけ掃除されてliveログは不可侵であること。 (d) 環境変数 LEAK_GUARD_MARKERS="call,court,count,course,cause" を付けると新語 cause の漏れもブロックされ、無指定では cause が素通りすることを確認(柔軟拡張の実証)。 新しい漏れ語が増えたときの拡張方法をユーザーに伝える: 漏れの目印になる単独語(マーカー語)は、両ファイル冒頭の MARKERS 配列が唯一の管理元(single source of truth)で、他の正規表現は全てそこから自動生成される。将来 call/court/count/course 以外の新しい単独語(例: cause)が漏れの直前に出るようになったら、対応は2通り — (あ)両ファイルの MARKERS 配列にその小文字1語を足すだけ、(い)ファイルを一切編集せず環境変数 LEAK_GUARD_MARKERS="call,court,count,course,cause" で上書き(フックの command に付けてもよい)。どうしても語を追いかけたくない場合は LEAK_GUARD_LOOSE=1 でマーカー語なしの行頭タグも全部検出できる(誤爆リスクは微増)。 反映にはClaude Codeの再起動(新セッション)が必要だと伝える。 絶対に削らない制約: フェイルオープン(壊れたら必ず素通り)/ループガード(stop_hook_active が真なら素通り=無限ループ防止)/掃除はassistantのテキストのみ対象でユーザー発言は不可侵/SessionStart掃除はliveの会話ログを絶対に触らない/既定はSTRICT(マーカー語必須・誤爆ゼロ)。 ━━━ ここまでコピー ━━━ これで次のセッションから、漏れたターンは自動ブロック&正しい再発行強制、現在の会話ログの汚染XMLは1行要約に自動整形(引数は保持・前後の文章は温存)、さらに過去の会話ログも起動時に一括で自動掃除される(差分のみ・liveは不可侵)。普通の文章やコード引用では発火しない誤爆ゼロ設計。 拡張も柔軟: 漏れの目印になる単独語は現在 call / court / count / course の4つ。今後これ以外の語(例: cause)が漏れの直前に出るようになっても、両ファイル冒頭の MARKERS 配列に小文字で1語足すか、環境変数 LEAK_GUARD_MARKERS="call,court,count,course,cause" で上書きするだけで追従する(他の正規表現は MARKERS から自動生成)。ファイルを触りたくない人は LEAK_GUARD_LOOSE=1 で目印語なしでも検出できる。コード冒頭の「HOW TO EXTEND」注釈に同じ手順を明記してある。

1
2
5,623
色々検証したけど おそらくこれで1発でセーフガードできそう (自分のは今のとこできている🧐再発なし) 心配ならまずはこれがちゃんと動くのか?と先に質問してから動かせばいいかもね ※自己責任でどうぞ あなた(Claude Code)に常設の安全フックを導入してほしい。目的は「ツール呼び出しの漏れ」の自動ブロックと、漏れて会話ログに焼き付いた汚染テキストの自動掃除(現在の会話+過去の全会話ログ)。 背景: アシスタントがツールを実際に呼ばず、ツール呼び出しの生マークアップ(行頭の invoke / function_calls / parameter 開始タグ)を本文テキストとして書いてしまうことがある。ツールは1回も走らないのに完了報告が出て、会話ログ(transcript)にも壊れたマークアップが残る。これを次の二段構えで処理したい: (A) Stopフックで検出し、本物のツール実行が無ければstopをブロックして正しい構造化形式での1回だけの再発行を強制+現在の会話ログの漏れを除去・要約に整形。(B) SessionStartフックで、同じプロジェクトフォルダの過去の会話ログ(.jsonl)を一括で掃除(差分のみ・liveの会話は不可侵)。 手順: 次の内容で ~/.claude/hooks/toolcall-leak-clean.js を一字一句このまま作成する(掃除モジュール。現ログ掃除と過去ログ一括掃除の両方を担う。ディレクトリが無ければ作る): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Cleaner - portable companion to toolcall-leak-guard.js * ==================================================================== * Removes leaked tool-call markup that got baked into an assistant message as * plain text, and rewrites the JSONL transcript in place. It targets ONLY * assistant text blocks; user messages are never touched. * * A "leak burst" = a stray marker word (court/count/call) alone on a line, * followed by one or more line-leading tool-call blocks (and possibly an * unterminated tail). Each burst is replaced by a short, human-readable summary * that PRESERVES the fact a call was attempted and its arguments - only the raw * markup is stripped. Genuine prose before/after the burst is kept verbatim. * Line count is preserved; only changed lines are re-serialized; non-JSON lines * pass through untouched. Fail-safe: never throws to the caller. * * Two ways it runs: * 1. Called by the Stop guard on the CURRENT live transcript (immediate repair). * 2. As a SessionStart hook, it sweeps every OTHER past transcript .jsonl in the * same project folder (incremental: only files changed since the last sweep), * leaving the live conversation untouched. This retro-cleans old logs. * * Usage: * - as a module : const { cleanFile, cleanText, runSessionStart } = require('./toolcall-leak-clean.js') * - clean one file : node toolcall-leak-clean.js <file.jsonl> * - SessionStart : node toolcall-leak-clean.js (reads hook payload on stdin) */ const fs = require('fs'); const path = require('path'); // marker line following line-leading tool-call block(s) [ unterminated tail] const BURST = /(?:^|\n)[ \t]*(?:court|count|call)[ \t]*(?:(?:\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|function_calls)\b[\s\S]*?<\/\s*(?:antml:)?(?:invoke|function_calls)\s*>) (?:\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b[\s\S]*)?|\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b[\s\S]*)/gi; // each call block inside a burst (unterminated tail tolerated) const INVOKE = /<\s*(?:antml:)?invoke\s name\s*=\s*["']?([^"'>\s] )["']?[^>]*>([\s\S]*?)(?:<\/\s*(?:antml:)?invoke\s*>|$)/gi; // a single argument inside a call block const PARAM = /<\s*(?:antml:)?parameter\s name\s*=\s*["']?([^"'>\s] )["']?\s*>([\s\S]*?)(?:<\/\s*(?:antml:)?parameter\s*>|$)/gi; // a lone marker word left over after burst removal const LONE_MARKER = /^[ \t]*(?:court|count|call)[ \t]*$/gim; // fast pre-check: a marker line immediately followed by a line-leading tag const HASLEAK = /(?:^|\n)[ \t]*(?:court|count|call)[ \t]*\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const MULTI_NL = /\n{3,}/g; // cheap raw-string pre-filter: skip files with no leak trace without parsing JSON const PREFILTER = /<\s*\/?\s*(?:antml:)?(?:invoke|function_calls)\b|(?:^|\n)[ \t]*(?:court|count|call)[ \t]*(?:\n|<)/i; function summarizeInvoke(name, body) { PARAM.lastIndex = 0; const params = []; let pm; while ((pm = PARAM.exec(body)) !== null) { params.push(' ' pm[1] '=' (pm[2] || '')); // full value, no truncation } const head = '[leaked tool-call markup removed (was never executed; the real call is reissued separately): ' name; if (!params.length) return head ']'; return head '\n' params.join('\n') ']'; } function cleanText(text) { if (typeof text !== 'string' || !HASLEAK.test(text)) return { text, changed: false }; let n = text.replace(BURST, function (burst) { INVOKE.lastIndex = 0; const out = []; let im; while ((im = INVOKE.exec(burst)) !== null) out.push(summarizeInvoke(im[1], im[2] || '')); if (!out.length) out.push('[leaked tool-call markup removed (was never executed; reissued separately)]'); return '\n' out.join('\n'); }); if (n !== text) n = n.replace(LONE_MARKER, ''); n = n.replace(MULTI_NL, '\n\n').replace(/[ \t] \n/g, '\n').trim(); return { text: n, changed: n !== text }; } function cleanFileRaw(file, raw) { const lines = raw.split('\n'); let removed = 0; let fileChanged = false; for (let i = 0; i < lines.length; i ) { const line = lines[i]; if (!line.trim()) continue; let obj; try { obj = JSON.parse(line); } catch (_) { continue; } // non-JSON: leave as-is const isAssistant = obj.type === 'assistant' || (obj.message && obj.message.role === 'assistant'); if (!isAssistant) continue; const msg = obj.message || obj; const content = msg.content; if (!Array.isArray(content)) continue; let lineChanged = false; for (const b of content) { if (!b || typeof b !== 'object' || b.type !== 'text') continue; const r = cleanText(b.text || ''); if (r.changed) { b.text = r.text; removed ; lineChanged = true; } } if (lineChanged) { lines[i] = JSON.stringify(obj); fileChanged = true; } } if (fileChanged) { const tmp = file '.leakclean.tmp'; fs.writeFileSync(tmp, lines.join('\n')); fs.renameSync(tmp, file); // atomic replace } return removed; } function cleanFile(file) { let raw; try { raw = fs.readFileSync(file, 'utf8'); } catch (_) { return 0; } return cleanFileRaw(file, raw); } // recursively collect *.jsonl under dir, skipping our own helper/backup folders function collectJsonl(dir, out) { let ents; try { ents = fs.readdirSync(dir, { withFileTypes: true }); } catch (_) { return; } for (const e of ents) { const p = path.join(dir, e.name); if (e.isDirectory()) { if (e.name.startsWith('_')) continue; // skip _backup/_temp style folders collectJsonl(p, out); } else if (e.isFile() && e.name.endsWith('.jsonl')) { out.push(p); } } } // SessionStart sweep: clean every OTHER past transcript in the project folder. // Incremental (only files changed since the last sweep). Never touches the live // conversation. Returns a small summary. Fail-safe. function runSessionStart(payload) { const tpath = (payload && (payload.transcript_path || payload.transcriptPath)) || ''; if (!tpath) return { scanned: 0, touched: 0, removed: 0 }; const projDir = path.dirname(tpath); const current = path.resolve(tpath); const statePath = path.join(projDir, '_leakclean_state.json'); let lastRun = 0; try { lastRun = Number(JSON.parse(fs.readFileSync(statePath, 'utf8')).lastRun) || 0; } catch (_) {} const startedAt = Date.now(); const files = []; collectJsonl(projDir, files); let removed = 0, touched = 0, scanned = 0; for (const f of files) { if (path.resolve(f) === current) continue; // never touch the live conversation let st; try { st = fs.statSync(f); } catch (_) { continue; } if (st.mtimeMs <= lastRun) continue; // unchanged since last sweep -> skip scanned ; let raw; try { raw = fs.readFileSync(f, 'utf8'); } catch (_) { continue; } if (!PREFILTER.test(raw)) continue; // no leak trace -> skip without parsing const r = cleanFileRaw(f, raw); if (r > 0) { removed = r; touched ; } } try { fs.writeFileSync(statePath, JSON.stringify({ lastRun: startedAt })); } catch (_) {} if (removed > 0) { try { fs.appendFileSync(path.join(projDir, '_leakclean_activity.log'), `[${new Date().toISOString()}] SessionStart: cleaned ${removed} leaked block(s) across ${touched} file(s) (scanned ${scanned}, excluded live ${path.basename(current)}, ${Date.now() - startedAt}ms)\n`); } catch (_) {} } return { scanned, touched, removed }; } module.exports = { cleanText, cleanFile, cleanFileRaw, collectJsonl, runSessionStart }; if (require.main === module) { const arg = process.argv[2]; if (arg && arg.endsWith('.jsonl')) { const n = cleanFile(arg); process.stdout.write('cleaned ' n ' leaked block(s) in ' arg '\n'); process.exit(0); } // SessionStart hook: read payload from stdin, sweep, always output {} (fail-open) let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) {} let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) {} try { runSessionStart(payload); } catch (_) {} process.stdout.write('{}'); process.exit(0); } 次の内容で ~/.claude/hooks/toolcall-leak-guard.js を一字一句このまま作成する(検出+ブロック+現ログ掃除呼び出し。手順1と同じフォルダに置くこと。ガードが同フォルダの掃除モジュールを呼ぶ): #!/usr/bin/env node 'use strict'; /* * Tool-Call Leak Guard - portable Claude Code "Stop" hook * ======================================================= * PROBLEM * Occasionally an assistant turn prints the RAW tool-call markup as plain * text - a line that starts with a tool-call start-tag (the invoke, * function_calls, or parameter tag) - instead of actually invoking the tool. * The tool never runs, yet the model frequently reports as if it had. This * corrupts the transcript and produces fake "done" reports. * * WHAT THIS DOES * Runs when the assistant finishes a turn (the "Stop" event). It reads the * transcript, inspects the LAST assistant message, and: * - if that message already contains a real structured tool_use block, * it does nothing (the model really did call a tool this turn); * - else if it detects leaked tool-call markup in the plain text, it * BLOCKS the stop and tells the model to re-issue the call exactly once * using the proper structured mechanism. * * DESIGN RULES * - Fail-open: any error or unexpected shape -> allow (never wedge a session). * - Loop-guard: if we already blocked once this stop-cycle * (payload.stop_hook_active), give up and allow (no infinite loop). * - Zero false positives by default: requires a stray marker word on its own * line right before the tag. Prose that merely QUOTES a tag inline or * inside a code fence will NOT trigger. * * CONTRACT (Claude Code Stop hook) * stdin : JSON hook payload (transcript_path, stop_hook_active, ...) * stdout : {} -> allow the stop * {"decision":"block","reason":"..."} -> force the model to continue */ const fs = require('fs'); function out(o) { try { process.stdout.write(JSON.stringify(o)); } catch (_) {} } function allow() { out({}); process.exit(0); } function block(reason) { out({ decision: 'block', reason }); process.exit(0); } // read hook payload from stdin let input = ''; try { input = fs.readFileSync(0, 'utf8'); } catch (_) { allow(); } let payload = {}; try { payload = JSON.parse(input || '{}'); } catch (_) { allow(); } // loop guard: if a previous Stop hook in this same cycle already blocked, stop here. if (payload.stop_hook_active) allow(); const tpath = payload.transcript_path || payload.transcriptPath; if (!tpath) allow(); let raw; try { raw = fs.readFileSync(tpath, 'utf8'); } catch (_) { allow(); } // find the last assistant message in the JSONL transcript const lines = raw.split(/\r?\n/).filter(Boolean); let last = null; for (let i = lines.length - 1; i >= 0; i--) { let o; try { o = JSON.parse(lines[i]); } catch (_) { continue; } const m = o.message || o; const role = (m && m.role) || o.role || o.type; if (role === 'assistant') { last = m; break; } } if (!last) allow(); let content = last.content; if (typeof content === 'string') content = [{ type: 'text', text: content }]; if (!Array.isArray(content)) allow(); let hasToolUse = false; let text = ''; for (const b of content) { if (!b || typeof b !== 'object') continue; if (b.type === 'tool_use') hasToolUse = true; if (b.type === 'text' && typeof b.text === 'string') text = '\n' b.text; } // leak signature // STRICT (default): a stray marker word (court/count/call) alone on a line, // immediately followed by a line that STARTS with a tool-call start-tag. // This is the empirically observed shape of real leaks and is // false-positive-proof. // LOOSE (opt-in via env LEAK_GUARD_LOOSE=1): any line that starts with a // tool-call start-tag. Higher recall, tiny false-positive risk on prose that // quotes a tag at the start of a line. const STRICT = /(?:^|\n)[ \t]*(?:court|count|call)[ \t]*\r?\n[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const LOOSE = /(?:^|\n)[ \t]*<\s*(?:antml:)?(?:invoke|parameter|function_calls)\b/i; const re = process.env.LEAK_GUARD_LOOSE ? LOOSE : STRICT; const isLeak = re.test(text); // in-place repair: if the cleaner companion is present, strip the leaked markup // from the transcript so the saved conversation is not corrupted. fail-open if // the file is absent (detect block still works without it). if (isLeak) { try { require(require('path').join(__dirname, 'toolcall-leak-clean.js')).cleanFile(tpath); } catch (_) {} } // a real structured tool call this turn -> never block. if (hasToolUse) allow(); if (isLeak) { block( 'Tool-call leak detected. Your last message contains raw tool-call markup as ' 'plain text (a line beginning with a tool-call start-tag), but no tool was ' 'actually invoked, so nothing ran. Re-issue the intended tool call exactly ' 'once using the proper structured tool-call mechanism. Never paste tool-call ' 'markup as prose.' ); } allow(); ~/.claude/settings.json に次の2つのフックを既存を壊さず追記する(hooks.Stop と hooks.SessionStart)。command のパスは手順1・2で作った絶対パスに置換する(Windowsは node "C:\\Users\\<you>\\.claude\\hooks\\toolcall-leak-guard.js" のようにエスケープ)。該当キーが無ければ新規作成、既にあれば配列に追記: { "hooks": { "Stop": [ { "hooks": [ { "type": "command", "command": "node \"<guardの絶対パス>\"", "timeout": 8 } ] } ], "SessionStart": [ { "hooks": [ { "type": "command", "command": "node \"<cleanの絶対パス>\"", "timeout": 15 } ] } ] } } 動作確認を実際に実行して結果を見せる: (a) 漏れ入りの擬似transcript(本物のtool_useは無し、本文に court マーカー+行頭invokeタグ+閉じタグ付きの呼び出しを含む)をnodeでガードに食わせ、{"decision":"block",...} が返り、かつそのtranscriptから生マークアップが消えて要約に変わること。(b) 普通の文章では {} が返ること。(c) 別フォルダに漏れ入りの過去ログ複数+liveログを作り、cleanモジュールを runSessionStart で走らせ、過去ログだけ掃除されてliveログは不可侵であること。 反映にはClaude Codeの再起動(新セッション)が必要だと伝える。 絶対に削らない制約: フェイルオープン(壊れたら必ず素通り)/ループガード(stop_hook_active が真なら素通り=無限ループ防止)/掃除はassistantのテキストのみ対象でユーザー発言は不可侵/SessionStart掃除はliveの会話ログを絶対に触らない/既定はSTRICT(マーカー語必須・誤爆ゼロ)。 ━━━ ここまでコピー ━━━ これで次のセッションから、漏れたターンは自動ブロック&正しい再発行強制、現在の会話ログの汚染XMLは1行要約に自動整形(引数は保持・前後の文章は温存)、さらに過去の会話ログも起動時に一括で自動掃除される(差分のみ・liveは不可侵)。普通の文章やコード引用では発火しない誤爆ゼロ設計。

1
4
1,581
the text being kind of garbled and the production sloppy makes the image funnier to me although the "cleantext" version of the ad would also be slightly differently funny
3
150
19 Sep 2025
32REPLACE() – Replace part of string Ex: MaskedPhone = REPLACE(Customers[Phone], 1, 6, "XXXXXX") 33. SUBSTITUTE() – Replace all occurrences Ex: CleanText = SUBSTITUTE(Products[Description], "Old", "New") 34. UPPER() – To uppercase Ex: ShoutName = UPPER(Customers[Name])
1
245
#15yrsago Twitter phishing scam memex.craphound.com/2010/02/… #15yrsago Stephen Levy on Google’s algorithm wired.com/2010/02/ff-google-… #15yrsago Cleantext: turn your ASCII pastebombs into formatted text cleantext.org 8/
1
1
1
69
This is done by using a clean text tool called the CleanTextTool. #CleanTextTool #Cleantext #CleanText towardsdatascience.com/prime…

2
Replying to @Mwadz2
There are tools like ftfy or cleantext that can remove some of those characters. That said: 1.) do you even need pre-trained models? 2.) sometimes special characters have useful information, so filtering them out isn't always a good idea.
2
2
Dang, I'll be on the road with family then. But thank you very much (and doubly for cleantext)
2
8 Dec 2021
Oh, also I messed up the schedule, so it's a little late today (even though its pre-recorded days ago)... but time to CLEAN UP THE INTERNET with Clean-text (or cleantext) youtu.be/p7mgbX7ZTZU #python #Advent

7 Sep 2021
最近見つけたクールなPythonライブラリ6選 海外記事の翻訳なのだが、文章の前処理に使えるcleantext、文章校正ができるGramformer, 文章のスタイルを変換できるStyleformerが印象的だった qiita.com/baby-degu/items/86…

2
15
Replying to @SeanTrende
cleantext <- function(text){ require(stringr) str_replace_all(trimws(tolower(text),which="both"), "[^A-Za-z0-9]", '') }
2
Useful guide for anyone analyzing large amounts of text using #Python Guide to CleanText: A Python Package to Clean Raw Text Data hubs.la/H0QjFGJ0 #programming #coding #coder #programmer #it

2
18 Jun 2021
Useful guide for anyone analyzing large amounts of text using #Python Guide to CleanText: A Python Package to Clean Raw Text Data hubs.la/H0QjFs80 #programming #coding #coder #programmer #it

1
2
.@aigeek7 dives into an interesting library called CleanText, that eases the process of cleaning textual data and speeds up the data preprocessing pipeline. buff.ly/33cS9Gz

1
4
22
Replying to @Agornello
Thanks buddy will reconfirm that once but even if it’s a trusted domains leaking password in cleantext doesn’t that violate privacy terms?
1
2
15 Feb 2021
Unless it is password in cleantext, I do not see the impact.
1
1
Try cleantext or puretext
1
2
Those hidden nulls are a pain... I tend to have A LOT of auto-enters CleanText ( Self ) to at least try to remove this and other unwanted characters.
1