sqry analysis of the Linux kernel (/srv/repos/public/linux)
Index & Scale
sqry has a complete index:
• 65,048 files indexed
• 12,066,713 symbols
• 20,053,202 edges
• Index version 1.2 (built 2026-06-07)
Language breakdown (top ones):
• C: 63,142 files, ~11.91M symbols (dominant)
• Rust: 349 files, ~96k symbols (real kernel Rust, not just bindings)
• Shell: 1,115 files
• Python: 364 files (tooling, generators, kconfig, etc.)
• Small amounts of Perl, C , etc.
Symbol kinds (selected):
• Constants: 5.94M
• Functions: 2.25M
• Variables: 1.1M
• Types: 925k
• Structs: 554k
• Enums: 55k
• Modules: 65k
Health Metrics
• Call cycles: 1,071
• Duplicate groups: 1,384,148 (very high but expected)
• "Unused" symbols: 356,579
• Cross-language edges: 464
Cycles (call graph, cycle_type: calls): Mostly small and benign.
• Wrapper → internal nolock variant patterns (common in OMAP clockdomain, block submit_bio chains, etc.).
• KVM MMU page table walkers and zap logic (deeper cycles, 4–5 nodes).
• Arch-specific: Alpha error logging, PowerPC PCI of-scan, x86 KVM RTC/IOAPIC/TDPMMU, MIPS/Octeon PCIe init, PAT page attr aliasing.
• Tooling/boot code (e.g. relocs.c).
These are largely intentional recursion in walkers or paired lock/unlock-style helpers, not classic deadlocks.
Complexity (sqry's size fanout-derived metric):
• Heavily skewed by massive test code (e.g. tools/testing/radix-tree/maple.c:check_erase2_sets at 16k complexity / 33k LOC) and enormous AMD display mode math functions (dml*_ModeSupportAndSystemConfigurationFull and prefetch/watermark calcs — many 1.5k–2k LOC single functions).
• Real kernel notables: NTFS3 log_replay, some old SCSI (qla1280), pktgen, hrtimer.c:clock_was_set, PowerPC single-step analyzer, etc.
The metric surfaces "big gnarly functions tables" more than classic cyclomatic complexity.
Core Kernel Highlights
• Scheduler: Canonical __schedule at kernel/sched/core.c:7017. schedule() has hundreds of callers (477 in sampled results) — signal return paths (do_work_pending across arches), KVM vcore blocking/wait, block device prep, driver sleep points, etc. Direct internal callees in the core file include __schedule_loop, sched_submit_work, etc.
• Locking (kernel/locking/mutex.c): Full implementation with fastpath, optimistic spinning (osq_lock), slowpath, ww_mutex (wound-wait), handoff, debug hooks, mutex_lock* variants (_nested, _killable, _interruptible, _io). Separate files exist for rtmutex, rwsem, etc.
• "schedule", "mutex_lock", and similar core symbols have many local/arch/driver definitions or shims (sqry surfaces the full set; the real core ones live under kernel/).
"sys_" pattern produces ~9k matches — mostly noise from struct fields, arch headers, and module names rather than clean syscall entries.
Rust in the Kernel
Real, integrated Rust code (not just experiments):
• drivers/android/binder/rust_binder_main.rs supporting C side (binderfs init, proc file helpers) via explicit FFI.
• rust/kernel/time.rs and friends abstracting ktime_get (calls into many arch C implementations).
• Build-time generators (Python DRM register header parser interacting with Rust syn/proc_macro2).
• ~96k Rust symbols across 349 files. Cross-language edges capture the C ↔ Rust boundary cleanly.
Other Observations
• Enormous constant/enum surface (device tables, tracepoints, Kconfig-derived, arch headers).
• find_unused and broad duplicate queries will return high volume (config-gated code, arch copies, exported symbols for modules, test code, etc.). Use exclude_cfg_gated and kind/language filters.
• Macro expansion cache is absent in this index; some unresolved attributes and cfg-gated symbols are tracked.
• Subsystem scoping (e.g. path:kernel, path:drivers) works on some tools but can hit workspace resolution or budget limits on very broad queries due to the tree size.
sqry Usage Notes on This Tree
Broad queries (lang:c kind:function, path:rust/kernel, etc.) frequently hit the row budget (~5M rows examined). Prefer:
• file: or tighter path: predicates
• sqry_query (planner syntax), relation_query, direct_callers/direct_callees, call_hierarchy, trace_path
• get_document_symbols on specific files (excellent for one-file overviews)
• get_definition, get_references, dependency_impact
• find_cycles, complexity_metrics (with path or target where supported)
The index is high quality and the unified graph (C Rust others) works well even on a codebase this large.
Would you like deeper dives (specific subsystem, particular symbol/call paths, Rust component analysis, cycle investigation in a area like KVM/MM, duplicate patterns in drivers, etc.)?