Computer Vision and Pattern Recognition Papers

Computer Vision and Pattern Recognition Papers

Users
Tweets

Computer Vision and Pattern Recognition Papers @CSVisionPapers

16h

ERN-Net: Evolving Reason Node-Net for Document Binarization Hsin-Jui Pan, Sheng-Wei Chan, Jen-Shiung Chiang arxiv.org/abs/2606.11710 [𝚌𝚜.𝙲𝚅]

ERN-Net : Evolving Reason Node-Net for Document Binarization

This paper presents ERN-Net, an Evolving Reason Node-Net for efficient document image binarization. ERN-Net enhances degradation-sensitive regions, such as faint strokes, broken characters, and...

arxiv.org

Yongzhong Xu

Yongzhong Xu

@yongzhong_xu

Jun 12

4/ Pipeline: max-attention per head per example, per-head median binarization, pseudolikelihood Ising, spectral clustering — Bhalla et al.'s recipe, minimally adapted. Then closure: ablate the discovered community, compare per-example damage to five matched random head-sets, report z on loss, accuracy, target-logit.

Computer Vision and Pattern Recognition Papers

Computer Vision and Pattern Recognition Papers @CSVisionPapers

Jun 12

DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based State Space Models for Document Image Binarization Sheng-Wei Chan, Yung-Che Wang, Hsin-Jui Pan, Chia-Min Lin, Jen-Shiun Chiang arxiv.org/abs/2606.08781 [𝚌𝚜.𝙲𝚅] 💬Code: github.com/henrychan0719/Dee…

DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based...

Document image binarization aims to separate foreground text from degraded backgrounds while preserving thin, broken, and low-contrast strokes. Although deep learning methods have improved...

arxiv.org

115

Arpit Mishra

Arpit Mishra

@arpitcodecamp

Jun 11

Day 9 📚 Today I learnt about Binning and Binarization topic in Machine Learning. Watched an old orientation session video of MLP project of IITM BS Degree program. Also understood several things about Kaggle Competitions.

Artificial Intelligence Papers

Artificial Intelligence Papers @SciFi

Jun 5

Bridging the Sim-to-Real Gap in Semiconductor Visual Program Synthesis via Input Binarization Yusuke Ohtsubo, Kota Dohi, Koichiro Yawata, Koki Takeshita, Tatsuya Sasaki arxiv.org/abs/2606.02434 [𝚌𝚜.𝙰𝙸]

Precise parametric control over circuit geometry is essential for semiconductor inspection, yet obtaining sufficient real training data remains costly. Although generative models such as diffusion models and Generative Adversarial Networks (GANs) can augment training data, they cannot guarantee the nanometer-scale geometric accuracy required for metrology tasks. We propose a visual program synthesis framework in which a Vision-Language Model (VLM) converts inspection images into editable Domain-Specific Language (DSL) code describing circuit geometries, enabling controlled generation of training data with exact parameter manipulation. Because the VLM is trained solely on synthetic DSL-rendered data, a domain gap arises when processing real Scanning Electron Microscope (SEM) images. We bridge this gap with an input binarization strategy that strips SEM-specific texture and noise, letting the model focus on geometric structure. On the MIIC dataset, binarized inputs improve the mean Dice

ALT Precise parametric control over circuit geometry is essential for semiconductor inspection, yet obtaining sufficient real training data remains costly. Although generative models such as diffusion models and Generative Adversarial Networks (GANs) can augment training data, they cannot guarantee the nanometer-scale geometric accuracy required for metrology tasks. We propose a visual program synthesis framework in which a Vision-Language Model (VLM) converts inspection images into editable Domain-Specific Language (DSL) code describing circuit geometries, enabling controlled generation of training data with exact parameter manipulation. Because the VLM is trained solely on synthetic DSL-rendered data, a domain gap arises when processing real Scanning Electron Microscope (SEM) images. We bridge this gap with an input binarization strategy that strips SEM-specific texture and noise, letting the model focus on geometric structure. On the MIIC dataset, binarized inputs improve the mean Dice

156

Luciano Martelotto 🛠🧬💻🇦🇺

Luciano Martelotto 🛠🧬💻🇦🇺@LGMartelotto

Jun 4

Yessss! Wanted to share #STAMPede - a new Python toolkit for exploring and analyzing #STAMP data by @mhlangalab’s talented duo Niels Velthuijs and Siebren Frölich (mhlangalab.org) If you haven't heard of #STAMP yet…check this paper out cell.com/cell/fulltext/S0092……it's a single-cell method published in @CellCellPress last year, repurposing spatial-omics imagers to profile millions of cells at the lowest cost yet, without compromising data quality - no sequencing required. It retains cell morphology and supports multimodal profiling (RNA, protein, and HnE) all in one workflow. But powerful data needs powerful tools. That's where #STAMPede comes in. Built to handle the massive, shallow-depth datasets that #STAMP generates, #STAMPede gives you: → Familiar scanpy-style syntax so there's almost no learning curve → Full pipeline support: QC, filtering, binarization, dimensionality reduction, clustering, and differential expression → Easy installation via pip or conda Whether you're exploring rare cell populations or running large-scale perturbation studies, STAMPede is designed to keep up. Check it out 👉 siebrenf.github.io/stampede/ @DrJasPlummer #SingleCell #Bioinformatics #Genomics #OpenSource #STAMP #scRNAseq #ComputationalBiology

STAMP: Single-cell transcriptomics analysis and multimodal profiling through imaging

Single-cell transcriptomics analysis and multimodal profiling (STAMP) by imaging enables single-cell analysis of cells in suspension without the need for sequencing. The markedly reduced costs and...

cell.com

2,359

Raghava Krishna | రాఘవ కృష్ణ

Raghava Krishna | రాఘవ కృష్ణ

@Anviksiki

Jun 2

The performative anxiety around Ravana being a Shiva Bhakta or having any virtues at all is the perfect illustration of what happens when the social becomes the dominant compass over spiritual knowledge core. The Hindu mind abhors easy binarization because Dharma is sukshma and human nature is volatile. It is to help us understand the essential human condition that the empathy of our rishis gave us our Itihasa and Purana - To make the eternal essence of the Vedic corpus relatable to everyone. The spiritual principle explained through Ravana is that one might have great qualities but when they fail Dharma, they devolve. This is all the more critical for those who are extraordinary cause their fall is the swiftest and gravest. Tradition is very clear on the extraordinary attributes of Ravana as it is on his fall. There is neither glorification nor obfuscation. If you wish to understand the real lesson on Ravana that comes to us from Valmiki Ramayana, please go through this beautiful thread by @jamvasu garu x.com/jamvasu/status/2037223… Sample this excerpt from the thread: अहो रूपमहो धैर्यमहो सत्त्वमहो द्युतिः। अहो राक्षसराजस्य सर्वलक्षणयुक्तता।। "What form! What courage! What strength! What radiance! This king of rākṣasas is endowed with every great quality." Hanumān is not praising an enemy. He is bearing witness to a truth: Rāvaṇa's greatness was real. rūpa. dhairya. sattva. dyuti. Sarvalakṣaṇayuktatā — the fullness of all great qualities. By any measure, he was extraordinary". We can only hope that we don't cancel Hanuman ji in our social stupor since he praises Ravana on certain attributes. "The ability to hold opposing ideas and be able to function is the hallmark of a first rate intelligence" - Scott Fitzgerald. This is what our Shastras equip us to do, as long we remain humble and not make them an instrument to serve our agenda.

Srinivas Jammalamadaka @jamvasu

Mar 26

1/ When Greatness Has No Foundation Rāvaṇa had everything. Form. Courage. Strength. Radiance. Scholarship. Valour. And yet he fell. A thread from the Sundarakāṇḍa on Dharma 🧵

246

18,549

Zhihu Frontier

Zhihu Frontier

@ZhihuFrontier

May 29

Lngram: Latent N-Gram Memory for All-Modal LLMs LLMs have a fatal architectural inefficiency 🚨 Insights from Zhihu Contributor 我是猫👇 We force the same Transformer stack to handle two entirely different tasks: 🧠 Hard dynamic work: Multi-step reasoning & long-range dependencies 📌 Trivial static work: Fixed phrases, entities & domain pattern matching Transformers lack native lookup logic. They waste precious layer depth endlessly re-learning basic static patterns — instead of focusing on real reasoning 💥 🔹 Engram → Lngram: Fixing the Core Flaw Engram added memory branches to offload pattern matching — but it’s trapped by tokenizer IDs ❌ It builds N-gram keys from tokenizer IDs ❌ This creates 3 critical bottlenecks: 1️⃣ Token segmentation rarely aligns with true semantic units 2️⃣ Token N-grams explode combinatorially, causing hash collisions & information loss 3️⃣ Text-only limitation — incompatible with vision, robotics & VLA tasks Lngram redefines the paradigm: N-gram lookup moves from rigid token space to flexiblelatent hidden space ✨ 🔹 How Lngram Works (4 Simple Steps) A lightweight residual plug-and-play branch for Transformers (no backbone rewrite needed): 1️⃣ Hidden State Discretization Compresses continuous hidden states into compact 4-bit latent symbols, preserving all task-essential information 2️⃣ Latent N-Gram Key Generation Builds 2/3-gram keys for exact, fast table lookup — no attention, no approximate search overhead 3️⃣ Context-Aware Gating Dynamically weights retrieved memory vectors via current context, avoiding rigid static pattern misuse 4️⃣ Residual Fusion Injects optimized memory features back into the backbone, leaving native attention/MLP fully functional 🔹 The Key Difference: Engram vs Lngram 🟠 Engram: Token-bound, text-only, fixed hash memory (tokenizer-dependent) 🔵 Lngram: Latent-space, full multimodal, learnable routing memory Not RAG. Not attention replacement. Pure, in-model trainable memory optimization 🧠 🔹 Training Innovation: Counterfactual Surrogate Gradient Hard discretization breaks raw gradients — naive STE training fails entirely ❌ Lngram solves this with counterfactual surrogate gradients: It calculates precise local gradients by contrasting embedding outputs of 0/1 bit flips, aligning discrete routing with loss optimization Result: Rock-solid training stability & ultra-precise latent symbol learning ✅ 🔹 Key Proof: Binarized Latent States Are Powerful Qwen3-1.7B Window Attention validation: ✅ Preserves 99% of general language capability ✅ NIAH-32k long-context score: 6.2 → 100 (on par with global attention!) Binarization filters noise, retains core pattern-matching signals 📊 🔹 SOTA Benchmark Performance (All Scales) 🏆 2B MoE Pre-Training Lngram outperforms vanilla MoE ( 1.41%) & Engram ( 0.62%) under identical parameter budgets A 23-layer Lngram model beats a 24-layer vanilla MoE — it directly boosts effective model depth 💪 🏆 Large Model & Long-Train Robustness Gains hold steady at 140B training tokens (2B) & full 8B model scale — no small-model bias 🏆 64k Long-Context Modeling Consistently lower perplexity across all context ranges, freeing attention for true long-range dependency modeling 🏆Domain Adaptation (Zero Extra Overhead) Freeze pretrained backbone, only train Lngram: Driving benchmark accuracy jumps from 50.59 → 55.73% Joint fine-tune pushes it to 62.45% (far surpasses full vanilla fine-tuning) 🔹 True Cross-Modality Power 🎥🤖 No tokenizer dependency = works for all modalities: 🖼️ VLM (LLaVA setup): Average benchmark score 0.7%, big gains in visual reasoning ( 2.1%) & object localization 🤖 VLA Robotics: Improved long-horizon & spatial task success rates on LIBERO/StarVLA 🔹 Internal Mechanism: Boost Effective Model Depth LogitLens & CKA analysis confirm: ✅ Lngram pushes task-related information to earlier Transformer layers ✅ Creates an effective depth gain of 2–3 layers ✅ Separates static lookup tasks from dynamic reasoning tasks (perfect computational division of labor) 🔹 Deployment Efficiency 🚀 ✅ Prefill speed 16.7%, latency -14.3% (faster long-sequence parallel processing) ✅ Decode latency slightly higher ( 6.7%) butzero incremental cache growth ✅ No exploding memory overhead with longer context lengths 🔹 Limitations (No Hype, Full Transparency) ❌ Not a replacement for Transformer reasoning ❌ Slight general capability drop after heavy domain adaptation ❌ Less effective for dynamic cross-instance relational modeling 💡 Core Industry Insight Future LLM progress isn’t just about scaling parameters — it’s about specialized computational division. Let attention handle long-range context, MLP handle complex reasoning, andLngram handle static local pattern lookup. This is the next evolution of efficient Transformer design 🔥 #AIResearch #Lngram #Engram #LLMArchitecture #DeepLearning #AIEfficiency #LLMTraining #VLM #VLA 🔗Full article： zhuanlan.zhihu.com/p/2042559…

2,996

Fumi Kawano

Fumi Kawano

@fumikawano

May 25

Predicting knockout-induced transcriptomic responses from unperturbed single-cell RNA-seq IGNITE infers a gene regulatory network (GRN) directly from unperturbed scRNA-seq data, then uses that GRN to simulate in silico gene knockouts. From pseudotime-ordered single-cell expression dynamics, this model estimates a directed, weighted, and signed effective GRN and generates wild-type and KO cell-state patterns. 1) Data used for machine learning mouse PSCs: 10x scRNA-seq during naïve-to-formative transition 9894 cells across 0, 6, 12, 24, and 48 h model input: curated 24-gene PSC transition program human PSCs: independent differentiation dataset toward definitive endoderm 758 cells across 0, 12, 24, 36, 72, and 96 h model input: 98 transcriptional regulators 2) From scRNA-seq to gene activity scRNA-seq counts ->-> QC and log-normalization ->-> dimensionality reduction / clustering ->-> Slingshot pseudotime ordering ->-> Mini-Bulk smoothing along pseudotime ->-> binarization by gene-specific half-maximum expression ->-> gene activity 3) Kinetic Ising model as a generative GRN - genes are treated as ON/OFF activity states - the inferred GRN defines activating or repressing effects between genes - gene states are updated probabilistically by Glauber dynamics - repeated updates generate wild-type-like cell-state patterns 4) GRN inference as an inverse problem observed gene activity transitions ->-> infer signed gene–gene effects that can reproduce them ->-> generate WT-like cells from 250 candidate GRNs ->-> select the GRN that best preserves the input correlation structure 5) Mouse KO prediction Model condition: - unperturbed mouse PSC scRNA-seq - curated 24-gene transition program - KO = target-gene interaction removal from the inferred GRN Spearman correlation between predicted and experimental KO–WT gene-activity changes: Rbpj KO: ρ = 0.531 Etv5 KO: ρ = 0.803 Tcf7l1 KO: ρ = 0.010 triple KO: ρ = 0.716 My scientific thought: For KO prediction, the model focused on 24 genes known to be involved in the naïve-to-formative transition of mouse PSCs. Because the correlation was calculated within this restricted and biologically important gene set, the reported values indicate relatively high predictive accuracy. This may also avoid the problem that correlations based on whole-RNA-seq data can appear high simply because of the large number of genes, although I would still be interested in seeing the whole-gene correlation. Gene-wise KO-WT change correlation is useful as a measure of molecular validity of the perturbation response, whereas cell-state prediction is useful for asking whether that response has biological phenotypic meaning. #Bioinformatics #ComputationalBiology

895

cooper ⚢🪶

cooper ⚢🪶@jazforthesoul

May 17

Replying to @jazforthesoul @fagdisease

one of the coolest projects ive ever done for my native studies cert. was on the differences between different indigenous concepts of gender identity and how the binarization is a product of colonial understanding.

241

HeisRae💡

HeisRae💡

@Heisrae_Vibez

May 11

x.com/i/article/205374703095…

145

4,111

PK Lesbian

PK Lesbian @enbyian

May 4

Replying to @junebug1408_

a fuckkk ton of kirby characters are canonically nonbinary yet the binarization of nonbinary ppl continues to ensue

121

ↁэѓ Ќlошиѕтэіи 🇵🇸

ↁэѓ Ќlошиѕтэіи 🇵🇸@DerKlownstein

Apr 25

Replying to @kingvmr @bornposting

People put too much emphasis on an arbitrary percentage derived from the binarization of said collection, yes. Thanks for agreeing.

196

空き缶詰

空き缶詰 @ABCDMARTN

Apr 23

ワールドミラー(binarizationモード)とVRC プリント(通常カラー)とパーソナルミラー(かなり透かして、binarizationと通常カラーの写真を重ねる)を組み合わせて撮影📸 違う組み合わせで試したら独特な雰囲気になった一枚 #VRC鏡遊び

空き缶詰 @ABCDMARTN

Apr 21

『Pixel Minahoshi🌸』 #みなほしフォトコン #VRC鏡遊び

ALT ワールドミラー(outlineモード)とVRC プリント(プリントから離れて低解像度のmipmapに切り替える)とパーソナルミラー(かなり透かして、低解像度の通常カラーとoutlineを重ねる)を組み合わせて撮影📸 ※合成なし(VRC内で写真は完成させてます)

1,689

Devansh Singh

Devansh Singh @DevanshSin52865

Apr 17

1/6 🧵 Data preprocessing can make or break your ML model. Two techniques that often get confused are Binning and Binarization. They both transform numerical data, but their goals are very different. Let’s break them down. 👇 #MachineLearning #DataScience #Python

空き缶詰

空き缶詰 @ABCDMARTN

Apr 10

[ALTの詳細] 同じ構図の写真3枚(デフォカメラの設定→スケッチとモザイクフィルターワールドミラー(binarizationモード)で撮ったVRC プリントをパーソナルミラーをかなり透かして重ね撮りした一枚📸 ※合成なし(VRC内で写真は完成させてます)、３枚のうち1枚だけアバターを入れて撮ってます

空き缶詰 @ABCDMARTN

14 Dec 2025

『私がいないクリスマス』 #VRChatPhotography #VRC鏡遊び

ワールドミラー(binarizationモード)とVRC のプリント3枚とパーソナルミラー(かなり透かして重ねる)を組み合わせて撮影📸
※合成なし(VRC内で写真は完成させてます)、同じ構図の写真をフィルター3種類(binarization、スケッチ、モザイクカラー)重ねて撮ってます

ALT ワールドミラー(binarizationモード)とVRC のプリント3枚とパーソナルミラー(かなり透かして重ねる)を組み合わせて撮影📸 ※合成なし(VRC内で写真は完成させてます)、同じ構図の写真をフィルター3種類(binarization、スケッチ、モザイクカラー)重ねて撮ってます

426

Adam Green

Adam Green

@adamlewisgreen

Apr 9

x.com/i/article/204230708394…

136

14,375

Robert Jankowski

Robert Jankowski @rjankowskiub

Apr 8

New paper in @MLSTjournal: task difficulty leaves a clear signature in neural networks, and for harder tasks even simple binarization can collapse performance to chance. Read more: iopscience.iop.org/article/1… @filrad @MAngelsSerranoM Marián Boguñá @santo_fortunato

Task complexity shapes internal representations and robustness in neural networks

Task complexity shapes internal representations and robustness in neural networks, Jankowski, Robert, Radicchi, Filippo, Serrano, M Ángeles, Boguñá, Marián, Fortunato, Santo

iopscience.iop.org

548

Hitachi Electron Microscope

Hitachi Electron Microscope

@Hitachi_EM

Apr 7

Happy Metric System Day ! 📏 📐⚖️🧪 (April 7) Automated measurement of Nanofibers Diameter of nanofibers were automatically measured using low accelerating voltage SEM images without charging. Each fiber is recognized by image binarization and its length, angle, diameter, average values, standard deviation and minimum and maximum values were automatically reported as well as their distribution. To know more about the measurement tools, check out the following post !

0:23

277

U-Mentalism

U-Mentalism

@U_Mentalism

Mar 27

🚀 Excited to announce that Phase 1 of the Entreacte OCR project is officially launching this Monday! 🔄 The core innovation? Reversed binarization — turning inter-character spaces into primary computational objects. Voids become data. 🐍 Built in Python with OpenCV, scikit-image, and NumPy. First phase focuses on binarization, skeletonization, and interval extraction. 🧠 This is a patent-backed research project bridging philosophy of computing and machine-centered vision. 🏢 Proudly developed under Hebdómada, Unipessoal Lda — bringing novel computer vision to life, one interval at a time. 🔜 Phases 2 and 3 will expand toward modular experimentation, OCR integration, and multi-script support. 📄 Paper upcoming. Conference in June. More to come. #OCR #ComputerVision #Entreacte #ReversedBinarization #DeepLearning #OpenCV #Python #AI #Hebdomada #Research #Patent