Filter
Exclude
Time range
-
Near
HPの上下をトリガーにする、てのはやっぱみんな考えるね スト6の映像なら、SegmentAnythingでキャラだけ抽出すれば背景消からキャラだけ抜くのは容易だと思う
37
Meta releases SAM 3.1, an update to its Segment Anything Model that improves video processing efficiency through object multiplexing and global reasoning. #segmentanything More details here: todaysai.ai/tool/145
2
27
This will eventually be a part of it! There a ton of side models like DAv3/SegmentAnything/ect that also need evaluations that help with slam. But I wanted to focus on a constrained version of things to start. Very cool demo btw =] What depth model are you using? It seems like the fisheye lens makes the depth model struggle some. Might be worth looking at github.com/yuliangguo/depth_… or any of the other wider FOV depth models. This is another cool one nam1410.github.io/cam3r/
1
4
262
SAM3 detects and tracks hands very well. Prompt: “hands” Frame 0 → detect 2 hands Rest of video → fully tracked #SegmentAnything
Check out this amazing point cloud from FoundationStereo by @bowenwen_me Neural stereo depth is the future
6
13
168
28,662
🏆 Won 1st Place at the AGI Hackathon at @agihouse_org with @juliakeem @JerryHan_og and @OpenGraph_Labs! We built a "Temporal Action Segmentation Pipeline" for Physical AI. The Problem: Robotics data today = short clips, RGB-only, lab settings. We need long-horizon, multi-modal, in-the-wild data. Our Solution: 🎬 Input: Long manipulation video (5 mins) 🤖 Gemini VLM → Action & Phase segmentation 🎯 SAM3 → Object tracking with text prompts 🌐 Pi3 → 3D reconstruction & camera poses 📚 Skill clustering → Reusable skill library → Output: Structured robot training data with timestamps, masks & 3D Humans ARE the ultimate robots 🦾 #PhysicalAI #Robotics #Hackathon #Gemini #SegmentAnything Huge thanks to @henry_yu_01 @NomadicML @zoox @DynaRobotics
9
8
98
7,603
New tutorial | Text-prompt segmentation with @AIatMeta SAM3 ✨ Learn how to segment objects in images and videos using single or multiple text prompts with SAM3. Watch here ➡️ bit.ly/48UI16s #SAM3 #SegmentAnything #Ultralytics
1
30
1,760
9 Dec 2025
🚀 I am very excited to release the SamGeo QGIS plugin for geospatial image segmentation, powered by Meta’s Segment Anything Model (SAM 3) In this full tutorial, I’ll walk you through how to install, configure, and start segmenting satellite imagery in QGIS without writing a single line of code! 👉 Download the plugin here: github.com/opengeos/qgis-sam… 💻 Full video tutorial: youtu.be/oPZc7BvDsHE #QGIS #SegmentAnything #SAM #GeoAI #RemoteSensing
13
75
370
28,973
✨ We found that #SegmentAnything hides a rich semantic structure, and we show how to unlock it! Our paper SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation is a #NeurIPS2025 Spotlight. 📍 Come check it out! Poster Friday, 11 a.m. 📄github.com/ClaudiaCuttano/SA…
3
9
1,117
To the leads pushing “Segment Anything” forward   @AIatMeta @nikhilaravi @kate_saenko_ @PengchuanZ @cfeichtenhofer @alexandr_wang (and multimodal teams at Adobe, Apple, etc.), Adobe’s Semantic Audio Search shows audio segmentation is here, powered by techniques like logit adjustment from our FLAM work.   Let’s discuss how these methods could enhance image video open-set detection or how audio can integrate into broader multimodal models. DMs open, and I am excited to explore synergies, limitations and extensions!  #Adobe #SemanticAudioSearch #Meta #SegmentAnything #SAM3 #AudioAI #Multimodal
2
191
Meta just dropped SAM 3 — open-vocabulary segmentation for images video is here and it’s incredible 🔥 Huge congrats @AIatMeta and the entire FAIR team! #SegmentAnything   While vision is in the spotlight… collaborators in Adobe just shipped the **audio version** in production 🥳   Semantic Audio Search is now live in @Adobe Premiere Pro.   Type literally any sound (“glass breaking”, “crowd cheering”, “opera singing”) and it jumps to every single instance in your timeline.   Open-set. Frame-accurate. Shipping to millions today. Congratulations to the team @justin_salamon, @urinieto, @pseetharaman, @wuyusongwys and others for making this possible, and for the opportunity to deliver real-world impact to a production-grade creativity software through my multi-year Ph.D. research! (find details below)  Announcement: blog.adobe.com/en/publish/20…
1
4
11
1,068
東京都より公開されている #点群 データと #オルソ画像 を利用して、セグメンテーションを行いました。東京ドームが一つの大きな物体として認識されています。また周辺の建物もうまく色分けされています。#SegmentAnything を利用してセグメンテーションしました #デジタルツイン実現プロジェクト
4
38
1,676
Gemini2.5で自然言語で画像セグメンテーションできるようになったんだと。別に画像セグメンテーションくらいSegmentAnythingでできてたでしょ?と思うかもしれんけど、SegmentAnythingだと"people"とかの大雑把な指示しかできんかったけど、Geminiなら文脈を理解する能力があるから「そこの木の横に立ってる赤いシャツ来た男性をマスキングして」みたいな細かい指示とかできるのが嬉しいという
Gemini 2.5 introduces conversational image segmentation for AI, enabling advanced visual understanding through object relationships, conditional logic, and in-image text. developers.googleblog.com/en…
22
129
18,092
📢SAM4D: Segment Anything in Camera and LiDAR Streams SAM4D introduces a 4D foundation model for promptable segmentation across camera and LiDAR streams, addressing the limitations of frame-centric and modality-isolated approaches in autonomous driving. Key Highlights: ✅Promptable Multi-modal Segmentation (PMS) – Enables interactive segmentation across sequences from both modalities using diverse prompts (points, boxes, masks), allowing cross-modal propagation and long-term object tracking. ✅Unified Multi-modal Positional Encoding (UMPE) – Aligns image and LiDAR features in a shared 3D space using sinusoidal and MLP-based encoding for seamless cross-modal interaction while preserving modality-specific structure. ✅Motion-aware Cross-modal Memory Attention (MCMA) – Incorporates ego-motion compensation into memory attention, enabling temporally consistent retrieval and robust segmentation in dynamic scenes. ✅Multi-modal Architecture – Builds on SAM2 with Hiera for image encoding and MinkUNet (via TorchSparse) for LiDAR voxelization, allowing efficient 2D-3D joint segmentation. ✅Efficient Prompt Handling – Supports point, box, and mask prompts from either modality, using a unified decoder to produce temporally consistent masks across the stream. ✅Waymo-4DSeg Dataset – A large-scale pseudo-labeled dataset containing 15M image masks, 30M LiDAR masks, and 300k cross-modal masklets, generated via VFM segmentation, 4D LiDAR reconstruction, and ray casting. ✅Cross-Modal Label Fusion Pipeline – Builds dense pixel-to-voxel mappings, filters noisy masklets using DBSCAN clustering, and merges multi-view data into high-quality voxel masklets. ✅Cross-Dataset Generalization – Demonstrates strong zero-shot and fine-tuned performance on nuScenes, validating robust transferability across sensor configurations and environments. ✅Quantitative Performance – Achieves 69.8% mIoU on images and 55.7% on LiDAR with 80.1% J&F, significantly outperforming single-modality and projection-based baselines. ✅Scalable & Efficient Design – 119.88M parameter model optimized with memory banks, FIFO queues, and prompt imitation logic for high-throughput 4D segmentation. ✅Future-Proof Foundation – Roadmap includes natural language prompting via LLMs, multi-sensor scaling, weak/self-supervised learning, and improved memory and compute efficiency. ➡️Project: SAM4D-Project.github.io ➡️Github Repo: github.com/CN-ADLab/SAM4D ➡️LearnopenCV blog post:  learnopencv.com/sam-2/ #SegmentAnything #SAM4D #LiDAR #Camera #4DPerception #AutonomousDriving #MultiModal #PromptableSegmentation
32
138
5,643
#PointSAM を使ってサイの3次元 #点群 からしっぽをセグメンテーションしました。動画ではしっぽを一点クリックすることでしっぽを自動的に抽出できています。以下のコードおよびデモデータを利用しました。SegmentAnything (#SAM) を3Dモデルに適用しているイメージです。 point-sam.github.io/
6
519
#PointSAM を利用して植木鉢のセグメンテーションを試しました。4点ほどクリックすることで、意図した形状を適切に抽出・色付けすることができました。 SegmentAnything (#SAM)は画像に対して実行しますが、こちらは3次元点群に対して実行することができます。 point-sam.github.io/
4
26
1,616
Arguably one of the most important papers for microscopy landed in February this year. This Nature paper provides a segmentation and fine tuning framework for anything microscopy. Fast, general, and open-source. #Microscopy #AI #SegmentAnything ow.ly/mvHV50W25SO
2
6
153
📢 Segment Any Motion in Videos: fine-grained video object segmentation — without flow supervision or manual annotations during inference. By integrating long-range motion trajectories, DINO-based semantics, and SAM2 prompting, SAMotion delivers dynamic segmentation masks per object even in complex, real-world scenes. Key Highlights: ✅ Spatio-Temporal Trajectory Attention (ST-ATT) – Encodes long-range motion by alternating spatial attention (across trajectories) and temporal attention (along each trajectory), capturing both global inter-object relationships and local motion evolution. ✅Motion-Semantic Decoupled Embedding (MSDE) – Separates motion and semantic reasoning in the decoder: motion-only attention is followed by DINO-based semantic augmentation through cross-attention, ensuring semantic cues refine but do not dominate motion prediction. ✅BootsTAP-Based Track Generation – Leverages high-confidence 2D trajectories from BootsTAP with visibility and confidence filtering, enriching motion cues with depth and frame-to-frame deltas (Δu, Δv, Δd) for enhanced temporal modeling. ✅Frequency-Based Positional Encoding (PE) – Adopts NeRF-style sinusoidal embeddings on spatial and temporal signals to avoid oversmoothing and preserve fine-grained motion localization across trajectories. ✅Depth-Enhanced Motion Encoding – Incorporates monocular depth estimates from Depth-Anything to model scene structure and occlusions, enabling better segmentation under 3D layout variations and partial visibility. ✅Two-Stage SAM2 Prompting – 1. Groups tracks per object (spatial/frame heuristics) 2. Uses long-range point prompts and merges fragmented masks. ✅Fine-Grained Instance-Level Masks – Handles multiple similarly-moving objects, complex articulation, clothing, limbs, etc. ✅Superior Benchmark Results – Outperforms state-of-the-art MOS and fine-grained MOS baselines (e.g., RCF, ABR, OCLR) across DAVIS17, SegTrackv2, FBMS59: DAVIS17-Moving (Fine-grained MOS): J=77.4, F=83.6 DAVIS16-Moving (MOS): J=89.0, F=89.2 ✅Robust in Challenging Conditions – Demonstrates resilience to: Camouflage textures and motion blur Transparent surfaces and reflections Strong camera motion and partial occlusion ✅Ablation-Backed Architecture – Removing DINO, MSDE, or ST-ATT leads to significant drops (up to -17 % J&F), confirming the necessity of decoupled semantic integration and spatio-temporal modeling. ✅Modular & Data-Efficient Training – Trained on a mix of synthetic (Kubric, DynamicReplica) and real-world (HOI4D) datasets, showing generalization across scene types without needing dense motion annotations at inference. Paper: lnkd.in/giH-YuFr Github: lnkd.in/gquJ_TwP Project: lnkd.in/gxmiJ6q9 Related articles from LearnOpenCV: SAM2: lnkd.in/gkG7dx65 MedSAM2: lnkd.in/gg78Pri3 #SAM2 #Segmentation #SegmentAnything
4
13
413
New tutorial | @AIatMeta Segment Anything 2 in @Google Colab with Ultralytics! 🚀 Segment objects using point and box prompts, or segment everything automatically with a ready-to-use Colab notebook. Watch here ➡️ ow.ly/1brb50VXBtC #SAM2 #SegmentAnything #Ultralytics #AI
2
9
380
東京都より公開されている #点群 データと #オルソ画像 を利用して、セグメンテーションを行いました。東京ドームが一つの大きな物体として認識されています。また周辺の建物もうまく色分けされています。#SegmentAnything を利用してセグメンテーションしました #デジタルツイン実現プロジェクト
23
928
Shocked 💀⚡️ Initially tried to use #klingai for the ball swap but found the mask too restricting. Ended up using a custom #ComfyUI workflow with #segmentanything and VACE!! Featuring @sweaty__palms getting electrocuted 😬
8
665