いずいずちゃん

いずいずちゃん

Users
Tweets

いずいずちゃん @izk14v

Jun 9

HPの上下をトリガーにする、てのはやっぱみんな考えるねスト6の映像なら、SegmentAnythingでキャラだけ抽出すれば背景消からキャラだけ抜くのは容易だと思う

Today's AI

Today's AI @TodaysAIdotai

May 22

Meta releases SAM 3.1, an update to its Segment Anything Model that improves video processing efficiency through object multiplexing and global reasoning. #segmentanything More details here: todaysai.ai/tool/145

Pablo Vela

Pablo Vela

@pablovelagomez1

Apr 16

Replying to @uddupa @rerundotio @Gradio

This will eventually be a part of it! There a ton of side models like DAv3/SegmentAnything/ect that also need evaluations that help with slam. But I wanted to focus on a constrained version of things to start. Very cool demo btw =] What depth model are you using? It seems like the fisheye lens makes the depth model struggle some. Might be worth looking at github.com/yuliangguo/depth_… or any of the other wider FOV depth models. This is another cool one nam1410.github.io/cam3r/

GitHub - yuliangguo/depth_any_camera: [CVPR 2025] Depth Any Camera: Zero-Shot Metric Depth Estima...

[CVPR 2025] Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera - yuliangguo/depth_any_camera

github.com

262

Yu Xiang

Yu Xiang

@YuXiang_IRVL

Feb 27

SAM3 detects and tracks hands very well. Prompt: “hands” Frame 0 → detect 2 hands Rest of video → fully tracked #SegmentAnything

0:40

Yu Xiang

@YuXiang_IRVL

Feb 26

Check out this amazing point cloud from FoundationStereo by @bowenwen_me Neural stereo depth is the future

0:16

168

28,662

Beomsoo Son

Beomsoo Son

@BeomsooSon

Feb 8

🏆 Won 1st Place at the AGI Hackathon at @agihouse_org with @juliakeem @JerryHan_og and @OpenGraph_Labs! We built a "Temporal Action Segmentation Pipeline" for Physical AI. The Problem: Robotics data today = short clips, RGB-only, lab settings. We need long-horizon, multi-modal, in-the-wild data. Our Solution: 🎬 Input: Long manipulation video (5 mins) 🤖 Gemini VLM → Action & Phase segmentation 🎯 SAM3 → Object tracking with text prompts 🌐 Pi3 → 3D reconstruction & camera poses 📚 Skill clustering → Reusable skill library → Output: Structured robot training data with timestamps, masks & 3D Humans ARE the ultimate robots 🦾 #PhysicalAI #Robotics #Hackathon #Gemini #SegmentAnything Huge thanks to @henry_yu_01 @NomadicML @zoox @DynaRobotics

0:33

7,603

Ultralytics

Ultralytics

@ultralytics

24 Dec 2025

New tutorial | Text-prompt segmentation with @AIatMeta SAM3 ✨ Learn how to segment objects in images and videos using single or multiple text prompts with SAM3. Watch here ➡️ bit.ly/48UI16s #SAM3 #SegmentAnything #Ultralytics

1,760

Qiusheng Wu

Qiusheng Wu

@giswqs

9 Dec 2025

🚀 I am very excited to release the SamGeo QGIS plugin for geospatial image segmentation, powered by Meta’s Segment Anything Model (SAM 3) In this full tutorial, I’ll walk you through how to install, configure, and start segmenting satellite imagery in QGIS without writing a single line of code! 👉 Download the plugin here: github.com/opengeos/qgis-sam… 💻 Full video tutorial: youtu.be/oPZc7BvDsHE #QGIS #SegmentAnything #SAM #GeoAI #RemoteSensing

0:57

370

28,973

Claudia Cuttano

Claudia Cuttano @ClaudiaCuttano

28 Nov 2025

✨ We found that #SegmentAnything hides a rich semantic structure, and we show how to unlock it! Our paper SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation is a #NeurIPS2025 Spotlight. 📍 Come check it out! Poster Friday, 11 a.m. 📄github.com/ClaudiaCuttano/SA…

0:26

1,117

Christos Tsirigotis

Christos Tsirigotis

@tsirigoc

20 Nov 2025

To the leads pushing “Segment Anything” forward @AIatMeta @nikhilaravi @kate_saenko_ @PengchuanZ @cfeichtenhofer @alexandr_wang (and multimodal teams at Adobe, Apple, etc.), Adobe’s Semantic Audio Search shows audio segmentation is here, powered by techniques like logit adjustment from our FLAM work. Let’s discuss how these methods could enhance image video open-set detection or how audio can integrate into broader multimodal models. DMs open, and I am excited to explore synergies, limitations and extensions! #Adobe #SemanticAudioSearch #Meta #SegmentAnything #SAM3 #AudioAI #Multimodal

191

Christos Tsirigotis

Christos Tsirigotis

@tsirigoc

20 Nov 2025

Meta just dropped SAM 3 — open-vocabulary segmentation for images video is here and it’s incredible 🔥 Huge congrats @AIatMeta and the entire FAIR team! #SegmentAnything While vision is in the spotlight… collaborators in Adobe just shipped the **audio version** in production 🥳 Semantic Audio Search is now live in @Adobe Premiere Pro. Type literally any sound (“glass breaking”, “crowd cheering”, “opera singing”) and it jumps to every single instance in your timeline. Open-set. Frame-accurate. Shipping to millions today. Congratulations to the team @justin_salamon, @urinieto, @pseetharaman, @wuyusongwys and others for making this possible, and for the opportunity to deliver real-world impact to a production-grade creativity software through my multi-year Ph.D. research! (find details below) Announcement: blog.adobe.com/en/publish/20…

Adobe Premiere 25.6 adds smarter search, faster edits, and seamless collaboration | Adobe Blog

The newest version of Adobe Premiere brings next-level intelligence, speed, and collaboration to every part of your video workflow. Whether you’re cutting on your desktop or on the go, Premiere now...

blog.adobe.com

1,068

Kenta@ImVisionLabs Inc.

Kenta@ImVisionLabs Inc.

@imvisionlabs

23 Aug 2025

東京都より公開されている #点群データと #オルソ画像を利用して、セグメンテーションを行いました。東京ドームが一つの大きな物体として認識されています。また周辺の建物もうまく色分けされています。#SegmentAnything を利用してセグメンテーションしました #デジタルツイン実現プロジェクト

0:20

1,676

うみゆき@AI研究

うみゆき@AI研究

@umiyuki_ai

22 Jul 2025

Gemini2.5で自然言語で画像セグメンテーションできるようになったんだと。別に画像セグメンテーションくらいSegmentAnythingでできてたでしょ？と思うかもしれんけど、SegmentAnythingだと"people"とかの大雑把な指示しかできんかったけど、Geminiなら文脈を理解する能力があるから「そこの木の横に立ってる赤いシャツ来た男性をマスキングして」みたいな細かい指示とかできるのが嬉しいという

Google AI Developers

@googleaidevs

21 Jul 2025

Gemini 2.5 introduces conversational image segmentation for AI, enabling advanced visual understanding through object relationships, conditional logic, and in-image text. developers.googleblog.com/en…

129

18,092

OpenCV University

OpenCV University

@OpenCVUniverse

30 Jun 2025

📢SAM4D: Segment Anything in Camera and LiDAR Streams SAM4D introduces a 4D foundation model for promptable segmentation across camera and LiDAR streams, addressing the limitations of frame-centric and modality-isolated approaches in autonomous driving. Key Highlights: ✅Promptable Multi-modal Segmentation (PMS) – Enables interactive segmentation across sequences from both modalities using diverse prompts (points, boxes, masks), allowing cross-modal propagation and long-term object tracking. ✅Unified Multi-modal Positional Encoding (UMPE) – Aligns image and LiDAR features in a shared 3D space using sinusoidal and MLP-based encoding for seamless cross-modal interaction while preserving modality-specific structure. ✅Motion-aware Cross-modal Memory Attention (MCMA) – Incorporates ego-motion compensation into memory attention, enabling temporally consistent retrieval and robust segmentation in dynamic scenes. ✅Multi-modal Architecture – Builds on SAM2 with Hiera for image encoding and MinkUNet (via TorchSparse) for LiDAR voxelization, allowing efficient 2D-3D joint segmentation. ✅Efficient Prompt Handling – Supports point, box, and mask prompts from either modality, using a unified decoder to produce temporally consistent masks across the stream. ✅Waymo-4DSeg Dataset – A large-scale pseudo-labeled dataset containing 15M image masks, 30M LiDAR masks, and 300k cross-modal masklets, generated via VFM segmentation, 4D LiDAR reconstruction, and ray casting. ✅Cross-Modal Label Fusion Pipeline – Builds dense pixel-to-voxel mappings, filters noisy masklets using DBSCAN clustering, and merges multi-view data into high-quality voxel masklets. ✅Cross-Dataset Generalization – Demonstrates strong zero-shot and fine-tuned performance on nuScenes, validating robust transferability across sensor configurations and environments. ✅Quantitative Performance – Achieves 69.8% mIoU on images and 55.7% on LiDAR with 80.1% J&F, significantly outperforming single-modality and projection-based baselines. ✅Scalable & Efficient Design – 119.88M parameter model optimized with memory banks, FIFO queues, and prompt imitation logic for high-throughput 4D segmentation. ✅Future-Proof Foundation – Roadmap includes natural language prompting via LLMs, multi-sensor scaling, weak/self-supervised learning, and improved memory and compute efficiency. ➡️Project: SAM4D-Project.github.io ➡️Github Repo: github.com/CN-ADLab/SAM4D ➡️LearnopenCV blog post: learnopencv.com/sam-2/ #SegmentAnything #SAM4D #LiDAR #Camera #4DPerception #AutonomousDriving #MultiModal #PromptableSegmentation

0:19

138

5,643

Kenta@ImVisionLabs Inc.

Kenta@ImVisionLabs Inc.

@imvisionlabs

24 Jun 2025

#PointSAM を使ってサイの3次元 #点群からしっぽをセグメンテーションしました。動画ではしっぽを一点クリックすることでしっぽを自動的に抽出できています。以下のコードおよびデモデータを利用しました。SegmentAnything (#SAM) を3Dモデルに適用しているイメージです。 point-sam.github.io/

0:05

519

Kenta@ImVisionLabs Inc.

Kenta@ImVisionLabs Inc.

@imvisionlabs

6 Jun 2025

#PointSAM を利用して植木鉢のセグメンテーションを試しました。4点ほどクリックすることで、意図した形状を適切に抽出・色付けすることができました。 SegmentAnything (#SAM)は画像に対して実行しますが、こちらは3次元点群に対して実行することができます。 point-sam.github.io/

1,616

MicroanalysisSociety

MicroanalysisSociety @MicrobeamSoc

1 Jun 2025

Arguably one of the most important papers for microscopy landed in February this year. This Nature paper provides a segmentation and fine tuning framework for anything microscopy. Fast, general, and open-source. #Microscopy #AI #SegmentAnything ow.ly/mvHV50W25SO

153

OpenCV University

OpenCV University

@OpenCVUniverse

26 May 2025

📢 Segment Any Motion in Videos: fine-grained video object segmentation — without flow supervision or manual annotations during inference. By integrating long-range motion trajectories, DINO-based semantics, and SAM2 prompting, SAMotion delivers dynamic segmentation masks per object even in complex, real-world scenes. Key Highlights: ✅ Spatio-Temporal Trajectory Attention (ST-ATT) – Encodes long-range motion by alternating spatial attention (across trajectories) and temporal attention (along each trajectory), capturing both global inter-object relationships and local motion evolution. ✅Motion-Semantic Decoupled Embedding (MSDE) – Separates motion and semantic reasoning in the decoder: motion-only attention is followed by DINO-based semantic augmentation through cross-attention, ensuring semantic cues refine but do not dominate motion prediction. ✅BootsTAP-Based Track Generation – Leverages high-confidence 2D trajectories from BootsTAP with visibility and confidence filtering, enriching motion cues with depth and frame-to-frame deltas (Δu, Δv, Δd) for enhanced temporal modeling. ✅Frequency-Based Positional Encoding (PE) – Adopts NeRF-style sinusoidal embeddings on spatial and temporal signals to avoid oversmoothing and preserve fine-grained motion localization across trajectories. ✅Depth-Enhanced Motion Encoding – Incorporates monocular depth estimates from Depth-Anything to model scene structure and occlusions, enabling better segmentation under 3D layout variations and partial visibility. ✅Two-Stage SAM2 Prompting – 1. Groups tracks per object (spatial/frame heuristics) 2. Uses long-range point prompts and merges fragmented masks. ✅Fine-Grained Instance-Level Masks – Handles multiple similarly-moving objects, complex articulation, clothing, limbs, etc. ✅Superior Benchmark Results – Outperforms state-of-the-art MOS and fine-grained MOS baselines (e.g., RCF, ABR, OCLR) across DAVIS17, SegTrackv2, FBMS59: DAVIS17-Moving (Fine-grained MOS): J=77.4, F=83.6 DAVIS16-Moving (MOS): J=89.0, F=89.2 ✅Robust in Challenging Conditions – Demonstrates resilience to: Camouflage textures and motion blur Transparent surfaces and reflections Strong camera motion and partial occlusion ✅Ablation-Backed Architecture – Removing DINO, MSDE, or ST-ATT leads to significant drops (up to -17 % J&F), confirming the necessity of decoupled semantic integration and spatio-temporal modeling. ✅Modular & Data-Efficient Training – Trained on a mix of synthetic (Kubric, DynamicReplica) and real-world (HOI4D) datasets, showing generalization across scene types without needing dense motion annotations at inference. Paper: lnkd.in/giH-YuFr Github: lnkd.in/gquJ_TwP Project: lnkd.in/gxmiJ6q9 Related articles from LearnOpenCV: SAM2: lnkd.in/gkG7dx65 MedSAM2: lnkd.in/gg78Pri3 #SAM2 #Segmentation #SegmentAnything

0:22

413

Ultralytics

Ultralytics

@ultralytics

23 May 2025

New tutorial | @AIatMeta Segment Anything 2 in @Google Colab with Ultralytics! 🚀 Segment objects using point and box prompts, or segment everything automatically with a ready-to-use Colab notebook. Watch here ➡️ ow.ly/1brb50VXBtC #SAM2 #SegmentAnything #Ultralytics #AI

380

Kenta@ImVisionLabs Inc.

Kenta@ImVisionLabs Inc.

@imvisionlabs

23 May 2025

0:20

928

Amy Liu 🥟

Amy Liu 🥟

@hungrydumpling_

12 May 2025

Shocked 💀⚡️ Initially tried to use #klingai for the ball swap but found the mask too restricting. Ended up using a custom #ComfyUI workflow with #segmentanything and VACE!! Featuring @sweaty__palms getting electrocuted 😬

0:12

665