I’m excited to announce that Sierra has acquired Opera Tech in Japan. Opera’s co-founders, Keita Morikawa and Kiyo Kunii, started the company with the simple idea that AI could help businesses deliver high-quality customer experiences at scale. We’re so excited to have them join us to lead Sierra in Japan. sierra.ai/blog/sierra-acquir…

213

22,440

Lyaka

Lyaka @lyakaap

Feb 10

Claude Codeに写真の現像を頼んでみたら、RAW→JPEG変換して結果を自分の目で見て「暗いからもう少し露出上げよう」「空飛んだから下げよう」って試行錯誤し始めて、普通にそこそこのものができた

1,306

watany

Lyaka retweeted

watany @_watany

Jan 23

某所で80人に向けたコーディングエージェントの研修をしました。研修資料を公開します。 speakerdeck.com/watany/agent…

Agentic Coding 実践ワークショップ

某所での研修資料です。初学者向け・3～4時間を想定しています。出張も可能ですので、ご依頼お待ちしております。

speakerdeck.com

172

1,804

601,286

ITmedia AI＋

Lyaka retweeted

ITmedia AI＋

@itm_aiplus

18 Dec 2025

LINEヤフー、日本語マルチモーダル基盤モデル「clip-japanese-base-v2」を開発　商用利用もOK itmedia.co.jp/aiplus/article…

LINEヤフー、日本語マルチモーダル基盤モデル「clip-japanese-base-v2」を開発　商用利用もOK

LINEヤフーは、日本語マルチモーダル基盤モデル「clip-japanese-base-v2」を開発したと発表した。

itmedia.co.jp

499

90,774

Lyaka

Lyaka @lyakaap

18 Dec 2025

日本語CLIPの新バージョンを公開しました！蒸留とデータ増強でかなりパワーアップしてます！今回もApache 2.0なのでぜひ色々な場面で使ってください！ 🤗： huggingface.co/line-corporat…

line-corporation/clip-japanese-base-v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

LINEヤフー Tech

@lycorptech_jp

18 Dec 2025

LINEヤフー Tech Blog 🆕 『高性能な日本語マルチモーダル基盤モデル「clip-japanese-base-v2」の公開』 - 日本語特化CLIPを高性能化し公開 - 大規模データ収集と精密フィルタによる精度の底上げ - 知識蒸留によるさらなる精度改善 techblog.lycorp.co.jp/ja/202…

261

36,903

LINEヤフー Tech

Lyaka retweeted

LINEヤフー Tech

@lycorptech_jp

25 Nov 2025

LINEヤフー Tech Blog🆕 「コンピュータビジョンの最難関国際会議 ICCV 2025に論文およびワークショップが採択されました」 - 最難関国際会議ICCV2025への論文採択・参加報告 - 基盤「データ」に関する国際ワークショップ開催 - デザイン x コンピュータビジョンの最先端調査 techblog.lycorp.co.jp/ja/202…

コンピュータビジョンの最難関国際会議 ICCV 2025 に論文およびワークショップが採択されました（参加報告レポート）

こんにちは。LINEヤフーで画像生成やデザイン生成の研究開発を担当している北田 (@shunk031) です。先日 2025 年 10 月 19 日から 23 日までアメリカ・ハワイにて開催された ...

techblog.lycorp.co.jp

15,191

Lyaka

Lyaka @lyakaap

19 Nov 2025

めちゃくちゃ参考になる記事。この辺の話辛すぎて先延ばしにしてたから本当にありがたい🙏

Kazuki Fujii

@kazukifujii

19 Nov 2025

NVIDIA NeMoを利用したgpt-ossの学習方法について記事を執筆しました NGCコンテナ内のTransformerEngine、cuDNN versionのupdateだけでなく、NeMo側の実装、Megatron-Coreの実装も修正する必要がありました LLMの研究開発において実は大変なライブラリ整備に関する記事です zenn.dev/turing_motors/artic…

5,753

Lyaka

Lyaka @lyakaap

19 Nov 2025

gpt-oss フルファインチューニング難しい問題・・・日本語のCoTデータを自前で集めるしかないんだろうか > gpt-ossの英語能力、数学能力、コード能力、深い推論を伴うReasoning(推論)能力などを損なわずに日本語能力、日本語知識を強化するのは容易ではありません(=困難です)。

419

Issa Sugiura

Lyaka retweeted

Issa Sugiura @strayer_13

28 Oct 2025

大規模かつ高品質な日本語画像テキスト対データセットのWAONを公開しました!🇯🇵 新たに構築した日本文化画像分類ベンチマークWAON-BenchにおいてWAONはReLAIONより効率的にモデルの性能を向上させ、SoTAの性能を達成することを示しています。ブログ記事もぜひご覧ください! speed1313.github.io/posts/WA…

WAON: 大規模かつ高品質な日本語画像・テキスト対データセット | speed1313 Blog

本稿では, LLM勉強会で構築した, 大規模かつ高品質な日本語画像・テキスト対データセットのWAONを紹介します.

speed1313.github.io

Issa Sugiura @strayer_13

28 Oct 2025

We introduce WAON, a large-scale and high-quality Japanese image–text dataset comprising 155M pairs. Fine-tuning SigLIP2 on WAON improves performance on Japanese cultural benchmark WAON-Bench more efficiently than using ReLAION, achieving SoTA. Try WAON now! 🇯🇵📷

122

22,567

Lyaka

Lyaka @lyakaap

21 Oct 2025

DeepSeek-OCRの文書をそのままビジョントークンとして圧縮するという話は、長いコンテキストを扱うのが得意なGeminiでも似たようなことをやっているかもという推測たしかに有り得そう

Jeffrey Emanuel

@doodlestein

20 Oct 2025

DeepSeek just released a pretty shocking new paper. They really buried the lede here by referring to it simply as DeepSeek OCR. While it’s a very strong OCR model, the purpose of it and the implications of their approach go far beyond what you’d expect of “yet another OCR model.” Traditionally, vision LLM tokens almost seemed like an afterthought or “bolt on” to the LLM paradigm. And 10k words of English would take up far more space in a multimodal LLM when expressed as intelligible pixels than when expressed as tokens. So those 10k words may have turned into 15k tokens, or 30k to 60k “visual tokens.” So vision tokens were way less efficient and really only made sense to use for data that couldn’t be effectively conveyed with words. But that gets inverted now from the ideas in this paper. DeepSeek figured out how to get 10x better compression using vision tokens than with text tokens! So you could theoretically store those 10k words in just 1,500 of their special compressed visual tokens. This might not be as unexpected as it sounds if you think of how your own mind works. After all, I know that when I’m looking for a part of a book that I’ve already read, I imagine it visually and always remember which side of the book it was on and approximately where on the page it was, which suggests some kind of visual memory representation at work. Now, it’s not clear how exactly this interacts with the other downstream cognitive functioning of an LLM; can the model reason as intelligently over those compressed visual tokens as it can using regular text tokens? Does it make the model less articulate by forcing it into a more vision-oriented modality? But you can imagine that, depending on the exact tradeoffs, it could be a very exciting new axis to greatly expand effective context sizes. Especially when combined with DeepSeek’s other recent paper from a couple weeks ago about sparse attention. For all we know, Google could have already figured out something like this, which could explain why Gemini has such a huge context size and is so good and fast at OCR tasks. If they did, they probably wouldn’t say because it would be viewed as an important trade secret. But the nice thing about DeepSeek is that they’ve made the entire thing open source and open weights and explained how they did it, so now everyone can try it out and explore. Even if these tricks make attention more lossy, the potential of getting a frontier LLM with a 10 or 20 million token context window is pretty exciting. You could basically cram all of a company’s key internal documents into a prompt preamble and cache this with OpenAI and then just add your specific query or prompt on top of that and not have to deal with search tools and still have it be fast and cost-effective. Or put an entire code base into the context and cache it, and then just keep appending the equivalent of the git diffs as you make changes to the code. If you’ve ever read stories about the great physicist Hans Bethe, he was known for having vast amounts of random physical facts memorized (like the entire periodic table; boiling points of various substances, etc.) so that he could seamlessly think and compute without ever having to interrupt his flow to look something up in a reference table. Having vast amounts of task-specific knowledge in your working memory is extremely useful. This seems like a very clever and additive approach to potentially expanding that memory bank by 10x or more.

2,710

すずどら

Lyaka retweeted

すずどら @sz_dr

8 Sep 2025

最近やっていた仕事です Vespaを活用したYahoo!フリマのベクトル検索 —— 類似画像で広がる商品探索 techblog.lycorp.co.jp/ja/202…

Vespaを活用したYahoo!フリマのベクトル検索 —— 類似画像で広がる商品探索

こんにちは。LINEヤフー株式会社の鈴木です。業務ではYahoo!オークションとYahoo!フリマの検索改善を担当しています。最近、Yahoo!フリマで「見た目が似ている商品を探せる機能」をリリースし...

techblog.lycorp.co.jp

8,036

Lyaka

Lyaka @lyakaap

15 Apr 2025

弊チームのサマーインターンの募集です！大規模な社内の画像データを使ったVLMの研究開発ができて楽しいと思うのでぜひ！ lycorp.co.jp/ja/recruit/newg…

高精度な日本語大規模Vision and Language Modelの研究開発｜LINEヤフー株式会社

LINEヤフー株式会社の2025年度インターシップの詳細高精度な日本語大規模Vision and Language Modelの研究開発のページです。

lycorp.co.jp

8,249

Mikihiro Tanaka

Lyaka retweeted

Mikihiro Tanaka @mikittt417

8 Mar 2025

月曜日からNLP2025に参加します！以下の論文を発表する予定で、内容は 1. 日本語MLLMで既存の公開モデルの精度を上回るものができたこと 2. 新しく作成したJIC-VQAベンチマークについてになります。 JIC-VQA: huggingface.co/datasets/line… 論文プロジェクページ: mikittt.github.io/posts/Japa… #NLP2025

line-corporation/JIC-VQA · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

106

19,173

eisaku｜LayerX

Lyaka retweeted

eisaku｜LayerX @eisaku9393

6 Mar 2025

3月7日はサウナの日です！ということでサウナ企画！実はサウナーの弊社代表@fukkyy や @y_matsuwitter のおすすめサウナも紹介しています！是非見てみてください！🤟 -- LayerXメンバーが選ぶおすすめサウナ #日めくりLayerX｜Shimomura Eisaku @eisaku9393 #note note.com/jolly_koala293/n/n7…

LayerXメンバーが選ぶおすすめサウナ #日めくりLayerX｜Shimomura Eisaku

はじめにこんにちは、LayerXで事業開発を担当しているshimomuと申します。 LayerXには2024年3月に事業開発として入社後、途中”メラ期”宣言に伴いFSに異動しました。FSとしての学びは別途noteにまとめようと思っています。そして3月からは兼務という形でバクラク勤怠チームにジョインしております。怒涛！！！補足：メラ期については代表の福島のnoteを御覧ください...

note.com

21,713

Mikihiro Tanaka

Lyaka retweeted

Mikihiro Tanaka @mikittt417

5 Feb 2025

#NLP2025 3月11日（火） 13:00-14:30 Q3で、日本語のマルチモーダル大規模言語モデルの開発に関するポスター発表をします。興味のある方はぜひ来てください！

NLP2026 UTSUNOMIYA @anlpmeeting

5 Feb 2025

🎉大会プログラム公開🎉 #NLP2025 の発表件数は778件と、過去最多！プログラム委員会で調整を重ねて口頭発表・ポスター発表ともテーマごとに分類し、座長や聴講者と共に活発な議論ができるようセッションを組み立てました。プログラムはこちらからご確認ください。 anlp.jp/proceedings/annual_m…

3,261

Ikki Tanaka(kyazuki)

Lyaka retweeted

Ikki Tanaka(kyazuki)@ikki407

5 Feb 2025

ベールに包まれてたベイスターズを支えるAI活用いっぱい話すよ⚾️（16:00〜) ここから見れます！ techcon2025.dena.dev

DeNA × AI Day ‖ DeNA TechCon 2025

新しい技術の進化と普及はめざましく今まで以上のスピードとインパクトを持って世の中をより良く変えていける、そんな時代に私たちはいます。DeNA では「技術を活かし、新しい価値を創造する」ことを大切にしこれまでの技術と、これからの技術を駆使しながらさらなる Delight の実現に日々向き合っています。共に切磋琢磨しながら、さまざまな技術を使いこなす、DeNAの取り組みを2つのイベントを通して紹...

techcon2025.dena.dev

DeNA×AI

@DeNAxAI_NEWS

9 Dec 2024

/／ #ベイスターズを支えるAI技術をついに公開⚾️ DeNA × AI Day || TechCon セッション紹介✨ \＼捕手の成長、投手の復活、球界屈指の強力打線。その裏にはDeNAのAI技術があります。 DeNAのスポーツ事業戦略と合わせてご紹介！ #DeNAxAI_Day #denatechcon techcon2025.dena.dev/session…

2,671

しゅんけー「📕Pythonで学ぶ画像生成」発売中！

Lyaka retweeted

しゅんけー「📕Pythonで学ぶ画像生成」発売中！@shunk031

4 Dec 2024

同じ部の基盤モデルチームがありえん強いCLIP拡張モデルを作ってプロダクト応用ガッツリ進めている話😤 自社開発のマルチモーダル基盤モデルを用いたYahoo!オークションの出品審査効率化 techblog.lycorp.co.jp/ja/202…

自社開発のマルチモーダル基盤モデルを用いたYahoo!オークションの出品審査効率化

LINEヤフー Advent Calendar 2024の参加記事です。こんにちは。LINEヤフーのFoundation Models開発担当チームです。われわれのチームでは、画像と言語のマルチモーダ...

techblog.lycorp.co.jp

112

8,554

Lyaka

Lyaka @lyakaap

3 Dec 2024

#ViEW2024 の特別講演2のセッションにてVLMのお話をします。 VLM開発の話、ヤフオクでの事業応用事例の紹介、実応用における課題や解決策の話など、盛りだくさんの内容になっていると思います。一時間という長尺ですがぜひ！

ViEW2026 @iaipviewx

30 Sep 2024

特別講演（横尾氏）を更新しました。 tc-iaip.org/view/2024/speake… #パシフィコ横浜 #ViEW2024

3,912

Sebastian Raschka

Lyaka retweeted

Sebastian Raschka

@rasbt

3 Nov 2024

If you are curious how Multimodal LLMs work, I wrote a new article to explain the two main approaches, decoder-only- and cross-attention-style: magazine.sebastianraschka.co… Plus, I reviewed and summarized the 10 latest research papers to see how it's done in practice. Happy reading!

Understanding Multimodal LLMs

An introduction to the main techniques and latest models

magazine.sebastianraschka.com

305

1,497

77,907

simple-evals-mmの紹介 | speed1313 Blog

Agentic Coding 実践ワークショップ

LINEヤフー、日本語マルチモーダル基盤モデル「clip-japanese-base-v2」を開発 商用利用もOK

line-corporation/clip-japanese-base-v2 · Hugging Face

コンピュータビジョンの最難関国際会議 ICCV 2025 に論文およびワークショップが採択されました（参加報告レポート）

WAON: 大規模かつ高品質な日本語画像・テキスト対データセット | speed1313 Blog

Vespaを活用したYahoo!フリマのベクトル検索 —— 類似画像で広がる商品探索

高精度な日本語大規模Vision and Language Modelの研究開発｜LINEヤフー株式会社

line-corporation/JIC-VQA · Datasets at Hugging Face

LayerXメンバーが選ぶおすすめサウナ #日めくりLayerX｜Shimomura Eisaku

DeNA × AI Day ‖ DeNA TechCon 2025

自社開発のマルチモーダル基盤モデルを用いたYahoo!オークションの出品審査効率化

Understanding Multimodal LLMs

LINEヤフー、日本語マルチモーダル基盤モデル「clip-japanese-base-v2」を開発　商用利用もOK