12kaz

12kaz

160 Photos and videos

Tweets

Pinned Tweet

12kaz @12_technology

7 May 2022

機械学習手法NAFNetでノイズ除去、ブラー除去、超解像を試してみました。カラーノイズ除去はなかなかの精度です。使い方はこちら 12-technology.com/2022/05/na…

0:12

697

3,341

Axross Recipe：つかえる知識をともに学べる場所

12kaz retweeted

Axross Recipe：つかえる知識をともに学べる場所

@AxrossRecipe_SB

5 Dec 2024

＼🍁11月有料レシピランキング🏆／ 🥇：PointNetで3次元の物体検出を行うレシピ 🥈：YOLOとStrongSORTを用いて物体検出・物体追跡するレシピ 🥉：MemoRAGを用いて検索拡張生成(RAG)アプリを開発するレシピ 4位以下はコチラのURLからぜひご覧ください👇 axross-recipe.com/news_items… #AxrossRecipe #AI

426

Axross Recipe：つかえる知識をともに学べる場所

12kaz retweeted

Axross Recipe：つかえる知識をともに学べる場所

@AxrossRecipe_SB

3 Sep 2024

＼🍀8月有料レシピランキング🍀／ 🥇：rinna社の日本語特化GPTモデルを用いて文書生成を行うレシピ 🥈：ChatGPTと同じNLPモデルであるGPT3.5系と話せるDjangoアプリをデプロイするレシピ 🥉：Imagenを用いてテキストから画像生成するレシピ axross-recipe.com/news_items… #AxrossRecipe #生成AI

2024年8月人気レシピランキング発表しました🎐 | お知らせ | Axross Recipe（アクロスレシピ）

axross-recipe.com

927

Axross Recipe：つかえる知識をともに学べる場所

12kaz retweeted

Axross Recipe：つかえる知識をともに学べる場所

@AxrossRecipe_SB

1 Jul 2024

本レシピでは、従来技術より最大158倍高速に動作するStreamV2Vを用いて、動画変換を行う方法と、StreamV2VとGradioを用いて、リアルタイム画像生成アプリを生成する方法をご紹介します。 axross-recipe.com/recipes/15… #AxrossRecipe #StreamV2V #画像生成 #動画変換

StreamV2Vを用いて、動画変換、画像生成するレシピ

本レシピでは、従来技術より最大158倍高速に動作するStreamV2Vを用いて、動画変換を行う方法と、StreamV2VとGradioを用いて、以下のようなリアルタイム画像生成アプリを生成する方法をご紹介します。![](https:/...

axross-recipe.com

262

Jinbo Xing

12kaz retweeted

Jinbo Xing @Double47685693

5 Feb 2024

🚀Our 𝑫𝙮𝒏𝙖𝒎𝙞𝑪𝙧𝒂𝙛𝒕𝙚𝒓 just got a massive upgrade!🚀 🎯Better Dynamic, Higher Resolution and Stronger Coherence! Code - github.com/Doubiiu/DynamiCra… Project - doubiiu.github.io/projects/D… Demo - huggingface.co/spaces/Doubii…

1:37

42,213

Zuntan

12kaz retweeted

Zuntan @Zuntan03

8 Jan 2024

日本語の短いテーマから、画像生成プロンプト&和訳とアップスケールした絵とセリフ&感情付き音声を、雑然と生成するEasyZatuGenです。 calm2-chat-AWQとStreamDiffusionとStyle-Bert-VITS2の三点盛りで、すべてをローカルで生成します。要 RTX 3060 12GB。声だけなら8GB。 github.com/Zuntan03/EasyZatu…

0:39

275

39,751

深津貴之 / THE GUILD, note

12kaz retweeted

深津貴之 / THE GUILD, note

@fladdict

4 Jan 2024

reading... 自分が気に入った曲がＡＩ生成だと分かった瞬間。「感動の先に人間がいないことが急に不安になった。まだ心の準備ができていないんだと思い知った」。そしてＡＩが作った曲を聞き分けられなくなっている現実にショックを受けた。 kobe-np.co.jp/news/culture/2…

＜AIと創造力　表現者編＞(1)音楽プロデューサーtofubeats「感動の先に人間がいない」

■楽曲生成、変わる芸術の価値観

kobe-np.co.jp

513

2,563

326,664

Alex Carlier

12kaz retweeted

Alex Carlier @alexcarliera

20 Dec 2023

Google just announces VideoPoet: a multimodal video generation model! It's massively multimodal and can take as input: text, image, depth & optical flow or a masked video and is one of the first models that generates video audio! More info below ⬇️⬇️

0:04

314

43,408

12kaz

12kaz @12_technology

13 Dec 2023

Stable Video Diffusionを用いて、画像から動画を生成してみました。入力は画像1枚です。こちらでお試しいただけます huggingface.co/spaces/multim… #AI #AIArt #stablediffusion #AIイラスト

0:04

238

12kaz

12kaz @12_technology

12 Dec 2023

Google Colabでテキストからのコード生成が提供され始めました画像を表示する際など地味な検索作業が減りそうです

157

AK

12kaz retweeted

@_akhaliq

21 Nov 2023

Stability releases Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets model: huggingface.co/stabilityai/s… present Stable Video Diffusion — a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary widely, and the field has yet to agree on a unified strategy for curating video data. In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.

0:20

150

640

200,430

AK

12kaz retweeted

@_akhaliq

16 Nov 2023

Drivable 3D Gaussian Avatars paper page: huggingface.co/papers/2311.0… present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.

6:20

316

1,493

327,043

camenduru

12kaz retweeted

camenduru

@camenduru

18 Nov 2023

Little @Gradio ❤ code 🌿 and little @diffuserslib ❤ code 🍅 real-time drawing app is done 🥗 Thanks to @SimianLuo (LCM) ❤ @Gradio Team ❤ @diffuserslib Team ❤ 🦒colab: please try it 🐣 github.com/camenduru/latent-…

1:48

113

22,117

AK

12kaz retweeted

@_akhaliq

15 Nov 2023

One-2-3-45 : Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion paper page: huggingface.co/papers/2311.0… Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45 , an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image.

0:08

437

90,107

12kaz

12kaz @12_technology

27 Oct 2023

Fuyu-8BがHuggingFaceのSpaceでトレンドです。このマルチモーダルモデルはテキストと画像プロンプトを組み合わせて様々なタスクを実現しています。以下は、スクリーンショット画像からテキスト"Share your repair"を検索している様子です #Python #AI

ALT Find text in screenshot

183

12kaz

12kaz @12_technology

10 Oct 2023

#LLM を動作させるために必要なGPU数を見積もってくれる #HuggingFace の #Space です手持ちのGPUに載るLLMを探す際に役立ちそうです huggingface.co/spaces/Voktur…

308

12kaz

12kaz @12_technology

4 Oct 2023

従来技術より高速化を果たした3Dメッシュ生成技術である #DreamGaussian が発表されました画像、テキストから3Dメッシュが生成できます。 Github github.com/dreamgaussian/dre… #Python #TextTo3D #ImageTo3D

0:06

691

12kaz

12kaz @12_technology

24 Sep 2023

PDF保存された論文のOCRに特化したNougatを #GoogleColab で動かしてみました。 PDF上の数式をマークアップ言語に変換できます。 12-technology.com/2023/09/no… #Python #OCR

[Nougat] 論文・数式に特化したOCRでArxivを文字認識する

本記事では、Nougatと呼ばれる機械学習手法を用いて、PDFで保存された科学文書を文字認識(OCR)する方法をご紹介しています。

12-technology.com

378

12kaz

12kaz @12_technology

22 Sep 2023

学術文書のためのOCR #Nougat のSpaceです。 Arxivの論文をリンクを入力すると文字認識はもちろんのこと、数式をマークアップ言語に変換してくれます。 huggingface.co/spaces/ysharm… #Python #OCR #AI

Nougat - a Hugging Face Space by ysharma

Discover amazing ML apps made by the community

huggingface.co

322

12kaz

12kaz @12_technology

11 Sep 2023

#MusicGen を用いて Text to Musicを試してみました。以下は「crazy piano jazz」から生成した音楽です。 Google Colabでお試しいただけます 12-technology.com/2023/09/mu… #Python #GenerativeAI #AIart

0:10

315

12kaz

12kaz @12_technology

11 Sep 2023

#MusicGen を用いて、テキストから音楽を生成いてみました。以下は「Solid sound that makes you feel Japanese tradition」から生成した音楽です Google Colabでお試しいただけます 12-technology.com/2023/09/mu…

0:10

367

2024年8月人気レシピランキング発表しました🎐 | お知らせ | Axross Recipe（アクロスレシピ）

StreamV2Vを用いて、動画変換、画像生成するレシピ

＜AIと創造力 表現者編＞(1)音楽プロデューサーtofubeats「感動の先に人間がいない」

[Nougat] 論文・数式に特化したOCRでArxivを文字認識する

Nougat - a Hugging Face Space by ysharma

＜AIと創造力　表現者編＞(1)音楽プロデューサーtofubeats「感動の先に人間がいない」