Filter
Exclude
Time range
-
Near
🏗️ Juggernaut Z Fast is built on the Z-Image family and released by Team Juggernaut, with training by KandooAI and publishing support from RunDiffusion. ⚡ Juggernaut Z Fast is the speed-focused version of Juggernaut Z. 🎯 It is designed for 4-step generation. Based on testing in Draw Things, it typically takes at least 6 steps to consistently produce satisfying results in a single pass. ⚙️ The officially recommended settings are: Steps: 4–8 CFG: 1–1.5 Sampler: DDIM Trailing 📋 Below are the DT JSON Configs used for the four example images above, provided for reference only. —————— {"tiledDiffusion":false,"tiledDecoding":false,"height":1280,"hiresFix":false,"colorCalibration":"none","cfgZeroStar":false,"faceRestoration":"","sharpness":0,"cfgZeroInitSteps":0,"resolutionDependentShift":false,"loras":[],"strength":1,"causalInferencePad":0,"shift":2.2200000000000002,"model":"juggernautz_fast_by_rundiffusion_f16.ckpt","maskBlurOutset":0,"seedMode":2,"width":960,"steps":8,"upscaler":"","preserveOriginalAfterInpaint":true,"guidanceScale":1,"sampler":16,"batchSize":1,"maskBlur":1.5,"batchCount":1,"refinerModel":"","seed":1732208121,"controls":[]} —————— 🔗Model link: huggingface.co/RunDiffusion/…
2
11
1,089
Replying to @FSchaipp
already did, this has variable batchsize, starts small then increases
1
142
Jun 9
Replying to @akazwz_
batchsize 并发低吧
1
1,362
BULK INSERT moves flat-file data into SQL Server at maximum speed. BULK INSERT dbo.Orders FROM 'D:\Import\orders.csv' WITH ( FORMAT = 'CSV', FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', TABLOCK, BATCHSIZE = 50000 ); TABLOCK: takes a table lock instead of row locks — dramatically faster for large loads. BATCHSIZE: commits every N rows. Smaller batches reduce log usage but add commit overhead. For minimal logging (bulk-logged recovery model), add ROWS_PER_BATCH and ensure the table has no clustered index or you're inserting into an empty clustered index. BCP is the command-line equivalent when you need automation or scripting. #SQLServer #BULKINSERT #ETL #DataLoading
16
The paper mentions using 8x A100 for all their experiments. @zhang_weny92997 and @Jingwen_Sun_ can probably tell you more about it. On our side for the finetuning we used 4x H100, you can definitely get away with less GPUs and VRAM, especially given the size of the batchsize.
1
47
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Dropped from Microsoft Train agent skills like you train neural networks — with epochs, (mini-)batchsize, learning rates, and validation gates — but without touching model weights. github.com/microsoft/SkillOp…
3
2
10
440
👇🏻 Here are the configs and the LoRA link: 🔷With Z-Image-Fun-Lora-Distill-2-Steps-2603 —————— {"cfgZeroStar":false,"tiledDecoding":false,"strength":1,"cfgZeroInitSteps":0,"batchSize":1,"guidanceScale":1,"preserveOriginalAfterInpaint":true,"resolutionDependentShift":false,"controls":[],"causalInferencePad":0,"maskBlurOutset":0,"maskBlur":1.5,"batchCount":1,"refinerModel":"","shift":1.5,"steps":3,"loras":[{"mode":"all","file":"z_image_fun_lora_distill_2_steps_2603_lora_f16.ckpt","weight":0.90000000000000002}],"width":1280,"sampler":16,"sharpness":0,"faceRestoration":"","seed":385987460,"model":"z_image_turbo_1.0_i8x.ckpt","upscaler":"","hiresFix":false,"tiledDiffusion":false,"seedMode":2,"height":960} —————— 🔷With Z-Image-Fun-Lora-Distill-4-Steps-2603 —————— {"cfgZeroStar":false,"tiledDecoding":false,"strength":1,"cfgZeroInitSteps":0,"batchSize":1,"guidanceScale":1,"preserveOriginalAfterInpaint":true,"resolutionDependentShift":false,"controls":[],"causalInferencePad":0,"maskBlurOutset":0,"maskBlur":1.5,"batchCount":1,"refinerModel":"","shift":2.5,"steps":4,"loras":[{"mode":"all","file":"z_image_fun_lora_distill_4_steps_2603_lora_f16.ckpt","weight":0.75}],"width":1280,"sampler":16,"sharpness":0,"faceRestoration":"","seed":3919178504,"model":"z_image_turbo_1.0_i8x.ckpt","upscaler":"","hiresFix":false,"tiledDiffusion":false,"seedMode":2,"height":960} —————— 🔗LoRA Link:huggingface.co/alibaba-pai/Z…
1
10
2,462
跑媒认大作业发现准确率比同学低半截 结果发现我把batchsize调低了(
7
733
Replying to @KassLoizo @94bord_
j'pense pas, batchsize 2 sous 20g de vram a crash, et tu me dis que 16g suffit ''largement'' ?
1
4
1,618
Replying to @pyrovaler0nex
Force a toi, sinon un T4 de collab fait pas l'affaire ? Si tu augmente le batchsize au pire?
1
1
5,367
Replying to @joewill1082630
Tpu v4-64 and batchsize ~800
2
2,782
Replying to @vidhanmertiya
Training can do a lot of task in the same time. (like 800 users.) Inference is like one guy. Training is so much faster cus nothing need to wait for anything. Cus there is so much data! And inference first need to process embedding then other then other them other. And yk it appear slower. If I would measure inference similar to training so 800users vs batchsize 800. I would get similar numbers. But individual user would not seen that 1Million. Cus he would see only his part of the model computed so he would anyway need to wait for the model to pass all the layers. Where in training one layer finish task x move to layer 2 while layer 1 automatically do task y. No time wasted! 😁 Ye. I'm also trying to make them very very fast on inference. I hope my explanation is Okey clear it's night for me so 😅 but gemini/chatgpt can explain it too. Maybe better than half sleeping me.
1
3
267
Replying to @vidhanmertiya
Its training 😅 so technically it answer fast but only with a big batch size! And it's 800~~(don't remeber exact numer rn) batchsize. On inference for one client it sadly going to be slower :(((
1
4
2,602
Digital agent learning needs massive rollouts. But digital agent rollouts are painfully slow due to heavy environments. 🐌 🚀 We introduce NanoRollout, a lightweight open infra (900 lines core code) for digital agent rollout at scale, validated with three workloads: 🏋️ Large batchsize (4K) SWE Agent RL -> surpasses DeepSWE-32B 🧪 250k distilled coding trajectories -> SOTA ≤32B open coding agent ⚡ Fast evaluation on coding/cua/unified agent -> finish Check our Blog: cocoa-org.notion.site/nanoro…
2
39
136
35,478
"defining an extra low effort scringlo scramble thought" isn't a second smaller classification model! it's fast only because of batchsize>>2 inference and writing an Actual API Server, because blah blah blah words words words 'prefill<->AR decode inequality'!
1
5
598
my favorite part about having 96GB vram is i can run many small experiments at once. the tiny matmuls usually dont saturate the gpu, even when they step on each other. cranking batchsize and making the flywheel faster
1
30
2,571
🆚Some Comparisons and Draw Things Configs for SeedVR2 3B/7B are provided for your reference, no prompt needed: (feel free to set the dimensions according to your needs) 📝SeedVR2 3B —————— {"faceRestoration":"","seedMode":2,"controls":[],"loras":[],"cfgZeroStar":false,"upscaler":"","width":768,"batchSize":1,"tiledDiffusion":false,"height":1152,"shift":1,"refinerModel":"","cfgZeroInitSteps":0,"maskBlur":2.5,"maskBlurOutset":0,"steps":1,"batchCount":1,"strength":1,"hiresFix":false,"preserveOriginalAfterInpaint":true,"tiledDecoding":false,"sharpness":0,"guidanceScale":1,"model":"seedvr2_3b_i8x.ckpt","causalInferencePad":0,"sampler":10,"seed":3016078078} —————— 📝SeedVR2 7B —————— {"faceRestoration":"","seedMode":2,"controls":[],"loras":[],"cfgZeroStar":false,"upscaler":"","width":768,"batchSize":1,"tiledDiffusion":false,"height":1152,"shift":1,"refinerModel":"","cfgZeroInitSteps":0,"maskBlur":2.5,"maskBlurOutset":0,"steps":1,"batchCount":1,"strength":1,"hiresFix":false,"preserveOriginalAfterInpaint":true,"tiledDecoding":false,"sharpness":0,"guidanceScale":1,"model":"seedvr2_7b_i8x.ckpt","causalInferencePad":0,"sampler":10,"seed":2757549182} ——————
2
1
12
2,510