moe - WWW.YOUINFO.SITE - WWW.YOUINFO.SITE

SILX AI 正式发布 Quasar-Preview：18B MoE 架构的早期预览版拥有5M上下文长度

LinuxDo 最新话题 · 2026-06-09 13:28:35+08:00 · tech

今日，SILX AI 宣布推出其 Quasar 基础模型系列的首个公开版本—— Quasar-Preview 。 Quasar-Preview 并非旨在与当前顶尖模型“刷榜”竞争，而是一个用于验证和探索前沿架构的奠基之作。它的主要技术规格包括：采用约 18B 总参数的混合专家（MoE）架构，其中激活参数（Active Parameters）仅为 2B 级别，保持了极高的推理效率。配置了实验性的 500万（5M）Token 上下文窗口，采用 Safe NoPE / DrOPE 风格的阶段性长上下文扩展方法，专为未来的基于内存的系统而设计。模型基于 Loop Transformer 和 Quasar 混合注意力构建，内部包含了 Quasar、Raven 和 GLA 混合层，并结合了稀疏 MoE 路由技术。目前训练所用的 Token 数量在 1T 到 1.5T 之间（其中长上下文扩展路径目前接收了不到 1B 的 Token）。官方强调， Quasar-Preview 并非最终形态的 Quasar 模型，也不能代表该架构的最终质量。它采用 MIT 协议开源发布，旨在将架构公之于众，方便研究人员进行测试与开发。该模型依托 Bittensor（SN24）去中心化基础设施进行训练。SILX AI 计划在未来通过以下方式持续提升模型性能：迭代式的子网训练与知识蒸馏更长的训练周期与更强的后训练进一步的长上下文扩展训练以及架构更新 huggingface.co silx-ai/Quasar-Preview · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science. 1 个帖子 - 1 位参与者阅读完整话题

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-06 03:58:27+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-06 03:02:20+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-06 00:12:11+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

相关专题

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-06 00:12:11+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

相关专题

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 22:08:18+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 19:59:18+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

相关专题

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 19:59:18+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 17:49:16+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 17:22:24+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 16:50:55+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 13:15:41+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 13:15:41+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 09:56:38+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 09:56:38+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 09:44:40+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 09:36:40+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

相关专题

Gemma4 12B 如何跑在 16G 显存上？

V2EX - 技术 · 2026-06-05 08:45:01+08:00 · tech

Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。 https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/ 看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23.9GB 左右。 https://huggingface.co/google/gemma-4-12B-it/tree/main https://www.kaggle.com/models/google/gemma-4/transformers/gemma-4-12b-it Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

发现了vllm 0.19.0的一个bug

LinuxDo 最新话题 · 2026-06-01 17:14:39+08:00 · tech

部署vllm 0.19.0的时候发现了以下问题：当在6卡上部署MOE模型，设置tensor-parallel-size为1，data-parallel-size为6时，会报错，因为：模型在加载到 MoE（混合专家）的 SharedFusedMoE 这一层时，它需要执行那行断言：assert intermediate_size % self.tp_size == 0。问题就在于，vLLM V1 引擎在重构多进程执行器时，没有把全局的进程总数（World Size = 6）和局部的张量并行大小（TP Size = 1）隔离干净。导致 SharedFusedMoE 在尝试获取当前进程的 self.tp_size 时，错误地读取到了全局的 GPU 总数（6）。因为 Qwen 模型的专家网络维度（例如 3584）根本无法被 6 整除，所以这就导致了一个本不该触发的 AssertionError 1 个帖子 - 1 位参与者阅读完整话题

[慢慢慢讯]阶跃星辰 Step 3.7 Flash 发布

LinuxDo 最新话题 · 2026-05-30 19:43:36+08:00 · tech

模型采用 198B参数MoE混合专家架构，仅11B参数为动态激活参数，在保障高性能的同时大幅提升推理效率，原生支持 256K超长上下文窗口，可高效处理海量文本与长序列任务。在工具使用方面，它致力于高可靠性，τ²-bench 得分超过 98%。Step 3.7 Flash 兼容 Claude Code、MCP 协议等工具链，并支持在 Mac Studio M4 Max 等设备上本地运行。模型权重以 Apache 2.0 许可开源。 1 个帖子 - 1 位参与者阅读完整话题

/tag/moe