各位 V2EX 的大佬、V 友们好! 无论是做 跨境电商多账号管理、海外社交媒体矩阵( TikTok/FB/Ins )、大规模网络数据爬取,还是 AI 大模型多模态训练数据清洗,一个高纯净度、稳定且高性价比的海外住宅 IP 都是刚需。 为了回馈 V 友,Novproxy 现已开启年中大促!我们带着全新上线的海量资源和全网超值的诚意优惠来啦! 🚀 Novproxy 核心优势(为什么选我们?) 资源纯净、高匿名度: 拥有 1 亿+ 活跃住宅 IP ,全球 200+ 国家与地区覆盖。不仅支持国家/城市级精准定位,更有全新上线的优质资源,每天稳定更新超过 100,000+ 纯净 IPs ! 极速与稳定兼得:99.9% 连通成功率,响应时间 小于 0.5 秒。 策略灵活: 动态住宅流量永不过期!支持轮转( Rotation )和固定会话( Sticky Sessions ,支持 1-120 分钟自定义时长)。 技术栈完美兼容: 原生支持 HTTP 、HTTPS 、SOCKS5 协议,完美兼容 Python 、Go 、Java 、Node.js 、PHP 、C/C++ 等主流语言的爬虫框架,以及各类指纹浏览器与自动化工具。 🎁 年中狂欢,多重福利引爆! 🔥 福利一:超值流量套餐,价格直接击穿底线 动态住宅流量套餐: 购买 1TB 动态住宅流量套餐,单价低至 $0.5/GB ! 此外我们还提供:长期静态 ISP IP (低至 $3.0/月/个、独享原生、无限流量)、无限流量端口/带宽套餐、以及按个计费的短效住宅 IP 。不管什么业务规模,总有一款戳中你的预算! 🆓 福利二:零门槛,免费试用福利 新用户注册,在活动期间联系客服,即可免费领取 500MB 住宅 IP 试用流量! 无需任何充值,好不好用,跑个脚本测试了再说! 💰 福利三:联盟计划全新升级,轻松赚取美金 欢迎各大站长、大 V 、社群主以及有推介资源的 V 友加入我们的联盟计划。 尊享高达 10% 的高额返利,轻松赚取高达 $1000 的推荐现金奖励! 🛠️ 业务应用场景 Data for AI: 助力多模态大语言模型( LLM )高效抓取训练数据。 跨境电商 & 社交媒体:Facebook 、TikTok 、Instagram 、Discord 多账号防关联矩阵运营。 数据采集与监控: 竞品价格监控、市场研究分析、全球机票/酒店数据聚合。 📌 快速传送门 🌐 官方网站(点击前往注册): https://novproxy.com?kwd=tt-v ✈️ 官方电报( Telegram )找客服领免费流量: @ Nov669 各位 V 友在集成或使用过程中有任何技术问题,我们的技术团队提供 24/7 全天候支持。欢迎大家注册一试,有什么建议也欢迎在帖子里留言交流!
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
能加一个内存和固态 但是查了一下 内存竟然要1000多 固态也挺贵 有推荐咋搞的么 加个内存16g 官方要1799 加1tb硬盘官方要1499 3 个帖子 - 3 位参与者 阅读完整话题
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
codex推荐了三星 Samsung 990 EVO Plus 1TB / 2TB,M.2 2280 NVMe,1T的我淘宝上一看价格1400 我现在全身上下就3000块钱,真买不起了,有没有佬友教一下买其他的什么牌子, 20 个帖子 - 16 位参与者 阅读完整话题
IT之家 6 月 6 日消息,联想 ThinkPad T16 2025 笔记本(ThinkPad T16 Gen4)昨日开售,搭载酷睿 Ultra 7 255H 处理器,32GB+1TB 售价 13999 元 。 京东 ThinkPad T16 笔记本电脑 酷睿 Ultra 7 32G 1TB 商务办公本 13999 元 直达链接 这款笔记本电脑搭载 16 核心 16 线程酷睿 Ultra 7 255H 处理器,32GB DDR5 6400MT/s 双通道内存和 1TB PCIe 4.0 M.2 2280 SSD。 这款笔记本电脑搭载 16 英寸 IPS 屏幕,分辨率 1920×1200,亮度 500nit,色域覆盖 100% sRGB;重 1.76kg,厚 18.14mm,电池容量 52.5Whr。 接口方面,这台电脑搭载 2 个雷电 4、2 个 USB-A 3.2 Gen1、1 个 HDMI 2.1、1 个 RJ45 网口和 1 个 3.5mm 耳机耳麦接口;支持人脸识别登录和指纹识别技术。 IT之家附这款笔记本主要参数如下: 京东 618 无门槛红包 面额至高 26618 元,每天抽 3 次: 点此抽红包 淘宝 618 无门槛红包 面额至高 26888 元,每天抽 1 次: 点此抽红包
IT之家 6 月 3 日消息,KIOXIA(铠侠)当地时间昨日举行了 2026 年投资者日活动。 该企业宣布将在今年夏天出样 BiCS10 1Tb TLC NAND ,这一闪存将用于下代支持 PCIe Gen6 的 CM 系列企业级固态硬盘中。 BiCS10 采用 332 层设计,堆叠层数低于部分竞争对手。铠侠在解释产品设计时表示, 当前已无法仅通过提升堆叠改善 3D NAND 的成本结构 ,更高堆叠意味着工艺复杂性和制造成本的上升,此外高堆叠还不利于能效和可靠性表现。 铠侠表示,332L BiCS10 与其推测的 400L 产品相比实现了 10% 的成本降低、10% 的能效提升、35% 的可靠性优势。 在企业层面,铠侠预测 NAND 市场的整体出货容量在 2026~2028 年实现 22% 的 CAGR,其中数据中心领域的复合年均增长率达到 46%,而 AI 推理这一细分领域的增速将达到 86%;另一方面, PC、智能手机领域的需求则将持平或小幅下降 。 此外,铠侠认为当前已是 NAND 市场供需失衡最为严重的时刻,未来将逐步改进,但 到 2027 年底前供小于求的基本面不会改变 。 在此背景下,铠侠 正专注于数据中心和企业业务 ,目标将该细分领域对整体营收的贡献比例提升到 60% 以上,通过高附加值产品推动盈利能力的提升;而 PC、智能手机端收入规模则将保持稳定。 铠侠 2026~2028 财年的 平均资本支出与研发支出都将较 2025 财年水平增长 60% 以上 。该企业将快速推进制程升级,目标实现每年约 10% 的前端单位容量成本降低。 IT之家了解到,铠侠表示 已就北上市生产基地的进一步扩建进行评估 ,目标在 2029 财年后投产。