先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
IT之家 6 月 8 日消息,Cooler Master(酷冷至尊)在上周 COMPUTEX 2026 台北国际电脑展上公布了两款机箱新品:HAF II 500 与 Silencio 600。 新一代 HAF 500 机箱此前已出现在待上市产品预告中。其预装三颗 Mighty40 系列加厚大尺寸风扇,分别是两颗前置 22040 和一颗后置 18040,利用尺寸优势实现优秀气流表现。 静音机箱 Silencio 600 结合了声学工程、高气流散热、极简设计。其独特的前板结构形成“透风不透音”的声学迷宫,在保持散热性能的同时实现降噪效果。 酷冷至尊还展出了更多应用金色工业级 3D Mesh 设计的限量版产品,除 COSMOS Alpha Gold、MasterFrame 400 Mesh Gold 外还有 MasterLiquid Atmos II Gold 液冷和一款风冷。
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
英伟达最新AI平台Vera Rubin进入量产阶段,SK海力士、三星和美光之间的竞争正从层数比拼转向技术攻坚,芯片内部热管理已成为HBM5时代的关键突破口。 AI硬件加速迭代,英伟达、AMD新一代AI服务器GPU单芯片功耗逼近1000W。HBM4已堆叠12至16层,HBM5将迈向20层堆叠。 堆叠层数越高,HBM内部热量积聚越严重,过热会触发芯片降频、算力缩水、整机稳定性下降。 英伟达和AMD等客户已明确要求HBM供应商加强散热管理。 SK海力士近期发布iHBM散热技术,将集成冷却元件内嵌到HBM中,在芯片内部开辟直通散热通道。 与传统设计相比,该技术可将热阻降低30%以上。SK海力士计划将iHBM应用于其HBM5及后续产品。 三星电子在Computex 2026上首次公开HBM5原型,并推出HPB散热方案,将导热块埋入多层DRAM裸片之间,相当于在堆叠芯片内部搭建多条独立散热烟囱。 该技术已在第七代HBM4E上完成验证,样品已于5月底首次交付客户。 三星表示,该技术可将热阻降低16%,HBM5预计在2028年左右实现量产。 美光则主攻低功耗HBM设计,并辅以硅通孔沟槽冷却技术。通过在AI加速器芯片的硅芯片内部蚀刻微型沟槽,使冷却液在其中循环流动,从而降低内部热积累。 业内人士指出,散热技术升级将带动高导热材料、先进封装制程需求爆发,重塑半导体供应链。低功耗和热管理技术将是未来HBM研发的核心方向。 查看评论
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
IT之家 6 月 8 日消息,酷冷至尊冰神 G360 龙影 2 代性能版一体式水冷散热器现已在京东发售, 定价为 599 元 。 京东 酷冷至尊冰神 G360 龙影 2 代 水冷散热器 599 元 直达链接 系列散热器冷头配备磁吸式顶盖套件,提供了 VRM 风扇顶盖款提升散热效率,相应风扇顶盖最高转速设定在 4000 RPM±10%,最大风量为 9.3 CFM,最大风压为 2.8mm H2O。 该散热器匹配 MOBIUS 120P 主风扇,具体尺寸为 120 (L) x 120 (W) x 25 (H) mm,最高转速设定在 2400 RPM±10%,风压为 127.8 m³/h(75.2CFM),噪音水平低于 30dB。 该散热器采用超薄水泵设计,匹配 EPDM 材质水冷管,搭配 29mm 厚度铝制冷排,扣具兼容英特尔 LGA 1851/1700/1200/1151/1155/1156 以及 AMD AM4/AM5 平台。 IT之家附产品参数: 京东 618 无门槛红包 面额至高 26618 元,每天抽 3 次: 点此抽红包 淘宝 618 无门槛红包 面额至高 26888 元,每天抽 1 次: 点此抽红包
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
IT之家 6 月 7 日消息,散热品牌 RYVNTEC(睿温)近日公布了 机箱用“通用型”贯流风机 Airjet 360 。从命名上来看其应兼容机箱的 3×120mm 风扇位或 360 冷排位。 RYVNTEC 官网宣称 Airjet 360 面向 1U / 2U 高度的服务器系统;不过从两侧的 RGB 灯效来看, 其电竞属性更为明显 。该贯流风机内置一对风扇单元,采用 12V DC 供电,支持 PWM 调控。 IT之家注意到,贯流风机此前已被个别机箱采用,不过都是以预装配件的形式供应;通用型 PC 贯流风机的市场潜力值得关注。
IT之家 6 月 7 日消息,新锐 PC 机箱、电源、散热品牌 amiiba 在 COMPUTEX 2026 前夕“退出隐身模式”,并在本次台北国际电脑展上带来了一系列产品。 其中机箱 Ferra 和液冷散热器 Proteus 均在 外部集成了一个装饰性的磁流体模块 。当磁铁靠近时,模块内部的磁性流体会因为磁力作用出现形变。用户可通过磁铁的方位和距离让磁流体展现不同的立体形态。 Ferra 机箱属于 micro ATX“迷你塔”类型,体积 31L,支持 418mm 显卡、360 冷排,可兼容 7 颗 120mm 风扇,四面配有防尘网。其标准形态下前面板为实木 + 铝合金,还可选石膏板;亦提供无磁流体模块版本。 而 Proteus 液冷采用静音陶瓷水泵,冷头装饰单元支持音乐律动 ARGB 灯效。该散热器采用 20mm 加厚鳍片阵列冷排和 12028 规格加厚风扇。 amiiba 在 COMPUTEX 上还展出了冷头模块可更换的 AIO 液冷 Leuceus ,而风扇网罩遵循 Proteus 那种“金属骨骼”风格的 Vitalis 电源获得 80 PLUS 钛金效率认证,功率可达 2000W。