先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
先说结论,能跑,但没办法长期跑,主要问题是散热,外挂风扇支架也不太能解决问题,高强度跑温度上升快,持续高温机器会降频。如果考虑便携+生产力,推荐上 mac book pro 吧。 装了两个平台,ollama 跟 olmx ,测试下来,olmx 平台会更快些,考虑到机器 32G 的内存,能跑的模型大小不要超 22GB 附上部分主流模型下载容量大小及 olmx 平台测试结果给大家做参考 Qwen3.5-4B-MLX-4bit 2.85GB gemma-4-26b-a4b-it-4bit 14.57GB Qwen3.6-35B-A3B-4bit 15.13GB GLM-4.7-Flash-4bit 15.71GB gpt-oss-20b-MXFP4-Q8 11.27GB oMLX - LLM inference, optimized for your Mac Benchmark Model: Qwen3.5-4B-MLX-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1001.6 22.74 1022.4 tok/s 44.3 tok/s 3.889 296.2 tok/s 3.29 GB pp4096/tg128 3540.9 23.76 1156.8 tok/s 42.4 tok/s 6.558 644.1 tok/s 3.90 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 44.3 tok/s 1.00x 1022.4 tok/s 1022.4 tok/s 1001.6 3.889 2x 88.3 tok/s 1.99x 407.6 tok/s 203.8 tok/s 3040.1 7.924 4x 175.1 tok/s 3.95x 322.7 tok/s 80.7 tok/s 6833.9 15.617 Benchmark Model: gemma-4-26b-a4b-it-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1500.5 24.21 682.4 tok/s 41.6 tok/s 4.575 251.8 tok/s 14.23 GB pp4096/tg128 4863.4 25.14 842.2 tok/s 40.1 tok/s 8.056 524.3 tok/s 14.91 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 41.6 tok/s 1.00x 682.4 tok/s 682.4 tok/s 1500.5 4.575 2x 82.5 tok/s 1.98x 361.6 tok/s 180.8 tok/s 3495.8 8.767 4x 166.1 tok/s 3.99x 283.4 tok/s 70.8 tok/s 7840.6 17.536 Benchmark Model: Qwen3.6-35B-A3B-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1676.1 17.20 610.9 tok/s 58.6 tok/s 3.860 298.4 tok/s 18.80 GB pp4096/tg128 5046.3 17.93 811.7 tok/s 56.2 tok/s 7.323 576.8 tok/s 19.24 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 58.6 tok/s 1.00x 610.9 tok/s 610.9 tok/s 1676.1 3.860 2x 116.2 tok/s 1.98x 435.5 tok/s 217.8 tok/s 2973.7 6.907 4x 230.7 tok/s 3.94x 352.0 tok/s 88.0 tok/s 6445.2 13.855 Benchmark Model: GLM-4.7-Flash-4bit ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1985.0 21.78 515.9 tok/s 46.3 tok/s 4.752 242.4 tok/s 16.27 GB pp4096/tg128 6839.2 27.31 598.9 tok/s 36.9 tok/s 10.307 409.8 tok/s 17.34 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 46.3 tok/s 1.00x 515.9 tok/s 515.9 tok/s 1985.0 4.752 2x 91.5 tok/s 1.98x 362.7 tok/s 181.3 tok/s 3549.9 8.445 4x 174.9 tok/s 3.78x 321.2 tok/s 80.3 tok/s 6393.9 15.679 Benchmark Model: gpt-oss-20b-MXFP4-Q8 ================================================================================ Single Request Results -------------------------------------------------------------------------------- Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem pp1024/tg128 1687.6 24.70 606.8 tok/s 40.8 tok/s 4.824 238.8 tok/s 11.67 GB pp4096/tg128 4088.8 26.44 1001.8 tok/s 38.1 tok/s 7.446 567.3 tok/s 11.75 GB Continuous Batching pp1024 / tg128 -------------------------------------------------------------------------------- Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s) 1x 40.8 tok/s 1.00x 606.8 tok/s 606.8 tok/s 1687.6 4.824 2x 82.1 tok/s 2.01x 359.0 tok/s 179.5 tok/s 3489.1 8.822 4x 159.5 tok/s 3.91x 293.2 tok/s 73.3 tok/s 7335.0 17.180
IT之家 6 月 3 日消息,史克威尔艾尼克斯为旗下热门角色扮演游戏《最终幻想 7:重生》推出了最新的 1.005 版本更新补丁,为 PS5 版本以及 PC 版本追加了“游戏加速功能”和“新游戏 + 模式”。 其中,游戏加速功能在开启后可提供多项实用的强化选项,例如令角色的生命值(HP)与魔法值(MP)永久保持全满状态,使玩家在战斗中对敌人造成的伤害永久锁定为 9999 点,以及让各类道具的持有数量永久保持在上限(部分道具除外)。 IT之家提醒:玩家在游戏过程中可以根据自己的喜好随时设置各个选项的开启或关闭状态,以便更加轻松地去体验战斗和展开探索。 此外,新游戏 + 模式则允许玩家继承此前存档中的部分成长进度,在角色等级为 65 级(本作最高等级为 70 级)以及持有数个成长后的魔晶石的状态下,开启一段新的冒险旅程。 除此之外,本次更新后 PC 版本的《最终幻想 7:重生》还额外添加了对 AMD FSR 超分辨率技术的支持,这能帮助其在更广泛的硬件配置下实现更稳定、细致的图像处理。 随着此次补丁的到来,PC 端游戏的整体稳定性和运行中出现的部分已知问题也得到了进一步修复。值得一提的是,伴随着新版本在原有平台的上线,《最终幻想 7:重生》的 Xbox Series X|S 版以及 Switch 2 版也已于同一时间正式发售,这两个平台上的版本均同步包含了上述更新的全部功能内容。 相关阅读: 《 一刀 9999?SE〈最终幻想 7 重制版〉“官方外挂”引发硬核玩家不满:白金不该这么简单 》
IT之家 6 月 2 日消息,据数据泄露预警网站“Have I Been Pwned(账号是否遭泄露)”消息,热门网游《侠盗猎车手 5》的外挂服务商 Atlas Menu 遭到黑客入侵。 该网站称,本次泄露数据包含用户邮箱、账号名、加密乱码密码、IP 地址以及客服工单,共计近 6.4 万个账号信息遭到窃取。 颇具讽刺的是,Atlas Menu 官网此前宣称依托自研高级加密技术,可实现“安全身份验证、强化用户隐私防护”,本文撰稿时其官网已无法访问。 一名宣称对此入侵事件负责的黑客,在代码托管平台 GitHub 上公开了疑似被盗数据,这名黑客作案动因疑似是报复一名诈骗人员。 从该网站发布的演示视频来看,Atlas Menu 提供多项外挂功能:人物隐身、超高跳跃(角色跳高远超常规数值)、全地图自由飞行等。 此次数据泄露意味着使用该外挂作弊的玩家信息全部暴露。如今游戏外挂已然发展成产值数百万美元的产业,不少职业玩家为在对局中取得优势选择购买外挂。 IT之家注意到,Atlas Menu 并非首个遭遇黑客泄露的外挂服务商,早年知名《反恐精英:全球攻势》外挂平台也曾曝出大规模数据失窃事件。
有没有打瓦的, 被外挂困扰, 佬友们有没有防作弊软件? 6 个帖子 - 6 位参与者 阅读完整话题
IT之家 5 月 25 日消息,5 月 22 日,腾讯游戏安全中心发布《三角洲行动》 “AI 视觉吸附”类外挂专项打击公告。近期,安全团队关注到一类利用 AI 视觉识别实现辅助瞄准 的外挂在社区传播。 IT之家从公告获悉,此类外挂通过直播推流软件采集游戏画面, 利用 AI 模型识别目标位置 ,再将结果转化为鼠标移动指令实现自动吸附。无论是否读取游戏内存、是否修改游戏文件,均属于作弊行为。 作弊原理:直播软件画面采集 → 加载外挂核心插件 → AI 模型识别 → 自动吸附瞄准。 公告称,安全团队已完成对该类外挂核心插件的逆向分析,提取代码特征并部署了针对性检测方案,目前已开展全量打击。 所有确认作弊账号均处以封号十年及永久拉入游戏黑名单 。请勿轻信“不读内存、不会被封”的说法,安全团队的检测不局限于内存修改。正常使用直播推流软件不受影响。 公告还提到,针对该类外挂的制作、传播和销售,团队已联合执法部门启动刑事诉讼程序。制作、销售、传播游戏外挂属于违法行为, 相关责任人将面临刑事追诉 。
IT之家 5 月 25 日消息,据公安部网安局今日披露,今年以来,上海网警对网约车抢单外挂和出租车计价器作弊设备等涉嫌破坏计算机信息系统犯罪展开专项打击整治, 破获此类违法犯罪案件 15 起,抓获涉案人员 73 名。 IT之家附官方公布的典型案例如下: 今年4月底,上海市公安局网安总队在工作中发现,李某某在网上二手交易平台兜售网约车自动抢单外挂软件,扰乱网约车市场秩序。 经查,李某某通过网上店铺售卖网约车自动抢单外挂软件, 该外挂软件可破解网约车平台系统,通过设置接单时间、价格、距离、城市等选项,帮助网约车司机自动抢单接单 。 同时,李某某通过网约车司机同行推荐,以及在网约车司机交流群投放广告等方式推广销售软件,并提供在线安装指导等售后服务。截至案发,李某某已售出抢单外挂软件 600 余单。 目前,犯罪嫌疑人李某某因涉嫌 提供侵入、非法控制计算机信息系统程序工具罪 已被警方依法采取 刑事强制措施。 另有乘客反映其乘坐的出租车实际载客里程 58.2 公里,而计价器计费里程竟达 101.7 公里,虚增 43.5 公里,车费也从 229 元虚增至 400 元。 接到报案后,上海静安警方迅速开展调查,通过对查获的作弊装置开展溯源,于 4 月下旬抓获李某、米某、夏某等三名犯罪嫌疑人。 经查,2025 年 10 月以来,三人通过非法改装并售卖出租车计价作弊器获利 3 万余元。 其中,李某负责招揽有需求司机,米某采购零配件并提供安装技术服务,夏某实施具体安装。经进一步深挖犯罪线索,警方又抓获了采用同样手法作案的犯罪嫌疑人江某、彭某。 目前,5 名犯罪嫌疑人均因涉嫌 破坏计算机信息系统罪 被依法采取 刑事强制措施 。对于违规安装的司机已移交交通执法部门。