导致 - WWW.YOUINFO.SITE - WWW.YOUINFO.SITE

lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

V2EX - 技术 · 2026-06-12 11:16:28+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

V2EX - 技术 · 2026-06-12 10:38:35+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

V2EX - 技术 · 2026-06-12 10:17:36+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

V2EX - 技术 · 2026-06-12 10:04:55+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

[Local LLM] lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

v2ex · 2026-06-12 10:04:55+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

V2EX - 技术 · 2026-06-12 09:54:53+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

[Local LLM] lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

v2ex · 2026-06-12 09:54:53+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

高强度使用 Web Pro 模型是否会导致降智？

LinuxDo 最新话题 · 2026-06-12 09:51:36+08:00 · tech

从关于反代 ChatGPT 网页 Pro 模型的降智问题继续讨论。目前我怀疑降智的原因只剩下两个：一个是 IP 的问题，一个是使用量过大的问题。目前我的 Pro 模型使用量大概是一天 50 次左右，我用的是 Pro 20x。我现在的情况是，两并发就会降智，一并发偶尔也会降智。所以我想问问其他的佬友，高强度使用会不会也遇到降智的情况（就在 IP 特别稳定的情况下）？补充一下，我试了香港、台湾还有美国的真家宽，都无一例外地有降智的情况，但是我也没法排除不是 IP 的问题，因为我不知道是不是我的几个 IP 都被 OpenAI 标记了，之前有过自动化的操作。 1 个帖子 - 1 位参与者阅读完整话题

lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

V2EX - 技术 · 2026-06-12 09:35:47+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

[Local LLM] lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

v2ex · 2026-06-12 09:32:36+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

lama.cpp 目前有重大性能 bug： checkpoint 的巡回逻辑对于混合模型（比如 qwen3.6-27B）无效，从而导致大概率每次对话都要 prefill 全文，严重拖慢速度

V2EX - 技术 · 2026-06-12 09:26:27+08:00 · tech

在昨天研究 qwen3.6-27B 的优化时，看到了这个问题： server: fix context checkpoint restore for hybrid/recurrent models (DeltaNet/Mamba) 大概意思就是，因为 llama.cpp 的缓存巡回逻辑有问题，导致你 n 次调用大模型（ n>1 ）时，大概率 llama.cpp 找不到之前的对话，会从头再次 prefill 你的对话全文。翻译成大白话讲，就是你对一个人，每多说一句话，就要从第一句开始重复一遍。更为悲惨的是：在 5 月份，llama.cpp 制作组引入了另外一个 checkpoint 逻辑，使得缓存巡回性能再次下降： Commit e98cb51 经过此帖中大神实测，NVIDIA RTX PRO 6000 Blackwell 在运行 qwen3.6-27B Q8 时，上下文 50K 的长度下，每次请求 LLM 都会浪费 40 秒： 3 consecutive full re-processings logged: ┌───────────┬────────────────────┬───────┐ │ Turn │ Tokens reprocessed │ Time │ ├───────────┼────────────────────┼───────┤ │ Task 2795 │ 67,608 │ 38.4s │ ├───────────┼────────────────────┼───────┤ │ Task 3241 │ 71,211 │ 41.0s │ ├───────────┼────────────────────┼───────┤ │ Task 3401 │ 71,105 │ 41.4s │ └───────────┴────────────────────┴───────┘ Root cause visible in logs: The new prompt is ~19k tokens, but all checkpoints sit at positions 39k–71k (from previous longer requests). Every checkpoint is checked against 19340 and rejected because they're all beyond the new prompt length. Result: 0 usable checkpoints → full reprocess from BOS. 结论是，目前的 llama.cpp+qwen3.6-27B 这个组合，在 Agent 工具这个场景下，性能不可用。目前此 issues 还是 open 状态，待修复。

有感而发，路怒症真的很恐怖！

LinuxDo 最新话题 · 2026-06-12 09:24:46+08:00 · tech

不知道是不是我进了信息茧房了，最近才刷到 “佛山桂丹路恶意别车导致连环车祸” 的那个视频，看完真的特别气愤，也特别唏嘘。一辆本田车恶意别车，大巴车司机估计也是本能反应，没刹住直接往右打方向盘避让，结果撞上了旁边的大货车导致侧翻。几个家庭就这么毁了。视频里最让人拳头硬的，是那个本田司机下车后，居然一脸无所谓地在那边抽烟边打电话。这起惨剧真的是用血淋淋的教训印证了那句话 ——“让速不让道” 的含金量还在持续上升啊。看完这个视频，也让我想起自己刚拿驾照那会儿。其实那时候我也有点路怒倾向，被人加塞或者滴喇叭，心里立马火冒三丈，总觉得别人是故意挑衅。可能人一坐进车里，处在一个密闭空间，就容易情绪失控。但是这几年车开得多了之后，现在反而彻底佛系了。在路上遇到瞎开的，基本都是稳稳当当让过去，绝不跟对方起冲突。说白了，跟这种烂人烂事较劲，赢了没半毛钱好处，输了可能就是车毁人亡。咱们开车上路，唯一目的就是平平安安回家。不知道各位佬们现在开车是什么心态，平时遇到这种恶意加塞或者别车的，一般都是怎么克制路怒情绪的？ 4 个帖子 - 4 位参与者阅读完整话题

1970年后出生的美国人到了中年因几种主要原因导致的死亡率较高

LinuxDo 最新话题 · 2026-06-12 09:13:13+08:00 · tech

Fox News Americans born after 1970 face higher death rates from several major causes... Americans born after 1970 are dying faster than their parents, with higher rates of heart disease, cancer and overdoses between ages 30 and 49. [!quote]+ 数据显示，1970 年后出生的美国人比他们的父母死得更快。塔夫茨大学（Tufts University）的最新分析显示，"X一代 "和 "千禧一代 "未能长寿，他们死于常见慢性病和外因的比例高于同龄人。数据显示，在 20 世纪的大部分时间里，美国人的预期寿命一直在稳步提高，这意味着每一代人的寿命一般都比上一代人长。然而，从 20 世纪 50 年代出生的人开始，情况发生了变化。20 世纪 40 年代出生的美国人在人生的各个阶段都经历了生存率稳步提高的过程，而 20 世纪 50 年代出生的人则看到了这一进展的放缓或逆转。这一下降趋势在之后的每一代人中都在持续，1970 年以后出生的美国人变化最大。 TheUNN – 11 Jun 26 Americans Born After 1970 Experience Rising Middle-Age Death Rates Americans born after 1970 are experiencing higher death rates from major health issues compared to previous generations, according to a new analysis from 3 个帖子 - 2 位参与者阅读完整话题

一起来唠唠世界杯

LinuxDo 最新话题 · 2026-06-12 09:06:40+08:00 · tech

佬们有看球的吗美加墨的时差导致每天的10点必看不可啊 1 个帖子 - 1 位参与者阅读完整话题

iOS 27 wifi 默认会导致 WiFi dhcp 下发的 dns 失效

V2EX - 技术 · 2026-06-12 03:18:09+08:00 · tech