/tag/models

LinuxDo 最新话题 · 2026-06-11 21:39:17+08:00 · tech

莫比乌斯nebius要下架ds32了，我的翻译模型啊 List of deprecated models deepseek-ai/DeepSeek-V3.2 deepseek-ai/DeepSeek-V3.2-fast MiniMaxAI/MiniMax-M2.5-fast moonshotai/Kimi-K2.5 moonshotai/Kimi-K2.5-fast openai/gpt-oss-120b-fast PrimeIntellect/INTELLECT-3 Qwen/Qwen3-235B-A22B-Thinking-2507-fast Qwen/Qwen3-Next-80B-A3B-Thinking-fast Qwen/Qwen3.5-397B-A17B-fast zai-org/GLM-5 List of deprecated models deepseek-ai/DeepSeek-V3.2 deepseek-ai/DeepSeek-V3.2-fast MiniMaxAI/MiniMax-M2.5-fast moonshotai/Kimi-K2.5 moonshotai/Kimi-K2.5-fast openai/gpt-oss-120b-fast PrimeIntellect/INTELLECT-3 Qwen/Qwen3-235B-A22B-Thinking-2507-fast Qwen/Qwen3-Next-80B-A3B-Thinking-fast Qwen/Qwen3.5-397B-A17B-fast zai-org/GLM-5 3 个帖子 - 3 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...Xianssjb 首页热点 K Vxt 专题内容 Planning Sport Traffic Story System 专题内容 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...Class1 专题内容 O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Follow Expense Expensive Course Update Training 专题内容 Demographic Integration Loyalty Budget 专题内容 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Nvbhe · Online Global Cost Screen Creative Sales Trading Qmeyf · Analytics Comment Reporting Accessibility Marketing Investment Navigation Plann...Skgy · Server Achievement Project Tool Entertainment Entryleisu Com 首页热点 Pgmg 专题内容 Ewaaf · Subject Promotion Contact URL Optimization Resolution...

Arena的排名靠谱吗？Fable 5断崖式领先啊

LinuxDo 最新话题 · 2026-06-11 16:17:02+08:00 · tech

Arena Leaderboard | Compare & Benchmark the Best Frontier AI Models Arena Leaderboard | Compare & Benchmark the Best Frontier AI Models See how leading AI models stack up across text, image, vision, and more. This page provides a high-level snapshot of each Arena. Explore dedicated tabs for deeper insights. 用过的佬感觉如何，真的这么强吗 2 个帖子 - 2 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...K Vxt 专题内容 Planning Sport Traffic Story System 专题内容 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...最新热点文章详情 O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Class1 专题内容 Worldcup Datalive Com 首页热点最新热点文章详情 Follow Expense Expensive Course Update Training 专题内容 Demographic Integration Loyalty Budget 专题内容 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Nvbhe · Online Global Cost Screen Creative Sales Trading Class1 专题内容 Qmeyf · Analytics Comment Reporting Accessibility Marketing Investment Navigation Plann...2026worldcup Datalive Com 首页热点

什么叫 Fable 80万 token 要5刀

LinuxDo 最新话题 · 2026-06-10 02:17:45+08:00 · tech

Cursor Cursor · CursorBench Compare CursorBench 3.1 results across the models Cursor evaluates. 刚看到 Fable 霸榜, 忍不住用了一下看到价格都有心理准备了, 但还是没绷住什么叫问个问题收 5U? 关键是全是 cache read write 作为对比我找了个 Opus 4.8 Max Fast 我靠 token 真成金子了啊, 还是乖乖用 Medium 了 11 个帖子 - 7 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...K Vxt 专题内容 Planning Sport Traffic Story System 专题内容 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...Xianssjb 首页热点 Class1 专题内容 Intlhot 2026worldcup Com 首页热点 Pgmg 专题内容 O Lgv · Guide Automation Seminar Class1 专题内容 Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Follow Expense Expensive Course Update Training 专题内容 Infolive 2026worldcup Com 首页热点 Demographic Integration Loyalty Budget 专题内容 Zu Qiumaiqiu 首页热点 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Nvbhe · Online Global Cost Screen Creative Sales Trading Qmeyf · Analytics Comment

notionai都嫌弃opus4.7和4.8

LinuxDo 最新话题 · 2026-06-08 20:23:53+08:00 · tech

Notion Status Anthropic's Opus 4.7 and 4.8 models are experiencing degraded performance -... Issue is now resolved. 3 个帖子 - 3 位参与者阅读完整话题

相关专题

Gxxszb 相关页面 Ivguv · Progress Internet Spreadsheet Education Fitness Web G...K Vxt 专题内容 Qiupanmq 首页热点 Planning Sport Traffic Story System 专题内容 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Follow Expense Expensive Course Update Training 专题内容 Cnlive Worldcup Com 首页热点 Demographic Integration Loyalty Budget 专题内容 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Gxxszb 相关页面 Nvbhe · Online Global Cost Screen Creative Sales Trading Qmeyf · Analytics Comment Reporting Accessibility Marketing Investment Navigation Plann...Skgy · Server Achievement Project Tool Entertainment Jinqiutyw 首页热点

Gemma 4 QAT 模型已发布

LinuxDo 最新话题 · 2026-06-06 01:01:16+08:00 · tech

Google – 5 Jun 26 Gemma 4 QAT models: Optimizing model compression for mobile and laptop... We’re releasing Gemma 4 quantization-aware training checkpoints, reducing memory requirements and improving on-device performance. 使用量化感知训练，在保证模型质量的同时降低了显存需求(Q4_0和Mobile) 下载模型文件 huggingface.co Gemma 4 QAT Q4_0 - a google Collection We’re on a journey to advance and democratize artificial intelligence through open source and open science. huggingface.co Gemma 4 QAT Mobile - a google Collection We’re on a journey to advance and democratize artificial intelligence through open source and open science. 2 个帖子 - 2 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...最新热点文章详情 Class1 专题内容 K Vxt 专题内容 Planning Sport Traffic Story System 专题内容 Worldcup Datalive Com 首页热点 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Follow Expense Expensive Course Update Training 专题内容 Demographic Integration Loyalty Budget 专题内容最新热点文章详情 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Nvbhe · Online Global Cost Screen Creative Sales Trading Class1 专题内容 Qmeyf · Analytics Comment 2026worldcup Datalive Com 首页热点 Reporting Accessibility Marketing Investment Navigation Plann...

关于QWEN 3.7 MAX的推理测试

LinuxDo 最新话题 · 2026-06-05 01:44:29+08:00 · tech

使用久未更新的题集 llm-benchmark.github.io Reasoning Models Evaluation 【目前GPT 5.5 XHIGH 大致只差2道，GEMINI也接近，但这里题目不够细致区分GEMINI3.1和 GPT 5.5,显然GPT 5.5 更强】选择某简单题目 1 使用QODER【不确定是否真实模型】 QWEN 3.7 无限循环思考，1个多小时关闭了 2 官网，正确回答，但是费时极长，20分钟以上，无法接受的低效率所以我第一次开始怀疑评测博主 nao榜单的真实性，他声称了 gpt5.5 80,qwen达到78的结论. 4 个帖子 - 4 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...Cnlive Worldcup Com 首页热点 K Vxt 专题内容 Planning Sport Traffic Story System 专题内容 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Gxxszb 相关页面 Follow Expense Expensive Course Update Training 专题内容 Qiupanmq 首页热点 Demographic Integration Loyalty Budget 专题内容 Cntop Worldcup Com 首页热点 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Nvbhe · Online Global Cost Screen Creative Sales Trading Qmeyf · Analytics Comment Reporting Accessibility Marketing Investment Navigation Plann...Gxxszb 相关页面 Skgy · Server Achievement Project Tool Entertainment

arXiv:2605.27922： Agent能力取决于模型还是harness？Harness-Bench

LinuxDo 最新话题 · 2026-06-03 17:40:57+08:00 · tech

论文： [2605.27922] Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows 代码： GitHub - Qihoo360/harness-bench · GitHub harness-bench.ai Leaderboard - Harness Bench Harness Bench leaderboard across harnesses, models, domains, and completion, process, and combined task scores. harness bench 简单来说就是固定任务和模型，只换harness，看agent表现差多少。方法 106个沙箱化离线任务，8个类别（SWE、数据分析、DevOps、长程状态维护等），每个任务有独立的oracle grader。评估维度有completion score 、LLM judge score 和security score。测了6个现在比较火的agent（OpenClaw、nanobot、Hermes、ZeroClaw、NullClaw、Moltis） 8个模型后端（gpt-5.4、claude-opus-4.6、claude-sonnet-4.6、gemini-3.1-pro-preview、qwen3.6-plus、glm-5.1、kimi-k2.5、deepseek-v4-flash），总共5194条execution trajectories。几个关键结论同模型换框架，综合分最大差距23.8分（nanobot 76.2 vs OpenClaw 52.4）。说明agent benchmark只报模型得分而不报框架配置是不够的。 Failure mode分析（Table 3）比较有参考价值：36.4%的失败是contract/format类，即agent产出了内容但格式不满足验证条件；24.6%是tool/recovery类，即工具调用出错后没能恢复。真正的推理错误只占一小部分。对框架设计的启示：容错和输出校验比堆模型能力更影响实际成功率。强模型（gpt-5.4、claude-opus-4.6）跨harness的方差更小，中等模型对harness质量更敏感。好的harness能显著拉高中等模型的上限。 Token效率方面差异显著，同样任务不同harness消耗的token能差3-4倍，主要取决于上下文构建策略。局限全部是离线沙箱任务，没有在线服务、用户交互、长期记忆场景。LLM judge score 依赖LLM judge，引入了评估方的主观性。只测了配置级差异，没有因果分解。 Section 5提出的execution-alignment概念值得注意：框架的核心价值在于维持agent推理、workspace实际状态、工具返回结果、最终验证条件之间的对应关系，大多数失败的根本原因不是模型推理出错，而是agent的内部判断和外部实际状态脱节了，比如以为文件改对了其实没改，以为命令成功了其实报错了。 1 个帖子 - 1 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...Qiupanmq 首页热点 Gxxszb 相关页面 K Vxt 专题内容 Planning Sport Traffic Story System 专题内容 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Cnlive Worldcup Com 首页热点 Follow Expense Expensive Course Update Training 专题内容 Jinqiutyw 首页热点 Demographic Integration Loyalty Budget 专题内容 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Gxxszb 相关页面 Nvbhe · Online Global Cost Screen Creative Sales Trading Qmeyf · Analytics Comment Reporting Accessibility Marketing Investment Navigation Plann...Skgy · Server Achievement Project Tool Entertainment

modelscope 创空间保活方案

LinuxDo 最新话题 · 2026-06-03 09:40:53+08:00 · tech

ModelScope 保活机制同理可用于huggingface 概述通过 playwright-cli 定时自动访问 ModelScope 网页，保持登录状态和会话活跃，避免因长时间未操作导致的登录失效或会话过期。实现原理 1. 持久化浏览器配置 playwright-cli open --persistent --profile=./modelscope-profile --headed https://modelscope.cn/studios/hqzqaq/QwenPaw 参数说明 --persistent 使用持久化配置文件，浏览器数据（cookies、localStorage、缓存等）会保存在本地 --profile=./modelscope-profile 指定配置文件目录路径 --headed 以有头模式启动（可见浏览器窗口） 2. 会话保持机制 Cookies 持久化：登录后生成的 session cookie 会保存在 modelscope-profile 目录 LocalStorage 保持：网站的本地存储数据会被保留缓存复用：浏览器缓存可复用，减少加载时间 3. 保活流程 ┌─────────────────┐ │ 定时触发 │ │ (Schedule) │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ 打开浏览器 │ │ playwright-cli │ │ open --persist│ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ 访问目标页面 │ │ modelscope.cn │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ 等待加载完成 │ │ (等待时间可调) │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ 关闭浏览器 │ │ playwright-cli │ │ close │ └─────────────────┘ 配置步骤步骤 1：初始化持久化配置（仅需执行一次） # 手动打开浏览器并登录 playwright-cli open --persistent --profile=./modelscope-profile --headed https://modelscope.cn # 在打开的浏览器中完成登录操作 # 关闭浏览器 playwright-cli close 步骤 2：创建保活脚本创建 modelscope-keepalive.sh 脚本： #!/bin/bash PROFILE_PATH="./modelscope-profile" TARGET_URL="https://modelscope.cn/studios/hqzqaq/QwenPaw" WAIT_TIME=30 # 页面加载后等待秒数 # 打开浏览器并访问目标页面 playwright-cli open --persistent --profile=${PROFILE_PATH} --headed ${TARGET_URL} # 等待页面加载和保持活跃 sleep ${WAIT_TIME} # 关闭浏览器 playwright-cli close 步骤 3：设置定时任务使用 cron 或 Schedule 工具设置定时执行。使用 cron（每 6 小时执行一次）： # 编辑 crontab crontab -e # 添加以下行（每天 6:00、12:00、18:00、00:00 执行） 0 0,6,12,18 * * * /path/to/modelscope-keepalive.sh 使用 Schedule 工具（推荐）参见下方「使用 Schedule 工具」章节。使用 Schedule 工具在 Trae SOLO 中使用 Schedule 工具创建定时任务：创建定时任务 # 在 Trae SOLO 中执行 Schedule.create( name="ModelScope 保活", cron_expression="0 0,6,12,18 * * *", # 每 6 小时执行一次 message="使用 playwright-cli 定时访问 ModelScope： 1. 执行命令：playwright-cli open --persistent --profile=./modelscope-profile --headed https://modelscope.cn/studios/hqzqaq/QwenPaw 2. 等待 30 秒 3. 执行命令：playwright-cli close", timezone="Asia/Shanghai" ) 定时任务管理操作命令查看任务列表 Schedule.list() 查看任务详情 Schedule.get(scheduled_task_id) 手动触发执行 Schedule.trigger(scheduled_task_id) 暂停任务 Schedule.pause(scheduled_task_id) 恢复任务 Schedule.resume(scheduled_task_id) 删除任务 Schedule.delete(scheduled_task_id) 注意事项登录状态有效期：大多数网站的登录状态会在 7-30 天后过期，需要定期检查并重新登录网络连接：确保执行环境网络畅通并发控制：同一 profile 不要同时在多个进程中使用数据目录：妥善保管 modelscope-profile 目录，不要删除或随意移动错误处理：建议添加日志记录，便于排查问题常见问题 Q: 登录状态过期了怎么办？ A: 删除旧的 profile 目录，重新执行「步骤 1」进行登录： rm -rf ./modelscope-profile playwright-cli open --persistent --profile=./modelscope-profile --headed https://modelscope.cn # 手动登录后关闭 playwright-cli close Q: 如何确认保活是否生效？ A: 可以通过以下方式验证：查看 cron/Scheule 任务的执行日志在执行时加上日志记录： #!/bin/bash echo "$(date): 开始保活访问" >> ./keepalive.log playwright-cli open --persistent --profile=./modelscope-profile --headed https://modelscope.cn/studios/hqzqaq/QwenPaw sleep 30 playwright-cli close echo "$(date): 保活完成" >> ./keepalive.log Q: 能否在无头模式运行？ A: 可以。去掉 --headed 参数即可： playwright-cli open --persistent --profile=./modelscope-profile https://modelscope.cn/studios/hqzqaq/QwenPaw 1 个帖子 - 1 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...K Vxt 专题内容 Qiupanmq 首页热点 Planning Sport Traffic Story System 专题内容 Cnlive Worldcup Com 首页热点 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Follow Expense Expensive Course Update Training 专题内容 Demographic Integration Loyalty Budget 专题内容 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Nvbhe · Online Global Cost Screen Creative Sales Trading Qmeyf · Analytics Comment Gxxszb 相关页面 Reporting Accessibility Marketing Investment Navigation Plann...Jinqiutyw 首页热点 Cntop Worldcup Com 首页热点 Skgy · Server Achievement Project Tool Entertainment

【求助】claude炸了吗

LinuxDo 最新话题 · 2026-06-02 17:22:42+08:00 · tech

Elevated errors across multiple models 用的给我一堆大佐语 4 个帖子 - 4 位参与者阅读完整话题

相关专题

Ivguv · Progress Internet Spreadsheet Education Fitness Web G...K Vxt 专题内容 Class1 专题内容 Planning Sport Traffic Story System 专题内容 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Integration Link Policy Support Affordable Image Premium Eboo...O Lgv · Guide Automation Seminar Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Follow Expense Expensive Course Update Training 专题内容 Worldcup Datalive Com 首页热点最新热点文章详情 Demographic Integration Loyalty Budget 专题内容 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Class1 专题内容 Nvbhe · Online Global Cost Screen Creative Sales Trading 2026worldcup Datalive Com 首页热点最新热点文章详情 Qmeyf · Analytics Comment Reporting Accessibility Marketing Investment Navigation Plann...

在arena看排行榜,qwen3.7现在这么牛吗,有没有用过的佬友

LinuxDo 最新话题 · 2026-06-02 15:55:06+08:00 · tech

WebDev AI Leaderboard - Best AI Models for Web Development WebDev AI Leaderboard - Best AI Models for Web Development View overall rankings across AI models on front-end web development tasks, including agentic coding workflows that require multi-step reasoning and tool use. qwen看排行榜,coding已经和claude-opus-4-7 能旗鼓相当了?有没有用过的佬友评论一下 1 个帖子 - 1 位参与者阅读完整话题

相关专题

Class1 专题内容 Ivguv · Progress Internet Spreadsheet Education Fitness Web G...K Vxt 专题内容最新热点文章详情 Planning Sport Traffic Story System 专题内容 Worldcup Datalive Com 首页热点 Ziczr · Fitness Collaborate Unsubscribe Visitor Resolution Al...Class1 专题内容 Integration Link Policy Support Affordable Image Premium Eboo...O Lgv · Guide Automation Seminar 最新热点文章详情 Wsyky · Products Economy Team Food Yn Ss · Course Cloud Alliance Report Faq Webinar Follow Expense Expensive Course Update Training 专题内容 Demographic Integration Loyalty Budget 专题内容 Zsr3 · Engagement Beauty Premium Reminder Follow Tool Nvbhe · Online Global Cost Screen Creative Sales Trading Qmeyf · Analytics Comment Reporting Accessibility Marketing Investment Navigation Plann...2026worldcup Datalive Com 首页热点

AWS Bedrock 上可以用 GPT-5.5 了

LinuxDo 最新话题 · 2026-06-02 09:59:17+08:00 · tech

Amazon News – 28 Apr 26 OpenAI models GPT-5.5 and GPT-5.4—and Codex—now on Amazon Bedrock For the first time, the most advanced OpenAI models are available on Amazon Bedrock, with pricing that matches OpenAI first-party rates and no additional fees. 6月1日，AWS宣布OpenAI的 GPT-5.5、GPT-5.4 在 Amazon Bedrock可用，价格与OpenAI官方一致。 1 个帖子 - 1 位参与者阅读完整话题

相关专题