WWW.YOUINFO.SITE
标签聚合 across

/tag/across

LinuxDo 最新话题 · 2026-06-05 09:28:55+08:00 · tech

Business Insider Google has quietly cut staff across its Cloud business Google has laid off employees across parts of its Cloud business, including at an elite cybersecurity intelligence unit. [!quote]+ 两位熟悉内情的人士告诉《商业内幕》(Business Insider),在谷歌云工作的员工在过去两周内遭到了裁员。 这些人说,谷歌的威胁情报小组是谷歌的顶级安全单位之一,定期发布有关黑客的研究报告。一些员工在 LinkedIn 上发布了有关裁员的消息。 这些人补充说,裁员并不局限于该部门,Mandiant(谷歌于 2022 年收购的一家网络安全公司)和谷歌云内部的其他部门也受到了影响。 目前还不清楚具体有多少人受到影响,也不清楚为什么现在要裁员。其中一位知情人士说,谷歌曾以需要在人工智能等增长领域进行再投资为由,证明此举是合理的。 "谷歌发言人告诉 Business Insider:"我们会定期评估内部结构,以确保我们能够以最佳状态满足客户和行业不断变化的需求。 3 个帖子 - 3 位参与者 阅读完整话题

LinuxDo 最新话题 · 2026-06-03 17:40:57+08:00 · tech

论文: [2605.27922] Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows 代码: GitHub - Qihoo360/harness-bench · GitHub harness-bench.ai Leaderboard - Harness Bench Harness Bench leaderboard across harnesses, models, domains, and completion, process, and combined task scores. harness bench 简单来说就是固定任务和模型,只换harness,看agent表现差多少。 方法 106个沙箱化离线任务,8个类别(SWE、数据分析、DevOps、长程状态维护等),每个任务有独立的oracle grader。 评估维度有completion score 、LLM judge score 和security score。 测了6个现在比较火的agent(OpenClaw、nanobot、Hermes、ZeroClaw、NullClaw、Moltis) 8个模型后端(gpt-5.4、claude-opus-4.6、claude-sonnet-4.6、gemini-3.1-pro-preview、qwen3.6-plus、glm-5.1、kimi-k2.5、deepseek-v4-flash),总共5194条execution trajectories。 几个关键结论 同模型换框架,综合分最大差距23.8分(nanobot 76.2 vs OpenClaw 52.4)。说明agent benchmark只报模型得分而不报框架配置是不够的。 Failure mode分析(Table 3)比较有参考价值:36.4%的失败是contract/format类,即agent产出了内容但格式不满足验证条件;24.6%是tool/recovery类,即工具调用出错后没能恢复。真正的推理错误只占一小部分。对框架设计的启示:容错和输出校验比堆模型能力更影响实际成功率。 强模型(gpt-5.4、claude-opus-4.6)跨harness的方差更小,中等模型对harness质量更敏感。好的harness能显著拉高中等模型的上限。 Token效率方面差异显著,同样任务不同harness消耗的token能差3-4倍,主要取决于上下文构建策略。 局限 全部是离线沙箱任务,没有在线服务、用户交互、长期记忆场景。LLM judge score 依赖LLM judge,引入了评估方的主观性。只测了配置级差异,没有因果分解。 Section 5提出的execution-alignment概念值得注意:框架的核心价值在于维持agent推理、workspace实际状态、工具返回结果、最终验证条件之间的对应关系,大多数失败的根本原因不是模型推理出错,而是agent的内部判断和外部实际状态脱节了,比如以为文件改对了其实没改,以为命令成功了其实报错了。 1 个帖子 - 1 位参与者 阅读完整话题

LinuxDo 最新话题 · 2026-05-10 18:04:38+08:00 · tech

剔除一些敏感信息后的邮件原文 Came across your github and was impressed =) I’m putting together a virtual, invite-only hackathon for people building AI agents that trade, invest, create, and interface with markets — settled on Arc, the stablecoin-native L1 from Circle. We will have ~$50k in cash (or equivalents) to distribute as prizes, grants and challenges! You may be new to crypto and that is fine, we just want you to build something beautiful and useful because we’re a fan of your open source work =) The event will be virtual and from May 11 to May 25, 2026. The invite link is here (please don’t share broadly): Agora Agents Hackathon · Luma and the required passphrase is *********. A few ideas worth exploring, to give you a simple set of examples: Trading-R1: Reasoning traces as the product (Wang et al., 2025, Tauric Research). Trading-R1 is a large-scale financial reasoning model whose value is the reasoning trace, not the trade. The full reasoning trace can be hashed and pinned (trace to IPFS/Irys, hash on Arc) without eroding PnL. That unlocks a new market type: bets on which reasoning patterns converge to profit, with TradingAgents v0.2.4’s structured outputs (Trader / Research Manager / Portfolio Manager all emit JSON reasoning blocks) as the machine-readable substrate. More here: [2509.11420] Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning Hyperliquid Whale Index. Top Hyperliquid whales migrate across forks (Aster, Polynomial, etc.). The hack: an Arc-native ERC-20 that holds USDC and auto-rebalances exposure across HL forks based on top-trader migration. Each rebalance is a Gateway cross-chain move; weekly rebalances cost cents on Arc rather than dollars elsewhere. The rebalance signal is the research, eg “where smart money is currently trading.” Buyers hold tokens and the underlying is a live migration-tracking index. Slash-bonded copy-trading. The Hyperliquid leaderboard rank may not persist out-of-sample. The hack: a USDC performance bond on Arc for a given whale that users can stake alongside. A smart contract reads leaderboard rank via oracle; if the leader falls below a defined threshold, the bond slashes proportionally. The empirical decay function becomes the slash schedule directly. Arc’s cheap fees mean this works at retail follower size — on other chains the gas would erode the bond. Translation as a source of alpha (Wang et al., 2025, TradingAgents). The forks of TradingAgents have added different data brokers each locale’s investors trust. For example, hsliuping/TradingAgents-CN added Tushare for Chinese A-share fundamentals and news; huygiatrng/AlpacaTradingAgent added Coindesk + DeFiLlama + Reddit term matching for crypto-native flow. The framework is interchangeable; the translation layer is the moat. Polymarket only operates in English-language US events because translating Mandarin macro news into a well-formed prediction market question is the bottleneck — exactly what TradingAgents-CN’s structured outputs already do, just in a different output format. The hack: a market where agents bid in USDC for the right to translate a non-English news event into a Polymarket-shaped question, with builder fees flowing back to the translator on every fill. More here: [2412.20138] TradingAgents: Multi-Agents LLM Financial Trading Framework We will have a ton of surprises! If you’re around, we would love to have you there. Canteen is a community of passionate builders, investors and operators in NYC - open invite to visit us if you’re ever around =) Cheers! The team at Canteen 不知道含金量咋样,要不要参加下 5 个帖子 - 4 位参与者 阅读完整话题