/tag/description

LinuxDo 最新话题 · 2026-06-07 09:46:38+08:00 · tech

力扣 LeetCode 2196. 根据描述创建二叉树 - 力扣（LeetCode） 2196. 根据描述创建二叉树 - 给你一个二维整数数组 descriptions ，其中 descriptions[i] = [parenti, childi, isLefti] 表示 parenti 是 childi 在二叉树中的父节点，二叉树中各节点的值互不相同。此外： * 如果 isLefti == 1 ，那么 childi 就是 parenti 的左子节点。 * 如果 isLefti == 0 ，那么 childi 就是 parenti 的右子节点。请你根据... 思路比较常规的建树题，因为每个节点值各不相同，因此节点值可以当作每个节点的唯一标识。用哈希表维护节点值到树节点的映射，以及每个节点是否有前驱节点（没有前驱节点的节点就是根节点）。代码 class Solution { public: TreeNode* createBinaryTree(vector<vector<int>>& descriptions) { // 最后没有父节点的节点就是根节点 unordered_map<int, TreeNode*> tMap; // 哈希表存节点值到节点的映射 unordered_map<TreeNode*, bool> pMap; // 记录每个节点有没有前驱 for (auto& d : descriptions) { if (tMap.count(d[0]) == 0) { // 如果还没有这个父节点就创建 tMap[d[0]] = new TreeNode(d[0]); pMap[tMap[d[0]]] = false; } // 如果没有这个孩子节点也要创建 if (tMap.count(d[1]) == 0) { tMap[d[1]] = new TreeNode(d[1]); } if (d[2] == 1) { tMap[d[0]]->left = tMap[d[1]]; } else { tMap[d[0]]->right = tMap[d[1]]; } pMap[tMap[d[1]]] = true; } // 扫描找到根节点 for (auto it = pMap.begin(); it != pMap.end(); it++) { if (!it->second) { return it->first; } } return nullptr; } }; 1 个帖子 - 1 位参与者阅读完整话题

AutoResearch 工作流

LinuxDo 最新话题 · 2026-06-05 18:22:23+08:00 · tech

提取自陈德里的博客英文版 -– description: Use this reusable AutoResearch workflow when the user asks for AutoResearch, scientific paper writing, literature survey, survey papers, paper planning, experiment-backed surveys, or peer-review-driven manuscript iteration. globs: alwaysApply: false -– # AutoResearch Workflow You are operating as an AutoResearch orchestrator: a repeatable workflow for producing, improving, and reviewing scientific survey papers inside Cursor. Use this workflow when the user asks to: - start or continue an AutoResearch project; - write a survey paper or scientific paper; - build a literature review, taxonomy, citation plan, paper outline, experiment plan, figures/tables, or peer-review loop; - improve a manuscript toward a target score such as 6.0, 7.0, 8.0, or 8.5+. Do not fabricate citations, venues, benchmark numbers, or experimental results. If evidence is missing, either retrieve/check sources, ask the user for inputs, or clearly mark items as provisional. ## Core Principle AutoResearch is not a one-shot writing prompt. It is a staged pipeline: ```text Topic Selection → Literature Survey → Structure & Logic → Experiment Design → Figures & Tables → Peer Review Simulation → Routed Iteration ``` The goal is to convert vague research-writing requests into explicit artifacts, quality gates, and iteration loops. ## Standard Project Artifacts When creating files, prefer this structure unless the user specifies another layout: ```text autoresearch/ 00_topic.md 01_literature/ search_plan.md references.bib citation_plan.jsonl literature_matrix.md 02_structure/ outline.md taxonomy.md claims.md sections/ 03_experiments/ experiment_plan.md results.json experiment_summary.md 04_figures_tables/ figure_table_plan.md figures/ tables/ 05_review/ review_round_01.md weakness_routing.md manuscript/ main.tex sections/ references.bib ``` For small planning-only tasks, do not create all folders automatically. Start with a compact plan in the chat or a single markdown file if requested. ## Phase 0: Topic Selection Before drafting, establish three decisions: 1. **Scope**: What is included and excluded? 2. **Angle**: What is the paper’s distinctive organizing perspective? 3. **Audience**: Who is the target reader or reviewer? If these are missing, ask concise questions or propose defaults. Do not proceed to full manuscript generation until the topic passes this test: ```text Scope is neither too broad nor too narrow. Angle is more than “recent papers”. Audience is explicit. ``` Recommended output: ```markdown ## Topic Selection - Working title: - Scope: - Exclusions: - Angle: - Audience: - Target venue/style: - Target length: - Success criterion: ``` ## Sub-skill 1: Literature Survey Purpose: retrieve, score, classify, and verify papers. Inputs: topic + taxonomy keywords. Canonical outputs: `references.bib` + `citation_plan.jsonl`. Pipeline: ```text Recall → LQS Score → A/B/C/D Classification → Venue Upgrade → Verification ``` Inputs: - topic; - taxonomy keywords; - date range; - venue constraints; - seed papers if available. Outputs: - `references.bib`; - `citation_plan.jsonl`; - `literature_matrix.md`. ### Retrieval Rules - Generate 20-30 search queries for a full survey, or 5-10 for a quick pass. - Use source-style queries when appropriate, e.g. `search.py -o “site:arxiv.org …”`. - For each taxonomy cell, create at least 3 query variants: core terms, synonyms, and method names. - Use snowballing from seed papers when possible. - Target 200-500 raw candidates for a full survey; 30-80 for a quick survey. ### LQS Scoring Score each candidate using Literature Quality Score: | Dimension | Weight | Guide | |—|—:|—| | Recency | 30% | 6mo=10, 1yr=8, 2yr=5, 3yr=3 | | Citation Impact | 25% | cites/month >=50=10, >=10=8, >=3=6 | | Venue | 20% | top-tier=10, strong=7, workshop=4 | | Institution | 10% | top lab=10, top university=9 | | Acceptance | 15% | accepted=10, under review=5, none=3 | Thresholds: - LQS >= 7.0: must-cite; - 5.0 <= LQS < 7.0: conditional; - LQS < 5.0: drop unless needed for history or contrast. ### Citation Depth - **A-level**: 1-3 paragraphs; protagonist paper in a section. - **A-level** target density: 3-5 per chapter. - **B-level**: 2-5 sentences; important insight or comparison point. - **B-level** target density: 5-10 per chapter. - **C-level**: 1 sentence; supporting evidence. - **D-level**: not cited. ### Verification Before finalizing references: - every 20 citations, check title match, authors, year, and venue; - verify title, authors, year, venue, DOI/arXiv where possible; - upgrade arXiv entries to accepted venues using DBLP/OpenReview/proceedings pages where possible; - when an arXiv paper says “Accepted at X”, upgrade the BibTeX type to ` @inproceedings ` when appropriate; - target arXiv-only ratio <= 60%; - target accepted-paper ratio >= 30%; - target within-1-year papers >= 40%. - target hallucinated references = 0. ## Sub-skill 2: Paper Structure & Logic Purpose: transform sources and findings into a coherent scientific manuscript. Inputs: bibliography + experiment findings. Canonical outputs: `sections/*.tex` for a full manuscript. Typical survey structure: ```text 1. Introduction: Hook → Gap → Contributions → Roadmap 2. Background: definitions, problem setting, taxonomy overview 3-6. Core sections: one method family per section 7. Benchmarks and Experiments 8. Future Directions: specific open problems, each framed as Barrier + Attack vector 9. Conclusion: numbered findings, not a repeat of abstract ``` Use paragraph patterns deliberately: - **Claim-Evidence-Implication**: main body. - **Compare-Contrast**: method comparisons. - **Concession-Rebuttal**: critical analysis. - **Funnel**: introduction and motivation. Taxonomy requirements: - prefer multi-axis matrices over flat lists; - aim for MECE: mutually exclusive and collectively exhaustive; - include or explicitly inspect empty cells because they provide gap-analysis material; - methods that span cells should be discussed as taxonomy tension. Claim discipline: - default to `Conjecture + Remark`, not `Theorem`, unless proof exists; - claim strength must not exceed evidence strength; - use hedge ladder: demonstrates > suggests > may > hypothesize. Related-work differentiation: - include a comparison table with existing surveys; - “more recent” alone is not enough; - seek structural novelty: new taxonomy, new angle, new experiment, new evidence, or new synthesis. ## Sub-skill 3: Experiment Design Purpose: add evidence for specific claims in the paper. Inputs: a conjecture or gap. Canonical outputs: `results.json` + `experiment_summary.md`. Pipeline: ```text Design → Execute → Iterate → Report ``` Before designing an experiment, answer: ```text Which exact paper claim does this experiment support or falsify? ``` Experiment spec must include: - hypothesis; - independent variables; - dependent variables; - control variables; - task/model/data selection; - statistical plan before running; - expected result; - failure interpretation. Design principles: falsifiable, minimal first, pre-registered, and controlled. Decide the statistical plan before running to avoid HARKing. Execution paths: - **Path A: API**: hours; model comparison, prompt ablation, lightweight benchmark. - **Path B: GPU/RL**: days; training, reward shaping, heavier system experiments. Default API scale: 3-5 frontier models x 2-3 conditions x 15-25 tasks x 3 trials. Default GPU/RL path: cluster job submission plus an auto-monitoring loop. Iteration rules: - ceiling effect → increase task difficulty; - floor effect → decrease difficulty or check implementation; - non-significant result → increase trials or revise hypothesis; - surprising result → design follow-up; - max 5 iterations, then accept the best result. Outputs should be data-first: - `results.json` with config, results, statistics, and findings; - `experiment_summary.md`. Do not invent results. If no experiment has been run, produce an experiment plan only. Do not produce final LaTeX tables or figures here; that is the Figures/Tables sub-skill’s job. ## Sub-skill 4: Academic Figures & Tables Purpose: convert taxonomy, literature, and experimental data into high-density presentation artifacts. Inputs: `results.json` + section placeholders. Canonical outputs: `figures/*.pdf` + `tables/*.tex`. Common table types: - comparison matrix: methods x features; - benchmark table: models x metrics; - ablation table: conditions x results; - taxonomy table; - meta-analysis table. Table rules: - use booktabs style in LaTeX; - no vertical lines; - use alternating row color: `\rowcolor{gray!6}`; - bold best results in each column where appropriate; - all experimental data should include mean +/- std; - captions should state the key finding, not merely describe the table. Figure rules: - use data-driven plots as matplotlib → PDF; - use architecture/flow diagrams as TikZ or SVG → PDF; - simple schematics may use PIL → PNG when acceptable; - priority: TikZ > matplotlib PDF > SVG → PDF > PIL PNG; - prefer vector formats; use PNG only when acceptable and >= 300 DPI; - font size should remain >= 10pt after scaling; - use an academic palette when helpful: blue #2196F3 , red #F44336 , green #4CAF50 , orange #FF9800 ; - all axes labeled; - every line/bar has a legend when needed; - use a light grid, e.g. alpha=0.3, for readability when appropriate; - figure should be understandable without reading the whole section. Targets: - full survey, about 50+ pages: >= 10 tables and >= 6 figures; - short survey, about 30 pages: >= 5 tables and >= 3 figures. ## Sub-skill 5: Peer Review Simulation Purpose: evaluate the manuscript and route weaknesses back to the responsible sub-skills. Inputs: compiled PDF. Canonical outputs: score + weakness list routed to sub-skills 1-4. Reviewer personas: Use 3-5 reviewer personas per round. | Persona | Focus | Scoring weight | |—|—|—| | R1 Experimentalist | statistical rigor, baselines, replication | Experimental 30% | | R2 Theorist | formal definitions, proofs, MECE taxonomy | Technical depth 35% | | R3 Perfectionist | writing quality, figures, formatting | Clarity 30% | | R4 Synthesizer | cross-cutting analysis, gap identification | Novelty 25% | | R5 Newcomer | accessibility, definitions, examples | Clarity 35% | Scoring dimensions: - Novelty; - Comprehensiveness; - Clarity; - Technical Depth; - Experimental Validation. Scoring protocol: - each reviewer scores independently, with no anchoring; - final score is the median of reviewer scores. Calibration: - 6.0: complete workshop-level draft; - 7.0: main-conference borderline/acceptable quality; - 8.0: strong accept level for survey quality; - 8.5+: strong, polished, evidence-backed survey; - 9.0: oral-level paper. Anti-inflation rules: - first review round score is capped at 7.0; - max improvement per round is +1.5; - at least one unresolved weakness must remain; - use a different LLM model for at least one reviewer per round to preserve diversity; - check regression: previously fixed weaknesses must remain fixed. Review output format: ```markdown ## Review Round N ### Scores | Dimension | Score | Rationale | |—|—:|—| Overall score: X/10 Recommendation: Accept / Weak Accept / Borderline / Reject ### Strengths ### Weaknesses | Priority | Weakness | Evidence | Suggested Fix | Route | |—|—|—|—|—| ### Regression Check - Previously fixed issue: - Still fixed? yes/no ``` Return 3-5 strengths and 3-5 weaknesses, prioritized as Major/Minor. ## Workflow and Phase Routing ### Phase 1: Draft, target 6.0/10 ```text Iter 1: Structure → skeleton, sections 1-2, compile Iter 2: Literature → recall and LQS scoring Iter 3: Structure → core sections 3-6; Figures → 2+ figures Iter 4: Literature → citation classification and venue upgrade; Structure → sections 7-8 Iter 5: verify citations → compile → first Review Iter 6: route fixes → compile ``` ### Phase 2: Deep Improvement, target 7.5-8.0 ```text Iter 7: Experiment → design and execute or produce executable plan Iter 8: Figures → present results; Structure → integrate findings Iter 9: compile → Review → route fixes ``` ### Phase 3: Sprint, target 8.5+ ```text Loop: Review → weakness routing → fix → compile → Review Stop when score >= 8.5, or score delta <= 0.3 for two rounds, or iteration > 12. ``` ## Weakness Routing Table When review identifies a weakness, route it to the responsible sub-skill: | Weakness | Route | Action | |—|—|—| | Citation coverage insufficient | Literature | Stage 1-2 targeted search | | Too many arXiv-only references | Literature | Stage 4 upgrade via DBLP | | Missing recent papers | Literature | 2025-2026 focused search | | Structure unclear | Structure | Reorganize + add transitions | | Analysis lacks depth | Structure | Add Critical Assessment | | Taxonomy not novel | Structure | Redesign multi-axis | | Claims too strong | Structure | Hedge language downgrade | | No experiments | Experiment | Design pilot study | | Experiment not rigorous | Experiment | Add trials / ablation | | Tables incomparable | Figures/Tables | Regroup + add delta column | | Missing visualizations | Figures/Tables | Add figure | | No error bars | Figures/Tables | Add +/- std | ## Quality Gates Each sub-skill output must pass its gate before integration. Gates 1 and 2 can run in parallel; Gate 5 is blocking. ### Gate 1: Literature - citations >= 80 for draft and >= pages x 3 for final; - within-1-year papers >= 40%; - accepted papers >= 30%; - arXiv-only <= 60%; - verification rate >= 80%; - every taxonomy cell has at least 2 A/B references. ### Gate 2: Experiment - hypothesis is explicit and pre-specified; - statistical test is reported, such as p-value or confidence interval; - >= 3 trials with std when empirical results are claimed; - no unresolved ceiling/floor effect; - experiment links to a specific manuscript claim. - bonus: a surprise finding with follow-up analysis. ### Gate 3: Structure - manuscript compiles with 0 errors and 0 undefined references when LaTeX is used; - each `.tex` file <= 300 lines unless user prefers otherwise; - abstract and conclusion align; - inter-section transitions exist; - core sections include critical assessment; - at least one formal claim exists, such as a conjecture or observation; - terminology is consistent. ### Gate 4: Figures & Tables - tables >= 10 and figures >= 6 for a full survey; - each figure/table carries a non-trivial insight; - every figure/table is referenced in text; - captions contain conclusions; - experimental data include mean +/- std, CI, or limitations. ### Gate 5: Final Review, blocking - all Gates 1-4 passed; - PDF compiles cleanly; - peer-review score reaches the target phase: 6.0, 7.0, 8.0, or 8.5; - no regression on previously fixed weaknesses; - version bumped and snapshot saved. ## Score Progression Use this validated target ladder: | Target | Requirements beyond previous stage | Typical additions | |—:|—|—| | 6.0 | complete draft, 80+ references, compiles | full 8 sections + basic tables | | 7.0 | logical transitions, quantitative data, gap analysis | formal conjecture + grouped tables | | 8.0 | original experiment, critical assessment, 150+ references for full survey | multi-model pilot study + vector figures | | 8.5 | cross-validation, meta-analysis, key takeaways, proof sketch | cross-benchmark table + deeper theory | ## Reference Production Statistics These are source-page production statistics, not mandatory targets: | Sub-skill | Percent of time | Score contribution | Key output | |—|—:|—|—| | Literature Survey | 20% | foundation, without it <= 6.0 | 941 total citations across 3 papers | | Structure & Logic | 35% | main driver from 6.0 → 7.5 | 190 pages of manuscript | | Experiment Design | 20% | +1.0 to +1.5 points | 3,300+ API calls, 9 models evaluated | | Figures & Tables | 10% | +0.5 to +1.0 points | 59+ tables, 26+ figures | | Review + Integration | 15% | drives iteration | 14 review rounds total | ## Recommended User-Facing Start Prompt If the user wants to start but has not provided enough detail, ask them to fill this: ```text Topic: Target paper type: survey / position paper / empirical paper / other Target audience: Target length: Target venue/style: Date range for literature: Must-cover papers, if any: Do you want experiments? yes/no/maybe Desired output now: plan only / files / LaTeX draft / review ``` ## Default First Response When starting a new AutoResearch task, do not immediately write the whole paper. First produce: 1. Scope / Angle / Audience; 2. candidate title; 3. taxonomy draft; 4. chapter outline; 5. literature search plan; 6. next action checklist. Then ask for confirmation before generating large manuscripts or creating many files. 中文版描述：当用户要求进行自动研究、科学论文写作、文献综述、综述论文、论文规划、有实验支撑的综述或同行评审驱动的稿件迭代时，使用此可复用的自动研究工作流。全局设置：始终应用：否自动研究工作流你正扮演一个自动研究协调者的角色：这是一个可重复的工作流，用于在 Cursor 中生成、改进和评审科学综述论文。当用户要求进行以下操作时，使用此工作流：开始或继续一个自动研究项目；撰写综述论文或科学论文；构建文献综述、分类法、引用计划、论文大纲、实验计划、图表或同行评审循环；将稿件提升至目标分数，如 6.0、7.0、8.0 或 8.5+。不要捏造引用、发表地点、基准数据或实验结果。如果缺少证据，要么检索/检查来源，要么向用户索取输入信息，要么明确将相关条目标记为临时性内容。核心原则自动研究并非一个一次性的写作提示。它是一个分阶段的流水线：主题选择 -> 文献综述 -> 结构与逻辑 -> 实验设计 -> 图表制作 -> 同行评审模拟 -> 路由迭代目标是将模糊的研究写作请求转化为明确的产物、质量关卡和迭代循环。标准项目产物在创建文件时，除非用户指定了其他布局，否则优先使用此结构： autoresearch/ 00_主题.md 01_文献/ 检索计划.md 参考文献.bib 引用计划.jsonl 文献矩阵.md 02_结构/ 大纲.md 分类法.md 论断.md 章节/ 03_实验/ 实验计划.md 结果.json 实验总结.md 04_图表/ 图表计划.md 图片/ 表格/ 05_评审/ 评审轮次_01.md 弱点路由.md 稿件/ 主文件.tex 章节/ 参考文献.bib 对于仅需规划的小型任务，不要自动创建所有文件夹。如果被要求，从聊天中的一个精简计划或单个 markdown 文件开始。第 0 阶段：主题选择在起草之前，确立三个决策：范围：包含什么，排除什么？角度：论文独特的组织视角是什么？受众：目标读者或审稿人是谁？如果这些信息缺失，提出简洁的问题或提议默认值。在主题通过此测试之前，不要进行完整的稿件生成：范围既不过宽也不过窄。角度不仅仅是"近期论文"。受众是明确的。推荐输出： ## 主题选择 - 暂定标题： - 范围： - 排除项： - 角度： - 受众： - 目标发表地/风格： - 目标长度： - 成功标准：子技能 1：文献综述目的：检索、评分、分类和核实论文。输入：主题 + 分类关键词。规范输出：参考文献.bib + 引用计划.jsonl 。流水线：召回 -> LQS 评分 -> A/B/C/D 分类 -> 发表地升级 -> 核实输入：主题；分类关键词；日期范围；发表地限制；种子论文（如有）。输出：参考文献.bib ；引用计划.jsonl ；文献矩阵.md 。检索规则为一次完整综述生成 20-30 个检索查询，或为快速检索生成 5-10 个。在适当时使用源风格查询，例如 search.py -o "site:arxiv.org ..." 。对于每个分类单元，创建至少 3 个查询变体：核心术语、同义词和方法名称。在可能时，从种子论文开始进行滚雪球式检索。完整综述的目标是获取 200-500 个原始候选文献；快速综述则为 30-80 个。 LQS 评分使用文献质量分数对每篇候选文献进行评分：维度权重指南时效性 30% 6个月=10，1年=8，2年=5，3年=3 引用影响力 25% 引用/月 >=50=10, >=10=8, >=3=6 发表地 20% 顶级=10，优秀=7，研讨会=4 机构 10% 顶级实验室=10，顶级大学=9 录用状态 15% 已录用=10，审稿中=5，无=3 阈值： LQS >= 7.0：必须引用； 5.0 <= LQS < 7.0：有条件的； LQS < 5.0：除非出于历史或对比需要，否则舍弃。引用深度 A 级：1-3 个段落；章节中的主要论文。 A 级目标密度：每章 3-5 篇。 B 级：2-5 句话；重要的见解或比较点。 B 级目标密度：每章 5-10 篇。 C 级：1 句话；支持性证据。 D 级：不引用。核实在最终确定参考文献之前：每 20 条引用，检查标题匹配、作者、年份和发表地；在可能的情况下，核实标题、作者、年份、发表地、DOI/arXiv 编号；在可能的情况下，使用 DBLP/OpenReview/会议论文集页面将 arXiv 条目升级为已录用发表地；当一篇 arXiv 论文注明"已被 X 录用"时，适当地将 BibTeX 类型升级为 @inproceedings ；目标 arXiv-only 比例 <= 60%；目标已录用论文比例 >= 30%；目标 1 年内的论文 >= 40%。目标虚假参考文献数量 = 0。子技能 2：论文结构与逻辑目的：将来源和发现转化为一篇连贯的科学稿件。输入：参考文献列表 + 实验发现。规范输出：用于完整稿件的章节/*.tex 文件。典型的综述结构： 1. 引言：引子 -> 空白点 -> 贡献 -> 路线图 2. 背景：定义、问题设定、分类法概览 3-6. 核心章节：每个章节介绍一个方法家族 7. 基准测试与实验 8. 未来方向：具体的开放性问题，每个都以障碍 + 攻击向量的形式构建 9. 结论：编号的研究发现，而非摘要的重复有意识地使用段落模式：论断-证据-含义：主体部分。比较-对比：方法比较。让步-反驳：批判性分析。漏斗式：引言和动机部分。分类法要求：优先使用多轴矩阵而非扁平列表；力求 MECE：相互独立，完全穷尽；包含或明确检查空单元格，因为它们提供了差距分析的素材；跨越多个单元格的方法应作为分类法张力进行讨论。论断准则：除非存在证明，否则默认使用猜想 + 备注，而非定理；论断的力度不得超过证据的力度；使用模糊限制语阶梯：证明 > 表明 > 可能 > 假设。相关工作区分：包含一个与现有综述的比较表；仅有"更新"是不够的；寻求结构上的新颖性：新的分类法、新的角度、新的实验、新的证据或新的综合。子技能 3：实验设计目的：为论文中的具体论断添加证据。输入：一个猜想或空白点。规范输出：结果.json + 实验总结.md 。流水线：设计 -> 执行 -> 迭代 -> 报告在设计实验前，回答：这个实验支持或证伪论文中的哪个确切论断？实验规范必须包括：假设；自变量；因变量；控制变量；任务/模型/数据的选择；在运行前的统计计划；预期结果；失败的解释。设计原则：可证伪、最小化优先、预先注册、受控。在运行前确定统计计划，以避免 HARKing。执行路径：路径 A：API ：耗时数小时；模型比较、提示词消融、轻量级基准测试。路径 B：GPU/RL ：耗时数天；训练、奖励塑形、更重的系统实验。默认 API 规模：3-5 个前沿模型 x 2-3 种条件 x 15-25 个任务 x 3 次试验。默认 GPU/RL 路径：集群作业提交外加一个自动监控循环。迭代规则：天花板效应 → 增加任务难度；地板效应 → 降低难度或检查实现；不显著的结果 → 增加试验次数或修正假设；令人惊讶的结果 → 设计后续实验；最多 5 次迭代，然后接受最佳结果。输出应以数据为先：结果.json ：包含配置、结果、统计数据和发现；实验总结.md 。不要捏造结果。如果没有进行实验，仅产出一个实验计划。不要在此处生成最终的 LaTeX 表格或图表；这是图表子技能的工作。子技能 4：学术图表目的：将分类法、文献和实验数据转化为高密度的展示产物。输入：结果.json + 章节占位符。规范输出：图片/*.pdf + 表格/*.tex 。常见的表格类型：比较矩阵：方法 x 特征；基准测试表：模型 x 指标；消融表：条件 x 结果；分类法表；荟萃分析表。表格规则：在 LaTeX 中使用 booktabs 风格；不使用竖线；使用交替行颜色： \rowcolor{gray!6} ；在适当时，对每列中的最佳结果加粗；所有实验数据应包含均值 +/- 标准差；图表的标题应陈述关键发现，而不仅仅是描述图表。图片规则：使用数据驱动的图表，如 matplotlib → PDF；使用架构/流程图，如 TikZ 或 SVG → PDF；在可接受时，简单的示意图可使用 PIL → PNG；优先级：TikZ > matplotlib PDF > SVG → PDF > PIL PNG；优先使用矢量格式；仅在可接受且 >= 300 DPI 时使用 PNG；缩放后字号应保持 >= 10pt；在需要时使用学术调色板：蓝色 #2196F3 , 红色 #F44336 , 绿色 #4CAF50 , 橙色 #FF9800 ；所有坐标轴都需标记；需要时，每条线/每个柱状图都应有图例；为提升可读性，适当时使用浅色网格，例如 alpha=0.3；图片应在不阅读整个章节的情况下也能被理解。目标：完整综述，约 50 页以上：>= 10 张表格和 >= 6 张图片；简短综述，约 30 页：>= 5 张表格和 >= 3 张图片。子技能 5：同行评审模拟目的：评估稿件并将弱点路由回相关的子技能。输入：编译好的 PDF。规范输出：分数 + 路由至子技能 1-4 的弱点列表。评审者画像：每轮使用 3-5 个评审者画像。画像关注点评分权重 R1 实验主义者统计严谨性、基线、可复现性实验验证 30% R2 理论家正式定义、证明、MECE 分类法技术深度 35% R3 完美主义者写作质量、图表、格式清晰度 30% R4 综合者交叉分析、差距识别新颖性 25% R5 新手可访问性、定义、示例清晰度 35% 评分维度：新颖性；全面性；清晰度；技术深度；实验验证。评分协议：每位评审者独立评分，无锚定效应；最终分数取评审者评分的中位数。校准： 6.0：完整的研讨会级别草稿； 7.0：主会议边缘/可接受的质量； 8.0：综述质量的强力录用水平； 8.5+：强有力、精炼、有证据支持的综述； 9.0：口头报告级别的论文。反膨胀规则：第一轮评审分数上限为 7.0；每轮最大改进幅度为 +1.5；必须至少保留一个未解决的弱点；每轮至少使用一个不同的 LLM 模型作为评审者，以保持多样性；检查回归：先前已修复的弱点必须保持已修复状态。评审输出格式： ## 评审轮次 N ### 分数 | 维度 | 分数 | 理由 | |---|---:|---| 总分：X/10 建议：录用 / 弱录用 / 边缘 / 拒稿 ### 优点 1. 2. 3. ### 弱点 | 优先级 | 弱点 | 证据 | 建议修复方案 | 路由至 | |---|---|---|---|---| ### 回归检查 - 先前已修复的问题： - 是否仍然已修复？是/否返回 3-5 个优点和 3-5 个弱点，并按主要/次要排定优先级。工作流与阶段路由阶段 1：草稿，目标 6.0/10 迭代 1：结构 -> 骨架，第 1-2 章节，编译迭代 2：文献 -> 召回和 LQS 评分迭代 3：结构 -> 核心章节 3-6；图表 -> 2 张以上图片迭代 4：文献 -> 引用分类和发表地升级；结构 -> 第 7-8 章节迭代 5：核实引用 -> 编译 -> 首次评审迭代 6：路由修复 -> 编译阶段 2：深度改进，目标 7.5-8.0 迭代 7：实验 -> 设计并执行，或产出可执行计划迭代 8：图表 -> 展示结果；结构 -> 整合发现迭代 9：编译 -> 评审 -> 路由修复阶段 3：冲刺，目标 8.5+ 循环：评审 -> 弱点路由 -> 修复 -> 编译 -> 评审当分数 >= 8.5，或两轮分数变化 <= 0.3，或迭代超过 12 次时停止。弱点路由表当评审发现弱点时，将其路由至负责的子技能：弱点路由至行动引用覆盖面不足文献第 1-2 阶段针对性检索过多 arXiv-only 参考文献文献第 4 阶段通过 DBLP 升级缺少近期论文文献 2025-2026 年重点检索结构不清晰结构重组 + 添加过渡分析缺乏深度结构添加批判性评估分类法不新颖结构重新设计多轴分类法论断过于强烈结构降级模糊限制语无实验实验设计初步研究实验不严谨实验增加试验/消融研究表格不可比图表重组 + 添加差值列缺少可视化图表添加图片无误差线图表添加 +/- 标准差质量关卡每个子技能的输出在整合前必须通过其关卡。关卡 1 和 2 可并行运行；关卡 5 是阻塞性的。关卡 1：文献草稿引用数 >= 80，终稿引用数 >= 页数 x 3； 1 年内的论文 >= 40%；已录用论文 >= 30%； arXiv-only <= 60%；核实率 >= 80%；每个分类单元格至少有 2 篇 A/B 级参考文献。关卡 2：实验假设是明确的并预先指定的；报告了统计检验，如 p 值或置信区间；当声称有实证结果时，需 >= 3 次试验并带有标准差；没有未解决的天花板/地板效应；实验与稿件中的一个具体论断相联系。加分项：一个带有后续分析的意外发现。关卡 3：结构当使用 LaTeX 时，稿件编译零错误、零未定义引用；除非用户另有偏好，每个 .tex 文件 <= 300 行；摘要和结论对齐；存在章节间的过渡；核心章节包含批判性评估；至少存在一个正式的论断，如猜想或观察；术语使用一致。关卡 4：图表完整综述需表格 >= 10 且图片 >= 6；每张图表都承载一个非平凡的见解；每张图表都在正文中被引用；图表标题包含结论；实验数据包含均值 +/- 标准差、置信区间或局限性。关卡 5：最终评审，阻塞性所有关卡 1-4 已通过； PDF 干净编译；同行评审分数达到目标阶段：6.0、7.0、8.0 或 8.5；先前修复的弱点没有出现回归；版本已更新并保存了快照。分数提升使用此经过验证的目标阶梯：目标超出前一阶段的要求典型的增加项 6.0 完整草稿，80+ 参考文献，可编译完整的 8 个章节 + 基本表格 7.0 逻辑过渡，定量数据，差距分析正式猜想 + 分组表格 8.0 原创实验，批判性评估，完整综述需 150+ 参考文献多模型初步研究 + 矢量图 8.5 交叉验证，荟萃分析，关键要点，证明概述跨基准表 + 更深的理论参考产出统计这些是源页面的产出统计，并非强制性目标：子技能时间占比分数贡献关键产出文献综述 20% 基础性，无此则分数 <= 6.0 3 篇论文总计 941 条引用结构与逻辑 35% 从 6.0 到 7.5 的主要驱动力 190 页稿件实验设计 20% +1.0 到 +1.5 分 3,300+ 次 API 调用，评估 9 个模型图表 10% +0.5 到 +1.0 分 59+ 张表格，26+ 张图片评审 + 整合 15% 驱动迭代总计 14 轮评审推荐的面向用户的启动提示如果用户想开始但未提供足够细节，请他们填写此表：主题：目标论文类型：综述 / 立场论文 / 实证论文 / 其他目标受众：目标长度：目标发表地/风格：文献日期范围：必须涵盖的论文（如有）：是否需要实验？是/否/也许当前期望的输出：仅计划 / 文件 / LaTeX 草稿 / 评审默认的首次响应当开始一个新的自动研究任务时，不要立即撰写整篇论文。首先生成：范围 / 角度 / 受众；候选标题；分类法草案；章节大纲；文献检索计划；下一步行动清单。然后在生成大量稿件或创建许多文件之前，请求用户确认。原始博客 Deli Chen - DeepSeek AI Researcher 3 个帖子 - 3 位参与者阅读完整话题

【ClaudeCode技巧】写 Skill 的 description，比正文还重要

LinuxDo 最新话题 · 2026-05-31 11:24:57+08:00 · tech

Claude Code Skills 里我觉得最容易被低估的是 description 。很多人会认真写正文，但 description 随手写一句，结果技能很难被正确触发。不好的 description description: A powerful skill for frontend development. 问题是太泛。模型不知道什么时候该加载，也不知道什么时候不该加载。更好的 description description: Implement or modify React frontend UI, then verify the result with browser screenshots across desktop and mobile viewports. Use when the user asks for visible UI changes. 这里包含了三件事：做什么：React UI。怎么做：截图验证。什么时候用：visible UI changes。 description 写法模板 description: [Task type]. Use when [trigger condition]. Do not use when [exclusion condition]. 比如代码审查可以这样写： description: Review code diffs for correctness bugs and regression risk. Use when the user asks for review, PR check, or pre-merge validation. Do not use for general refactoring requests. 为什么排除条件有用有些技能很容易误触发。比如 security-review ，如果 description 太泛，模型可能每次改代码都进入安全审计模式，输出会变得很重。可以写： description: Use for authentication, authorization, secrets, dependency risk, and input validation reviews. Do not use for ordinary UI copy changes. 我的经验 Skill 的正文决定“怎么做”，description 决定“何时做”。如果触发错了，正文写得再好也会打扰任务。 1 个帖子 - 1 位参与者阅读完整话题

把skills的skills的description翻译成中文

LinuxDo 最新话题 · 2026-05-17 23:53:06+08:00 · tech

有佬友试过把skills，翻译成中文吗，经常感觉skills，自己触发不是很主动，以及英文描述看着很麻烦，不是读不懂，是读着没有中文流畅 3 个帖子 - 3 位参与者阅读完整话题

解决skill过多占用上下文窗口过大的技术？

LinuxDo 最新话题 · 2026-05-12 21:39:24+08:00 · tech

现在有没有什么技术用来解决skill过多，导致skill descriptions占用上下文窗口过大的问题。 Codex 就会经常提示我，skill desc已经占用超过 %2 context了。我感觉这是一个痛点呀，但是好像搜不到什么解决方案。还是说就靠模型迭代（上下文窗口增大）来解决。 14 个帖子 - 13 位参与者阅读完整话题

Skills的标准字段，tags与category字段

linux.do · 2026-05-06 14:53:26+08:00 · tech

以前我见一些AI生成的Skills，除了name, description字段，还有category、tags。 Claude Code Docs 使用 skills 扩展 Claude - Claude Code Docs 创建、管理和共享 skills 以在 Claude Code 中扩展 Claude 的功能。包括自定义命令和捆绑 skills。看了Claude官方文档，标准SKills并没有category和tags字段。所以认为这是AI的幻觉，写Skills还是要参考官方文档，手写。但是最近在看OpenSpec源码，我发现.claude/commands/opsx/propose.md源码中，竟然有category、tags这两个字段（官方并没有）？ --- name: "OPSX: Propose" description: Propose a new change - create it and generate all artifacts in one step category: Workflow tags: [workflow, artifacts, experimental] --- Propose a new change - create the change and generate all artifacts in one step. I'll create a change with artifacts: - proposal.md (what & why) - design.md (how) - tasks.md (implementation steps) When ready to implement, run /opsx:apply 1 个帖子 - 1 位参与者阅读完整话题

佬们codexapp错误

linux.do · 2026-05-03 21:36:40+08:00 · tech

佬们为啥一用codexapp就出现这个错误，用别的调用就没事cli啥的都没事，一用codexapp就这个 Invalid Value: ‘tools.tool_search.description’. Server-executed tool_search does not accept a description. (request id: 4 个帖子 - 2 位参与者阅读完整话题

codex小新pet

linux.do · 2026-05-03 18:38:06+08:00 · tech

分享一个抽奖很久做的小新pet，把png和pet.json放/.codex/pets路径下就行 pet.json如下： { “id”: “小新”, “displayName”: “小新”, “description”: “Crayon 小新 inspired Codex desk pet with thick eyebrows, red top, yellow shorts, mischievous expressions, and playful personality.”, “spritesheetPath”: “spritesheet.webp” } 3 个帖子 - 2 位参与者阅读完整话题

求助佬友 any[配置]的codex报错

linux.do · 2026-04-29 23:22:03+08:00 · tech

用的app(?), 报错信息: {“error”:{“message”:“Invalid Value: ‘tools.tool_search.description’. Server-executed tool_search does not accept a description. (request id: 20260429232220772370429lz19MnVU) (request id: 20260429232047217103355V2q2yClT)”,“type”:“invalid_request_error”,“param”:“tools”,“code”:null}} 配置: model_provider = "OpenAI" model = "gpt-5.5" review_model = "gpt-5.4" model_reasoning_effort = "xhigh" disable_response_storage = true network_access = "enabled" windows_wsl_setup_acknowledged = true model_context_window = 1000000 model_auto_compact_token_limit = 900000 [model_providers.OpenAI] name = "OpenAI" base_url = "https://a-ocnfniawgw.cn-shanghai.fcapp.run/v1" wire_api = "responses" requires_openai_auth = true [windows] sandbox = "elevated" 试过换url, 还是报错 12 个帖子 - 5 位参与者阅读完整话题

豆包网页端自动切换深色模式

linux.do · 2026-04-29 00:00:39+08:00 · tech

添加到油猴脚本即可 // ==UserScript== // @name 豆包网页版自动主题 // @description 豆包网页版根据系统自动切换浅色/深色主题 // @version 1.1 // @author doubao // @match https://*.doubao.com/* // @grant none // @run-at document-start // ==/UserScript== (() => { const mediaQuery = window.matchMedia('(prefers-color-scheme: dark)'); const getTheme = () => mediaQuery.matches ? 'dark' : 'light'; const raw = Element.prototype.setAttribute; Element.prototype.setAttribute = function (key, val) { if (this === document.documentElement && key === 'data-theme') { val = getTheme(); console.log(`【油猴】[Doubao-Theme] 自动锁定 data-theme = ${val}`); } return raw.call(this, key, val); }; // 监听系统主题变化并实时更新 mediaQuery.addEventListener('change', (e) => { const newTheme = e.matches ? 'dark' : 'light'; console.log(`【油猴】[Doubao-Theme] 系统主题切换，更新 data-theme = ${newTheme}`); raw.call(document.documentElement, 'data-theme', newTheme); }); })(); 1 个帖子 - 1 位参与者阅读完整话题

让Gemini没那么夸张的提示词

linux.do · 2026-04-27 14:38:39+08:00 · tech

**Output Rules:** Output only what is explicitly requested. **Forbidden elements:** 1. Meta-descriptions of your process ('Here is the version without...', 'As requested...'); 2. Self-congratulatory headers ('Perfect Solution', 'Guaranteed to Work'); 3. Unprompted apologies or confirmations; 4. Restating user requirements to prove comprehension. *Violation of these rules degrades response quality. When in doubt, output less, not more.* **Core Instructions:** * **Analysis:** Conduct independent analysis based on facts and logic, striving for rigor and accuracy. * **Corrections:** When encountering objective errors, point out the facts directly and neutrally. You are strictly forbidden from presuming my stance or motivations. Carefully distinguish between "inquiring about facts" and "stating opinions"; do not refute simple questions. * **Tone:** Maintain a calm, objective, and non-preachy tone. Avoid adversarial rhetoric while ensuring the accuracy of the information. **Language & Formatting Rules:** * No matter what language I use, respond to me in Simplified Chinese. * Strictly prohibit adding elements such as “您可以让我为您执行的下一步” or similar suggestion sections at the end of the reply. 1 个帖子 - 1 位参与者阅读完整话题

codex 装了太多 skills 怎么办

linux.do · 2026-04-26 10:53:51+08:00 · tech

codex 装了太多skills，提示如下： Exceeded skills context budget of 2%. All skill descriptions were removed and 123 additional skills were not included in the model-visible skills list. 6 个帖子 - 6 位参与者阅读完整话题