docs: add lark drive knowledge organization workflow (#1028)

Change-Id: I2343fcdf26ceefb898cc8d4faeae4b17384cfea8
This commit is contained in:
YH-1600
2026-06-02 16:28:25 +08:00
committed by GitHub
parent 4710a294f5
commit 925ae5ecd6
8 changed files with 1477 additions and 0 deletions

View File

@@ -18,6 +18,7 @@ metadata:
## 快速决策
- 用户要**整理云盘 / 文件夹 / 文档库 / 知识库 / 个人文档库**,或要“盘点目录结构、找出未归档/临时/重复/空目录、生成整理方案”,必须先阅读 [`references/lark-drive-workflow-knowledge-organize.md`](references/lark-drive-workflow-knowledge-organize.md)。默认只生成方案;创建目录、移动资源、申请权限都必须单独确认。
- 用户要**搜文档 / Wiki / 电子表格 / 多维表格 / 云空间(云盘/云存储)对象**,优先使用 `lark-cli drive +search`。自然语言里"最近我编辑过的"、"我创建的"(→ `--mine`,实为 owner 语义)、"最近一周我打开过的 xxx"、"某人 owner 的 docx" 等直接映射到扁平 flag避免手写嵌套 JSON。
- 用户要把本地 `.xlsx` / `.csv` / `.base` 导入成 Base / 多维表格 / bitable第一步必须使用 `lark-cli drive +import --type bitable`
- 用户要把本地 `.md` / `.docx` / `.doc` / `.txt` / `.html` 导入成在线文档,使用 `lark-cli drive +import --type docx`

View File

@@ -0,0 +1,222 @@
# 知识整理工作流Analysis
Loaded by states: `CONTENT_READ`, `ISSUE_ANALYSIS`, `RULE_GENERATION`.
This file owns low-confidence partial reads, issue analysis, classification rules, and target tree generation. It MUST NOT create execution plans, ask for execution confirmation, or perform write operations.
## Required Context
Before executing rules in this file:
1. `resource_items` MUST already exist from [`lark-drive-workflow-knowledge-organize-discovery.md`](lark-drive-workflow-knowledge-organize-discovery.md).
2. For document partial reads, follow [`../../lark-doc/SKILL.md`](../../lark-doc/SKILL.md) and [`../../lark-doc/references/lark-doc-fetch.md`](../../lark-doc/references/lark-doc-fetch.md).
3. For sheet / bitable down-drill, follow [`../../lark-sheets/SKILL.md`](../../lark-sheets/SKILL.md) or [`../../lark-base/SKILL.md`](../../lark-base/SKILL.md) only when title and path are insufficient.
## State: CONTENT_READ
Entry: `resource_items` exists.
MUST:
1. Build `low_confidence_items`.
2. Apply `Low-Confidence Partial Read`.
3. Read only supported docs through `lark-doc-fetch`.
4. Switch to `lark-sheets` / `lark-base` only when sheet / bitable title and path are insufficient.
5. Record read evidence for classification.
6. Continue reading low-confidence resources in internal batches until all supported low-confidence resources in the current inventory are processed or a blocker occurs.
7. Output progress / summary without asking the user to continue between batches.
Exit: low-confidence items are classified or marked `needs_review=true`.
### Low-Confidence Partial Read
Low-confidence resources include:
- 标题为空
- 标题为 `test` / `测试` / 纯数字 / 无意义短词
- 标题、路径、类型之间没有足够分类线索
- 同一标题或相似标题出现在多个候选分类中
- 用户要求按项目 / 客户 / 业务线归类,但标题和路径没有明确项目 / 客户 / 业务线名称
| Condition | Agent MUST Do | Agent MUST NOT Do |
|-----------|---------------|-------------------|
| Title / path / type clearly determine classification | Classify directly | Do not perform content read |
| Resource is low-confidence and docs-fetch-supported | Read outline via `lark-doc-fetch` | Do not skip partial read |
| Candidate project / customer / business / document-type terms exist | After outline, run keyword partial read with candidate terms | Do not use broad generic keywords |
| Partial read returns usable block id and classification is still unclear | Read the relevant section via `lark-doc-fetch` | Do not read the full document |
| Partial read still cannot classify | Set `needs_review=true`; classify to manual confirmation target | Do not invent classification |
| Read fails or permission is insufficient | Set `needs_review=true`; record failure reason | Do not retry indefinitely |
### Partial Read Limits
| Limit | Default |
|-------|---------|
| `batch_size` | 20 resources per internal batch |
| `progress_report_interval` | 50 low-confidence resources |
| `max_attempts_per_resource` | 3 partial reads: outline, keyword, section |
Batching rules:
1. Sort low-confidence resources by impact before reading: root-level loose items, duplicated titles, project/customer ambiguity, then empty or meaningless titles.
2. Read supported low-confidence resources across internal batches without asking the user to continue after each batch.
3. Process reads in internal batches of `batch_size`; do not ask the user between internal batches unless auth, permission, or API errors block progress.
4. After each internal batch, update `low_confidence_items` with read evidence or `needs_review=true`.
5. After every `progress_report_interval` processed resources, output a progress summary and continue automatically.
6. If unread low-confidence resources remain because of auth, permission, API, unsupported type, or tool budget blockers, set `partial=true`, report unread count, and default remaining unread items to `needs_review=true` with target path set to manual confirmation target.
7. Never bypass these limits by reading full documents.
### Low-Confidence Read Start Notice
When `low_confidence_total > 100`, output this notice before reading:
```text
低置信度资源较多,共 <low_confidence_total> 项。我会分批做轻量读取并定期汇报进度;不会读取全文,也不会执行移动或创建。
```
### Low-Confidence Read Summary
Use this as progress / final summary output. Do not ask the user to continue unless a blocker occurs.
```text
低置信度内容读取进度
- 低置信度资源总数:<low_confidence_total>
- 已读取:<read_done>/<low_confidence_total>
- 已补充证据并完成分类:<classified_count>
- 暂入待人工确认:<needs_review_count>
- 失败:<failed_count>
继续分析整理问题。
```
Output this summary:
- After every 50 processed low-confidence resources.
- Once after low-confidence reading finishes.
## State: ISSUE_ANALYSIS
Entry: `resource_items` and partial-read evidence are ready.
MUST:
1. Detect problems from organization perspective only. Do not generate research conclusions.
2. Generate an organization approach based on inventory, low-confidence read evidence, and detected problems.
3. Include how non-reused source containers will be handled after their contents are moved.
4. Output `Inventory And Organization Approach Decision`.
5. Stop and wait for the user to confirm the approach before `RULE_GENERATION`.
Problem rules:
| Problem | Detection Rule |
|---------|----------------|
| 根目录堆积 | 根目录直接资源过多,或超过总资源的明显比例 |
| 同类文件分散 | 标题 / 类型相似的资源分布在多个无关路径 |
| 命名不统一 | 同类资源日期、客户、项目命名格式明显不一致 |
| 临时内容过多 | 标题 / 路径含 `临时``测试``tmp``draft``转移``未整理` |
| 空目录 | 目录类节点无后代资源 |
| 重复目录 | 目录名归一化后相同或高度相似 |
| 过旧归档内容 | 旧年份资源仍散落在活跃目录 |
MUST output evidence count or example paths. Do not output only abstract judgment.
### Problem Pagination
| Output Area | Rule |
|-------------|------|
| Problem overview | Show at most 5 problem types per page |
| Problem examples | Show at most 3 example paths per problem type |
| Pagination | Affects display only; complete `issue_summary` MUST remain internal |
### Inventory And Organization Approach Decision
```text
盘点与整理思路
盘点结果:
| 指标 | 数量 |
|------|------|
| 总资源数 | |
| 各类型资源数 | |
| 一级目录数量 | |
| 根目录直接资源数 | |
| 空目录数量 | |
| 低置信度资源数 | |
| 已完成低置信度读取 | |
| 待人工确认 | |
| partial | |
共发现 <problem_type_count> 类问题,当前展示第 <page>/<total_pages> 页。
| 问题 | 证据数量 | 样例路径 | 说明 |
|------|----------|----------|------|
整理思路:
- <approach item 1>
- <approach item 2>
- 对证据不足、读取失败或权限不足的资源放入"待人工确认"
- 如存在不再复用的来源目录,内容迁出后将目录本体收起到 `待人工确认/待清理旧目录`,避免整理后一级目录仍杂乱
- 不删除、不重命名、不修改权限
是否基于这个整理思路生成目标目录和移动 / 创建计划?
你可以选择:
A. 基于这个思路生成目标目录和计划
B. 调整整理思路
C. 查看问题详情
D. 取消本次整理
```
## State: RULE_GENERATION
Entry: user confirms the organization approach.
MUST:
1. Generate `classification_rules`.
2. Generate `target_tree`.
3. Generate `target_tree` to at least two levels; include third level when needed for project / customer / document-type grouping.
4. Reuse existing clear structure when possible.
5. Identify reused top-level containers and non-reused source containers, and set `source_container_disposition`.
6. For non-reused source containers, ensure `target_tree` includes a source-container cleanup target, defaulting to `待人工确认/待清理旧目录`, unless the user explicitly asks to keep source containers in place.
7. Ensure target tree can contain every planned `target_path`.
8. Ensure the target tree contains a manual confirmation target named `待人工确认` unless the user explicitly provides an equivalent name.
9. Continue to `PLAN_GENERATION` without a separate target-tree-only confirmation.
### Classification
| Condition | Agent MUST Do |
|-----------|---------------|
| Existing structure is clear | Reuse existing directory names and hierarchy |
| Title / path / type is enough | Classify without content read |
| Item remains uncertain after mandatory partial read | Put into manual confirmation target and set `needs_review=true` |
| Item is temporary / test / draft | Prefer temporary / test target |
| Root has many loose resources | Prefer organizing root-level obvious items first |
| User asks project / customer grouping | Use project / customer names from title, path, and partial read evidence |
| Naming is inconsistent | Report the issue with examples only; do not generate rename actions |
### Adaptive Classification
The agent MUST NOT start from a fixed default category list. A fixed taxonomy can bias classification and confuse users when category names or numeric prefixes do not match their resources.
Derive categories from the current `resource_items` and partial-read evidence:
1. First group resources by clear signals from title, current path, type, and mandatory partial-read evidence.
2. Prefer category names that appear in the user's own content, such as project names, customer names, business lines, document types, years, or existing folder / Wiki node names.
3. Create a category only when there is enough evidence for at least one resource.
4. Do not create generic buckets such as archive, temporary, test, meeting, dashboard, or operations unless the current resources contain matching evidence.
5. Do not add numeric prefixes to category names unless the user explicitly asks for ordered naming.
6. Always keep a manual confirmation target named `待人工确认` or an equivalent user-specified name for unresolved items.
### Target Tree
`target_tree` is generated in this state but shown together with the move / create plan in `PLAN_GENERATION`. Do not stop after displaying a target tree alone.
## Analysis Failure Handling
| Failure / Blocker | Agent MUST Do | Agent MUST NOT Do |
|-------------------|---------------|-------------------|
| Missing API scope | Follow `lark-shared` permission handling and stop | Do not retry the same command repeatedly |
| Resource access denied | Stop and follow the main workflow `Permission Request Gate` | Do not request permission automatically or in batch |
| Partial document read fails for a low-confidence item | Mark item `needs_review=true`, record reason, and route to manual confirmation target | Do not classify by guessing |
| Item remains ambiguous after partial read | Mark `needs_review=true` and route to manual confirmation target | Do not invent classification |

View File

@@ -0,0 +1,205 @@
# 知识整理工作流Discovery
Loaded by states: `PARSE_SCOPE`, `INVENTORY`.
This file owns target parsing, scope clarification, resource inventory, ResourceItem normalization, dedupe, and partial inventory handling. It MUST NOT generate classification rules, execution plans, or perform write operations.
## Required Context
Before executing rules in this file:
1. Follow [`../../lark-shared/SKILL.md`](../../lark-shared/SKILL.md) for identity, auth, and permission handling.
2. For Wiki / personal library targets, follow [`../../lark-wiki/SKILL.md`](../../lark-wiki/SKILL.md).
3. For Drive search targets, follow [`lark-drive-search.md`](lark-drive-search.md).
4. For URL / token inspection, follow [`lark-drive-inspect.md`](lark-drive-inspect.md) and [`../../lark-wiki/references/lark-wiki-node-get.md`](../../lark-wiki/references/lark-wiki-node-get.md).
## State: PARSE_SCOPE
Entry: workflow triggered.
MUST:
1. Identify `target_scope`, `environment_profile`, and `identity`.
2. Apply `Scope Parsing`.
3. Output `Scope Confirmation`.
4. Stop and wait for user confirmation before `INVENTORY`.
Exit: user confirms target scope.
### Scope Parsing
| Condition | Agent MUST Do | Set `target_scope` | Next State |
|-----------|---------------|--------------------|------------|
| Input is `/wiki/<token>` URL | Resolve the Wiki node and preserve both node identity and object identity | Wiki node | `INVENTORY` after user confirms scope |
| Input is Wiki space name / `space_id` | Resolve the Wiki space; 0 matches -> stop and ask; 1 exact match -> continue; multiple matches -> show candidates and wait for user selection; do not treat `my_library` as a normal listed space | Wiki space | `INVENTORY` after user confirms scope |
| Input has Personal Library Intent | Treat as Wiki personal library / `my_library`; resolve real `space_id` before root write; do not treat it as Drive root or owned Drive document search | Personal doc library | `INVENTORY` after user confirms scope |
| Input is `/drive/folder/<token>` URL | Extract `folder_token` | Drive folder | `INVENTORY` after user confirms scope |
| Input has Drive Folder Intent but no concrete folder URL, token, or unique folder name | Ask for folder URL / token / name; if a concrete folder name exists, search folder candidates and wait for user selection when 0 or multiple matches exist | Unknown or Drive folder candidate | Stay in `PARSE_SCOPE` until scope is confirmed |
| Input has Broad Cloud Drive Intent without explicit owned-document search request | Ask the user to choose concrete scope: Drive folder URL / token, Drive root, owned Drive document search, or another explicit search filter; do not default to `drive +search --mine` | Unknown | Stay in `PARSE_SCOPE` until scope is confirmed |
| Input is single cloud resource URL | Resolve the resource type; if not folder / Wiki scope, do not expand automatically | Single resource | Ask whether scope is this resource, parent folder, owning Wiki, or related search results |
| Input is real keyword / name | Search with the real keyword according to `lark-drive-search` | Search scope | `INVENTORY` after user confirms scope |
| Input is range browsing / statistical description with no real keyword | Search by filters / empty-query browsing according to `lark-drive-search` | Search scope | `INVENTORY` after user confirms scope |
| Input is ambiguous | Ask the minimum clarification question and stop | Unknown | Stay in `PARSE_SCOPE` |
Personal Library Intent means the user is referring to the current user's own Feishu document library / personal document library / personal knowledge library, such as `个人文档库`, `飞书个人文档库`, `我的文档库`, `个人知识库`, `我的知识库`, `My Document Library`, or `my_library`.
When this intent is detected, use Wiki personal library semantics. Do not use Drive root, `drive +search --mine`, or broad owned-document search unless the user explicitly asks to search owned Drive documents.
Drive Folder Intent means the user wants to organize a specific Drive folder or Drive folder tree. A Drive folder scope requires a concrete folder URL, folder token, or user-selected folder candidate.
When this intent is detected without a concrete folder identity, stop in `PARSE_SCOPE` and ask for clarification. Do not use Drive root, `drive +search --mine`, or broad owned-document search unless the user explicitly asks for Drive root or owned-document search.
Broad Cloud Drive Intent means the user refers to a broad cloud-drive-level scope such as `我的飞书云盘`, `我的云盘`, `我的云空间`, `我的空间`, or `整理云盘`, without a concrete folder URL / token / unique folder name.
This intent is broader than Drive Folder Intent and MUST NOT be silently converted to owned-document search. Ask the user to choose one of:
1. A specific Drive folder URL / token.
2. Drive root, only when the user explicitly accepts root-level scope.
3. Owned Drive document search, only when the user explicitly asks to organize documents owned / managed by the current user.
4. Another explicit search filter, such as keyword, type, time range, or folder token.
### Stop Conditions
Stop and ask for clarification when:
1. 用户只说"整理文件夹"、"整理目录"、"整理资料"、"整理文档"、"我的文档",且没有 URL、token、知识库名称、Personal Library Intent、concrete Drive folder identity 或明确搜索范围。
2. 用户说"我的文件夹"、"我的目录"、"我的空间"、"我的云盘"、"我的飞书云盘"、"我的云空间",但无法唯一判断是具体 Drive 文件夹、Drive 根目录、owned Drive document search、个人文档库还是某个 Wiki 节点。
3. 用户给的是单个资源 URL但要求"整理一批文档"或"整理相关资料"。
4. 用户目标环境不明确且上下文中同时存在线上、BOE、PRE 或多个 profile。
Clarification template:
```text
请提供要整理的 Drive 文件夹链接、Wiki 节点 / 知识库链接,或明确说明要整理"我的文档库";如果只想按关键词搜索整理,也请给出关键词或范围。
```
### Scope Confirmation
```text
我先确认本次整理范围。
目标:
范围:
环境 / profile
身份:
预计操作:先盘点并生成整理方案,不执行移动或创建。
请确认是否按这个范围继续?
```
## State: INVENTORY
Entry: `target_scope` confirmed.
MUST:
1. Recursively list resources according to target type.
2. Generate `path` during traversal.
3. Normalize all results to `ResourceItem`.
4. Track pagination, depth, and item limits.
5. Set `partial=true` when limits are hit.
6. Output `Inventory Summary`.
7. Continue to `CONTENT_READ` without asking the user unless auth, permission, API, target scope, or environment blockers occur.
### Inventory Limits
| Scope | Default Limit | If Limit Is Hit |
|-------|---------------|-----------------|
| Wiki recursion | `max_depth=3`, `max_items=500`; follow `lark-wiki-node-list` pagination | Set `partial=true`; list covered paths and suggested next first-level directories |
| Drive folder recursion | `max_depth=3`, `max_items=500`, max 10 pages per folder, `page_size=50` | Set `partial=true`; list folders not drilled into |
| Search discovery | `page_size=20`, `max_items=500`; continue pages until `has_more=false` or `max_items` is reached | Set `partial=true`; report collected_count, service_total when available, page_count, and continuation information |
If the user explicitly asks for full processing, batch by first-level directory, Wiki space, or time window. Do not remove all limits in one run.
### Wiki Inventory Rules
1. Follow [`../../lark-wiki/references/lark-wiki-node-list.md`](../../lark-wiki/references/lark-wiki-node-list.md) traversal semantics.
2. Generate stable paths from parent-child traversal.
3. Preserve Wiki node identity fields needed by `ResourceItem`.
4. Treat `my_library` as Wiki personal library, not Drive root.
### Drive Inventory Rules
1. Use CLI command family `drive files list` according to `lark-drive` API rules; its schema path is `drive.files.list`.
2. Recurse only into `folder` items.
3. Use `drive metas batch_query` when URL, owner, created time, or updated time is needed.
4. Continue pages by feeding `next_page_token` into request param `page_token`.
5. Prefer explicit `folder_token`; querying root with empty `folder_token` may return broad root data and may not paginate as expected.
### Search Inventory Rules
1. Search results may be normalized directly only when they include stable identity fields required by `ResourceItem`.
2. If a search result is a Wiki item and lacks `node_token`, resolve it with `drive +inspect` or `wiki +node-get` before dedupe.
3. If Wiki identity still cannot be resolved, keep the item, set `needs_review=true`, and record `needs_review_reason`.
4. For search scope, use `page_size=20` unless a lower value is required by the command.
5. Continue fetching pages until `has_more=false` or `max_items` is reached.
6. Do not stop at an arbitrary sample size such as first 5 pages unless the user explicitly asks for sampling or auth, permission, API, environment, or tool-budget blockers occur.
7. If `service_total` / result total is greater than collected items, set `partial=true` and show collected_count, service_total, page_count, and continuation information.
8. Do not present a partial search sample as complete inventory. Before generating a full organization plan from partial search results, ask whether to continue fetching more pages or proceed with sample-based planning.
## ResourceItem
Agent MUST normalize Wiki, Drive, and search results into `ResourceItem`. Later statistics, classification, and planning MUST use this model rather than raw API responses.
```json
{
"source": "wiki|drive|search",
"title": "资源标题",
"type": "doc|docx|sheet|bitable|mindnote|file|wiki|folder|slides|shortcut|catalog",
"path": "当前路径/资源标题",
"depth": 2,
"url": "https://...",
"token": "canonical_token",
"node_token": "wiki_node_token_or_empty",
"obj_token": "wiki_obj_token_or_drive_file_token",
"node_type": "origin|shortcut|empty",
"origin_node_token": "wiki_origin_node_token_or_empty",
"space_id": "wiki_space_id_or_empty",
"parent_token": "parent_node_or_folder_token",
"has_child": false,
"dedupe_key": "wiki:<space_id>:<node_token>|drive:<type>:<token>|search:<type>:<token>",
"created_at": "optional",
"updated_at": "optional",
"needs_review": false,
"needs_review_reason": ""
}
```
ResourceItem rules:
1. `path` MUST be generated by recursion. Do not use title alone as path.
2. Wiki URL token may not be the underlying document token. Preserve both `node_token` and `obj_token`.
3. `type` MUST come from API fields such as `obj_type` / `doc_type`.
4. Wiki organization is by node instance. Prefer `wiki:<space_id>:<node_token>` as `dedupe_key`.
5. MUST NOT dedupe Wiki nodes only by `obj_token`; one document can appear under different Wiki paths or shortcuts.
6. If `node_type=shortcut` or dedupe is uncertain, use `wiki +node-get` to supplement `origin_node_token`; if unavailable, leave empty and set `needs_review=true`.
7. Drive folder tree dedupes by `drive:<type>:<token>`.
8. Search results may merge with recursive results only by exact identity: Wiki by same `node_token`, Drive by same `type + token`.
## Inventory Summary
```text
已完成盘点。
| 指标 | 数量 |
|------|------|
| 总资源数 | |
| 各类型资源数 | |
| 一级目录数量 | |
| 根目录直接资源数 | |
| 空目录数量 | |
| 疑似临时 / 测试 / 未整理资源数 | |
| 低置信度待确认资源数 | |
下一步将自动读取低置信度资源并分析整理问题;不会执行移动或创建。
```
## Discovery Failure Handling
| Failure / Blocker | Agent MUST Do | Agent MUST NOT Do |
|-------------------|---------------|-------------------|
| Target scope is ambiguous | Ask the minimum scope clarification question and stop | Do not choose a whole cloud drive / personal library by default |
| Environment / profile is ambiguous | Ask user to confirm prod / BOE / PRE and profile | Do not cross environment boundaries |
| Missing API scope | Follow `lark-shared` permission handling and stop | Do not retry the same command repeatedly |
| Resource access denied | Stop and follow the main workflow `Permission Request Gate` | Do not request permission automatically or in batch |
| Pagination / depth / item limit reached | Set `partial=true`; record uncovered range and continuation command | Do not claim full coverage |

View File

@@ -0,0 +1,200 @@
# 知识整理工作流Execution
Loaded by states: `EXECUTE`, `VERIFY`.
This file owns confirmed write execution, PathTokenMap, progress reporting, verification, next suggestions, and execution-stage failure handling. It MUST NOT generate or revise plans.
## Required Context
Before executing rules in this file:
1. `active_plan_items` and `execution_scope` MUST already exist from [`lark-drive-workflow-knowledge-organize-planning.md`](lark-drive-workflow-knowledge-organize-planning.md).
2. `target_tree` and `resource_items` MUST already exist.
3. Use the `PlanItem` structure already produced by the planning phase. Do not regenerate or revise plans in this file.
4. Follow command syntax, scope requirements, and confirmation behavior from referenced shortcut docs.
5. Follow `Non-goals` from the main workflow entry. Do not execute excluded operations from this file.
6. Maintain internal `rollback_snapshot` and `execution_journal` during writes, but do not mention recovery on the normal successful path.
## State: EXECUTE
Entry: user explicitly confirmed execution scope.
Allowed writes only:
- 创建 Drive 文件夹:`drive +create-folder`
- 移动 Drive 文件 / 文件夹:`drive +move`
- 创建 Wiki 节点:`wiki +node-create`
- 移动已有 Wiki 节点:`wiki +move --node-token`
- 续跑异步移动任务:`drive +task_result`
- 单个资源权限申请:`drive +apply-permission`
MUST:
1. Resolve all target paths through `PathTokenMap`.
2. Build internal `rollback_snapshot` for all move items in confirmed scope before any write operation.
3. Initialize `execution_journal` before any write operation.
4. Create target folders / nodes shallow to deep.
5. Save returned tokens immediately.
6. Append an `execution_journal` entry after every create / move / async continuation.
7. Apply parent-child source move ordering.
8. Execute only confirmed scope.
9. Record success/failure per `PlanItem`.
10. Apply `Progress Reporting`.
MUST NOT:
- Execute operations listed in `Non-goals`.
- Rename or patch resource titles.
- Move using path string instead of token.
- Use `wiki +move` docs-to-wiki mode.
- Output rollback snapshot, rollback readiness, or execution journal on the normal successful path.
### Internal Recovery Hooks
These hooks are mandatory internal state maintenance. They do not create user-facing output on the normal path.
Rules:
1. Build `rollback_snapshot` before the first write command.
2. Include only compact fields needed for recovery. Do not store full API responses.
3. Append `execution_journal` immediately after each write attempt, including successful creates, successful moves, failed moves, and async continuation results.
4. If snapshot or journal cannot be maintained, stop before further writes and report the blocker.
5. If a write blocker occurs after one or more successful moves, report the blocker and ask whether the user wants to try restoring to `整理前的位置`.
6. Do not load the rollback phase or execute recovery until the user explicitly chooses to try restore.
Recovery question template:
```text
执行暂停:已成功移动 <moved_success_count> 项,失败 <failed_count> 项。
执行出现错误,已有部分资源移动成功。是否需要尝试恢复到整理前的位置?
```
### Progress Reporting
Small execution means `created_total + moved_total <= 50`.
For small executions, a final execution result is enough. For larger or long-running executions (`created_total + moved_total > 50`), the agent MUST periodically report progress by elapsed time and meaningful stage boundaries rather than by operation count alone.
Progress reports SHOULD be stage-specific. Include only fields relevant to the current stage. Do not output empty, unknown, or irrelevant fields.
Required fields by stage:
- Start: total create count, total move count, reporting cadence.
- Create stage finished: created count, failed count.
- Move stage progress / finished: current moved count as `<moved_done>/<moved_total>`, failed count, optional recent item.
- Blocked: current stage, completed count, blocker, required next action.
Examples:
- `执行开始:本次将创建 <created_total> 个目录 / 节点,移动 <moved_total> 个资源。任务较大,我会约每 60 秒汇报一次进度。`
- `执行进度:移动资源 <moved_done>/<moved_total>,失败 <failed_count>。`
- `执行暂停:<current_stage> 阶段遇到 <blocker>,已完成 <done>/<total>,需要 <required_action>。`
Rules:
1. When `created_total + moved_total > 50`, output one progress notice when execution starts.
2. After the create stage finishes, output one progress report when `created_total > 0`.
3. During the move stage, output a progress report about every 60 seconds, and once more when the move stage finishes.
4. Every move-stage progress report MUST include the current moved count as `<moved_done>/<moved_total>` and failed count, even if fewer than 50 additional moves completed since the previous report.
5. If execution is blocked by auth, permission, unresolved token, or API error, output the current progress and blocker before stopping.
6. Do not mention operation-count milestones such as "not yet reached 50 moves"; progress is time-based.
7. Do not output filler messages such as "still running", "no failure yet", or "not yet reached the next progress point" without current counts.
8. Do not report progress after every item unless the user explicitly asks for verbose execution logs.
## PathTokenMap
`PathTokenMap` maps target paths to real tokens before execution.
| Scope | Mapping |
|-------|---------|
| Drive | `target_path -> folder_token` |
| Wiki | `target_path -> node_token`, with `space_id` retained |
Rules:
1. Scan existing target folders / nodes before creating new ones.
2. Create planned folders / nodes from shallow to deep.
3. Save returned `folder_token` or `node_token` immediately after each successful create.
4. Execute `move` only when `target_parent_path` resolves to a token.
5. If same-name target ambiguity exists, inspect existing children when possible; otherwise mark `needs_review=true`.
6. Before writing to `my_library` root, resolve the real `space_id` according to `lark-wiki`.
## State: VERIFY
Entry: execution finished.
MUST:
1. Rescan target scope.
2. Compare each executed `PlanItem` with actual path/token.
3. Verify items covered by `covered_by_parent_move=true`.
4. Output success/failure/manual-confirmation counts.
5. Report mismatches with expected vs actual path/token.
6. Verify non-reused source containers planned for cleanup are no longer left in their original top-level position.
7. Verify reused target containers remain in place.
8. If serious mismatches exist, ask whether the user wants to try restoring to `整理前的位置`.
9. Do not load the rollback phase or execute recovery until the user explicitly chooses to try restore.
Verification table:
| plan_id | 动作 | 标题 | 预期目标 | 实际目标 | 预期 token | 实际 token | 状态 | 失败原因 |
|---------|------|------|----------|----------|------------|------------|------|----------|
### Verification Result
```text
执行完成。
| 项目 | 数量 |
|------|------|
| 创建成功 | |
| 移动成功 | |
| 待人工确认 | |
| 失败 | |
| plan_id | 动作 | 预期目标 | 实际目标 | 状态 | 失败原因 |
|---------|------|----------|----------|------|----------|
```
Serious mismatch recovery question template:
```text
验证发现 <mismatch_count> 项结果与计划不一致。
是否需要尝试恢复到整理前的位置?
```
### Next Suggestions
Only output `建议下一步` when at least one trigger exists. Do not add generic suggestions when execution and verification are clean.
Triggers:
- `partial=true`: inventory or content read was incomplete.
- Manual confirmation or low-confidence items remain.
- Failed items exist.
- One target folder / Wiki node contains more than 100 direct child resources after organization.
- Root-level loose resources remain.
- Non-reused source containers remain in their original top-level position after cleanup was planned.
- Verification found mismatches.
Template:
```text
建议下一步:
- <trigger-based suggestion>
```
## Execution Failure Handling
| Failure / Blocker | Agent MUST Do | Agent MUST NOT Do |
|-------------------|---------------|-------------------|
| Missing API scope | Follow `lark-shared` permission handling and stop | Do not retry the same command repeatedly |
| Resource access denied | Stop and follow the main workflow `Permission Request Gate` | Do not request permission automatically or in batch |
| Target path cannot resolve to token | Mark affected plan item failed or `needs_review=true` | Do not execute move with a path string |
| Target path has same-name ambiguity | Read existing children if possible; otherwise mark `needs_review=true` | Do not create duplicate target blindly |
| Async move returns `ready=false` or `next_command` | Follow the returned async continuation command | Do not assume completion |
| Parent-child move conflict | Follow source-depth ordering; move divergent children before parent | Do not move parent first when child target differs |
| Verification mismatch | Report expected vs actual path/token and failure reason | Do not silently mark success |
| Write blocker after successful moves | Report current progress and ask whether to try restoring to `整理前的位置` | Do not load rollback phase or execute recovery without explicit user choice |

View File

@@ -0,0 +1,316 @@
# 知识整理工作流Planning
Loaded by states: `PLAN_GENERATION`, `EXEC_CONFIRM`.
This file owns plan generation, plan revision, user-facing pagination, and execution confirmation. It MUST NOT perform write operations.
## Required Context
Before executing rules in this file:
1. `resource_items`, `classification_rules`, and `target_tree` MUST already exist.
2. Follow command syntax, scope requirements, and confirmation behavior from referenced shortcut docs.
3. Follow `Non-goals` from the main workflow entry. Do not execute excluded operations from this file.
## State: PLAN_GENERATION
Entry: `target_tree` exists after the user confirmed the organization approach.
MUST:
1. Generate complete internal `plan_items`.
2. Build `DisplayItem` only for user-facing pages.
3. Apply `Plan Generation`.
4. Apply `Plan Pagination`.
5. Set `active_plan_items` to the latest complete plan.
6. Keep complete plan internally even if only one page is displayed.
7. Output `Target Tree And Plan Overview` or requested plan page, then wait.
### Plan Generation
| Condition | Agent MUST Do |
|-----------|---------------|
| Target path appears in any plan item | Ensure the path exists in `target_tree` |
| Source parent and descendants share same target subtree | Move parent only; mark descendants `covered_by_parent_move=true` |
| A child target differs from parent target | Move divergent child before parent; order by `source_depth` from deep to shallow |
| Target directory / node does not exist | Add `create_folder` / `create_node` before move |
| Resource is root-level and target path differs from current path | Add a `move` plan item; do not leave root-level resources in place by default |
| Resource has `needs_review=true` because classification evidence is insufficient | Set `target_path` to manual confirmation target, set `action=move`, and preserve `needs_review_reason` |
| Top-level folder / Wiki node has descendants that share the same target subtree | Move the parent folder / node only; descendants are covered by parent move |
| Top-level folder / Wiki node has descendants with divergent target subtrees | Move divergent descendants first; then move the parent only if it still has a target path or needs manual confirmation |
| Source container is reused as a target container | Keep the container in place; do not move it as source-container cleanup |
| Non-reused source container has descendants moved elsewhere | Add an explicit folder / node move plan item after descendant moves; target defaults to the source-container cleanup target |
| Source container handling is ambiguous | Move it to the manual confirmation target or mark `needs_review=true`; do not leave it in the root by default |
| Target parent token unresolved | Keep plan item but block execution until token is resolved |
| Resource title is poor or inconsistent | Report the naming issue only; do not create rename or title-patch plan items |
## PlanItem
`PlanItem` is for internal execution. It may contain tokens and internal enums.
| Field | Meaning |
|-------|---------|
| `plan_id` | Stable unique ID for plan / verification, such as `P001` |
| `source_path` | Current path |
| `title` | Resource title |
| `type` | Resource type |
| `source_token` | Drive token or normal resource token |
| `source_node_token` | Wiki node token; empty for non-Wiki resources |
| `source_parent_token` | Current parent folder token or parent Wiki node token |
| `source_depth` | Original depth in source tree |
| `target_path` | Target path |
| `target_parent_path` | Target parent path |
| `target_parent_token` | Target parent token; may be empty during planning, MUST be resolved before execution |
| `action` | Internal enum: `keep` / `create_folder` / `create_node` / `move` |
| `covered_by_parent_move` | Whether an ancestor move already covers this item |
| `reason` | Classification reason |
| `evidence_paths` | Evidence paths |
| `evidence_count` | Evidence count or hit count |
| `confidence` | Internal enum: `high` / `medium` / `low` |
| `needs_review` | Whether human review is required |
| `needs_review_reason` | Reason requiring human review |
| `rollback_origin_kind` | Internal recovery origin marker: `drive_folder` / `drive_root` / `wiki_node` / `wiki_space_root` / `unknown` |
| `rollback_origin_token` | Original parent token when applicable; empty for root markers |
| `rollback_origin_space_id` | Original Wiki space ID when `rollback_origin_kind=wiki_space_root` |
| `rollback_supported` | Whether this move item can be restored automatically if recovery is requested |
| `rollback_blocker` | Internal reason when `rollback_supported=false` |
### Rollback Origin Readiness
This is an internal execution-safety rule. Do not expose rollback readiness on the normal user-facing execution confirmation path.
Rules:
1. `action=move` items entering execution SHOULD have `rollback_origin_kind`.
2. `rollback_origin_kind` can be:
- `drive_folder`: original Drive parent folder token is known.
- `drive_root`: original location is the Drive root.
- `wiki_node`: original Wiki parent node token is known.
- `wiki_space_root`: original location is the Wiki space root and `rollback_origin_space_id` is known.
3. If `rollback_origin_kind` is missing or `unknown`, the agent MUST try to resolve it before execution from `ResourceItem.parent_token`, traversal context, `source_path`, `space_id`, or `wiki +node-get` for Wiki resources.
4. If the origin is still unresolved, set `rollback_supported=false` and `rollback_blocker`, but do not block the entire execution solely because recovery is unsupported.
5. Target resolution remains mandatory: a move item with unresolved `target_parent_token` MUST NOT execute.
6. Internal recovery metadata MUST NOT change `DisplayItem` output on the normal successful path.
## DisplayItem
`DisplayItem` is for user-facing output. It MUST NOT expose raw internal enum values.
| Display Field | Source |
|---------------|--------|
| `序号` | Page-local row number |
| `当前位置` | `source_path` |
| `标题` | `title` |
| `类型` | Human-readable `type` when possible; raw type is acceptable only when there is no clearer label |
| `目标位置` | `target_path` |
| `动作` | Convert from `action` using action display map |
| `原因` | `reason` |
| `置信度` | Convert from `confidence` using confidence display map |
| `待确认原因` | `needs_review_reason` |
Action display map:
| Internal Enum | User-Facing Label |
|---------------|-------------------|
| `keep` | 保持不变 |
| `create_folder` | 创建文件夹 |
| `create_node` | 创建知识库节点 |
| `move` | 移动到目标目录 |
`needs_review=true` is a review state, not an action. A review item MUST still use `action=move` when its target is the manual confirmation target.
### Manual Confirmation Target
Resources with insufficient classification evidence MUST be moved to the manual confirmation target after the user confirms execution.
Rules:
1. The target tree MUST include `待人工确认` or an equivalent user-specified manual confirmation path.
2. For Drive scopes, the manual confirmation target is a Drive folder.
3. For Wiki scopes, the manual confirmation target is a Wiki node.
4. Plan items for these resources MUST set `needs_review=true`, preserve `needs_review_reason`, set `target_path` to the manual confirmation target, and set `action=move`.
5. Do not leave these items in their original location by default.
Confidence display map:
| Internal Enum | User-Facing Label |
|---------------|-------------------|
| `high` | 高,证据明确 |
| `medium` | 中,有依据但建议确认 |
| `low` | 低,需要人工确认 |
### Plan Pagination
| Output Area | Rule |
|-------------|------|
| Plan details | Show at most 20 plan items per page |
| Plan item count > 20 | MUST paginate; do not output all details at once |
| Plan item count > 500 | First response MUST show overview and filters only; no detail rows until user asks |
| Pagination | Affects display only; complete `plan_items` MUST remain internal |
### Target Tree And Plan Overview
```text
建议目标目录结构
<target_tree>
移动 / 创建计划总览
本次计划共 <total_count> 项:
- 创建目录 / 节点:<create_count> 项
- 移动资源:<move_count> 项(其中来源目录本体:<source_container_move_count> 项)
- 保持不变:<keep_count> 项
- 待人工确认:<review_count> 项
- 高置信度:<high_count> 项
- 中置信度:<medium_count> 项
- 低置信度:<low_count> 项
你可以选择:
- 查看第 1 页明细
- 只看将创建的目录 / 节点
- 只看待人工确认项
- 只看高置信度移动项
- 进入执行确认
```
If `total_count > 500`, say:
```text
计划较大,我先只展示总览。
```
### Plan Revision Protocol
When the user corrects or adjusts the plan in `PLAN_GENERATION` or `EXEC_CONFIRM`, the agent MUST treat it as a full-plan revision unless the user explicitly asks to execute only the corrected items.
Revision triggers include:
- Adjusting classification rules.
- Adjusting target folder / Wiki node structure.
- Changing one or more resources' target paths.
- Excluding resources from movement.
- Restricting execution to high-confidence items.
- Moving a whole category to another target.
- Changing manual confirmation handling.
- Changing source container cleanup or retention handling.
Internal rules:
1. Record the user correction in `last_user_correction`.
2. Mark the previous `plan_items` as stale.
3. Recompute `classification_rules`, `target_tree`, and complete `plan_items` when needed.
4. Increment `plan_version`.
5. Set `active_plan_items` to the complete revised plan.
6. Append a short internal summary to `plan_revision_history`.
7. Do not execute stale `plan_items`.
8. Do not execute only the delta unless the user explicitly asks for partial execution.
User-facing output:
```text
已按你的修改重新生成完整计划。
已应用的修改:
- <correction item 1>
- <correction item 2>
当前完整计划:
- 创建目录 / 节点:<create_count> 项
- 移动资源:<move_count> 项
- 保持不变:<keep_count> 项
- 待人工确认:<review_count> 项
说明:后续执行默认基于这份完整修正版计划,不是只执行刚才的修正项。
你可以选择:
A. 查看修正版计划总览
B. 查看本次修改涉及的资源
C. 进入执行确认
D. 继续调整
```
If the user explicitly asks to execute only the corrected items, ask for confirmation before execution:
```text
你明确要求只执行本次修改涉及的 <count> 项。其余计划项不会执行。
请确认是否只执行这些项?
```
### Plan Detail Page
```text
移动 / 创建计划,第 <page>/<total_pages> 页,每页 20 项
| 序号 | 当前位置 | 标题 | 类型 | 目标位置 | 动作 | 原因 | 置信度 | 待确认原因 |
|------|----------|------|------|----------|------|------|--------|------------|
还有 <remaining_pages> 页未展示。
你可以回复:
- 继续看下一页
- 只看待人工确认项
- 只看低置信度项
- 进入执行确认
```
## State: EXEC_CONFIRM
Entry: user asks to execute.
MUST:
1. Show write-operation summary:
- 将创建哪些目录 / 节点
- 将移动哪些资源
- 将移动哪些来源目录本体(如有)
- 哪些资源仍需人工确认
- 预计影响范围
2. Use `active_plan_items` from the latest complete plan.
3. Show `Permission Inheritance Notice`.
4. Ask for execution scope using `Execution Confirmation`.
5. Reference `Non-goals` for operations excluded from this workflow.
6. Wait for explicit confirmation.
### Permission Inheritance Notice
Before execution confirmation, MUST show this notice:
```text
权限提示:移动资源后,资源权限可能随目标位置变化,可见范围或协作权限可能变化。本 workflow 不会自动修改权限。
```
### Execution Confirmation
When the user wants execution, ask for execution scope:
Execution confirmation options MUST be renumbered by currently available choices. Do not show disabled choices, and do not ask the user to reply with skipped letters.
If a plan detail page is currently active:
```text
请确认执行范围:
A. 执行完整计划:<total_count> 项
B. 只执行当前页:<current_page_count> 项
C. 只执行高置信度项:<high_confidence_count> 项
D. 暂不执行,只保留方案
本 workflow 只执行已确认范围内的创建、移动和必要的单资源权限申请;不会重命名任何资源。
```
If no plan detail page is currently active:
```text
请确认执行范围:
A. 执行完整计划:<total_count> 项
B. 只执行高置信度项:<high_confidence_count> 项
C. 暂不执行,只保留方案
如需只执行某一页,请先查看计划明细页。
本 workflow 只执行已确认范围内的创建、移动和必要的单资源权限申请;不会重命名任何资源。
```
If there is no pagination, still state the total number of plan items covered by confirmation.

View File

@@ -0,0 +1,308 @@
# 知识整理工作流Rollback
Loaded by states: `ROLLBACK_CONFIRM`, `ROLLBACK`, `ROLLBACK_VERIFY`, `ROLLBACK_CLEANUP_CONFIRM`, `ROLLBACK_CLEANUP`, `ROLLBACK_CLEANUP_VERIFY`.
This file owns recovery plan generation, recovery confirmation, recovery execution, recovery verification, cleanup confirmation, cleanup execution, and cleanup verification. It also defines the internal `rollback_snapshot` and `execution_journal` contracts.
It MUST NOT generate organization plans, revise classification rules, execute unconfirmed deletes, rename resources, modify permissions, or use `wiki +move` docs-to-wiki mode.
User-facing language should use "恢复到整理前的位置" / "恢复". Internal state and field names may use `rollback`.
## Required Context
Before executing rules in this file:
1. `active_plan_items`, `execution_scope`, `target_scope`, and `path_token_map` MUST already exist.
2. `rollback_snapshot` MUST have been built before the first write operation in `EXECUTE`.
3. `execution_journal` MUST contain write-operation results from `EXECUTE`.
4. Follow command syntax and risk behavior from referenced shortcut docs.
5. Follow `Non-goals` from the main workflow entry.
## Normal Path Visibility
Do not mention rollback, recovery readiness, snapshot, or journal on the normal successful execution path.
Load this file only after:
1. Execution failed after one or more successful moves and the user chose to try restoring.
2. Verification found serious mismatches and the user chose to try restoring.
3. The user explicitly asks to rollback / recover the previous organization run.
## Internal State Contracts
### RollbackSnapshot
`rollback_snapshot` records original locations before any write command.
Fields:
| Field | Meaning |
|-------|---------|
| `plan_id` | Matching `PlanItem.plan_id` |
| `source_kind` | `drive` / `wiki` |
| `title` | Resource title |
| `type` | Resource type used by move commands |
| `original_token` | Original Drive token when applicable |
| `original_node_token` | Original Wiki node token when applicable |
| `original_parent_kind` | `drive_folder` / `drive_root` / `wiki_node` / `wiki_space_root` / `unknown` |
| `original_parent_token` | Original parent token; empty for root markers |
| `original_space_id` | Original Wiki space ID when restoring to Wiki space root |
| `original_path` | Original path before organization |
| `planned_target_parent_token` | Planned target parent token |
| `planned_target_path` | Planned target path |
| `rollback_supported` | Whether this item can be restored automatically |
| `rollback_blocker` | Reason when `rollback_supported=false` |
Rules:
1. Store compact fields only. Do not store full API responses.
2. `drive_root` and `wiki_space_root` are valid origins; do not treat empty parent token as missing when the root marker is known.
3. Items without reliable origin can still execute, but MUST be marked `rollback_supported=false`.
### ExecutionJournal
`execution_journal` records every write attempt.
Fields:
| Field | Meaning |
|-------|---------|
| `journal_id` | Stable journal row ID |
| `plan_id` | Matching `PlanItem.plan_id` when applicable |
| `operation` | `create_folder` / `create_node` / `move_drive` / `move_wiki_node` / `delete_created_folder` / `delete_created_node` |
| `status` | `success` / `failed` / `pending` |
| `input_token` | Token supplied to the command |
| `input_node_token` | Wiki node token supplied to the command |
| `input_parent_token` | Source parent token when known |
| `target_parent_token` | Target parent token supplied to the command |
| `returned_token` | Token returned by the command |
| `returned_node_token` | Wiki node token returned by the command |
| `returned_parent_token` | Parent token returned by the command |
| `task_id` | Async task ID when returned |
| `next_command` | Async continuation command when returned |
| `error` | Error summary when failed |
| `created_by_workflow` | Whether the resource was created by this workflow run |
| `rollback_eligible` | Whether this successful operation can be included in `rollback_plan` |
Rules:
1. Append a journal entry immediately after each write attempt.
2. `create_folder` and `create_node` entries MUST set `created_by_workflow=true`.
3. Successful `move_drive` and `move_wiki_node` entries may set `rollback_eligible=true` only when matching snapshot origin is supported.
4. Failed or pending moves MUST NOT enter automatic recovery execution.
5. Async operations are `pending` until `drive +task_result` proves completion.
## State: ROLLBACK_CONFIRM
Entry: user chose to try restoring after execution failure / verification mismatch, or explicitly asked to rollback.
MUST:
1. Generate `rollback_plan` from successful eligible move journal entries.
2. Use `execution_journal` current token / current node token as the recovery source.
3. Use `rollback_snapshot` original origin as the recovery target.
4. Generate recovery items in reverse successful move order.
5. Exclude failed, pending, and unsupported items from executable recovery.
6. Do not include delete actions in `rollback_plan`.
7. Ask for explicit restore execution confirmation.
Confirmation output:
```text
可恢复范围如下:
| 项目 | 数量 |
|------|------|
| 可尝试恢复到原位置 | <recoverable_move_count> |
| 无法安全自动恢复 | <unsupported_count> |
| 未完成 / 等待中的移动 | <pending_count> |
| 本次新建目录 / 节点 | <created_container_count> |
恢复操作只会尝试把已成功移动的资源移回原位置,不会删除、重命名或修改权限。是否执行恢复?
```
If no move can be restored automatically, report that no automatic restore is available and move to `DONE`.
## Recovery Command Rules
Use only these command forms:
```bash
# Drive resource back to original parent folder
lark-cli drive +move \
--file-token <current_token> \
--type <type> \
--folder-token <original_parent_token>
# Drive resource back to root
lark-cli drive +move \
--file-token <current_token> \
--type <type>
# Wiki node back to original parent node
lark-cli wiki +move \
--node-token <current_node_token> \
--target-parent-token <original_parent_token>
# Wiki node back to original space root
lark-cli wiki +move \
--node-token <current_node_token> \
--target-space-id <original_space_id>
```
MUST NOT:
- Use `wiki +move` docs-to-wiki mode.
- Move using path strings.
- Recover failed or pending moves as if they succeeded.
- Delete created folders / nodes in `ROLLBACK`.
## State: ROLLBACK
Entry: user explicitly confirmed restore execution.
MUST:
1. Execute only confirmed `rollback_plan` items.
2. Execute reverse moves in reverse successful move order.
3. Continue async move tasks with `drive +task_result` when needed.
4. Record recovery success / failure per rollback item.
5. Stop on blockers that make following recovery items unsafe.
Progress output should stay concise:
```text
恢复进度:已尝试 <done>/<total> 项,失败 <failed_count> 项。
```
## State: ROLLBACK_VERIFY
Entry: recovery execution finished.
MUST:
1. Rescan the relevant Drive folder / Wiki nodes.
2. Compare each rollback item with its original origin.
3. Mark status per item.
4. If cleanup candidates clearly remain from this workflow run, transition to `ROLLBACK_CLEANUP_CONFIRM`.
5. Do not ask for deletion confirmation in this state.
Verification table:
| plan_id | 标题 | 原位置 | 当前实际位置 | 状态 | 失败原因 |
|---------|------|--------|--------------|------|----------|
Status values:
| Status | Meaning |
|--------|---------|
| `rollback_success` | Resource is back under the original parent / root |
| `rollback_failed` | Resource is still outside the original origin |
| `missing` | Resource cannot be found |
| `needs_manual_review` | Actual state is ambiguous or affected by external changes |
Do not delete anything from this state.
## State: ROLLBACK_CLEANUP_CONFIRM
Entry: cleanup candidates exist after recovery, or user asks to view / perform cleanup after recovery.
Cleanup is optional and separate from recovery. It may delete resources, so it requires separate confirmation.
Candidate rules:
1. Candidate MUST have `created_by_workflow=true` in `execution_journal`.
2. Candidate MUST be a Drive folder or Wiki node created by this workflow run.
3. Candidate MUST currently be empty, or contain only workflow-created cleanup candidates that are themselves safe to delete.
4. Candidate MUST NOT contain original resources, unknown resources, rollback-failed resources, or user-created resources.
5. If child origin is uncertain, mark the candidate `cleanup_blocked`.
Generate `rollback_cleanup_plan` with:
| Field | Meaning |
|-------|---------|
| `cleanup_id` | Stable cleanup row ID |
| `type` | `drive_folder` / `wiki_node` |
| `path` | Current path |
| `token` | Folder token or node token |
| `depth` | Current path depth |
| `safe_to_delete` | Whether deletion is allowed after confirmation |
| `blocker` | Reason when deletion is blocked |
Confirmation output:
```text
恢复已完成。本次整理新建的部分空目录 / 节点如下,是否需要删除?
| 项目 | 数量 |
|------|------|
| 可删除的新建空目录 / 节点 | <safe_count> |
| 不可删除,需人工确认 | <blocked_count> |
注:删除只会作用于本次 workflow 新建且当前可安全清理的空目录 / 节点。
```
If the user wants details, paginate cleanup items at 20 rows per page.
## State: ROLLBACK_CLEANUP
Entry: user explicitly confirmed cleanup deletion.
MUST:
1. Delete only `safe_to_delete=true` cleanup items.
2. Delete deepest paths first.
3. Record delete results in `rollback_cleanup_results`.
4. Continue async delete tasks with `drive +task_result` when needed.
Command forms:
```bash
# Delete workflow-created Drive folder
lark-cli drive +delete \
--file-token <folder_token> \
--type folder \
--yes
# Delete workflow-created Wiki node
lark-cli wiki +node-delete \
--node-token <node_token> \
--obj-type wiki \
--include-children=true \
--yes
```
`--yes` is allowed only after the user explicitly confirmed cleanup deletion.
MUST NOT:
- Delete original resources.
- Delete unknown resources.
- Delete rollback-failed resources.
- Delete non-empty folders / nodes that contain anything outside cleanup candidates.
- Delete a knowledge space.
## State: ROLLBACK_CLEANUP_VERIFY
Entry: cleanup deletion finished.
MUST:
1. Verify each confirmed cleanup target is gone.
2. Report failed or pending deletes.
3. Stop after reporting cleanup results.
Verification table:
| 类型 | 路径 | token | 状态 | 失败原因 |
|------|------|-------|------|----------|
Status values:
| Status | Meaning |
|--------|---------|
| `deleted` | Cleanup target was deleted |
| `delete_pending` | Async deletion is still pending |
| `delete_failed` | Delete command failed |
| `still_exists` | Target still exists after deletion attempt |
| `skipped` | Target was not safe to delete or user did not confirm it |

View File

@@ -0,0 +1,224 @@
# 知识整理工作流
This file is the single entry point for the knowledge organization workflow. It defines the global contract, state machine, and progressive loading map. Stage-specific rules live in phase files and MUST be loaded only when the workflow reaches the corresponding state.
Phase files are references for this workflow, not independent skills. Do not route user requests directly to a phase file.
## Required Context
Before running this workflow, MUST read [`../../lark-shared/SKILL.md`](../../lark-shared/SKILL.md) for identity, authentication, permission handling, and write-operation confirmation rules.
Load other skills / references progressively:
- Wiki / personal library target: [`../../lark-wiki/SKILL.md`](../../lark-wiki/SKILL.md)
- Content read required: [`../../lark-doc/SKILL.md`](../../lark-doc/SKILL.md) and [`../../lark-doc/references/lark-doc-fetch.md`](../../lark-doc/references/lark-doc-fetch.md)
- Sheet down-drill required: [`../../lark-sheets/SKILL.md`](../../lark-sheets/SKILL.md)
- Base down-drill required: [`../../lark-base/SKILL.md`](../../lark-base/SKILL.md)
## Agent Contract
When this workflow is triggered, the agent MUST:
1. Follow the `Execution State Machine` in order.
2. Maintain the fields in `Runtime State`.
3. Before executing a state, read the phase file listed in `Progressive Load Map`.
4. Do not pre-load all phase files. Load only the current state's required phase file unless a transition requires the next state.
5. Stop and wait whenever a state has `wait_for_user=true`.
6. Keep complete internal state even when user-facing output is paginated.
7. Never perform organization write operations before `EXEC_CONFIRM`; never perform recovery or cleanup writes before the corresponding explicit confirmation state.
8. Execute only commands allowed by `Command Map`.
9. Use command syntax, scope requirements, and API parameter rules from referenced skills / shortcut docs.
10. Convert internal enum values to natural-language Chinese labels in user-facing tables.
11. Do not invent recovery behavior; follow the active phase file's failure handling.
12. Maintain internal recovery state during execution, but do not mention recovery on the normal successful path.
## Scope
本 workflow 用于对指定 Drive 文件夹、Wiki 知识库、个人文档库或搜索范围做知识整理。默认只生成可审阅方案;只有用户明确确认执行范围后,才创建目录 / 节点或移动资源。
适用触发语包括:
- "帮我整理我的云盘 / 文档库 / 知识库"
- "帮我盘点这个知识库,给出整理后的目录结构"
- "这个文件夹太乱了,先给我一个整理方案"
- "把知识库里的文档按项目 / 客户 / 时间 / 类型归类"
- "帮我找出未归档、临时、重复、空目录和命名混乱的内容"
## Non-goals
默认不生成:
- 研究报告
- 对比分析
- 风险 / 结论 / 行动项
- 引用来源列表
- 权限治理报告
默认禁止执行:
- 删除原有文件、文件夹、Wiki 节点或知识空间
- owner 转移
- 批量权限申请
- 批量公开权限修改
- 批量协作者权限修改
- 任何资源重命名或标题修改;即使用户要求,也不由本 workflow 执行
仅在 rollback cleanup 阶段,允许删除本次 workflow 新建且当前可安全清理的空 Drive 文件夹或空 Wiki 节点,并且必须用户单独确认。不得删除知识空间。
如果用户明确要求其他非目标能力,必须转入对应专项流程,并单独确认风险。资源重命名 / 标题修改不属于本 workflow 的可执行能力。
## Responsibility Boundary
| File | Owns | Must Not Own |
|------|------|--------------|
| `lark-drive-workflow-knowledge-organize.md` | Workflow trigger, global contract, state machine, progressive load map, command family allowlist | Stage-specific rules, output templates, execution details |
| `lark-drive-workflow-knowledge-organize-discovery.md` | `PARSE_SCOPE`, `INVENTORY`, target parsing, stop conditions, inventory limits, `ResourceItem` | Classification, plan generation, write execution |
| `lark-drive-workflow-knowledge-organize-analysis.md` | `CONTENT_READ`, `ISSUE_ANALYSIS`, `RULE_GENERATION`, low-confidence reads, issue rules, problem pagination, classification, target tree | Plan execution, write confirmation, verification |
| `lark-drive-workflow-knowledge-organize-planning.md` | `PLAN_GENERATION`, `EXEC_CONFIRM`, `PlanItem`, `DisplayItem`, plan pagination, plan revision, execution scope confirmation | Scope parsing, resource inventory, write execution, verification |
| `lark-drive-workflow-knowledge-organize-execution.md` | `EXECUTE`, `VERIFY`, `PathTokenMap`, write execution, progress reporting, verification, next suggestions, internal recovery hooks | Scope parsing, resource inventory, classification, plan generation, plan revision, rollback execution details |
| `lark-drive-workflow-knowledge-organize-rollback.md` | `ROLLBACK_CONFIRM`, `ROLLBACK`, `ROLLBACK_VERIFY`, `ROLLBACK_CLEANUP_CONFIRM`, `ROLLBACK_CLEANUP`, `ROLLBACK_CLEANUP_VERIFY`, recovery plan generation, recovery execution, cleanup verification | Scope parsing, resource inventory, classification, organization plan generation, normal write execution |
## Runtime State
Agent MUST maintain these internal fields during one workflow run:
| Field | Meaning |
|-------|---------|
| `current_state` | Current state in `Execution State Machine` |
| `target_scope` | Parsed target: Drive folder, Wiki node, Wiki space, personal doc library, single resource, or search scope |
| `environment_profile` | Current environment and CLI profile, such as prod / BOE / PRE and config profile |
| `identity` | `user` by default unless user explicitly asks for app / bot perspective |
| `resource_items` | Complete normalized resource list from discovery |
| `partial` | Whether inventory or content-read limits were hit |
| `low_confidence_items` | Items requiring mandatory partial content read |
| `issue_summary` | Problem types, counts, evidence paths, and suggested handling |
| `classification_rules` | Rules used to map resources to target paths |
| `target_tree` | Proposed target folder / Wiki node tree |
| `source_container_disposition` | Reused / retired source folders or nodes and their intended handling |
| `plan_items` | Complete internal execution plan |
| `plan_version` | Internal version of the current complete plan, such as `v1` / `v2` |
| `active_plan_items` | Latest complete valid plan used for execution confirmation |
| `plan_revision_history` | Internal summaries of user-requested plan revisions |
| `last_user_correction` | Most recent user correction that changed classification, target tree, or plan scope |
| `display_page_state` | Current page, page size, filters, and total count for user-facing pagination |
| `path_token_map` | Mapping from target path to real `folder_token` / `node_token` |
| `execution_scope` | Full plan, current page, filtered subset, or no execution |
| `verification_results` | Per-plan-item verification result after execution |
| `rollback_snapshot` | Internal pre-write snapshot used only for recovery after failure or user-requested restore |
| `execution_journal` | Internal write-operation journal used only for recovery after failure or user-requested restore |
| `rollback_plan` | Internal recovery plan generated only after user asks to restore |
| `rollback_verification_results` | Per-item recovery verification result |
| `rollback_cleanup_plan` | Optional cleanup plan for workflow-created empty folders / nodes after recovery |
| `rollback_cleanup_results` | Cleanup verification result |
## Execution State Machine
| State | Entry Condition | Agent MUST Do | User-Facing Output | wait_for_user | Next State |
|-------|-----------------|---------------|--------------------|---------------|------------|
| `PARSE_SCOPE` | Workflow triggered | Load discovery phase; parse target, environment, identity, and target type | Scope confirmation or clarification question | `true` | `INVENTORY` |
| `INVENTORY` | Scope confirmed | Load discovery phase; recursively list resources and build `resource_items` | Inventory progress / summary; continue automatically unless blocked | `false` unless blocked | `CONTENT_READ` |
| `CONTENT_READ` | Inventory complete | Load analysis phase; identify low-confidence items and perform mandatory partial read when needed | Low-confidence read summary | `false` unless auth / permission blocks | `ISSUE_ANALYSIS` |
| `ISSUE_ANALYSIS` | Resource list and partial reads ready | Load analysis phase; detect structure problems, evidence, and organization approach | Inventory result, problems, organization approach, and decision options | `true` | `RULE_GENERATION` |
| `RULE_GENERATION` | User confirms organization approach | Load analysis phase; generate classification rules and `target_tree` | No separate stop; target tree is shown with plan generation | `false` | `PLAN_GENERATION` |
| `PLAN_GENERATION` | Target tree ready | Load planning phase; generate complete internal `plan_items`; show target tree plus plan overview or page | Target tree and plan overview / paginated plan page | `true` | `EXEC_CONFIRM` |
| `EXEC_CONFIRM` | User wants execution | Load planning phase; ask user to choose execution scope | Execution options and write-operation summary | `true` | `EXECUTE` or `DONE` |
| `EXECUTE` | User explicitly confirmed execution scope | Load execution phase; execute only whitelisted write operations for confirmed scope while maintaining internal recovery state | Progress reports for large or long-running execution; if blocked after successful moves, ask whether to try restoring to `整理前的位置` | `false` unless blocked / recovery offered | `VERIFY`, `ROLLBACK_CONFIRM`, or `DONE` |
| `VERIFY` | Execution finished | Load execution phase; rescan target scope and compare actual path/token against plan | Verification table and final summary; if serious mismatches exist, ask whether to try restoring to `整理前的位置` | `false` unless recovery offered | `DONE` or `ROLLBACK_CONFIRM` |
| `ROLLBACK_CONFIRM` | User asks to restore after execution failure / verification mismatch / explicit rollback request | Load rollback phase; generate internal `rollback_plan`; ask whether to execute recovery | Recoverable scope and restore confirmation | `true` | `ROLLBACK` or `DONE` |
| `ROLLBACK` | User explicitly confirms restore execution | Load rollback phase; execute confirmed reverse moves only | Recovery progress / result | `false` | `ROLLBACK_VERIFY` |
| `ROLLBACK_VERIFY` | Recovery execution finished | Load rollback phase; verify restored locations and decide whether cleanup candidates exist | Recovery verification result | `false` | `ROLLBACK_CLEANUP_CONFIRM` or `DONE` |
| `ROLLBACK_CLEANUP_CONFIRM` | Cleanup candidates exist after recovery, or user asks to clean workflow-created empty folders / nodes | Load rollback phase; generate cleanup plan and ask for delete confirmation | Cleanup candidates and delete confirmation | `true` | `ROLLBACK_CLEANUP` or `DONE` |
| `ROLLBACK_CLEANUP` | User explicitly confirms cleanup deletion | Load rollback phase; delete only confirmed workflow-created safe-empty folders / nodes | Cleanup progress / result | `false` | `ROLLBACK_CLEANUP_VERIFY` |
| `ROLLBACK_CLEANUP_VERIFY` | Cleanup deletion finished | Load rollback phase; verify deleted cleanup targets | Cleanup verification result | `false` | `DONE` |
| `DONE` | No more action | Stop | Final answer | `false` | End |
## Progressive Load Map
Agent MUST read the phase file for the active state before executing that state.
| State | Required Phase File |
|-------|---------------------|
| `PARSE_SCOPE` | [`lark-drive-workflow-knowledge-organize-discovery.md`](lark-drive-workflow-knowledge-organize-discovery.md) |
| `INVENTORY` | [`lark-drive-workflow-knowledge-organize-discovery.md`](lark-drive-workflow-knowledge-organize-discovery.md) |
| `CONTENT_READ` | [`lark-drive-workflow-knowledge-organize-analysis.md`](lark-drive-workflow-knowledge-organize-analysis.md) |
| `ISSUE_ANALYSIS` | [`lark-drive-workflow-knowledge-organize-analysis.md`](lark-drive-workflow-knowledge-organize-analysis.md) |
| `RULE_GENERATION` | [`lark-drive-workflow-knowledge-organize-analysis.md`](lark-drive-workflow-knowledge-organize-analysis.md) |
| `PLAN_GENERATION` | [`lark-drive-workflow-knowledge-organize-planning.md`](lark-drive-workflow-knowledge-organize-planning.md) |
| `EXEC_CONFIRM` | [`lark-drive-workflow-knowledge-organize-planning.md`](lark-drive-workflow-knowledge-organize-planning.md) |
| `EXECUTE` | [`lark-drive-workflow-knowledge-organize-execution.md`](lark-drive-workflow-knowledge-organize-execution.md) |
| `VERIFY` | [`lark-drive-workflow-knowledge-organize-execution.md`](lark-drive-workflow-knowledge-organize-execution.md) |
| `ROLLBACK_CONFIRM` | [`lark-drive-workflow-knowledge-organize-rollback.md`](lark-drive-workflow-knowledge-organize-rollback.md) |
| `ROLLBACK` | [`lark-drive-workflow-knowledge-organize-rollback.md`](lark-drive-workflow-knowledge-organize-rollback.md) |
| `ROLLBACK_VERIFY` | [`lark-drive-workflow-knowledge-organize-rollback.md`](lark-drive-workflow-knowledge-organize-rollback.md) |
| `ROLLBACK_CLEANUP_CONFIRM` | [`lark-drive-workflow-knowledge-organize-rollback.md`](lark-drive-workflow-knowledge-organize-rollback.md) |
| `ROLLBACK_CLEANUP` | [`lark-drive-workflow-knowledge-organize-rollback.md`](lark-drive-workflow-knowledge-organize-rollback.md) |
| `ROLLBACK_CLEANUP_VERIFY` | [`lark-drive-workflow-knowledge-organize-rollback.md`](lark-drive-workflow-knowledge-organize-rollback.md) |
## Command Map
Use only command families allowed for the current state. Detailed syntax belongs to referenced skills / shortcut docs.
| State | Allowed Command Families | Purpose |
|-------|--------------------------|---------|
| `PARSE_SCOPE` | `drive +inspect`, `wiki +node-get`, `wiki +space-list`, `wiki spaces get`, `drive +search` | Resolve target scope |
| `INVENTORY` | `wiki +node-list`, `drive files list` (schema path: `drive.files.list`), `drive metas batch_query` | Recursively list and enrich resources |
| `CONTENT_READ` | `docs +fetch`, plus `lark-sheets` / `lark-base` when conditionally required | Partial content read for low-confidence items |
| `ISSUE_ANALYSIS` | No write commands | Analyze `resource_items` only |
| `RULE_GENERATION` | No write commands | Generate classification rules and target tree |
| `PLAN_GENERATION` | No write commands | Generate internal plan and user-facing pages |
| `EXEC_CONFIRM` | No write commands | Ask user to confirm execution scope |
| `EXECUTE` | `drive +create-folder`, `drive +move`, `wiki +node-create`, `wiki +move` existing-node mode only, `drive +task_result`, `drive +apply-permission` only when explicitly confirmed | Execute whitelisted writes |
| `VERIFY` | `wiki +node-list`, `drive files list` (schema path: `drive.files.list`), `drive +task_result` if async result remains pending | Verify actual result |
| `ROLLBACK_CONFIRM` | No write commands | Generate internal recovery plan and ask for restore confirmation |
| `ROLLBACK` | `drive +move`, `wiki +move` existing-node mode only, `drive +task_result` | Execute confirmed reverse moves |
| `ROLLBACK_VERIFY` | `wiki +node-list`, `drive files list` (schema path: `drive.files.list`), `drive +task_result` if async result remains pending | Verify recovery result |
| `ROLLBACK_CLEANUP_CONFIRM` | No write commands | Generate cleanup plan and ask for delete confirmation |
| `ROLLBACK_CLEANUP` | `drive +delete`, `wiki +node-delete`, `drive +task_result` | Delete only confirmed workflow-created safe-empty folders / nodes |
| `ROLLBACK_CLEANUP_VERIFY` | `wiki +node-list`, `drive files list` (schema path: `drive.files.list`), `drive +task_result` if async result remains pending | Verify cleanup deletion result |
## Wiki Move Mode Constraint
This workflow MUST NOT use `wiki +move` docs-to-wiki mode. Wiki moves MUST use existing Wiki node mode with `--node-token` only.
## Permission Request Gate
`drive +apply-permission` is a write operation and may notify the resource owner. If any state hits resource access denial:
1. Stop the current state.
2. Show the single target resource, requested permission, reason / remark, and owner-notification implication when known.
3. Ask the user to confirm this single permission request.
4. Only after explicit confirmation, treat the permission request as a confirmed `EXECUTE` operation.
5. After the request is submitted or skipped, return to the blocked state only when the user asks to continue.
Never request permission automatically, never batch permission requests, and never hide the owner-notification implication.
## Transition Rules
1. If `PARSE_SCOPE` cannot determine the target range, ask only for target range clarification and stop.
2. If auth or API scope is missing, follow `lark-shared` permission handling and stop.
3. If resource access permission is missing, follow `Permission Request Gate`.
4. If the user asks to inspect more pages, stay in `PLAN_GENERATION` and update `display_page_state`.
5. If the user declines execution in `EXEC_CONFIRM`, output the saved plan summary and move to `DONE`.
6. If execution fails for an item, record the failure and continue only when the failed item is independent; otherwise stop, report the blocker, and ask whether the user wants to try restoring to `整理前的位置` when any move already succeeded.
7. Do not load the rollback phase merely because a snapshot or journal exists. Load it only after execution failure, serious verification mismatch, or explicit user rollback request, and only after the user chooses to try restore.
## References
- [Discovery phase](lark-drive-workflow-knowledge-organize-discovery.md)
- [Analysis phase](lark-drive-workflow-knowledge-organize-analysis.md)
- [Planning phase](lark-drive-workflow-knowledge-organize-planning.md)
- [Execution phase](lark-drive-workflow-knowledge-organize-execution.md)
- [Rollback phase](lark-drive-workflow-knowledge-organize-rollback.md)
- [lark-shared](../../lark-shared/SKILL.md)
- [lark-drive](../SKILL.md)
- [lark-drive-search](lark-drive-search.md)
- [lark-drive-inspect](lark-drive-inspect.md)
- [lark-drive-apply-permission](lark-drive-apply-permission.md)
- [lark-drive-task-result](lark-drive-task-result.md)
- [lark-drive-delete](lark-drive-delete.md)
- [lark-wiki](../../lark-wiki/SKILL.md)
- [lark-wiki-node-delete](../../lark-wiki/references/lark-wiki-node-delete.md)
- [lark-doc](../../lark-doc/SKILL.md)
- [lark-doc-fetch](../../lark-doc/references/lark-doc-fetch.md)
- [lark-sheets](../../lark-sheets/SKILL.md)
- [lark-base](../../lark-base/SKILL.md)

View File

@@ -24,6 +24,7 @@ metadata:
## 快速决策
- 用户要**整理 / 盘点 / 归类 / 重构知识库、个人文档库、文档库目录或 Wiki 节点结构**,或要生成整理方案、目标目录树、移动计划时,不要只使用 Wiki 节点 API。必须先阅读 [`../lark-drive/references/lark-drive-workflow-knowledge-organize.md`](../lark-drive/references/lark-drive-workflow-knowledge-organize.md),该 workflow 负责 Drive / Wiki / 个人文档库的统一入口解析、资源盘点、分类计划、写前确认和结果验证。
- 用户给的是知识库 URL`.../wiki/<token>`),且后续要查成员/加成员/删成员:先调用 `lark-cli wiki spaces get_node --params '{"token":"<wiki_token>"}'` 获取 `space_id`,后续成员接口统一使用 `space_id`
- 用户要**删除**知识空间(`wiki +delete-space`)但只给了名称或 URL**不能**把名称 / URL 原样传给 `--space-id`,必须先解析出真实 `space_id`。解析方式:
- URL`.../wiki/<token>``lark-cli wiki spaces get_node --params '{"token":"<wiki_token>"}' --format json`,读 `data.node.space_id`