sync(sheets): pick up sheets_df.py + doc DRY cleanup from spec

Mirror of the sheet-skill-spec change that ships a 32-line helper-only
sheets_df.py (df_to_sheet + sheet_to_df) and removes the corresponding
inline `def` blocks from three reference docs.

- skills/lark-sheets/scripts/sheets_df.py (new): pandas DataFrame ↔
  one +table-put / +table-get sheet, importable as a library. Same
  helper pair the docs already taught, lifted out of the prose so
  callers can `from sheets_df import df_to_sheet, sheet_to_df`.
- lark-sheets-write-cells.md / lark-sheets-read-data.md /
  lark-sheets-workbook.md: drop the inline helper definitions; keep
  the usage examples (single/multi-sheet, round-trip) and switch them
  to import-from-script. workbook reference's +workbook-create
  --sheets section now points pandas users at the helper directly
  (was previously a textual reference back to write-cells).

End-to-end verified against PPE (--as user):
- +workbook-create with df_to_sheet for three sheets (income / balance
  / cashflow): create ok, dtypes (datetime64[ns] / float64) + formats
  (#,##0 / 0.0% / yyyy-mm-dd) survive on read-back through sheet_to_df.
- read → pandas mutate → write-back round-trip preserves both data
  and formats.
This commit is contained in:
xiongyuanwen-byted
2026-06-24 16:55:26 +08:00
parent 14cb134cac
commit 5fac9c39a5
5 changed files with 48 additions and 14 deletions

View File

@@ -187,14 +187,12 @@ lark-cli sheets +table-get --url "<表URL>"
lark-cli sheets +table-get --url "<表URL>" --sheet-name "销售"
```
#### 输出 → DataFrame2 行 helper
#### 输出 → DataFrame用 `sheet_to_df` helper
输出形状对齐 pandas split`columns` 是列名数组、`data` 是二维数据、`dtypes``{列名: pandas_dtype_str}` 映射。直接喂给 `pd.DataFrame(...).astype(...)` 就能一次性还原所有列类型(不必逐列 `to_datetime` / `to_numeric`,写入侧 `df_to_sheet` 的镜像 helper
输出形状对齐 pandas split`columns` 是列名数组、`data` 是二维数据、`dtypes``{列名: pandas_dtype_str}` 映射。直接喂给 `pd.DataFrame(...).astype(...)` 就能一次性还原所有列类型(不必逐列 `to_datetime` / `to_numeric`。本 skill 把这段 2 行 helper 打包成可 import 的 [`scripts/sheets_df.py`](../scripts/sheets_df.py)(含 `df_to_sheet``sheet_to_df`,写入 / 读回成对)
```python
import pandas as pd
def sheet_to_df(sheet):
return pd.DataFrame(sheet["data"], columns=sheet["columns"]).astype(sheet["dtypes"])
from sheets_df import sheet_to_df
# 单 sheet
df = sheet_to_df(out["data"]["sheets"][0])
@@ -236,10 +234,12 @@ df = pd.read_feather(io.BytesIO(res.stdout))
#### round-trip读 → 改 → 写回(写读对偶)
`sheet_to_df` write-cells reference 里的 `df_to_sheet` 一对镜像 helperround-trip 三段读 / 改 / 写各一行:
`sheet_to_df``df_to_sheet` 一对镜像 helper[`scripts/sheets_df.py`](../scripts/sheets_df.py))让 round-trip 三段读 / 改 / 写各一行:
```python
import json, subprocess
from sheets_df import df_to_sheet, sheet_to_df
# 1. 读
out = json.loads(subprocess.check_output(
["lark-cli","sheets","+table-get","--url",URL,"--sheet-name","销售"]))

View File

@@ -229,6 +229,13 @@ python prepare.py | lark-cli sheets +workbook-create --title "交易" --datafram
`--sheets` 协议与 `+table-put` 完全同构(字段含义见 lark-sheets-write-cells 的 `+table-put`,大 payload 走 stdin / `@file``--dataframe` 是同一份 typed 数据的二进制 wireArrow IPC详见同 reference 的 `+table-put` 段落的 `--dataframe` 小节),按 producer 已有的 API 选——pandas 走 `--dataframe`,多子表 / 手拼 JSON 走 `--sheets`。关键差异:**新建工作簿的默认子表会被复用为第一个子表**(重命名后承载数据),不会残留空 `Sheet1`;其余子表按需新建。它把 `+table-put` 单独做不到的"建表 + typed 写入"合到一条命令是「pandas 算完直接落地一张带真日期的新表」的首选。回读校验用 `+table-get`(与 `--sheets` 同构、可 round-trippandas 用户也可走 `--dataframe-out` 直拿 Arrow 文件)。
> 💡 pandas DataFrame 走 `--sheets` 时直接 `from sheets_df import df_to_sheet`[`scripts/sheets_df.py`](../scripts/sheets_df.py),与 `+table-put` 共用同一份 helper多子表场景 helper 优势更明显:
> ```python
> payload = {"sheets": [df_to_sheet(income, "Income Statement"),
> df_to_sheet(balance, "Balance Sheet"),
> df_to_sheet(cashflow, "Cash Flow")]}
> ```
`--styles` 可在建表写入时同时写视觉处理。它和 `--sheets` 一样只有一种外层写法:顶层对象里放 `styles` 数组;数组每项对应一个子表,含 `name`,并按能力拆成四类可选数组:
- `cell_styles`:像 `+cells-set-style`,用 A1 单元格 `range` 加扁平样式字段(`font_weight` / `background_color` / `horizontal_alignment` / `vertical_alignment` / `number_format` 等)和可选 `border_styles`;这些样式会随内容在同一次写入里一并应用。完整字段跑 `+workbook-create --print-schema --flag-name styles`

View File

@@ -506,17 +506,12 @@ lark-cli sheets +table-put --spreadsheet-token "<token>" --sheets @payload.json
每个 sheet 还可带 `"allow_overwrite": false`(遇非空拒写、保护原数据)、`"header": false`(只写数据不写表头)。完整字段跑 `+table-put --print-schema --flag-name sheets`
#### DataFrame → 协议(5 行 helper
#### DataFrame → 协议(用 `df_to_sheet` helper
pandas 的 `df.to_json(orient="split", date_format="iso")` 一步完成所有清洗NaN→null、Timestamp→ISO 字符串、numpy 标量→原生数字),helper 只要把 dtypes 拼上去——5 行覆盖单 / 多 sheet
pandas 的 `df.to_json(orient="split", date_format="iso")` 一步完成所有清洗NaN→null、Timestamp→ISO 字符串、numpy 标量→原生数字),把 dtypes 拼上即可。本 skill 把这段 5 行 helper 打包成可 import 的 [`scripts/sheets_df.py`](../scripts/sheets_df.py)(含 `df_to_sheet``sheet_to_df`,写入 / 读回成对)
```python
import json
def df_to_sheet(df, name, formats=None):
return {"name": name,
**json.loads(df.to_json(orient="split", date_format="iso")),
"dtypes": df.dtypes.astype(str).to_dict(),
**({"formats": formats} if formats else {})}
from sheets_df import df_to_sheet
# 单 sheet显式 format 覆盖默认显示)
payload = {"sheets": [df_to_sheet(df, "销售", {"营收": "#,##0.00", "毛利率": "0.0%"})]}

View File

@@ -0,0 +1,32 @@
#!/usr/bin/env python3
# Copyright (c) 2026 Lark Technologies Pte. Ltd.
# SPDX-License-Identifier: MIT
"""DataFrame ↔ Feishu Sheet typed-JSON helpers.
This is the same 7-line snippet the skill docs already inline (see
`lark-sheets-write-cells` "DataFrame → 协议5 行 helper" and
`lark-sheets-read-data` "输出 → DataFrame2 行 helper"), pulled out
so callers can `import` it instead of copy-pasting:
from sheets_df import df_to_sheet, sheet_to_df
Callers run lark-cli themselves; this file is a library, not a CLI.
"""
import json
import pandas as pd
def df_to_sheet(df, name, formats=None):
"""Pack one DataFrame into one entry of a `+table-put --sheets` payload."""
return {
"name": name,
**json.loads(df.to_json(orient="split", date_format="iso")),
"dtypes": df.dtypes.astype(str).to_dict(),
**({"formats": formats} if formats else {}),
}
def sheet_to_df(sheet):
"""Restore one `+table-get` sheet dict into a typed DataFrame."""
return pd.DataFrame(sheet["data"], columns=sheet["columns"]).astype(sheet["dtypes"])