initial commit

2026-04-07 20:55:30 +08:00
commit 81d1fb7856
84 changed files with 11929 additions and 0 deletions
--- a/doc/models.md
+++ b/doc/models.md
@@ -0,0 +1,90 @@
+# 后端模型处理
+
+当前后端主要围绕四类模型提供服务：深度估计、语义分割、图像补全和动画生成。
+
+前端通过 GET /models 获取模型列表和参数配置，用来动态生成 UI；推理接口分别为：
+
+POST /depth
+
+POST /segment
+
+POST /inpaint
+
+POST /animate
+
+## 一、深度估计
+
+输入一张 RGB 图像，输出每个像素的相对深度，用于后续的分层和视差计算。
+
+这一部分是整个伪3D效果的基础，深度质量直接决定最终效果上限。
+
+模型：
+
+* ZoeDepth：https://github.com/isl-org/ZoeDepth.git
+* Depth Anything v2：https://github.com/DepthAnything/Depth-Anything-V2.git
+* MiDaS：https://github.com/isl-org/MiDaS.git
+* DPT：https://github.com/isl-org/DPT.git
+
+接口说明
+
+HTTP：POST /depth
+
+请求体：DepthRequest
+
+实现：models_depth.py 中的 run_depth_inference
+
+
+## 二、语义分割
+
+对图像进行像素级分区，用于辅助分层（天空 / 山 / 地面 / 建筑等）。
+
+在伪3D流程中，这一步主要解决一个问题：
+
+哪里可以拆开，哪里必须保持整体
+
+模型：
+* Mask2Former：https://github.com/facebookresearch/Mask2Former.git
+* SAM：https://github.com/facebookresearch/segment-anything.git
+
+接口说明
+
+HTTP：POST /segment
+
+请求体：SegmentRequest
+
+实现：models_segmentation.py 中的 run_segmentation_inference
+
+## 三、图像补全
+
+在进行视差变换或分层后，图像中会出现“空洞区域”，需要通过生成模型进行补全。
+
+这一部分主要影响最终画面的“真实感”。
+
+模型：
+* SDXL Inpainting：https://github.com/AyushUnleashed/sdxl-inpaint.git
+* ControlNet：https://github.com/lllyasviel/ControlNet.git
+
+接口说明
+
+HTTP：POST /inpaint
+
+请求体：InpaintRequest
+
+实现：models_inpaint.py 中的 run_inpaint_inference
+
+## 四、动画生成
+
+通过文本提示词生成短动画（GIF），用于从静态描述快速预览动态镜头效果。
+
+这部分当前接入 AnimateDiff，并通过统一后端接口对外提供调用能力。
+
+模型：
+* AnimateDiff：https://github.com/guoyww/animatediff.git
+
+接口说明
+
+HTTP：POST /animate
+
+请求体：AnimateRequest
+
+实现：`python_server/model/Animation/animation_loader.py` + `python_server/server.py` 中的 `animate`