Multimodal Image Understanding

by Rytia v1.0.0

通过调用多模态模型来理解图片内容。触发场景：(1) 用户要求分析/描述/提取/OCR 图片信息，且当前模型不支持图像输入（如 deepseek-v4、glm 5.1 等纯文本模型），(2) 用户明确要求"用我的视觉模型"或"调用多模态 API"来看图，(3) 用户显式调用本 skill（/multimodal-i...

Description

Downloads

Stars

Installs

Versions

View on ClawHub

Latest Changes

Install Multimodal Image Understanding with One Click

Get a managed OpenClaw server and install this skill from your dashboard. No SSH, no Docker, no configuration needed.

Deploy with ClawHost