当前位置：首页 > news >正文

实战 + 原理全解析：用 Qwen-Agent 构建图文生成智能体！

news 2025/7/2 0:21:45

🔥实战 + 原理全解析：用 Qwen-Agent 构建图文生成智能体！

✅ 零门槛启动
✅ 自定义工具一键注册
✅ 支持自然语言生成图像 + 编程执行
✅ 深度剖析 Qwen-Agent 的 Tool 调用机制、消息链设计与多模态能力

🎬 效果演示：对话生成图像 + 编程处理

🧩 背景介绍：为什么用 Qwen-Agent？

传统的 AI 绘图接口只提供一个 API，需要你写 prompt、请求服务、解析响应、展示图像。

而 Qwen-Agent 带来了一个更高级的抽象 —— Agent = LLM + Tool + Memory + File + GUI + Autonomy：

🔧 Tool：可注册任何自定义函数（如图像生成器）
🤖 LLM：通义千问模型作为 Agent 的大脑
🧠 Memory：维护上下文历史
🧑‍💻 Code Interpreter：内置代码执行器
📄 File：能自动读写 PDF/图像/CSV 等文件
🧠 Autonomy：可执行复杂多轮任务链

⚙️ 环境准备

建议使用 Python 3.9.23

pip install -U "qwen-agent[rag,code_interpreter,gui,mcp]"
pip install qwen_agent==0.0.27
pip install python-dotenv

🪄 第一步：注册图像生成工具（Tool）

Agent 中的 Tool 是一个类，实现了功能模块的封装。以下代码注册了一个 my_image_gen 工具，通过 Pollinations 平台生成图像。

@register_tool('my_image_gen')
class MyImageGen(BaseTool):description = 'AI 绘画服务，输入描述，返回图像 URL'parameters = [{'name': 'prompt', 'type': 'string','description': '图像内容描述', 'required': True}]def call(self, params: str, **kwargs) -> str:prompt = urllib.parse.quote(json5.loads(params)['prompt'])return json5.dumps({'image_url': f'https://image.pollinations.ai/prompt/{prompt}'}, ensure_ascii=False)

🔍 核心原理解析：

所有工具都需继承 BaseTool 并注册名
描述（description）和参数（parameters）会被 LLM 用于函数调用意图生成（Function Call）
call 方法是实际逻辑执行体，支持任意 Python 函数封装

🧠 第二步：配置 LLM 模型

你可以使用阿里云百炼的 Qwen 模型服务（推荐），或者本地部署的兼容 OpenAI API 的模型（如 vLLM + Qwen2.5）。

llm_cfg = {'model': 'qwen-plus-2025-01-25','model_type': 'qwen_dashscope','api_key': '替换为你的 API Key','generate_cfg': {'top_p': 0.8}
}

📌 获取方式：

模型地址：https://bailian.console.aliyun.com/#/model-market
API Key：https://bailian.console.aliyun.com/#/api-key

🧬 第三步：创建智能体 Assistant

system_instruction = '''你需要：
1. 调用绘图工具，获取图像 URL；
2. 下载图像；
3. 使用代码进行图像操作；
4. 使用 plt.show() 展示结果。
始终用中文回复用户。'''bot = Assistant(llm=llm_cfg,system_message=system_instruction,function_list=['my_image_gen', 'code_interpreter']
)

📌 Assistant 的底层架构理解：

Assistant 是 Qwen-Agent 的核心智能体结构
接受 system_message 用于定义角色行为（类似 OpenAI 的 system prompt）
可组合多个 Tool，自动进行函数调用推理
使用 code_interpreter，可在执行链中插入代码解释执行

💬 第四步：与智能体对话

messages = []  # 存储历史对话
while True:query = input('\n用户请求: ')messages.append({'role': 'user', 'content': query})response = []response_plain_text = ''print('机器人回应:')for resp in bot.run(messages=messages):response_plain_text = typewriter_print(resp, response_plain_text)messages.extend(response)

📌 交互机制说明：

Qwen-Agent 使用 messages 管理上下文对话历史（兼容 OpenAI 聊天格式）
.run(messages) 会触发 LLM → Tool 选择 → Tool 执行 → 输出
输出结果支持流式显示，提升交互体验

🧪 控制台演示输出

🧠 深层原理总结

关键能力	技术机制	说明
工具调用	LLM Function Call 推理 + Tool 注册机制	基于工具描述构建函数调用意图，模型主动选择并调用
多模态	外部图像生成 API + Code Interpreter + plt.show	自然语言描述 → 图像 → 代码 → 图像显示
记忆机制	`messages` 消息序列	每一轮交互都是在上下文中进行
指令控制	`system_message`	自定义行为流程（先画图、再下载、再处理）
模型解耦	支持本地 vLLM 或远程 DashScope	支持多种 LLM 接入方式（openai 接口兼容）

🔚 写在最后

通过本文，你不仅学会了如何：

构建一个能画图的智能体
实现 AI + 绘图 + 编程的一体化体验
深度掌握 Qwen-Agent 的 Tool 调用与运行逻辑

更重要的是，你已经迈出了多模态智能体 Agent 构建的第一步。

📌 本项目适合进一步扩展为：

图像风格迁移工具
智能 PDF + 图像处理系统
可自主分析、生成和编辑图像的 Agent Copilot

📎 项目源码

import pprint
import urllib.parse
import json5
from qwen_agent.agents import Assistant
from qwen_agent.tools.base import BaseTool, register_tool
from qwen_agent.utils.output_beautify import typewriter_print# 步骤 1（可选）：添加一个名为 `my_image_gen` 的自定义工具。
@register_tool('my_image_gen')
class MyImageGen(BaseTool):# `description` 用于告诉智能体该工具的功能。description = 'AI 绘画（图像生成）服务，输入文本描述，返回基于文本信息绘制的图像 URL。'# `parameters` 告诉智能体该工具有哪些输入参数。parameters = [{'name': 'prompt','type': 'string','description': '期望的图像内容的详细描述','required': True}]def call(self, params: str, **kwargs) -> str:# `params` 是由 LLM 智能体生成的参数。prompt = json5.loads(params)['prompt']prompt = urllib.parse.quote(prompt)return json5.dumps({'image_url': f'https://image.pollinations.ai/prompt/{prompt}'},ensure_ascii=False)# 步骤 2：配置您所使用的 LLM。
llm_cfg = {# 使用 DashScope 提供的模型服务：'model': 'qwen-plus-2025-01-25','model_type': 'qwen_dashscope','api_key': '替换为自己的api',# 如果这里没有设置 'api_key'，它将读取 `DASHSCOPE_API_KEY` 环境变量。# 使用与 OpenAI API 兼容的模型服务，例如 vLLM 或 Ollama：# 'model': 'Qwen2.5-7B-Instruct',# 'model_server': 'http://localhost:8000/v1',  # base_url，也称为 api_base# 'api_key': 'EMPTY',# （可选） LLM 的超参数：'generate_cfg': {'top_p': 0.8}
}# 步骤 3：创建一个智能体。这里我们以 `Assistant` 智能体为例，它能够使用工具并读取文件。
system_instruction = '''在收到用户的请求后，你应该：
- 首先绘制一幅图像，得到图像的url，
- 然后运行代码`request.get`以下载该图像的url，
- 最后从给定的文档中选择一个图像操作进行图像处理。
用 `plt.show()` 展示图像。
你总是用中文回复用户。'''
tools = ['my_image_gen', 'code_interpreter']  # `code_interpreter` 是框架自带的工具，用于执行代码。
# files = ['./examples/resource/doc.pdf']  # 给智能体一个 PDF 文件阅读。
bot = Assistant(llm=llm_cfg,system_message=system_instruction,function_list=tools)# files=files)# 步骤 4：作为聊天机器人运行智能体。
messages = []  # 这里储存聊天历史。
while True:# 例如，输入请求 "绘制一只狗并将其旋转 90 度"。query = input('\n用户请求: ')# 将用户请求添加到聊天历史。messages.append({'role': 'user', 'content': query})response = []response_plain_text = ''print('机器人回应:')for response in bot.run(messages=messages):# 流式输出。response_plain_text = typewriter_print(response, response_plain_text)# 将机器人的回应添加到聊天历史。messages.extend(response)