通義千問VL模型可以根據您傳入的圖片來進行回答。
訪問模型廣場可以在線體驗圖片理解能力。
如何使用
您需要已獲取API Key并配置API Key到環境變量。如果通過OpenAI SDK或DashScope SDK進行調用,還需要安裝SDK。
簡單示例
OpenAI兼容
您可以通過OpenAI SDK或OpenAI兼容的HTTP方式調用通義千問VL模型。
Python
示例代碼
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
completion = client.chat.completions.create(
model="qwen-vl-max-latest",
messages=[
{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
{"type": "text", "text": "這是什么"}
]}
]
)
print(completion.choices[0].message.content)
返回結果
這是一張在海灘上拍攝的照片。照片中,一個人和一只狗坐在沙灘上,背景是大海和天空。人和狗似乎在互動,狗的前爪搭在人的手上。陽光從畫面的右側照射過來,給整個場景增添了一種溫暖的氛圍。
curl
示例代碼
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-vl-max",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
{"type": "text", "text": "這是什么"}
]
}]
}'
返回結果
{
"choices": [
{
"message": {
"content": "這張圖片展示了一位女士和一只狗在海灘上互動。女士坐在沙灘上,微笑著與狗握手。背景是大海和天空,陽光灑在她們身上,營造出溫暖的氛圍。狗戴著項圈,顯得很溫順。",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 1270,
"completion_tokens": 54,
"total_tokens": 1324
},
"created": 1725948561,
"system_fingerprint": null,
"model": "qwen-vl-max",
"id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}
Node.js
示例代碼
import OpenAI from "openai";
const openai = new OpenAI(
{
// 若沒有配置環境變量,請用百煉API Key將下行替換為:apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
async function main() {
const response = await openai.chat.completions.create({
model: "qwen-vl-max-latest",
messages: [{role: "user",content: [
{ type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
{ type: "text", text: "這是什么?" }
]}]
});
console.log(response.choices[0].message.content);
}
main()
返回結果
這是一張在海灘上拍攝的照片。照片中,一位穿著格子襯衫的女性坐在沙灘上,與一只戴著項圈的黃色拉布拉多犬互動。背景是大海和天空,陽光灑在她們身上,營造出溫暖的氛圍。
DashScope
您可以通過DashScope SDK或HTTP方式調用通義千問VL模型。
Python
示例代碼
import os
import dashscope
messages = [
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"text": "這是什么?"}
]
}
]
response = dashscope.MultiModalConversation.call(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-max-latest',
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])
返回結果
是一張在海灘上拍攝的照片。照片中有一位女士和一只狗。女士坐在沙灘上,微笑著與狗互動。狗戴著項圈,似乎在與女士握手。背景是大海和天空,陽光灑在她們身上,營造出溫馨的氛圍。
Java
示例代碼
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
Collections.singletonMap("text", "這是什么?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen-vl-max-latest")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
返回結果
這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一只戴著項圈的狗。人和狗面對面坐著,似乎在互動。背景是大海和天空,陽光灑在他們身上,營造出溫暖的氛圍。
curl
示例代碼
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-vl-max",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"text": "這是什么?"}
]
}
]
}
}'
返回結果
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一只戴著項圈的狗。他們坐在沙灘上,背景是大海和天空。陽光從畫面的右側照射過來,給整個場景增添了一種溫暖的氛圍。"
}
]
}
}
]
},
"usage": {
"output_tokens": 55,
"input_tokens": 1271,
"image_tokens": 1247
},
"request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}
多圖片輸入
您可以在一次請求中向通義千問VL模型輸入多張圖片,傳入方法請參考以下代碼。
OpenAI兼容
您可以通過OpenAI SDK或OpenAI兼容的HTTP方式調用通義千問VL模型。
Python
示例代碼
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-vl-max-latest",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
},
},
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
},
},
{"type": "text", "text": "這些是什么"},
],
}
],
)
print(completion.choices[0].message.content)
返回結果
圖1中是一位女士和一只拉布拉多犬在海灘上互動的場景。女士穿著格子襯衫,坐在沙灘上,與狗進行握手的動作,背景是海浪和天空,整個畫面充滿了溫馨和愉快的氛圍。
圖2中是一只老虎在森林中行走的場景。老虎的毛色是橙色和黑色條紋相間,它正向前邁步,周圍是茂密的樹木和植被,地面上覆蓋著落葉,整個畫面給人一種野生自然的感覺。
curl
示例代碼
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-vl-max",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
}
},
{
"type": "text",
"text": "這些是什么"
}
]
}
]
}'
返回結果
{
"choices": [
{
"message": {
"content": "圖1中是一位女士和一只拉布拉多犬在海灘上互動的場景。女士穿著格子襯衫,坐在沙灘上,與狗進行握手的動作,背景是海景和日落的天空,整個畫面顯得非常溫馨和諧。\n\n圖2中是一只老虎在森林中行走的場景。老虎的毛色是橙色和黑色條紋相間,它正向前邁步,周圍是茂密的樹木和植被,地面上覆蓋著落葉,整個畫面充滿了自然的野性和生機。",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 2497,
"completion_tokens": 109,
"total_tokens": 2606
},
"created": 1725948561,
"system_fingerprint": null,
"model": "qwen-vl-max",
"id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}
Node.js
示例代碼
import OpenAI from "openai";
const openai = new OpenAI(
{
// 若沒有配置環境變量,請用百煉API Key將下行替換為:apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
async function main() {
const response = await openai.chat.completions.create({
model: "qwen-vl-max-latest",
messages: [{role: "user",content: [
{ type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
{ type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}},
{ type: "text", text: "這些是什么?" },
]}]
});
console.log(response.choices[0].message.content);
}
main()
返回結果
第一張圖片中,一個人和一只狗在海灘上互動。人穿著格子襯衫,狗戴著項圈,他們似乎在握手或擊掌。
第二張圖片中,一只老虎在森林中行走。老虎的毛色是橙色和黑色條紋,背景是綠色的樹木和植被。
DashScope
您可以通過DashScope SDK或HTTP方式調用通義千問VL模型。
Python
示例代碼
import os
import dashscope
messages = [
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"},
{"text": "這些是什么?"}
]
}
]
response = dashscope.MultiModalConversation.call(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-max-latest',
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])
返回結果
這些圖片展示了一些動物和自然場景。第一張圖片中,一個人和一只狗在海灘上互動。第二張圖片是一只老虎在森林中行走。第三張圖片是一只卡通風格的兔子在草地上跳躍。
Java
示例代碼
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"),
Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"),
Collections.singletonMap("text", "這些是什么?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen-vl-max-latest")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); }
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
返回結果
這些圖片展示了一些動物和自然場景。
1. 第一張圖片:一個女人和一只狗在海灘上互動。女人穿著格子襯衫,坐在沙灘上,狗戴著項圈,伸出爪子與女人握手。
2. 第二張圖片:一只老虎在森林中行走。老虎的毛色是橙色和黑色條紋,背景是樹木和樹葉。
3. 第三張圖片:一只卡通風格的兔子在草地上跳躍。兔子是白色的,耳朵是粉紅色的,背景是藍天和黃色的花朵。
這些圖片展示了不同的動物和自然環境。
curl
示例代碼
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"},
{"text": "這些是什么?"}
]
}
]
}
}'
返回結果
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "這張圖片顯示了一位女士和她的狗在海灘上。她們似乎正在享受彼此的陪伴,狗狗坐在沙灘上伸出爪子與女士握手或互動。背景是美麗的日落景色,海浪輕輕拍打著海岸線。\n\n請注意,我提供的描述基于圖像中可見的內容,并不包括任何超出視覺信息之外的信息。如果您需要更多關于這個場景的具體細節,請告訴我!"
}
]
}
}
]
},
"usage": {
"output_tokens": 81,
"input_tokens": 1277,
"image_tokens": 1247
},
"request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}
多輪對話(參考歷史對話信息)
通義千問VL模型可以參考歷史對話信息進行回復。您可以參考以下示例代碼,通過OpenAI或者DashScope的方式,調用通義千問VL模型,實現多輪對話的功能。
OpenAI兼容
您可以通過OpenAI SDK或OpenAI兼容的HTTP方式調用通義千問VL模型,體驗多輪對話的功能。
Python
示例代碼
from openai import OpenAI
import os
client = OpenAI(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
},
},
{"type": "text", "text": "這是什么"},
],
}
]
completion = client.chat.completions.create(
model="qwen-vl-max-latest",
messages=messages,
)
print(f"第一輪輸出:{completion.choices[0].message.content}")
assistant_message = completion.choices[0].message
messages.append(assistant_message.model_dump())
messages.append({
"role": "user",
"content": [
{
"type": "text",
"text": "做一首詩描述這個場景"
}
]
})
completion = client.chat.completions.create(
model="qwen-vl-max-latest",
messages=messages,
)
print(f"第二輪輸出:{completion.choices[0].message.content}")
返回結果
第一輪輸出:這是一張在海灘上拍攝的照片。照片中,一位穿著格子襯衫的女士坐在沙灘上,與一只戴著項圈的金毛犬互動。背景是大海和天空,陽光灑在她們身上,營造出溫暖的氛圍。
第二輪輸出:沙灘上,陽光灑,
女子與犬,笑語嘩。
海浪輕拍,風兒吹,
快樂時光,心兒醉。
curl
示例代碼
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-vl-max",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
}
},
{
"type": "text",
"text": "這是什么"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "這是一個女孩和一只狗。"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "寫一首七言絕句描述這個場景"
}
]
}
]
}'
返回結果
{
"choices": [
{
"message": {
"content": "海風輕拂笑顏開, \n沙灘上與犬相陪。 \n夕陽斜照人影短, \n歡樂時光心自醉。",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 1295,
"completion_tokens": 32,
"total_tokens": 1327
},
"created": 1726324976,
"system_fingerprint": null,
"model": "qwen-vl-max",
"id": "chatcmpl-3c953977-6107-96c5-9a13-c01e328b24ca"
}
Node.js
示例代碼
import OpenAI from "openai";
const openai = new OpenAI(
{
// 若沒有配置環境變量,請用百煉API Key將下行替換為:apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
let messages = [{
role: "user", content: [
{ type: "image_url", image_url: { "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg" } },
{ type: "text", text: "這是什么?" },
]
}]
async function main() {
let response = await openai.chat.completions.create({
model: "qwen-vl-max-latest",
messages: messages
});
console.log(`第一輪輸出:${response.choices[0].message.content}`);
messages.push(response.choices[0].message);
messages.push({"role": "user", "content": "做一首詩描述這個場景"});
response = await openai.chat.completions.create({
model: "qwen-vl-max-latest",
messages: messages
});
console.log(`第二輪輸出:${response.choices[0].message.content}`);
}
main()
返回結果
第一輪輸出:這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一只戴著項圈的狗。人和狗面對面坐著,似乎在互動。背景是大海和天空,陽光從畫面的右側照射過來,營造出溫暖的氛圍。
第二輪輸出:沙灘上,人與狗,
面對面,笑語稠。
海風輕拂,陽光柔,
心隨波浪,共潮頭。
項圈閃亮,情意濃,
格子衫下,心相通。
海天一色,無盡空,
此刻溫馨,永銘中。
DashScope
您可以通過DashScope SDK或HTTP方式調用通義千問VL模型,體驗多輪對話的功能。
Python
示例代碼
import os
from dashscope import MultiModalConversation
messages = [
{
"role": "user",
"content": [
{
"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
},
{"text": "這是什么?"},
],
}
]
response = MultiModalConversation.call(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-max-latest',
messages=messages
)
print(f"模型第一輪輸出:{response.output.choices[0].message.content[0]['text']}")
messages.append(response['output']['choices'][0]['message'])
user_msg = {"role": "user", "content": [{"text": "做一首詩描述這個場景"}]}
messages.append(user_msg)
response = MultiModalConversation.call(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-max-latest',
messages=messages
)
print(f"模型第二輪輸出:{response.output.choices[0].message.content[0]['text']}")
返回結果
模型第一輪輸出:這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一只戴著項圈的狗。人和狗面對面坐著,似乎在互動。背景是大海和天空,陽光灑在他們身上,營造出溫暖的氛圍。
模型第二輪輸出:在陽光照耀的海灘上,人與狗共享歡樂時光。
Java
示例代碼
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
public class Main {
private static final String modelName = "qwen-vl-max-latest";
public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage systemMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "You are a helpful assistant."))).build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
Collections.singletonMap("text", "這是什么?"))).build();
List<MultiModalMessage> messages = new ArrayList<>();
messages.add(systemMessage);
messages.add(userMessage);
MultiModalConversationParam param = MultiModalConversationParam.builder()
// 若沒有配置環境變量,請用百煉API Key將下行替換為:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY")) .model(modelName)
.messages(messages)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("第一輪輸出:"+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); // add the result to conversation
messages.add(result.getOutput().getChoices().get(0).getMessage());
MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "做一首詩描述這個場景"))).build();
messages.add(msg);
param.setMessages((List)messages);
result = conv.call(param);
System.out.println("第二輪輸出:"+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); }
public static void main(String[] args) {
try {
MultiRoundConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
返回結果
第一輪輸出:這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一只戴著項圈的狗。人和狗面對面坐著,似乎在互動。背景是大海和天空,陽光灑在他們身上,營造出溫暖的氛圍。
第二輪輸出:在陽光灑滿的海灘上,人與狗共享歡樂時光。
curl
示例代碼
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-vl-max",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"text": "這是什么?"}
]
},
{
"role": "assistant",
"content": [
{"text": "這是一只狗和一只女孩。"}
]
},
{
"role": "user",
"content": [
{"text": "寫一首七言絕句描述這個場景"}
]
}
]
}
}'
返回結果
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "海浪輕拍沙灘邊,女孩與狗同嬉戲。陽光灑落笑顏開,快樂時光永銘記。"
}
]
}
}
]
},
"usage": {
"output_tokens": 27,
"input_tokens": 1298,
"image_tokens": 1247
},
"request_id": "bdf5ef59-c92e-92a6-9d69-a738ecee1590"
}
流式輸出
大模型并不是一次性生成最終結果,而是逐步地生成中間結果,最終結果由中間結果拼接而成。使用非流式輸出方式需要等待模型生成結束后再將生成的中間結果拼接后返回,而流式輸出可以實時地將中間結果返回,您可以在模型進行輸出的同時進行閱讀,減少等待模型回復的時間。
OpenAI兼容
您可以通過OpenAI SDK或OpenAI兼容的HTTP方式調用通義千問VL模型,體驗流式輸出的功能。
Python
示例代碼
from openai import OpenAI
import os
client = OpenAI(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-vl-max-latest",
messages=[
{"role": "user",
"content": [{"type": "image_url",
"image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
{"type": "text", "text": "這是什么"}]}],
stream=True
)
full_content = ""
print("流式輸出內容為:")
for chunk in completion:
if chunk.choices[0].delta.content is None:
continue
full_content += chunk.choices[0].delta.content
print(chunk.choices[0].delta.content)
print(f"完整內容為:{full_content}")
返回結果
流式輸出內容為:
這
是一
張
在
海灘上拍攝的照片
。照片中,
一個人和一只狗
坐在沙灘上,
背景是大海和
天空。人和
狗似乎在互動
,狗的前
爪搭在人的
手上。陽光從
畫面的右側照射
過來,給整個
場景增添了一種
溫暖的氛圍。
完整內容為:這是一張在海灘上拍攝的照片。照片中,一個人和一只狗坐在沙灘上,背景是大海和天空。人和狗似乎在互動,狗的前爪搭在人的手上。陽光從畫面的右側照射過來,給整個場景增添了一種溫暖的氛圍。
curl
示例代碼
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
}
},
{
"type": "text",
"text": "這是什么"
}
]
}
],
"stream":true,
"stream_options":{"include_usage":true}
}'
返回結果
data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"finish_reason":null,"delta":{"content":"圖"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"中"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"是一名"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"女子和她的狗在"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"沙灘上互動。狗狗坐在地上,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"伸出爪子像是要握手或者擊"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"掌的樣子。這名女士穿著格子"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"襯衫,似乎正在與狗狗進行親密"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"的接觸,并且面帶微笑。"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"他們背后的海浪拍打著海岸線"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":",天空看起來很明亮但有些模糊"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":",可能是日出或日落時"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"delta":{"content":"分拍攝的照片。整體氛圍顯得非常"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":"和諧而溫馨。"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":1276,"completion_tokens":85,"total_tokens":1361},"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}
data: [DONE]
Node.js
示例代碼
import OpenAI from "openai";
const openai = new OpenAI(
{
// 若沒有配置環境變量,請用百煉API Key將下行替換為:apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen-vl-max-latest",
messages: [
{"role": "user",
"content": [{"type": "image_url",
"image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
{"type": "text", "text": "這是什么"}]}],
stream: true,
});
let fullContent = ""
console.log("流式輸出內容為:")
for await (const chunk of completion) {
if (chunk.choices[0].delta.content != null) {
fullContent += chunk.choices[0].delta.content;
console.log(chunk.choices[0].delta.content);
}
}
console.log(`完整輸出內容為:${fullContent}`)
返回結果
流式輸出內容為:
這
是一
張
在
海灘上拍攝的照片
。照片中,
一個人和一只狗
坐在沙灘上,
背景是大海和
天空。人和
狗似乎在互動
,狗的前
爪搭在人的
手上。陽光從
畫面的右側照射
過來,給整個
場景增添了一種
溫暖的氛圍。
完整內容為:這是一張在海灘上拍攝的照片。照片中,一個人和一只狗坐在沙灘上,背景是大海和天空。人和狗似乎在互動,狗的前爪搭在人的手上。陽光從畫面的右側照射過來,給整個場景增添了一種溫暖的氛圍。
DashScope
您可以通過DashScope SDK或HTTP方式調用通義千問VL模型,體驗流式輸出的功能。
Python
示例代碼
import os
from dashscope import MultiModalConversation
messages = [
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"text": "這是什么?"}
]
}
]
responses = MultiModalConversation.call(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen-vl-max-latest',
messages=messages,
stream=True,
incremental_output=True
)
full_content = ""
print("流式輸出內容為:")
for response in responses:
try:
print(response["output"]["choices"][0]["message"].content[0]["text"])
full_content += response["output"]["choices"][0]["message"].content[0]["text"]
except:
pass
print(f"完整內容為:{full_content}")
返回結果
流式輸出內容為:
這
是一
張
在
海灘上拍攝的照片
。照片中有一位
女士和一只狗
。女士坐在沙灘
上,微笑著
與狗互動。
狗戴著項圈
,似乎在與
女士握手。背景
是大海和天空
,陽光灑在
她們身上,營造
出溫馨的氛圍
。
完整內容為:這是一張在海灘上拍攝的照片。照片中有一位女士和一只狗。女士坐在沙灘上,微笑著與狗互動。狗戴著項圈,似乎在與女士握手。背景是大海和天空,陽光灑在她們身上,營造出溫馨的氛圍。
Java
示例代碼
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
public class Main {
public static void streamCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
// must create mutable map.
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>(){{put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg");}},
new HashMap<String, Object>(){{put("text", "這是什么");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// 若沒有配置環境變量,請用百煉API Key將下行替換為:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-max-latest")
.message(userMessage)
.incrementalOutput(true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(item -> {
try {
System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
返回結果
這
是一
張
在
海灘上拍攝的照片
。照片中,
一位穿著格子
襯衫的女士坐在
沙灘上,與
一只戴著項圈
的金毛犬
互動。背景是
大海和天空,
陽光灑在她們
身上,營造出
溫暖的氛圍。
curl
示例代碼
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen-vl-plus",
"input":{
"messages":[
{
"role": "system",
"content": [
{"text": "You are a helpful assistant."}
]
},
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"text": "這個圖片是哪里?"}
]
}
]
},
"parameters": {
"incremental_output": true
}
}'
返回結果
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"這張"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1278,"output_tokens":1,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"照片"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1278,"output_tokens":2,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}
......
id:10
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"拍打著海岸線以及遠處的地平"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1278,"output_tokens":56,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}
id:11
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"線上有陽光照射過來。"}],"role":"assistant"},"finish_reason":"stop"}]},"usage":{"input_tokens":1278,"output_tokens":63,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}
使用本地文件
您可以參考以下示例代碼,通過OpenAI或者DashScope的方式,調用通義千問VL模型處理本地文件。以下代碼使用的示例圖片為:test.png
OpenAI兼容
Python
示例代碼
from openai import OpenAI
import os
import base64
# base 64 編碼格式
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image = encode_image("test.png")
client = OpenAI(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-vl-max-latest",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
{"type": "text", "text": "這是什么"},
],
}
],
)
print(completion.choices[0].message.content)
返回結果
這是一只飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,擅長在高空翱翔和捕獵。圖片中的鷹展翅高飛,背景是藍天白云,顯得非常壯觀。
HTTP
示例代碼
import os
import base64
import requests
# base 64 編碼格式
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image = encode_image("test.png")
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key = os.getenv("DASHSCOPE_API_KEY")
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
payload = {
"model": "qwen-vl-max-latest",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
{"type": "text", "text": "這是什么"},
],
}
],
}
response = requests.post(
"https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions",
headers=headers,
json=payload,
)
print(response.json()["choices"][0]["message"]["content"])
返回結果
這是一只飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,能夠在高空翱翔并捕獵獵物。圖片中的鷹展翅高飛,背景是藍天白云,顯得非常壯觀。
Node.js
示例代碼
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// 若沒有配置環境變量,請用百煉API Key將下行替換為:apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image = encodeImage("test.png")
async function main() {
const completion = await openai.chat.completions.create({
model: "qwen-vl-max-latest",
messages: [
{"role": "user",
"content": [{"type": "image_url",
"image_url": {"url": `data:image/jpeg;base64,${base64Image}`},},
{"type": "text", "text": "這是什么"}]}]
});
console.log(completion.choices[0].message.content);
}
main();
返回結果
這是一只飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,能夠在高空翱翔并捕獵獵物。圖片中的鷹展翅高飛,背景是藍天白云,顯得非常壯觀。
DashScope
請您參考下表,結合您的使用方式與操作系統進行文件路徑的創建。
系統 | SDK | 傳入的文件路徑 | 示例 |
Linux或macOS系統 | Python SDK | file://{文件的絕對路徑} | file:///home/images/test.png |
Java SDK | |||
Windows系統 | Python SDK | file://{文件的絕對路徑} | file://D:/images/test.png |
Java SDK | file:///{文件的絕對路徑} | file:///D:images/test.png |
Python
示例代碼
import os
from dashscope import MultiModalConversation
local_path = "test.png"
image_path = f"file://{local_path}"
messages = [{'role': 'system',
'content': [{'text': 'You are a helpful assistant.'}]},
{'role':'user',
'content': [{'image': image_path},
{'text': '這是什么'}]}]
response = MultiModalConversation.call(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-max-latest',
messages=messages)
print(response["output"]["choices"][0]["message"].content[0]["text"])
返回結果
這是一只飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,能夠在高空翱翔并捕獵獵物。圖片中的鷹展翅高飛,背景是藍天白云,顯得非常壯觀。
Java
示例代碼
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
public class Main {
public static void callWithLocalFile(String localPath)
throws ApiException, NoApiKeyException, UploadFileException {
String filePath = "file://"+localPath;
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
new HashMap<String, Object>(){{put("text", "這是什么?");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// 若沒有配置環境變量,請用百煉API Key將下行替換為:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-max-latest")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}
public static void main(String[] args) {
try {
callWithLocalFile("test.png");
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
返回結果
這是一只飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,擅長在高空翱翔和捕獵。圖片中的鷹展翅高飛,背景是藍天白云,顯得非常壯觀。
視頻理解
qwen-vl-max-latest
、qwen-vl-max-0809
、qwen-vl-plus-latest
、qwen-vl-plus-0809
模型支持對視頻內容的理解功能,您可以通過圖片列表形式傳入。
最少傳入4張圖片,最多可傳入768張圖片,暫時只支持通過URL形式傳入圖片。
如果您需要直接輸入視頻文件,請提交工單進行申請以及獲取使用方式。
OpenAI兼容
您可以通過OpenAI SDK或HTTP方式使用視頻理解功能。
Python
示例代碼
import os
from openai import OpenAI
client = OpenAI(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-vl-max-latest",
messages=[{"role": "user","content": [
{"type": "video","video": [
"https://img.alicdn.com/imgextra/i3/O1CN01K3SgGo1eqmlUgeE9b_!!6000000003923-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01BjZvwg1Y23CF5qIRB_!!6000000003000-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01Ib0clU27vTgBdbVLQ_!!6000000007859-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i1/O1CN01aygPLW1s3EXCdSN4X_!!6000000005710-0-tps-3840-2160.jpg"]},
{"type": "text","text": "描述這個視頻的具體過程"},
]}]
)
print(completion.choices[0].message.content)
返回結果
這個視頻展示了一場足球比賽的瞬間。具體過程如下:
1. **背景**:視頻是在一個大型體育場拍攝的,觀眾席上坐滿了觀眾,燈光明亮,氣氛熱烈。
2. **球員**:場上有兩隊球員,一隊穿著紅色球衣,另一隊穿著藍色球衣。守門員穿著綠色球衣。
3. **動作**:一名穿著紅色球衣的球員在禁區內準備射門。守門員試圖撲救,但未能成功。
4. **進球**:紅色球衣的球員成功將球踢入球門,球網被球擊中,顯示出進球的瞬間。
整個過程充滿了緊張和激動,展示了足球比賽中的精彩瞬間。
curl
示例代碼
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-vl-max-latest",
"messages": [{"role": "user",
"content": [{"type": "video",
"video": ["https://img.alicdn.com/imgextra/i3/O1CN01K3SgGo1eqmlUgeE9b_!!6000000003923-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01BjZvwg1Y23CF5qIRB_!!6000000003000-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01Ib0clU27vTgBdbVLQ_!!6000000007859-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i1/O1CN01aygPLW1s3EXCdSN4X_!!6000000005710-0-tps-3840-2160.jpg"
]},
{"type": "text",
"text": "描述這個視頻的具體過程"}]}]
}'
返回結果
{
"choices": [
{
"message": {
"content": "這個視頻展示了一場足球比賽的瞬間。具體過程如下:\n\n1. **背景**:視頻是在一個大型體育場內拍攝的,觀眾席上坐滿了觀眾,燈光明亮,氣氛熱烈。\n2. **球員**:場上有兩支隊伍,一支穿著紅色球衣,另一支穿著藍色球衣。守門員穿著綠色球衣。\n3. **動作**:一名身穿紅色球衣的球員在禁區內接到傳球,準備射門。守門員迅速反應,向球的方向撲去,試圖阻止進球。\n4. **射門**:紅色球衣的球員果斷射門,球飛向球門。\n5. **撲救**:守門員盡力撲救,但球還是飛進了球門,球網被球撞得晃動。\n\n整個過程充滿了緊張和刺激,展示了足球比賽中的精彩瞬間。",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 1466,
"completion_tokens": 181,
"total_tokens": 1647
},
"created": 1728710375,
"system_fingerprint": null,
"model": "qwen-vl-max-latest",
"id": "chatcmpl-73b2b130-b29a-99db-9eda-4cd45f27d4e0"
}
Node.js
示例代碼
// 確保之前在 package.json 中指定了 "type": "module"
import OpenAI from "openai";
const openai = new OpenAI({
// 若沒有配置環境變量,請用百煉API Key將下行替換為:apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "qwen-vl-max-latest",
messages: [{
role: "user",
content: [
{
type: "video",
video: [
"https://img.alicdn.com/imgextra/i3/O1CN01K3SgGo1eqmlUgeE9b_!!6000000003923-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01BjZvwg1Y23CF5qIRB_!!6000000003000-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01Ib0clU27vTgBdbVLQ_!!6000000007859-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i1/O1CN01aygPLW1s3EXCdSN4X_!!6000000005710-0-tps-3840-2160.jpg"
]
},
{
type: "text",
text: "描述這個視頻的具體過程"
}
]
}]
});
console.log(response.choices[0].message.content);
}
main();
返回結果
這個視頻展示了一場足球比賽的瞬間。具體過程如下:
1. **背景**:視頻是在一個大型體育場拍攝的,觀眾席上坐滿了觀眾,燈光明亮,氣氛熱烈。
2. **球員**:場上有兩隊球員,一隊穿著紅色球衣,另一隊穿著藍色球衣。守門員穿著綠色球衣。
3. **動作**:一名穿著紅色球衣的球員在禁區內準備射門。他將球踢向球門。
4. **守門員**:守門員看到球飛來,迅速做出反應,向球的方向撲去,試圖將球撲出。
5. **進球**:盡管守門員盡力撲救,但球還是飛進了球門,網子被球撞得晃動。
這個視頻捕捉到了足球比賽中進球的精彩瞬間,展示了球員的技巧和守門員的反應。
DashScope
您可以通過DashScope SDK或HTTP方式使用視頻理解功能。
Python
示例代碼
import os
# dashscope版本需要不低于1.20.10
import dashscope
messages = [{"role": "user",
"content": [
{"video":["https://img.alicdn.com/imgextra/i3/O1CN01K3SgGo1eqmlUgeE9b_!!6000000003923-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01BjZvwg1Y23CF5qIRB_!!6000000003000-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01Ib0clU27vTgBdbVLQ_!!6000000007859-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i1/O1CN01aygPLW1s3EXCdSN4X_!!6000000005710-0-tps-3840-2160.jpg"]},
{"text": "描述這個視頻的具體過程"}]}]
response = dashscope.MultiModalConversation.call(
# 若沒有配置環境變量,請用百煉API Key將下行替換為:api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen-vl-max-latest',
messages=messages
)
print(response["output"]["choices"][0]["message"].content[0]["text"])
返回結果
這個視頻展示了一場足球比賽的瞬間。具體過程如下:
1. **背景**:視頻是在一個大型體育場內拍攝的,觀眾席上坐滿了觀眾,燈光明亮,氣氛熱烈。
2. **球員**:場上有兩支隊伍,一支穿著紅色球衣,另一支穿著藍色球衣。守門員穿著綠色球衣。
3. **動作**:一名身穿紅色球衣的球員在禁區內準備射門。他將球踢向球門。
4. **守門員**:守門員看到球飛來,迅速做出反應,向球的方向撲去,試圖將球撲出。
5. **進球**:盡管守門員盡力撲救,但球還是飛進了球門,網子被球撞得晃動。
這個視頻捕捉到了足球比賽中進球的精彩瞬間,展示了球員的技巧和守門員的反應。
curl
示例代碼
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-vl-max-latest",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"video": [
"https://img.alicdn.com/imgextra/i3/O1CN01K3SgGo1eqmlUgeE9b_!!6000000003923-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01BjZvwg1Y23CF5qIRB_!!6000000003000-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01Ib0clU27vTgBdbVLQ_!!6000000007859-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i1/O1CN01aygPLW1s3EXCdSN4X_!!6000000005710-0-tps-3840-2160.jpg"
]
},
{
"text": "描述這個視頻的具體過程"
}
]
}
]
}
}'
返回結果
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "這個視頻展示了一場足球比賽的瞬間。具體過程如下:\n\n1. **背景**:視頻是在一個大型體育場拍攝的,觀眾席上坐滿了觀眾,燈光明亮,氣氛熱烈。\n2. **球員**:場上有兩隊球員,一隊穿著紅色球衣,另一隊穿著藍色球衣。守門員穿著綠色球衣。\n3. **動作**:一名穿著紅色球衣的球員在禁區內接到了隊友的傳球,準備射門。\n4. **射門**:紅色球員用右腳大力射門,球飛向球門。\n5. **撲救**:守門員迅速反應,向球的方向撲去,試圖將球撲出。\n6. **進球**:盡管守門員盡力撲救,但球還是飛進了球門,守門員未能阻止進球。\n\n整個過程充滿了緊張和激動,展示了足球比賽中的精彩瞬間。"
}
]
}
}
]
},
"usage": {
"output_tokens": 191,
"video_tokens": 1440,
"input_tokens": 1466
},
"request_id": "c728d1e0-79ad-9076-8589-7f072e96bccf"
}
Java
示例代碼
// DashScope SDK版本需要不低于2.16.7
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
public class Main {
private static final String MODEL_NAME = "qwen-vl-max-latest";
public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage systemMessage = MultiModalMessage.builder()
.role(Role.SYSTEM.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "You are a helpful assistant.")))
.build();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("video", Arrays.asList("https://img.alicdn.com/imgextra/i3/O1CN01K3SgGo1eqmlUgeE9b_!!6000000003923-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01BjZvwg1Y23CF5qIRB_!!6000000003000-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i4/O1CN01Ib0clU27vTgBdbVLQ_!!6000000007859-0-tps-3840-2160.jpg",
"https://img.alicdn.com/imgextra/i1/O1CN01aygPLW1s3EXCdSN4X_!!6000000005710-0-tps-3840-2160.jpg")),
Collections.singletonMap("text", "描述這個視頻的具體過程")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model(MODEL_NAME).message(systemMessage)
.message(userMessage).build();
MultiModalConversationResult result = conv.call(param);
System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
videoImageListSample();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
返回結果
這個視頻展示了一場足球比賽的瞬間。具體過程如下:
1. **背景**:視頻是在一個大型體育場內拍攝的,觀眾席上坐滿了觀眾,燈光明亮,氣氛熱烈。
2. **球員**:場上有兩隊球員,一隊穿著紅色球衣,另一隊穿著藍色球衣。守門員穿著綠色球衣。
3. **動作**:一名身穿紅色球衣的球員在禁區內準備射門。他將球踢向球門。
4. **守門員**:守門員看到球飛來,迅速做出反應,向球的方向撲去,試圖將球撲出。
5. **進球**:盡管守門員盡力撲救,但球還是飛進了球門,網子被球撞得晃動。
這個視頻捕捉到了足球比賽中進球的精彩瞬間,展示了球員的技巧和守門員的反應。
支持的圖片
圖片格式 | Content Type | 文件擴展名 |
BMP | image/bmp | .bmp |
DIB | image/bmp | .dib |
ICNS | image/icns | .icns |
ICO | image/x-icon | .ico |
JPEG | image/jpeg | .jfif, .jpe, .jpeg, .jpg |
JPEG2000 | image/jp2 | .j2c, .j2k, .jp2, .jpc, .jpf, .jpx |
PNG | image/png | .apng, .png |
SGI | image/sgi | .bw, .rgb, .rgba, .sgi |
TIFF | image/tiff | .tif, .tiff |
WEBP | image/webp | .webp |
對于輸入的圖片有以下限制:
圖片文件大小不超過10MB。
輸入
qwen-vl-max
、qwen-vl-max-latest
、qwen-vl-max-0809
、qwen-vl-plus-latest
與qwen-vl-plus-0809
模型的單張圖片,總的像素數不超過 12M,可以支持標準的 4K 圖片;輸入qwen-vl-max-0201
與qwen-vl-plus
模型的單張圖片,總的像素數不超過 1048576,相當于一張寬高均為 1024 的圖片總像素數。
模型列表、計費和免費額度
商業版模型
通義千問VL模型按輸入和輸出的總Token數進行計費。
圖像轉換為Token的規則:512x512像素的圖像約等于334個Token,其他分辨率圖像按比例換算;最小單位是28x28像素,即每28x28像素對應一個Token,如果圖像的長或寬不是28的整數倍,則向上取整至28的整數倍;一張圖最少4個Token。
模型名稱 | 版本 | 上下文長度 | 最大輸入 | 最大輸出 | 輸入輸出單價 | 免費額度 |
(Token數) | (每千Token) | |||||
qwen-vl-max 相比qwen-vl-plus再次提升視覺推理和指令遵循能力,在更多復雜任務中提供最佳性能。 | 穩定版 | 32,000 | 30,000 單圖最大16384 | 2,000 | 0.02元 | 100萬Token 有效期:百煉開通后30天內 2024年9月19日0點后開通百煉的用戶,免費額度有效期為180天。 |
qwen-vl-max-latest | 最新版 | |||||
qwen-vl-max-2024-08-09 又稱qwen-vl-max-0809 此版本擴展上下文至32k,增強圖像理解能力,能更好地識別圖片中的多語種和手寫體。 | 快照版 | |||||
qwen-vl-max-2024-02-01 又稱qwen-vl-max-0201 | 快照版 | 8,000 | 6,000 單圖最大1280 | |||
qwen-vl-plus-latest | 最新版 | 32,000 | 30,000 單圖最大16384 | 0.008元 | ||
qwen-vl-plus-2024-08-09 又稱qwen-vl-plus-0809 | 快照版 | |||||
qwen-vl-plus 大幅提升細節識別和文字識別能力,支持超百萬像素分辨率和任意寬高比的圖像。在廣泛的視覺任務中提供卓越性能。 | 穩定版 | 8,000 | 6,000 單圖最大1280 |
開源版模型
模型名稱 | 上下文長度 | 最大輸入 | 最大輸出 | 輸入成本 | 輸出成本 | 免費額度 |
(Token數) | (每千Token) | |||||
qwen-vl-v1 | 8,000 | 6,000 | 1,500 | 目前僅供免費體驗。 免費額度用完后不可調用,敬請關注后續動態。 | 10萬Token 有效期:百煉開通后180天內 | |
qwen-vl-chat-v1 |
常見問題
我可以刪除已上傳的圖片嗎?
答:在模型完成文本生成后,百煉服務器會自動將圖片刪除,無需手動刪除。
通義千問VL是否支持理解視頻內容?
答:支持,如需使用該功能,請提交工單進行申請以及獲取使用方式。
應用示例
輸入示例 | 傳統OCR工具輸出示例 | 視覺語言模型輸出示例 |
傳統OCR工具提取的內容不完整且格式混亂。 | 大模型不僅能精準提取圖片中的文字,還能以規整的形式輸出。 | |
|
輸入示例 | 輸出示例 |
圖片里是什么? |
|
| |
圖片想表達什么? |
|
|
輸入示例 | 輸出示例 |
解這道題 | |
| |
把圖片里的文字翻譯成中文 | |
|
輸入示例 | 輸出示例 |
提示詞:轉換成HTML、CSS和基于JavaScript的界面。 | 要將這個界面轉換成HTML、CSS和JavaScript,你可以按照以下步驟進行:
這個HTML、CSS和JavaScript代碼將創建一個類似于圖片中顯示的界面。你可以根據需要調整樣式和功能。 基于大模型提供的代碼,生成前端界面如下: |
API參考
關于通義千問VL模型的輸入輸出參數,請參考通義千問。