国内精品久久久久影视,久99久精品视频+免费播放,99久久国产精品免费热

阿里云DashScope靈積模型服務(wù)通過標(biāo)準(zhǔn)化的API提供模型推理、模型微調(diào)訓(xùn)練等多種模型服務(wù)，本文通過調(diào)用DashScope中的通用文本向量模型，將業(yè)務(wù)數(shù)據(jù)向量化并在阿里云Elasticsearch（簡稱ES）中使用kNN實現(xiàn)檢索。

前提條件

創(chuàng)建阿里云ES實例，本文以8.9版本為例。具體操作，請參見創(chuàng)建阿里云Elasticsearch實例。
創(chuàng)建ECS實例。具體操作，請參見自定義購買實例。
已開通靈積服務(wù)并獲取API-KEY。具體操作，請參見：API-KEY的獲取與配置。
已開通DashVector向量檢索服務(wù)，并獲得API-KEY。具體操作，請參見：API-KEY管理。
已安裝最新版SDK：安裝DashScope SDK。

操作步驟

在ECS中執(zhí)行如下命令，設(shè)置靈積的API-KEY。
```
export DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY
```
下載測試數(shù)據(jù)源，請單擊extracted_data.json。

在ES的Kibana中執(zhí)行如下命令創(chuàng)建索引。

PUT lingji_test
{
  "settings": {
    "index": {
          "number_of_shards": 3,
          "number_of_replicas": 1
      }
  }, 
  "mappings": {
    "properties": {
      "context_vector": {
        "type": "dense_vector",
        "dims": 1000,
        "index": true,
        "similarity": "l2_norm"
      },
      "context": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

context_vector向量部分參數(shù)說明，更詳細說明，請參見dense-vector。

參數(shù)	說明
type	用來存儲浮點數(shù)的密集向量。需要設(shè)置為dense_vector。
dims	向量的維度大小。當(dāng)index為true時，不能超過1024；當(dāng)index為false時，不能超過2048 。
index	是否為kNN生成新的索引。實現(xiàn)近似kNN查詢時，需要將index設(shè)置為true，默認為false。
similarity	文檔間的相似度算法。index為true時，此值必須設(shè)置。可選值： l2_norm：計算向量間歐式距離。_score公式：`1 / (1 + l2_norm(query, vector)^2)`。 dot_product：計算兩個向量點積，_score計算依賴element_type參數(shù)值。 element_type為float，所有向量需歸一化為單位長度。_score公式：`(1 + dot_product(query, vector)) / 2`。 element_type為byte，所有向量需要有相同的長度，否則結(jié)果不準(zhǔn)確。_score公式：`0.5 + (dot_product(query, vector) / (32768 * dims))`。 cosine：計算向量間的余弦相似度。最有效的cosine使用方式是將所有向量歸一化為單位長度代替dot_product。_score公式：`(1 + cosine(query, vector)) / 2`。重要余弦相似度算法不允許向量數(shù)據(jù)為0。

導(dǎo)入數(shù)據(jù)。

在ECS的Python3環(huán)境中調(diào)用DashScope SDK使用通用文本向量模型（通義實驗室基于LLM底座的多語言文本統(tǒng)一向量模型），測試腳本示例如下。更多信息，請參見通用文本向量。

from elasticsearch import Elasticsearch
from http import HTTPStatus
import dashscope
import certifi
import json

HOST = 'http://es-cn-g4t3l1ke60002****.public.elasticsearch.aliyuncs.com:9200'
USERNAME = 'elastic'
PASSWORD = '******'
INDEX = "lingji_test"
FILE_PATH = '/root/extracted_data.json'

# 連接阿里云elasticsearch
es = Elasticsearch(HOST, basic_auth = (USERNAME, PASSWORD))

# 靈積文本轉(zhuǎn)向量，模型為text_embedding_v1
def embed_with_str(text):
    resp = dashscope.TextEmbedding.call(
        model=dashscope.TextEmbedding.Models.text_embedding_v1,
        input=text)
    vector_data = resp["output"]["embeddings"][0]["embedding"]
    return vector_data

# 加載本地文件路徑
with open(FILE_PATH, 'r', encoding='utf-8') as file:
    data = json.load(file)

# 上傳文檔到elasticsearch
for doc in data:
    # 源數(shù)據(jù)中需要轉(zhuǎn)成向量的字段數(shù)據(jù)，以實際為準(zhǔn)
    doc["context_vector"] = embed_with_str(doc["context"])
    
    response = es.index(index=INDEX, document=doc)
    print(response)

部分參數(shù)說明：

參數(shù)	說明
HOST	阿里云ES實例的域名和端口。示例：http://es-cn-xxxxxx.public.elasticsearch.aliyuncs.com:9200。
USERNAME	阿里云ES實例的用戶名。
PASSWORD	阿里云ES實例的密碼。
INDEX	創(chuàng)建的索引名。
FILE_PATH	源數(shù)據(jù)在ECS中的路徑。

在ES的Kibana中執(zhí)行以下命令查詢索引中的數(shù)據(jù)。
```
GET lingji_test/_search
{
 "_source": ["context","title"], 
 "knn": {

 "query_vector":[],
 "k": 5,
 "num_candidates": 10
 }
}
```
根據(jù)查詢結(jié)果可知實現(xiàn)了文本數(shù)據(jù)向量化。

日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

通過阿里云靈積服務(wù)實現(xiàn)文本數(shù)據(jù)向量化

前提條件

操作步驟