為了確保服務的持續可用和安全,避免潛在的證書泄露或密鑰破解帶來的安全風險,在專有版集群中,建議您根據系統提醒及時輪轉Master節點的etcd證書。本文介紹如何輪轉ACK專有版集群Master節點的etcd證書。
背景信息
ACK專有版集群支持遷移至ACK Pro版集群,您可以選擇將集群遷移到ACK Pro版集群,ACK Pro集群etcd和Kubernetes管控面證書由阿里云托管,原ACK專有版集群遷移完成后,無需進行以下輪轉操作。遷移具體操作,請參見熱遷移ACK專有集群至ACK集群Pro版。
注意事項
容器服務 Kubernetes 版 ACK(Container Service for Kubernetes)會在etcd證書過期前兩個月發送站內和短信過期提醒,并在集群列表頁面顯示更新ETCD證書。
輪轉過程中,系統將會逐個節點(one by one)重啟集群Master節點的apiserver、etcd、kcm和kubelet等控制平面組件,其間對APIServer的長連接請求會發生斷連,請在業務低峰期操作。輪轉流程預計在30分鐘內結束。
如果您修改過專業版集群etcd或Kubernetes的默認配置文件目錄,請建立軟鏈接到原有目錄后再進行輪轉,否則會導致輪轉失敗。
如果您通過手工方式輪轉完成后,容器服務控制臺依舊會顯示更新ETCD證書的過期提示,請您提交工單,通過后臺配置取消更新提示。
輪轉流程中,如遇任何問題導致輪轉失敗,請提交工單處理。
場景一:etcd證書未過期時輪轉方案
當etcd證書即將過期,提示需要更新時,您可以通過以下兩種方式進行etcd證書輪轉。
控制臺自動化方式輪轉etcd證書
登錄容器服務管理控制臺,在左側導航欄選擇集群。
單擊etcd證書即將過期集群右側的更新ETCD證書,進入更新證書頁面,然后單擊更新證書。
說明若集群證書即將在兩個月后過期,在對應集群右側才會出現更新ETCD證書。
在提示對話框,單擊確定。
證書更新成功后,您可以看到以下內容:
在更新證書頁面,顯示更新成功。
在集群列表頁面,目標集群右側無更新ETCD證書提示。
手工方式輪轉etcd證書
使用場景
專有版集群etcd證書即將過期。
無法通過模板部署的方式自動化輪轉etcd證書。
無法通過控制臺操作更新etcd證書。
當出現以上場景時,集群管理員可以登錄任意Master節點,通過操作如下腳本來手工輪轉etcd證書。
以下腳本使用需要root用戶執行。
確認集群Master節點之間配置了root用戶的免密登錄。
在Master上通過SSH方式登錄其他任意Master節點,如果提示輸入密碼,請您參考如下方式配置Master節點之間的免密登錄。
# 1. 生成密鑰。如果您的節點上已存在對應的登錄密鑰,可以跳過該步驟。 ssh-keygen -t rsa # 2. 使用ssh-copy-id工具傳輸公鑰到其他所有Master節點,$(internal-ip)為其他Master節點的內網IP。 ssh-copy-id -i ~/.ssh/id_rsa.pub $(internal-ip)
說明如果您未執行免密登錄相關操作,在運行腳本時,則需要輸入root用戶密碼。
分別復制以下腳本內容,保存并命名為restart-apiserver.sh和rotate-etcd.sh,然后將兩者保存到同一個文件夾下。
說明rotate-etcd.sh腳本會嘗試通過訪問節點的元數據服務獲取Region信息并從該Region就近拉取輪轉鏡像,您也可以在執行該腳本時,輸入參數
--region xxxx
指定Region信息。#! /bin/bash declare -x cmd k8s::wait_apiserver_ready() { set -e for i in $(seq 600); do if kubectl cluster-info &>/dev/null; then return 0 else echo "wait apiserver to be ready, retry ${i}th after 1s" sleep 1 fi done echo "failed to wait apiserver to be ready" return 1 } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function restart_apiserver() { # 判斷容器運行時 if [[ $cmd == "docker" ]]; then # 使用docker命令重啟kube-apiserver Pod container_id=$(docker ps | grep kube-apiserver | awk '{print $1}' | head -n 1 ) if [[ -n $container_id ]]; then echo "Restarting kube-apiserver pod using Docker: $container_id" docker restart "${container_id}" else echo "kube-apiserver pod not found." fi elif [[ $cmd == "crictl" ]]; then # 使用crictl命令重啟kube-apiserver Pod pod_id=$(crictl pods --label component=kube-apiserver --latest --state=ready | grep -v "POD ID" | head -n 1 | awk '{print $1}') if [[ -n $pod_id ]]; then echo "Restarting kube-apiserver pod using crictl: $pod_id" crictl stopp "${pod_id}" else echo "kube-apiserver pod not found." fi else echo "Unsupported container runtime: $cmd" fi k8s::wait_apiserver_ready } check_container_runtime restart_apiserver echo "API Server restarted"
#!/bin/bash set -eo pipefail declare -x TARGET_TEAR declare -x cmd dir=/tmp/etcdcert KUBE_CERT_PATH=/etc/kubernetes/pki ETCD_CERT_DIR=/var/lib/etcd/cert ETCD_HOSTS="" currentDir="$PWD" # 更新K8s證書,根據集群Region替換下面cn-hangzhou的默認鏡像地域。 function get_etcdhosts() { name1=$(find "$ETCD_CERT_DIR" -name '*-name-1.pem' -exec basename {} \; | sed 's/-name-1.pem//g') name2=$(find "$ETCD_CERT_DIR" -name '*-name-2.pem' -exec basename {} \; | sed 's/-name-2.pem//g') name3=$(find "$ETCD_CERT_DIR" -name '*-name-3.pem' -exec basename {} \; | sed 's/-name-3.pem//g') echo "hosts: $name1 $name2 $name3" ETCD_HOSTS="$name1 $name2 $name3" } function gencerts() { echo "generate ssl cert ..." rm -rf $dir mkdir -p "$dir" local hosts hosts=$(echo $ETCD_HOSTS | tr -s " " ",") echo "-----generate ca" echo '{"CN":"CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/ca-config.json echo "-----generate etcdserver" export ADDRESS=$hosts,ext1.example.com,coreos1.local,coreos1,127.0.0.1 export NAME=etcd-server echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME export ADDRESS= export NAME=etcd-client echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME # gen peer-ca echo "-----generate peer certificates" echo '{"CN":"Peer-CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/peer-ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/peer-ca-config.json i=0 for host in $ETCD_HOSTS; do ((i = i + 1)) export MEMBER=${host}-name-$i echo '{"CN":"'${MEMBER}'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=$dir/peer-ca.pem -ca-key=$dir/peer-ca-key.pem -config=$dir/peer-ca-config.json -profile=peer \ -hostname="$hosts,${MEMBER}.local,${MEMBER}" - | cfssljson -bare $dir/${MEMBER} done # 制作bundle ca cat $KUBE_CERT_PATH/etcd/ca.pem >>$dir/bundle_ca.pem cat $ETCD_CERT_DIR/ca.pem >>$dir/bundle_ca.pem cat $dir/ca.pem >>$dir/bundle_ca.pem # 制作bundle peer-ca cat $ETCD_CERT_DIR/peer-ca.pem >$dir/bundle_peer-ca.pem cat $dir/peer-ca.pem >>$dir/bundle_peer-ca.pem current_year=$(date +%Y) TARGET_TEAR=$((TARGET_TEAR + 50)) # chown chown -R etcd:etcd $dir chmod 0644 $dir/* } function etcd_client_urls() { local etcd_hosts=() for ip in "${ETCD_HOSTS[@]}"; do etcd_hosts+=("https://$ip:2379") done local result=$( IFS=',' echo "${etcd_hosts[*]}" ) echo "$result" } function check_cert_files_exist() { REQUIRED_CERTS=("ca.pem" "etcd-server-key.pem" "etcd-server.pem" "peer-ca-key.pem" "peer-ca.pem") if [ ! -d "$ETCD_CERT_DIR" ]; then echo "Error: Directory $ETCD_CERT_DIR does not exist" exit 1 fi for cert_file in "${REQUIRED_CERTS[@]}"; do if [ ! -f "$ETCD_CERT_DIR/$cert_file" ]; then echo "Error: File $ETCD_CERT_DIR/$cert_file does not exist" exit 1 fi done echo "All required certificate files exist" } function check_etcd_cluster_ready() { local etcd_endpoints=() for ip in $ETCD_HOSTS; do etcd_endpoints+=("https://$ip:2379") done ready=0 for i in $(seq 300); do for idx in "${!etcd_endpoints[@]}"; do endpoint="${etcd_endpoints[$idx]}" local health_output=$(ETCDCTL_API=3 etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints "$endpoint" endpoint health --command-timeout=1s 2>&1) if echo "$health_output" | grep -q "successfully committed proposal"; then unset 'etcd_endpoints[$idx]' else echo "etcdctl result: ${health_output}" echo "$endpoint is not ready" fi done # shellcheck disable=SC2199 if [[ -z "${etcd_endpoints[@]}" ]]; then echo "ETCD cluster is ready" ready=1 break fi printf "wait etcd cluster to be ready, retry %d after 1s,total 300s \n" "$i" done } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function rotate_etcd_ca() { for ADDR in $ETCD_HOSTS; do echo "update etcd CA on node $ADDR" scp -o StrictHostKeyChecking=no $dir/bundle_ca.pem root@$ADDR:$ETCD_CERT_DIR/ca.pem scp -o StrictHostKeyChecking=no $dir/bundle_ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/etcd-client.pem root@$ADDR:$KUBE_CERT_PATH/etcd/etcd-client.pem scp -o StrictHostKeyChecking=no $dir/etcd-client-key.pem root@$ADDR:$KUBE_CERT_PATH/etcd/etcd-client-key.pem scp -o StrictHostKeyChecking=no $dir/bundle_peer-ca.pem root@$ADDR:$ETCD_CERT_DIR/peer-ca.pem ssh -o StrictHostKeyChecking=no root@$ADDR chown -R etcd:etcd $ETCD_CERT_DIR ssh -o StrictHostKeyChecking=no root@$ADDR chmod 0644 $ETCD_CERT_DIR/* echo "restart etcd on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd echo "etcd on node $ADDR restarted" # 校驗etcd是否啟動成功,校驗集群是否正常 echo "check connectivity for etcd nodes" check_etcd_cluster_ready echo "end to check connectivity for etcd nodes" restart_one_apiserver $ADDR echo "apiserver on node $ADDR restarted" done } function rotate_etcd_certs() { for ADDR in $ETCD_HOSTS; do echo "update etcd peer certs on node $ADDR" scp -o StrictHostKeyChecking=no \ $dir/{peer-ca-key.pem,etcd-server.pem,etcd-server-key.pem,etcd-client.pem,etcd-client-key.pem,ca-key.pem,*-name*.pem} root@$ADDR:$ETCD_CERT_DIR/ ssh -o StrictHostKeyChecking=no root@$ADDR chown -R etcd:etcd $ETCD_CERT_DIR ssh -o StrictHostKeyChecking=no root@$ADDR \ chmod 0400 $ETCD_CERT_DIR/{peer-ca-key.pem,etcd-server.pem,etcd-server-key.pem,etcd-client.pem,etcd-client-key.pem,ca-key.pem,*-name*.pem} echo "restart etcd on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd echo "etcd on node $ADDR restarted" echo "check connectivity for etcd nodes" check_etcd_cluster_ready echo "end to check connectivity for etcd nodes" done } function recover_etcd_ca() { # Update certs on etcd nodes. for ADDR in $ETCD_HOSTS; do echo "replace etcd CA on node $ADDR" scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$ETCD_CERT_DIR/ca.pem scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/peer-ca.pem root@$ADDR:$ETCD_CERT_DIR/peer-ca.pem ssh -o StrictHostKeyChecking=no root@$ADDR chown -R etcd:etcd $ETCD_CERT_DIR echo "restart apiserver on node $ADDR" restart_one_apiserver $ADDR echo "apiserver on node $ADDR restarted" echo "restart etcd on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd echo "etcd on node $ADDR restarted" echo "check connectivity for etcd nodes" check_etcd_cluster_ready echo "end to check connectivity for etcd nodes" sleep 5 done } function recover_etcd_client_ca() { # Update certs on etcd nodes. for ADDR in $ETCD_HOSTS; do echo "replace etcd CA on node $ADDR" scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem done } function renew_k8s_certs() { # try to get region id from meta-server if not given in parameter META_REGION=$(get_region_id) if [[ -z "$REGION" ]]; then if [[ -z "$META_REGION" ]]; then echo "failed to get region id from ECS meta-server, please enter the region parameter." return 1 fi REGION=$META_REGION elif [[ -n "${META_REGION}" && "$REGION" != "$META_REGION" ]] ; then echo "switch to use local region id $META_REGION" REGION=$META_REGION fi # Update certs for k8s components and kubeconfig for ADDR in $ETCD_HOSTS; do echo "renew k8s components cert on node $ADDR" #compatible containerd set +e IMAGE="registry.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" if is_vpc; then IMAGE="registry-vpc.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" fi echo "will pull rotate image $IMAGE" ssh -o StrictHostKeyChecking=no root@$ADDR docker run --privileged=true -v /:/alicoud-k8s-host --pid host --net host \ $IMAGE /renew/upgrade-k8s.sh --role master ssh -o StrictHostKeyChecking=no root@$ADDR ctr image pull $IMAGE ssh -o StrictHostKeyChecking=no root@$ADDR ctr run --privileged=true --mount type=bind,src=/,dst=/alicoud-k8s-host,options=rbind:rw \ --net-host $IMAGE cert-rotate /renew/upgrade-k8s.sh --role master set -e echo "finished renew k8s components cert on $ADDR" done } function get_region_id() { set +e; # close error out local path=100.100.100.200/latest/meta-data/region-id for (( i=0; i<3; i++)); do response=$(curl --retry 1 --retry-delay 5 -sSL $path) if [[ $? -gt 0 || "x$response" == "x" ]]; then sleep 2; continue fi if echo "$response"|grep -E "<title>.*</title>" >/dev/null; then sleep 3; continue fi echo "$response" # return from metadata succeed. set -e; return done set -e # open error out # function will return empty string when failed } function is_vpc() { # Execute the curl command and capture the network-type from ECS meta-server response=$(curl -s http://100.100.100.200/latest/meta-data/network-type) if [ "$response" = "vpc" ]; then return 0 else return 1 fi } function generate_cm() { echo "generate status configmap" cat <<-"EOF" >/tmp/ack-rotate-etcd-ca-cm.yaml.tpl apiVersion: v1 kind: ConfigMap metadata: name: ack-rotate-etcd-status namespace: kube-system data: status: "success" hosts: "$hosts" EOF sed -e "s#\$hosts#$ETCD_HOSTS#" /tmp/ack-rotate-etcd-ca-cm.yaml.tpl | kubectl apply -f - } function restart_one_apiserver() { ADDR=$1 if [[ -z "${ADDR}" ]]; then printf "ADDR is empty,exit." exit 1 fi printf "restart apiserver on node %s\n" "${ADDR}" scp -o StrictHostKeyChecking=no "${currentDir}"/restart-apiserver.sh root@"${ADDR}":/tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" chmod +x /tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" bash /tmp/restart-apiserver.sh } while [[ $# -gt 0 ]] do key="$1" case $key in --region) export REGION=$2 shift ;; *) echo "unknown option [$key]" exit 1 ;; esac shift done get_etcdhosts echo "${ETCD_HOSTS[@]}" check_container_runtime # Update certs on etcd nodes. echo "---restart runtime and kubelet on master nodes---" for ADDR in $ETCD_HOSTS; do if [ "$cmd" == "docker" ]; then echo "restart docker on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart docker fi ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" systemctl restart kubelet done sleep 5 echo "---end to restart runtime and kubelet on master nodes---" echo "---renew k8s components certs---" renew_k8s_certs echo "---end to renew k8s components certs---" echo "---check cert files exist---" check_cert_files_exist echo "---end to check cert files exist---" echo "---check connectivity for etcd nodes---" check_etcd_cluster_ready echo "---end to check connectivity for etcd nodes---" # Update certs on etcd nodes. for ADDR in $ETCD_HOSTS; do scp -o StrictHostKeyChecking=no restart-apiserver.sh root@$ADDR:/tmp/restart-apiserver.sh ssh -o StrictHostKeyChecking=no root@$ADDR chmod +x /tmp/restart-apiserver.sh done gencerts echo "---rotate etcd ca and etcd client ca---" rotate_etcd_ca echo "---end to rotate etcd ca and etcd client ca---" echo "---rotate etcd peer and certs---" rotate_etcd_certs echo "---end to rotate etcd peer and certs---" echo "check etcd cluster ready" check_etcd_cluster_ready echo "---replace etcd ca---" recover_etcd_ca echo "---end to replace etcd ca---" generate_cm echo "etcd CA and certs have succesfully rotated!"
在任意Master節點上運行
bash rotate-etcd.sh
。當看到命令行輸出
etcd CA and certs have successfully rotated!
時,表示所有Master節點上的證書和K8s證書已經輪轉完成。驗證證書是否更新。
cd /var/lib/etcd/cert for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done cd /etc/kubernetes/pki/etcd for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done cd /etc/kubernetes/pki/ for i in `ls | grep crt| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
說明當以上腳本輸出的時間在50年之后,表示輪轉完成。
通過手工方式輪轉成功后,由于容器服務控制面側無法獲取輪轉結果,控制臺集群列表中對應集群仍會顯示更新按鈕,請您提交工單以清除該按鈕。
場景二:etcd證書已過期時輪轉方案
使用場景
etcd證書已過期。
APIServer無法訪問時輪轉etcd證書。
無法通過模板部署的方式自動化輪轉etcd證書。
無法通過控制臺操作更新etcd證書。
當出現以上場景時,集群管理員可以登錄任意Master節點,通過操作如下腳本來手工輪轉etcd證書。
以下腳本使用需要root用戶執行。
確認集群Master節點之間配置了root用戶的免密登錄。
在Master上通過SSH方式登錄其他任意Master節點,如果提示輸入密碼,請您參考如下方式配置Master節點之間的免密登錄。
# 1. 生成密鑰。如果您的節點上已存在對應的登錄密鑰,可以跳過該步驟。 ssh-keygen -t rsa # 2. 使用ssh-copy-id工具傳輸公鑰到其他所有Master節點,$(internal-ip)為其他Master節點的內網IP。 ssh-copy-id -i ~/.ssh/id_rsa.pub $(internal-ip)
說明如果您未執行免密登錄相關操作,在運行腳本時,則需要輸入root用戶密碼。
分別復制以下腳本內容,保存并命名為restart-apiserver.sh和rotate-etcd.sh,然后將兩者保存到同一個文件夾下。
說明rotate-etcd.sh腳本會嘗試通過訪問節點的元數據服務獲取Region信息并從該Region就近拉取輪轉鏡像,您也可以在執行該腳本時,輸入參數
--region xxxx
指定Region信息。#! /bin/bash declare -x cmd k8s::wait_apiserver_ready() { set -e for i in $(seq 600); do if kubectl cluster-info &>/dev/null; then return 0 else echo "wait apiserver to be ready, retry ${i}th after 1s" sleep 1 fi done echo "failed to wait apiserver to be ready" return 1 } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function restart_apiserver() { # 判斷容器運行時 if [[ $cmd == "docker" ]]; then # 使用docker命令重啟kube-apiserver Pod container_id=$(docker ps | grep kube-apiserver | awk '{print $1}' | head -n 1 ) if [[ -n $container_id ]]; then echo "Restarting kube-apiserver pod using Docker: $container_id" docker restart "${container_id}" else echo "kube-apiserver pod not found." fi elif [[ $cmd == "crictl" ]]; then # 使用crictl命令重啟kube-apiserver Pod pod_id=$(crictl pods --label component=kube-apiserver --latest --state=ready | grep -v "POD ID" | head -n 1 | awk '{print $1}') if [[ -n $pod_id ]]; then echo "Restarting kube-apiserver pod using crictl: $pod_id" crictl stopp "${pod_id}" else echo "kube-apiserver pod not found." fi else echo "Unsupported container runtime: $cmd" fi k8s::wait_apiserver_ready } check_container_runtime restart_apiserver echo "API Server restarted"
#!/bin/bash set -eo pipefail declare -x TARGET_TEAR declare -x cmd dir=/tmp/rollback/etcdcert KUBE_CERT_PATH=/etc/kubernetes/pki ETCD_CERT_DIR=/var/lib/etcd/cert ETCD_HOSTS="" currentDir="$PWD" # 更新K8s證書,根據集群Region替換下面cn-hangzhou的默認鏡像地域。 function get_etcdhosts() { name1=$(find "$ETCD_CERT_DIR" -name '*-name-1.pem' -exec basename {} \; | sed 's/-name-1.pem//g') name2=$(find "$ETCD_CERT_DIR" -name '*-name-2.pem' -exec basename {} \; | sed 's/-name-2.pem//g') name3=$(find "$ETCD_CERT_DIR" -name '*-name-3.pem' -exec basename {} \; | sed 's/-name-3.pem//g') echo "hosts: $name1 $name2 $name3" ETCD_HOSTS="$name1 $name2 $name3" } function gencerts() { echo "generate ssl cert ..." rm -rf $dir mkdir -p "$dir" cd $dir local hosts hosts=$(echo $ETCD_HOSTS | tr -s " " ",") echo "generate ca" echo '{"CN":"CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/ca-config.json echo "generate etcd server certificates" export ADDRESS=$hosts,ext1.example.com,coreos1.local,coreos1,127.0.0.1 export NAME=etcd-server echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME export ADDRESS= export NAME=etcd-client echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME # gen peer-ca echo "generate peer certificates" echo '{"CN":"Peer-CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/peer-ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/peer-ca-config.json i=0 for host in $ETCD_HOSTS; do ((i = i + 1)) export MEMBER=${host}-name-$i echo '{"CN":"'${MEMBER}'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=$dir/peer-ca.pem -ca-key=$dir/peer-ca-key.pem -config=$dir/peer-ca-config.json -profile=peer \ -hostname="$hosts,${MEMBER}.local,${MEMBER}" - | cfssljson -bare $dir/${MEMBER} done # chown chown -R etcd:etcd $dir chmod 0644 $dir/* for ADDR in $ETCD_HOSTS; do printf "sync the certificates of node %s" "${ADDR}" ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" mkdir -p "${dir}" scp -o StrictHostKeyChecking=no "${dir}"/* root@"${ADDR}":/var/lib/etcd/cert/ scp -o StrictHostKeyChecking=no "${dir}"/ca.pem "${dir}"/etcd-client.pem "${dir}"/etcd-client-key.pem root@"${ADDR}":/etc/kubernetes/pki/etcd/ done } function generate_cm() { echo "generate status configmap" cat <<-"EOF" >/tmp/ack-rotate-etcd-ca-cm.yaml.tpl apiVersion: v1 kind: ConfigMap metadata: name: ack-rotate-etcd-status namespace: kube-system data: status: "success" hosts: "$hosts" EOF sed -e "s#\$hosts#$ETCD_HOSTS#" /tmp/ack-rotate-etcd-ca-cm.yaml.tpl | kubectl apply -f - } function rotate_etcd() { for ADDR in $ETCD_HOSTS; do printf "rotate etcd's certificates on node %s\n" "${ADDR}" if [ "$cmd" == "docker" ]; then echo "restart docker on node $ADDR" ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart docker fi ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd done } function rotate_apiserver() { echo "current dir: $currentDir" for ADDR in $ETCD_HOSTS; do printf "restart apiserver on node %s\n" "${ADDR}" scp -o StrictHostKeyChecking=no "${currentDir}"/restart-apiserver.sh root@"${ADDR}":/tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" systemctl restart kubelet ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" chmod +x /tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" bash /tmp/restart-apiserver.sh done } function check_etcd_cluster_ready() { local etcd_endpoints=() for ip in $ETCD_HOSTS; do etcd_endpoints+=("https://$ip:2379") done for i in $(seq 300); do for idx in "${!etcd_endpoints[@]}"; do endpoint="${etcd_endpoints[$idx]}" local health_output=$(ETCDCTL_API=3 etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints "$endpoint" endpoint health --command-timeout=1s 2>&1) if echo "$health_output" | grep -q "successfully committed proposal"; then unset 'etcd_endpoints[$idx]' else echo "etcdctl result: ${health_output}" echo "$endpoint is not ready" fi done # shellcheck disable=SC2199 if [[ -z "${etcd_endpoints[@]}" ]]; then echo "ETCD cluster is ready" break fi sleep 1 printf "wait etcd cluster to be ready, retry %d after 1s,total 300s \n" "$i" done } function get_region_id() { set +e; # close error out local path=100.100.100.200/latest/meta-data/region-id for (( i=0; i<3; i++)); do response=$(curl --retry 1 --retry-delay 5 -sSL $path) if [[ $? -gt 0 || "x$response" == "x" ]]; then sleep 2; continue fi if echo "$response"|grep -E "<title>.*</title>" >/dev/null; then sleep 3; continue fi echo "$response" # return from metadata succeed. set -e; return done set -e # open error out # function will return empty string when failed } function is_vpc() { # Execute the curl command and capture the network-type from ECS meta-server response=$(curl -s http://100.100.100.200/latest/meta-data/network-type) if [ "$response" = "vpc" ]; then return 0 else return 1 fi } function renew_k8s_certs() { # try to get region id from meta-server if not given in parameter META_REGION=$(get_region_id) if [[ -z "$REGION" ]]; then if [[ -z "$META_REGION" ]]; then echo "failed to get region id from ECS meta-server, please enter the region parameter." return 1 fi REGION=$META_REGION elif [[ -n "${META_REGION}" && "$REGION" != "$META_REGION" ]] ; then echo "switch to use local region id $META_REGION" REGION=$META_REGION fi # Update certs for k8s components and kubeconfig for ADDR in $ETCD_HOSTS; do echo "renew k8s components cert on node $ADDR" #compatible containerd set +e IMAGE="registry.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" if is_vpc; then IMAGE="registry-vpc.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" fi echo "will pull rotate image $IMAGE" ssh -o StrictHostKeyChecking=no root@$ADDR docker run --privileged=true -v /:/alicoud-k8s-host --pid host --net host \ $IMAGE /renew/upgrade-k8s.sh --role master ssh -o StrictHostKeyChecking=no root@$ADDR ctr image pull $IMAGE ssh -o StrictHostKeyChecking=no root@$ADDR ctr run --privileged=true --mount type=bind,src=/,dst=/alicoud-k8s-host,options=rbind:rw \ --net-host $IMAGE cert-rotate /renew/upgrade-k8s.sh --role master set -e echo "finished renew k8s components cert on $ADDR" done } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } while [[ $# -gt 0 ]] do key="$1" case $key in --region) export REGION=$2 shift ;; *) echo "unknown option [$key]" exit 1 ;; esac shift done get_etcdhosts printf "ETCD_HOSTS: %s\n" "$ETCD_HOSTS" gencerts echo "---generate certificates successfully---" rotate_etcd echo "---rotate etcd successfully---" echo "---check etcd cluster ready---" check_etcd_cluster_ready rotate_apiserver echo "---restart apiserver successfully---" echo "---renew k8s components certs---" renew_k8s_certs echo "---end to renew k8s components certs---" generate_cm echo "etcd CA and certs have successfully rotated!" rm -rf $dir
驗證證書是否更新。
cd /var/lib/etcd/cert
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/etcd
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/
for i in `ls | grep crt| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
當以上腳本輸出的時間在50年之后,表示輪轉完成。
通過手工方式輪轉成功后,由于容器服務控制面側無法獲取輪轉結果,控制臺集群列表中對應集群仍會顯示已過期狀態,請您提交工單以清除過期狀態顯示。
證書輪轉失敗后回滾
使用場景
通過云控制臺證書輪轉失敗,恢復K8s集群。
通過黑屏方式證書輪轉失敗,恢復K8s集群。
當出現以上場景時,集群管理員可以登錄任意Master節點,通過操作如下腳本來手工更新etcd證書,因老證書即將過期,此操作會新生成一套etcd證書,并更新etcd server證書和kube-apiserver的client證書。
以下腳本使用需要root用戶執行。
確認集群Master節點之間配置了root用戶的免密登錄。
在Master上通過SSH方式登錄其他任意Master節點,如果提示輸入密碼,請您參考如下方式配置Master節點之間的免密登錄。
# 1. 生成密鑰。如果您的節點上已存在對應的登錄密鑰,可以跳過該步驟。 ssh-keygen -t rsa # 2. 使用ssh-copy-id工具傳輸公鑰到其他所有Master節點,$(internal-ip)為其他Master節點的內網IP。 ssh-copy-id -i ~/.ssh/id_rsa.pub $(internal-ip)
說明如果您未執行免密登錄相關操作,在運行腳本時,則需要輸入root用戶密碼。
分別復制以下腳本內容,保存并命名為restart-apiserver.sh和rollback-etcd.sh,然后將兩者保存到同一個文件夾
說明rollback-etcd.sh腳本會嘗試通過訪問節點的元數據服務獲取Region信息并從該Region就近拉取輪轉鏡像,您也可以在執行該腳本時,輸入參數
--region xxxx
指定Region信息。#! /bin/bash declare -x cmd k8s::wait_apiserver_ready() { set -e for i in $(seq 600); do if kubectl cluster-info &>/dev/null; then return 0 else echo "wait apiserver to be ready, retry ${i}th after 1s" sleep 1 fi done echo "failed to wait apiserver to be ready" return 1 } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function restart_apiserver() { # 判斷容器運行時 if [[ $cmd == "docker" ]]; then # 使用docker命令重啟kube-apiserver Pod container_id=$(docker ps | grep kube-apiserver | awk '{print $1}' | head -n 1 ) if [[ -n $container_id ]]; then echo "Restarting kube-apiserver pod using Docker: $container_id" docker restart "${container_id}" else echo "kube-apiserver pod not found." fi elif [[ $cmd == "crictl" ]]; then # 使用crictl命令重啟kube-apiserver Pod pod_id=$(crictl pods --label component=kube-apiserver --latest --state=ready | grep -v "POD ID" | head -n 1 | awk '{print $1}') if [[ -n $pod_id ]]; then echo "Restarting kube-apiserver pod using crictl: $pod_id" crictl stopp "${pod_id}" else echo "kube-apiserver pod not found." fi else echo "Unsupported container runtime: $cmd" fi k8s::wait_apiserver_ready } check_container_runtime restart_apiserver echo "API Server restarted"
#!/bin/bash set -eo pipefail declare -x TARGET_TEAR declare -x cmd dir=/tmp/rollback/etcdcert KUBE_CERT_PATH=/etc/kubernetes/pki ETCD_CERT_DIR=/var/lib/etcd/cert ETCD_HOSTS="" currentDir="$PWD" # 更新K8s證書,根據集群Region替換下面cn-hangzhou的默認鏡像地域。 function get_etcdhosts() { name1=$(find "$ETCD_CERT_DIR" -name '*-name-1.pem' -exec basename {} \; | sed 's/-name-1.pem//g') name2=$(find "$ETCD_CERT_DIR" -name '*-name-2.pem' -exec basename {} \; | sed 's/-name-2.pem//g') name3=$(find "$ETCD_CERT_DIR" -name '*-name-3.pem' -exec basename {} \; | sed 's/-name-3.pem//g') echo "hosts: $name1 $name2 $name3" ETCD_HOSTS="$name1 $name2 $name3" } function gencerts() { echo "generate ssl cert ..." rm -rf $dir mkdir -p "$dir" cd $dir local hosts hosts=$(echo $ETCD_HOSTS | tr -s " " ",") echo "generate ca" echo '{"CN":"CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/ca-config.json echo "generate etcd server certificates" export ADDRESS=$hosts,ext1.example.com,coreos1.local,coreos1,127.0.0.1 export NAME=etcd-server echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME export ADDRESS= export NAME=etcd-client echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME # gen peer-ca echo "generate peer certificates" echo '{"CN":"Peer-CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/peer-ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/peer-ca-config.json i=0 for host in $ETCD_HOSTS; do ((i = i + 1)) export MEMBER=${host}-name-$i echo '{"CN":"'${MEMBER}'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=$dir/peer-ca.pem -ca-key=$dir/peer-ca-key.pem -config=$dir/peer-ca-config.json -profile=peer \ -hostname="$hosts,${MEMBER}.local,${MEMBER}" - | cfssljson -bare $dir/${MEMBER} done # chown chown -R etcd:etcd $dir chmod 0644 $dir/* for ADDR in $ETCD_HOSTS; do printf "sync the certificates of node %s" "${ADDR}" ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" mkdir -p "${dir}" scp -o StrictHostKeyChecking=no "${dir}"/* root@"${ADDR}":/var/lib/etcd/cert/ scp -o StrictHostKeyChecking=no "${dir}"/ca.pem "${dir}"/etcd-client.pem "${dir}"/etcd-client-key.pem root@"${ADDR}":/etc/kubernetes/pki/etcd/ done } function generate_cm() { echo "generate status configmap" cat <<-"EOF" >/tmp/ack-rotate-etcd-ca-cm.yaml.tpl apiVersion: v1 kind: ConfigMap metadata: name: ack-rotate-etcd-status namespace: kube-system data: status: "success" hosts: "$hosts" EOF sed -e "s#\$hosts#$ETCD_HOSTS#" /tmp/ack-rotate-etcd-ca-cm.yaml.tpl | kubectl apply -f - } function rotate_etcd() { for ADDR in $ETCD_HOSTS; do printf "rotate etcd's certificates on node %s\n" "${ADDR}" if [ "$cmd" == "docker" ]; then echo "restart docker on node $ADDR" ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart docker fi ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd done } function rotate_apiserver() { echo "current dir: $currentDir" for ADDR in $ETCD_HOSTS; do printf "restart apiserver on node %s\n" "${ADDR}" scp -o StrictHostKeyChecking=no "${currentDir}"/restart-apiserver.sh root@"${ADDR}":/tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" systemctl restart kubelet ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" chmod +x /tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" bash /tmp/restart-apiserver.sh done } function check_etcd_cluster_ready() { local etcd_endpoints=() for ip in $ETCD_HOSTS; do etcd_endpoints+=("https://$ip:2379") done for i in $(seq 300); do for idx in "${!etcd_endpoints[@]}"; do endpoint="${etcd_endpoints[$idx]}" local health_output=$(ETCDCTL_API=3 etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints "$endpoint" endpoint health --command-timeout=1s 2>&1) if echo "$health_output" | grep -q "successfully committed proposal"; then unset 'etcd_endpoints[$idx]' else echo "etcdctl result: ${health_output}" echo "$endpoint is not ready" fi done # shellcheck disable=SC2199 if [[ -z "${etcd_endpoints[@]}" ]]; then echo "ETCD cluster is ready" break fi sleep 1 printf "wait etcd cluster to be ready, retry %d after 1s,total 300s \n" "$i" done } function get_region_id() { set +e; # close error out local path=100.100.100.200/latest/meta-data/region-id for (( i=0; i<3; i++)); do response=$(curl --retry 1 --retry-delay 5 -sSL $path) if [[ $? -gt 0 || "x$response" == "x" ]]; then sleep 2; continue fi if echo "$response"|grep -E "<title>.*</title>" >/dev/null; then sleep 3; continue fi echo "$response" # return from metadata succeed. set -e; return done set -e # open error out # function will return empty string when failed } function is_vpc() { # Execute the curl command and capture the network-type from ECS meta-server response=$(curl -s http://100.100.100.200/latest/meta-data/network-type) if [ "$response" = "vpc" ]; then return 0 else return 1 fi } function renew_k8s_certs() { # try to get region id from meta-server if not given in parameter META_REGION=$(get_region_id) if [[ -z "$REGION" ]]; then if [[ -z "$META_REGION" ]]; then echo "failed to get region id from ECS meta-server, please enter the region parameter." return 1 fi REGION=$META_REGION elif [[ -n "${META_REGION}" && "$REGION" != "$META_REGION" ]] ; then echo "switch to use local region id $META_REGION" REGION=$META_REGION fi # Update certs for k8s components and kubeconfig for ADDR in $ETCD_HOSTS; do echo "renew k8s components cert on node $ADDR" #compatible containerd set +e IMAGE="registry.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" if is_vpc; then IMAGE="registry-vpc.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" fi echo "will pull rotate image $IMAGE" ssh -o StrictHostKeyChecking=no root@$ADDR docker run --privileged=true -v /:/alicoud-k8s-host --pid host --net host \ $IMAGE /renew/upgrade-k8s.sh --role master ssh -o StrictHostKeyChecking=no root@$ADDR ctr image pull $IMAGE ssh -o StrictHostKeyChecking=no root@$ADDR ctr run --privileged=true --mount type=bind,src=/,dst=/alicoud-k8s-host,options=rbind:rw \ --net-host $IMAGE cert-rotate /renew/upgrade-k8s.sh --role master set -e echo "finished renew k8s components cert on $ADDR" done } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } while [[ $# -gt 0 ]] do key="$1" case $key in --region) export REGION=$2 shift ;; *) echo "unknown option [$key]" exit 1 ;; esac shift done get_etcdhosts printf "ETCD_HOSTS: %s\n" "$ETCD_HOSTS" gencerts echo "---generate certificates successfully---" rotate_etcd echo "---rotate etcd successfully---" echo "---check etcd cluster ready---" check_etcd_cluster_ready rotate_apiserver echo "---restart apiserver successfully---" echo "---renew k8s components certs---" renew_k8s_certs echo "---end to renew k8s components certs---" generate_cm echo "etcd CA and certs have successfully rotated!" rm -rf $dir
在任意Master節點上運行
bash rollback-etcd.sh
。當看到命令行輸出
etcd CA and certs have successfully rotated!
時,表示所有Master節點上的證書和K8s證書已經輪轉完成。驗證證書是否更新。
cd /var/lib/etcd/cert
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/etcd
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/
for i in `ls | grep crt| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
當以上腳本輸出的時間在50年之后,表示輪轉完成。