kubeadm 部署多 master 节点问题,高可用必须 2 台在线才行吗? - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
caicaiwoshishui
V2EX    Kubernetes

kubeadm 部署多 master 节点问题,高可用必须 2 台在线才行吗?

  •  
  •   caicaiwoshishui 2021-11-20 22:29:43 +08:00 2938 次点击
    这是一个创建于 1420 天前的主题,其中的信息可能已经有所发展或是发生改变。

    折腾一天了,一共三台 master 节点机器 用 keepalived 做虚拟 ip ,开启了 lvsf ,测试关闭其中任意一台,另外两台都没问题,但是只要关闭 2 台,服务就不可用了.

    • 错误如下
    [root@master-1 ~]# kubectl get nodes The connection to the server 192.168.0.8:6443 was refused - did you specify the right host or port? [root@master-1 ~]# netstat -ntlp |grep 6443 

    具体日志

    • kube-apiserver
    [root@master-1 ~]# docker ps -a |grep kube-api|grep -v pause 0c1c0042b8c2 53224b502ea4 "kube-apiserver --ad…" About a minute ago Exited (1) 54 seconds ago k8s_kube-apiserver_kube-apiserver-master-1.host.com_kube-system_464df844856c9d5461cb184edc4974c9_45 [root@master-1 ~]# docker logs -f 0c1c0042b8c2 I1120 14:25:26.120729 1 server.go:553] external host was not specified, using 192.168.0.11 I1120 14:25:26.122152 1 server.go:161] Version: v1.22.3 I1120 14:25:26.836619 1 shared_informer.go:240] Waiting for caches to sync for node_authorizer I1120 14:25:26.838689 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. I1120 14:25:26.838721 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota. I1120 14:25:26.840979 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. I1120 14:25:26.841003 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota. Error: context deadline exceeded 
    • etcd 错误是 RAFT NO LEADER
    [root@master-1 ~]# docker ps -a |grep etcd dfd6026ae3fd 004811815584 "etcd --advertise-cl…" 3 minutes ago Up 3 minutes k8s_etcd_etcd-master-1.host.com_kube-system_a23c864b52d59788909994fe31a97f5e_8 13c6e65046d6 004811815584 "etcd --advertise-cl…" 7 minutes ago Exited (2) 3 minutes ago k8s_etcd_etcd-master-1.host.com_kube-system_a23c864b52d59788909994fe31a97f5e_7 5ca2f134f743 registry.aliyuncs.com/google_containers/pause:3.5 "/pause" 22 minutes ago Up 22 minutes k8s_POD_etcd-master-1.host.com_kube-system_a23c864b52d59788909994fe31a97f5e_1 [root@master-1 ~]# docker logs -n 10 13c6e65046d6 {"level":"warn","ts":"2021-11-20T14:24:39.911Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"ad7fc708963cf6f3","rtt":"0s","error":"dial tcp 192.168.0.9:2380: i/o timeout"} {"level":"warn","ts":"2021-11-20T14:24:39.915Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"c68a49f4a0c3cea9","rtt":"0s","error":"dial tcp 192.168.0.10:2380: connect: no route to host"} {"level":"warn","ts":"2021-11-20T14:24:39.915Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"c68a49f4a0c3cea9","rtt":"0s","error":"dial tcp 192.168.0.10:2380: connect: no route to host"} {"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc is starting a new election at term 7"} {"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc became pre-candidate at term 7"} {"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc received MsgPreVoteResp from cb18584c4f4dbfc at term 7"} {"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc [logterm: 7, index: 3988] sent MsgPreVote request to ad7fc708963cf6f3 at term 7"} {"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc [logterm: 7, index: 3988] sent MsgPreVote request to c68a49f4a0c3cea9 at term 7"} {"level":"warn","ts":"2021-11-20T14:24:41.729Z","caller":"etcdhttp/metrics.go:166","msg":"serving /health false; no leader"} {"level":"warn","ts":"2021-11-20T14:24:41.729Z","caller":"etcdhttp/metrics.go:78","msg":"/health error","output":"{\"health\":\"false\",\"reason\":\"RAFT NO LEADER\"}","status-code":503} 

    结论

    etcd 没有选出 leader 节点?单个 etcd 不能用吗?求大佬支招

    11 条回复    2021-12-13 21:53:34 +08:00
    bpf2049
        1
    bpf2049  
       2021-11-20 23:00:28 +08:00   1
    etcd 为了避免脑裂,采用了 raft 算法,规定只有过半数节点在线才能提供服务,即 N/2+1 节点在线才能选出 Leader
    cs419
        2
    cs419  
       2021-11-20 23:35:23 +08:00   1
    高可用集群就是这么个设计方案
    集群节点都活着的时候 轮询受理请求 分摊压力
    挂掉的节点超过一半 就拒绝服务

    原因很简单 高可用机制被破坏了
    此时拒绝服务 在你修好节点后 集群可以正常工作

    但如果提供继续提供服务 然后请求把剩下的节点打爆掉
    则没法完整的修复数据

    想要单节点可用 那就一开始用单节点启动 别创建集群
    limao693
        3
    limao693  
       2021-11-20 23:49:04 +08:00 via iPhone   1
    Raft 过半数量,可正常工作
    chih758
        4
    chih758  
       2021-11-21 01:15:30 +08:00 via Android
    测试环境 etcdctl member remove ,从集群里面删掉两个节点,就可以单点运行了
    caicaiwoshishui
        5
    caicaiwoshishui  
    OP
       2021-11-21 10:08:46 +08:00
    @cs419 感谢大佬,想问下如果节点过半挂了,并且重启不能恢复,是否可以添加新的机器加入到集群中,但是问题是 kubectl 都不能用了,kubeadm 也连不上 master 节点呀,这怎么搞
    caicaiwoshishui
        6
    caicaiwoshishui  
    OP
       2021-11-21 10:13:47 +08:00
    @chih758 刚测试了下,关闭 2 台机器,剩下一台,我 docker exec it 进入后台

    配置 etcdctl 证书
    sh-5.0# export ETCDCTL_API=3
    sh-5.0# alias etcdctl='etcdctl --endpOnts=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
    sh-5.0# etcdctl member list

    执行
    sh-5.0# `etcdctl member list`

    {"level":"warn","ts":"2021-11-21T02:11:18.722Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003a4700/#initially=[https://127.0.0.1:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Error: context deadline exceeded

    超时,也就是剩下一台机器的 etcd 会超时并且 docker 会 exit 掉
    caicaiwoshishui
        7
    caicaiwoshishui  
    OP
       2021-11-21 11:09:22 +08:00 via iPhone
    @suifengdang666 想问下生产环境中.kubeadm 创建的 k8s 集群,etcd 是独立出来的吗?还是用 kubeadm 自带的 etcd
    bpf2049
        8
    bpf2049  
       2021-11-21 21:56:12 +08:00
    @caicaiwoshishui kubeadm 创建的就行,如果怕 master 负载太高导致 etcd 异常,可以独立几个 vm 组一个 etcd 集群
    pmispig
        9
    pmispig  
       2021-11-21 23:41:28 +08:00
    etcd 和 kubei api 分开放到不同的服务器部署
    0x208
        10
    0x208  
       2021-12-13 16:45:28 +08:00
    楼主找工作吗 可以看看我招聘贴
    caicaiwoshishui
        11
    caicaiwoshishui  
    OP
       2021-12-13 21:53:34 +08:00
    @0x208 可以远程吗 不在北京哦
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     6137 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 48ms UTC 02:26 PVG 10:26 LAX 19:26 JFK 22:26
    Do have faith in what you're doing.
    ubao snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86