ETCD mvcc database space exceeded

#etcd #database 2023-01-01

Background

在一次的压测试中，打满了 ETCD 的磁盘，etcd client 收到写入错误： mvcc database space exceeded 开始恢复 etcd ，因为当前项目仅仅是是使用 etcd 作通信，数据并不重要，所以直接压缩所有的 revision

获取最近的 revision

1	/usr/local/etcd/etcdctl --user=$user --password=$password --endpoints=$host endpoint status

压缩 etcd 的 revision

1	/usr/local/etcd/etcdctl --user=$user --password=$password --endpoints=$host compact $revision

整理 etcd 碎片

1	/usr/local/etcd/etcdctl --user=$user --password=$password --endpoints=$host defrag

解除 Alarm 告警

这一步非常重要，没有这一步即使磁盘占用下降，client 仍然会无法工作，因为告警未解除
1
/usr/local/etcd/etcdctl --user=$user --password=$password --endpoints=$host alarm disarm

在磁盘占用100%时，其实访问 etcd server api 经常会出现 context.timeout 的超时错误，选择一台节点，强制重启时，发现重启也超时，确实非常的慢。所以需要修改 linux service 的启动超时时间。

1	TimeoutSec=0

再次重启 etcd，重启过程非常耗时，但耐心等待就好了

1	service restart etcd

重启后只调用当前重启节点的 server 超时问题也会解决

etcd 配额的大小和 compaction 的速度需要按 QPS 来评估