etcd
Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.
refer

etcd backup&resotre
refer

ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshotdb
# exit 0

# verify the snapshot
ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshotdb

write progress

  1. client write to leader
  2. leader receive and put to kv store
  3. kv store write to raftNode
  4. leader write raft.node to log entry and send to raft.raft
  5. raft.raft send them to each follower
  6. leader write its own Ready info to store
  7. follower write message and create Ready message to response
  8. leader get > 1/2 Ready info log entry and committed index

TLS
▪ etcd supports encrypted communication through TLS
▪ TLS can be used between peers and clients
▪ To start a cluster with TLS each cluster member should have:
▪ ca-client.crt: the CA (certificate authority) trusted by the server to sign client certs
▪ Only used to auth clients if the –client-cert-auth switch is set, else any client can connect
▪ node-client.crt: the server public key certificate signed by a CA for use with clients
▪ node-client.key: the server private key for use with clients
▪ ca-peer.crt: the CA trusted by the server to sign peer certs
▪ Only used to auth peers if the –peer-client-cert-auth switch is set, else any peer can
connect
▪ node-peer.crt: the server public key certificate signed by a CA for use with peers
▪ node-peer.key: private key associated with the node-peer.crt for use with peers

etcd migrate/add/remove
take care of quorum
img(°0°)

etcd maintenance
refer

monitoring

  1. is running up ?
  2. has a leader ?
  3. leader changes
  4. Consensus proposal
    a proposal is a request that needs to go through raft protocol. it has four different types:
    committed
    applied
    pending
    failed(two reasons: leader election is failing or there is loss of the quorum)
    

    When etcd’s committed index is greater than the applied index threshold is greater than 5000, it will reject all requests and return the error ErrTooManyRequests.

  5. disk sync duration
    wal_fsync_duration_seconds
    backend_commit_duration_seconds
    
  6. gRPC stats
    etcd uses gRPC to communicate between each of the nodes in the cluster.