flow
–node-status-update-frequency from 10s to 4s —kubelet
–node-monitor-period from 5s to 2s —controller
–node-monitor-grace-period from 40s to 20s —controller , If the master's controller manager notices a node is unhealthy via the node-monitor-grace-period (Default is 40s), then it marks the node as unhealthy via the control manager.
–pod-eviction-timeout from 5m to 30s —controller
Nodes and kernel
- gke
1-5 nodes: n1-standard-1 6-10 nodes: n1-standard-2 11-100 nodes: n1-standard-4 101-250 nodes: n1-standard-8 251-500 nodes: n1-standard-16 more than 500 nodes: n1-standard-32
- kernel
```
fs.file-max=1000000
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.neigh.default.gc_thresh3=8192
net.netfilter.nf_conntrack_max=10485760
net.core.netdev_max_backlog=10000
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288
## Etcd
1. HA
2. ssd
3. --quota-backend-bytes to increase storage
4. separate etcd cluster
## image
1. docker
max-concurrent-downloads=10
concurrent pull 5
ssd
preload pause image
2. kubelet
–serialize-image-pulls=false
–image-pull-progress-deadline=30
–max-pods=110
3. registry p2p
## APIServer
1. nodes 1k-3k
–max-requests-inflight=1500
–max-mutating-requests-inflight=500
2. mem
–target-ram-mb=node_nums * 60
## Pod
1. requests&limits
spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage
```
- nodeAffinity, podAffinity, podAntiAffinity
Kube-scheduler
- –kube-api-qps=100
Kube-controller-manager
- –kube-api-qps=100 and –kube-api-burst=100