refer

flow

–node-status-update-frequency from 10s to 4s —kubelet
–node-monitor-period from 5s to 2s —controller
–node-monitor-grace-period from 40s to 20s —controller , If the master's  controller manager notices a node is unhealthy via the node-monitor-grace-period (Default is 40s), then it marks the node as unhealthy via the control manager.
–pod-eviction-timeout from 5m to 30s —controller

Nodes and kernel

  1. gke
    1-5 nodes: n1-standard-1
    6-10 nodes: n1-standard-2
    11-100 nodes: n1-standard-4
    101-250 nodes: n1-standard-8
    251-500 nodes: n1-standard-16
    more than 500 nodes: n1-standard-32
    
  2. kernel
    ```
    fs.file-max=1000000

net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=4096

net.ipv4.neigh.default.gc_thresh3=8192

net.netfilter.nf_conntrack_max=10485760

net.core.netdev_max_backlog=10000
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360

fs.inotify.max_user_instances=524288

fs.inotify.max_user_watches=524288


## Etcd
1. HA
2. ssd
3. --quota-backend-bytes to increase storage
4. separate etcd cluster

## image
1. docker

max-concurrent-downloads=10
concurrent pull 5
ssd
preload pause image

2. kubelet

–serialize-image-pulls=false
–image-pull-progress-deadline=30
–max-pods=110

3. registry p2p

## APIServer
1. nodes 1k-3k

–max-requests-inflight=1500
–max-mutating-requests-inflight=500

2. mem

–target-ram-mb=node_nums * 60


## Pod
1. requests&limits

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage
```

  1. nodeAffinity, podAffinity, podAntiAffinity

Kube-scheduler

  1. –kube-api-qps=100

Kube-controller-manager

  1. –kube-api-qps=100 and –kube-api-burst=100