Tunning
OS
page cache(Memory)
disable swapping(vm.swappiness=1, preferably 1, for minimum swapping on systems where the RHEL kernel is 2.6.32-642.el6 or higher.)
tcp connections
jvm(gc, heap, MaxGCPauseMillis, InitiatingHeapOccupancyPercent)
dirty pages
Disk(Throughput and Capacity)
JBOD VS raid 10
ext, xfs
ssd
mount with noatime, be default, atime is updated every time a file is read(access time)
Network
send and receive buffer size per socket(net.core.wmem_default, net.core.rmem_default )
tcp window scaling(tcp_window_scaling)
number of simultaneous connections to be accepted(net.core.netdev_max_backlog)
in different rack
monitoring
cpu
network metrics
file handle
disk usage
disk io
gc
zk
UnderReplicatedPartitions
PartitionCount,LeaderCount
IsrExpandsPerSec
Message in rate/Byte in rate/Byte out rate
NetworkProcessorAvgIdlePercent
RequestHandlerAvgIdlePercent
replication
replica.lag.time.max.ms
num.replica.fetchers
min.insync.replica
broker
log.retention.{ms, minutes, hours} , log.retention.bytes
message.max.bytes, replica.fetch.max.bytes
unclean.leader.election.enable
min.insync.replicas
replica.lag.time.max.ms
replica.fetch.response.max.bytes
zookeeper.session.timeout.ms
num.io.threads
log.segment.bytes(As messages are produced to Kafka , they are appended to the current log segment for the partition. A smaller log-segment +++size means that files must be closed and allocated more often, which reduces the overall efficiency of disk writes; too larger means cant be +++expired until log segment is closed.)
producer
buffer.memory
batch.size
linger.ms
max.in.flight.requests.per.connection
compression.type
acks
avoid large message
consumer
offsets.topic.replication.factor
offsets.retention.minutes
__consumer_offsets
fetch.min.bytes, fetch.max.wait.ms
max.poll.interval.ms
max.poll.records
session.timeout.ms
max.partition.fetch.bytes
auto.offset.reset
enable.auto.commit
receive.buffer.bytes and send.buffer.bytes