prometheus

OOM

  • which metrics are using the most resources
    topk(10, count by (name)({name=~”.+”}))
    topk(10, count by (name, job)({name=~”.+”}))
  • which jobs have the most time series
    topk(10, count by (job)({name=~”.+”}))
  • caculate
    refer
  • api(tsdb)
    GET /api/v1/status/tsdb
    headStats: This provides the following data about the head block of the TSDB:
    numSeries: The number of series.
    chunkCount: The number of chunks.
    minTime: The current minimum timestamp in milliseconds.
    maxTime: The current maximum timestamp in milliseconds.
    seriesCountByMetricName: This will provide a list of metrics names and their series count.
    labelValueCountByLabelName: This will provide a list of the label names and their value count.
    memoryInBytesByLabelName This will provide a list of the label names and memory used in bytes. Memory usage is calculated by adding the length of all values for a given label name.
    seriesCountByLabelPair This will provide a list of label value pairs and their series count.
    

ha
refer

high memory consumption
refer

remove unuseful labels

go tool pprof -symbolize=remote -inuse_space https://promXXX/debug/pprof/heap
File: prometheus
Type: inuse_space
Time: Apr 24, 2019 at 4:20pm (CEST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 8839.83MB, 84.87% of 10415.77MB total
Dropped 398 nodes (cum <= 52.08MB)
Showing top 10 nodes out of 64
      flat  flat%   sum%        cum   cum%
 1628.82MB 15.64% 15.64%  1628.82MB 15.64%  github.com/prometheus/tsdb/index.(*decbuf).uvarintStr /app/vendor/github.com/prometheus/tsdb/index/encoding_helpers.go
 1233.86MB 11.85% 27.48%  1234.86MB 11.86%  github.com/prometheus/prometheus/pkg/textparse.(*PromParser).Metric /app/pkg/textparse/promparse.go
...

Tune
optimising startup

how to scrape metrics
refer