可视化 Kubernetes 历史记录-技术圈

简介

Sloop 可以监控 Kubernetes event ，记录事件和资源状态变化的历史，并提供可视化来帮助调试过去的事件。

主要特点：

允许查找和检查不再存在的资源（例如：发现之前部署中的 pod ）。
提供时间线显示，显示deployment 、ReplicaSet 和 StatefulSet 更新中相关资源的退出。
帮助调试瞬态和间歇性错误。
可以查看 Kubernetes 应用程序中随时间的变化。
是一个独立的服务，不依赖于分布式存储。

架构

安装及使用

docker 安装

我们可以使用官方提供的镜像安装,sloop数据文件保存在容器的/data目录下

docker run  -it -p 8080:8080 -v ~/.kube/config:/kube/config  -v /data:/data -e KUBECONFIG=/kube/config sloopimage/sloop

通过访问https://localhost:8080 即可进入web ui 。

在侧边栏我们可以选择要查看的时间范围，名称空间，资源对象，以及关键词过滤等。

在详情页面我们可以看到我们event的详情

我们还可以点击页面里的details 查看资源对象的详情

还可以点击页面上方的debug menu 进入debug 页面查看metrics

我们还可以配置一下我们打开ui后的默认页面，sloop有如下选项

[root@dev-tools sloop]# docker run --rm -it -p 8080:8080 -v ~/.kube/config:/kube/config  -e KUBECONFIG=/kube/config sloop  sloop -h
Usage of configFileOnly:
  -alsologtostderr
        log to standard error as well as files
  -apiserver-host string
        Kubernetes API server endpoint
  -badger-detail-log-enabled
        Turns on detailed logging of BadgerDB
  -badger-discard-ratio float
        Badger value log GC uses this value to decide if it wants to compact a vlog file. The lower the value of discardRatio the higher the number of !badger!move keys. And thus more the number of !badger!move keys, the size on disk keeps on increasing over time.
  -badger-enable-event-logging
        Turns on badger event logging
  -badger-keep-l0-in-memory
        Keeps all level 0 tables in memory for faster writes and compactions
  -badger-level-one-size int
        The maximum total size for Level 1.  0 = use badger default
  -badger-level-size-multiplier int
        The ratio between the maximum sizes of contiguous levels in the LSM.  0 = use badger default
  -badger-max-table-size int
        Max LSM table size in bytes.  0 = use badger default
  -badger-number-of-compactors int
        Number of compactors for badger
  -badger-number-of-level-zero-tables int
        Number of level zero tables for badger
  -badger-number-of-zero-tables-stall int
        Number of Level 0 tables that once reached causes the DB to stall until compaction succeeds
  -badger-sync-writes
        Sync Writes ensures writes are synced to disk if set to true
  -badger-use-lsm-only-options
        Sets a higher valueThreshold so values would be collocated with LSM tree reducing vlog disk usage
  -badger-vlog-file-size int
        Max size in bytes per value log file. 0 = use badger default
  -badger-vlog-fileIO-mapping
        Indicates which file loading mode should be used for the value log data, in memory constrained environments the value is recommended to be true
  -badger-vlog-gc-freq duration
        Frequency of running badger's ValueLogGC
  -badger-vlog-max-entries uint
        Max number of entries per value log files. 0 = use badger default
  -badger-vlog-truncate
        Truncate value log if badger db offset is different from badger db size
  -bind-address string
        Web server bind ip address.
  -cleanup-frequency duration
        Frequency between subsequent runs for the database cleanup
  -config string
        Path to a yaml or json config file
  -context string
        Use a specific kubernetes context
  -crd-refresh-interval duration
        Frequency between CRD Informer refresh
  -default-kind string
        Default UX filter kind
  -default-lookback string
        Default UX filter lookback
  -default-namespace string
        Default UX filter namespace
  -deletion-batch-size int
        Size of batch for deletion
  -disable-kube-watch
        Turn off kubernetes watch
  -disable-store-manager
        Turn off store manager which is to clean up database
  -display-context string
        Use this to override the display context.  When running in k8s the context is empty string.  This lets you override that (mainly useful if you are running many copies of sloop on different clusters) 
  -enable-delete-keys
        Use delete prefixes instead of dropPrefix for GC
  -gc-threshold float
        Threshold for GC to start garbage collecting
  -keep-minor-node-updates
        Keep all node updates even if change is only condition timestamps
  -kube-watch-resync-interval duration
        OPTIONAL: Kubernetes watch resync interval
  -log_backtrace_at string
        when logging hits line file:N, emit a stack trace
  -logtostderr
        log to standard error instead of files
  -max-disk-mb int
        Max disk storage in MB
  -max-look-back duration
        Max history data to keep
  -playback-file string
        Read watch data from a playback file
  -port int
        Web server port
  -record-file string
        Record watch data to a playback file
  -restore-database-file string
        Restore database from backup file into current context.
  -stderrthreshold int
        logs at or above this threshold go to stderr
  -store-root string
        Path to store history data
  -use-mock-badger
        Use a fake in-memory mock of badger
  -v int
        log level for V logs
  -vmodule string
        comma-separated list of pattern=N settings for file-filtered logging
  -watch-crds
        Watch for activity for CRDs
  -web-files-path string
        Path to web files
Failed to pre-parse flags looking for config file: flag: help requested
ERROR: logging before flag.Parse: I0509 10:51:23.862730       1 config.go:256] Default config set
Usage of sloop:
  -alsologtostderr
        log to standard error as well as files
  -apiserver-host string
        Kubernetes API server endpoint
  -badger-detail-log-enabled
        Turns on detailed logging of BadgerDB
  -badger-discard-ratio float
        Badger value log GC uses this value to decide if it wants to compact a vlog file. The lower the value of discardRatio the higher the number of !badger!move keys. And thus more the number of !badger!move keys, the size on disk keeps on increasing over time. (default 0.99)
  -badger-enable-event-logging
        Turns on badger event logging
  -badger-keep-l0-in-memory
        Keeps all level 0 tables in memory for faster writes and compactions (default true)
  -badger-level-one-size int
        The maximum total size for Level 1.  0 = use badger default
  -badger-level-size-multiplier int
        The ratio between the maximum sizes of contiguous levels in the LSM.  0 = use badger default
  -badger-max-table-size int
        Max LSM table size in bytes.  0 = use badger default
  -badger-number-of-compactors int
        Number of compactors for badger
  -badger-number-of-level-zero-tables int
        Number of level zero tables for badger
  -badger-number-of-zero-tables-stall int
        Number of Level 0 tables that once reached causes the DB to stall until compaction succeeds
  -badger-sync-writes
        Sync Writes ensures writes are synced to disk if set to true (default true)
  -badger-use-lsm-only-options
        Sets a higher valueThreshold so values would be collocated with LSM tree reducing vlog disk usage (default true)
  -badger-vlog-file-size int
        Max size in bytes per value log file. 0 = use badger default
  -badger-vlog-fileIO-mapping
        Indicates which file loading mode should be used for the value log data, in memory constrained environments the value is recommended to be true
  -badger-vlog-gc-freq duration
        Frequency of running badger's ValueLogGC (default 1m0s)
  -badger-vlog-max-entries uint
        Max number of entries per value log files. 0 = use badger default (default 200000)
  -badger-vlog-truncate
        Truncate value log if badger db offset is different from badger db size (default true)
  -bind-address string
        Web server bind ip address.
  -cleanup-frequency duration
        Frequency between subsequent runs for the database cleanup (default 30m0s)
  -config string
        Path to a yaml or json config file
  -context string
        Use a specific kubernetes context
  -cpuprofile string
        write profile to file
  -crd-refresh-interval duration
        Frequency between CRD Informer refresh (default 5m0s)
  -default-kind string
        Default UX filter kind (default "_all")
  -default-lookback string
        Default UX filter lookback (default "1h")
  -default-namespace string
        Default UX filter namespace (default "default")
  -deletion-batch-size int
        Size of batch for deletion (default 1000)
  -disable-kube-watch
        Turn off kubernetes watch
  -disable-store-manager
        Turn off store manager which is to clean up database
  -display-context string
        Use this to override the display context.  When running in k8s the context is empty string.  This lets you override that (mainly useful if you are running many copies of sloop on different clusters) 
  -enable-delete-keys
        Use delete prefixes instead of dropPrefix for GC
  -gc-threshold float
        Threshold for GC to start garbage collecting (default 0.8)
  -keep-minor-node-updates
        Keep all node updates even if change is only condition timestamps
  -kube-watch-resync-interval duration
        OPTIONAL: Kubernetes watch resync interval (default 30m0s)
  -log_backtrace_at value
        when logging hits line file:N, emit a stack trace
  -log_dir string
        If non-empty, write log files in this directory
  -logtostderr
        log to standard error instead of files
  -max-disk-mb int
        Max disk storage in MB (default 32768)
  -max-look-back duration
        Max history data to keep (default 336h0m0s)
  -playback-file string
        Read watch data from a playback file
  -port int
        Web server port (default 8080)
  -record-file string
        Record watch data to a playback file
  -restore-database-file string
        Restore database from backup file into current context.
  -stderrthreshold value
        logs at or above this threshold go to stderr
  -store-root string
        Path to store history data (default "./data")
  -use-mock-badger
        Use a fake in-memory mock of badger
  -v value
        log level for V logs
  -vmodule value
        comma-separated list of pattern=N settings for file-filtered logging
  -watch-crds
        Watch for activity for CRDs (default true)
  -web-files-path string
        Path to web files (default "./pkg/sloop/webserver/webfiles")

修改默认的名称空间以及资源对象及时间

docker run --rm -it -p 8080:8080 -v ~/.kube/config:/kube/config  -e KUBECONFIG=/kube/config sloop  sloop -default-namespace=kube-system -default-kind=pod  -default-lookback=2h

从源码安装

mkdir -p $GOPATH/src/github.com/salesforce
cd $GOPATH/src/github.com/salesforce
git clone https://github.com/salesforce/sloop.git
cd sloop
go env -w GO111MODULE=auto
make
$GOPATH/bin/sloop

Helm 方式安装

git clone https://github.com/salesforce/sloop.git
cd sloop
cd /root/sloop/helm/sloop
kubectl create namespace sloop
helm template . --namespace sloop> sloop-test.yaml
kubectl -n sloop apply -f sloop-test.yaml

参考：https://github.com/salesforce/sloop.git