备份和迁移 Kubernetes 利器:Velero

共 11377字,需浏览 23分钟

 ·

2021-04-19 14:58


你是否在运维kubernetes集群中有过这样的经历:

⼀个新⼈把某个namespace点击删除,导致这下⾯所有的资源全部丢失,只能⼀步⼀步的重新部署。新搭建集群,为了保证环境尽可能⼀致,只能从⽼集群拿出来yaml⽂件在新集群中疯狂apply。令⼈抓狂的瞬间随之⽽来的就是浪费⼤好⻘春的搬砖时光。

现在已经开源了很多集群资源对象备份的⼯具,把这些⼯具利⽤起来让你的⼯作事半功倍,不在苦逼加班。


1

集群备份

1.etcd备份

etcd备份可以实现K8S集群的备份,但是这种备份⼀般是全局的,可以恢复到集群某⼀时刻的状态,⽆ 法精确到恢复某⼀资源对象,⼀般使⽤快照的形式进⾏备份和恢复。
# 备份#!/usr/bin/env bashdate;CACERT="/opt/kubernetes/ssl/ca.pem"CERT="/opt/kubernetes/ssl/server.pem"EKY="/opt/kubernetes/ssl/server-key.pem"ENDPOINTS="192.168.1.36:2379"
ETCDCTL_API=3 etcdctl \--cacert="${CACERT}" --cert="${CERT}" --key="${EKY}" \--endpoints=${ENDPOINTS} \snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db
# 备份保留30天find /data/etcd_backup_dir/ -name *.db -mtime +30 -exec rm -f {} \;
# 恢复ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot20191222.db \ --name etcd-0 \ --initial-cluster "etcd-0=https://192.168.1.36:2380,etcd1=https://192.168.1.37:2380,etcd-2=https://192.168.1.38:2380" \ --initial-cluster-token etcd-cluster \ --initial-advertise-peer-urls https://192.168.1.36:2380 \ --data-dir=/var/lib/etcd/default.etcd
2.资源对象备份
对于更⼩粒度的划分到每种资源对象的备份,对于误删除了某种namespace或deployment以及集群迁 移就很有⽤了。现在开源⼯具有很多都提供了这样的功能,⽐如Velero, PX-Backup,Kasten。

velero:

Velero is an open source tool to safely backup and restore, perform disaster recovery, andmigrate Kubernetes cluster resources and persistent volumes.
PX-Backup:
Built from the ground up for Kubernetes, PX-Backup delivers enterprise-grade applicationand data protection with fast recovery at the click of a button
Kasten:
urpose-built for Kubernetes, Kasten K10 provides enterprise operations teams an easy-touse, scalable, and secure system for backup/restore, disaster recovery, and mobility ofKubernetes applications.

2

velero

Velero lets you:
1.Take backups of your cluster and restore incase of loss.
2.Migrate cluster resources to otherclusters.
3.Replicate your production cluster todevelopment and testing clusters.

介绍的velero提到了以上三个功能,主要就是备份恢复和迁移。

 

1.安装
可以通过命令式安装,helm,yaml很多法,举个例

 

可以看到创建了很多crd,并最终在veleronamespace下将应跑起来了。其实从crd的命名上就可以 看出他概有哪些途了。

2.定时备份

对于运维员来说,对外提供个集群的稳定性保证是必不可少的,这就需要我们开启定时备份功能。通过命令能够开始定时任务,指定那么分区,保留多少时间的备份数据,每隔多时间进备份次。

Examples: # Create a backup every 6 hours velero create schedule NAME --schedule="0 */6 * * *" # Create a backup every 6 hours with the @every notation velero create schedule NAME --schedule="@every 6h" # Create a daily backup of the web namespace velero create schedule NAME --schedule="@every 24h" --include-namespaces web # Create a weekly backup, each living for 90 days (2160 hours) velero create schedule NAME --schedule="@every 168h" --ttl 2160h0m0s
velero create schedule 360cloud --schedule="@every 24h" --ttl 2160h0m0sSchedule "360cloud" created successfully.[root@xxxxx ~]# kubectl get schedules --all-namespacesNAMESPACE NAME AGEvelero 360cloud 40s[root@xxxxx ~]# kubectl get schedules -n velero 360cloud -o yamlapiVersion: velero.io/v1kind: Schedulemetadata: generation: 3 name: 360cloud namespace: velero resourceVersion: "18164238" selfLink: /apis/velero.io/v1/namespaces/velero/schedules/360cloud uid: 7c04af34-1529-4b48-a3d1-d2f5e98de328spec: schedule: '@every 24h' template: hooks: {} includedNamespaces: - '*' ttl: 2160h0m0sstatus: lastBackup: "2021-03-07T08:18:49Z" phase: Enabled

3.集群迁移备份

对于我们要迁移部分的资源对象,可能并没有进定时备份,可能有了定时备份,但是想要最新的数据。那么备份次性的数据来迁移就好了。

velero backup create test01 --include-namespaces defaultBackup request "test01" submitted successfully.Run `velero backup describe test01` or `velero backup logs test01` for moredetails.[root@xxxxx ~]# velero backup describe test01Name: test01Namespace: veleroLabels: velero.io/storage-location=defaultAnnotations: velero.io/source-cluster-k8s-gitversion=v1.19.7 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=19Phase: InProgressErrors: 0Warnings: 0Namespaces: Included: default Excluded: <none>Resources: Included: * Excluded: <none> Cluster-scoped: autoLabel selector: <none>Storage Location: defaultVelero-Native Snapshot PVs: autoTTL: 720h0m0sHooks: <none>Backup Format Version: 1.1.0Started: 2021-03-07 16:44:52 +0800 CSTCompleted: <n/a>Expiration: 2021-04-06 16:44:52 +0800 CSTVelero-Native Snapshots: <none included>
备份之后可以使describe logs去查看更详细的信息。
在另外的集群中使restore就可以将集群数据恢复了。
[root@xxxxx ~]# velero restore create --from-backup test01Restore request "test01-20210307164809" submitted successfully.Run `velero restore describe test01-20210307164809` or `velero restore logstest01-20210307164809` for more details.[root@xxxxx ~]# kuebctl ^C[root@xxxxx ~]# kubectl get podNAME READY STATUS RESTARTS AGEnginx-6799fc88d8-4bnfg 0/1 ContainerCreating 0 6snginx-6799fc88d8-cq82j 0/1 ContainerCreating 0 6snginx-6799fc88d8-f6qsx 0/1 ContainerCreating 0 6snginx-6799fc88d8-gq2xt 0/1 ContainerCreating 0 6snginx-6799fc88d8-j5fc7 0/1 ContainerCreating 0 6snginx-6799fc88d8-kvvx6 0/1 ContainerCreating 0 5snginx-6799fc88d8-pccc4 0/1 ContainerCreating 0 5snginx-6799fc88d8-q2fnt 0/1 ContainerCreating 0 4snginx-6799fc88d8-r9dqn 0/1 ContainerCreating 0 4snginx-6799fc88d8-zqv6v 0/1 ContainerCreating 0 4s

s3中的存储记录:

恢复完成。


3

PVC的备份迁移

如果是Amazon EBS Volumes, Azure Managed Disks,Google Persistent Disks的存储类型,velero允许为PV打快照,作为备份的部分。
其他类型的存储可以使插件的形式,实现备份。

velero install --use-restic

apiVersion: v1kind: Podmetadata: annotations: backup.velero.io/backup-volumes: mypvc name: rbd-testspec: containers: - name: web-server image: nginx volumeMounts: - name: mypvc mountPath: /var/lib/www/html volumes: - name: mypvc persistentVolumeClaim: claimName: rbd-pvc-zhf readOnly: false

可以通过 opt-in , opt-out 的形式,为pod添加注解来进选择需要备份的pod中的volume。

velero backup create testpvc05 --snapshot-volumes=true --include-namespacesdefaultBackup request "testpvc05" submitted successfully.Run `velero backup describe testpvc05` or `velero backup logs testpvc05` formore details.[root@xxxx ceph]# velero backup describe testpvc05Name: testpvc05Namespace: veleroLabels: velero.io/storage-location=defaultAnnotations: velero.io/source-cluster-k8s-gitversion=v1.19.7 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=19Phase: CompletedErrors: 0Warnings: 0Namespaces: Included: default Excluded: <none>Resources: Included: * Excluded: <none> Cluster-scoped: autoLabel selector: <none>Storage Location: defaultVelero-Native Snapshot PVs: trueTTL: 720h0m0sHooks: <none>Backup Format Version: 1.1.0Started: 2021-03-10 15:11:26 +0800 CSTCompleted: 2021-03-10 15:11:36 +0800 CSTExpiration: 2021-04-09 15:11:26 +0800 CSTTotal items to be backed up: 92Items backed up: 92Velero-Native Snapshots: <none included>
Restic Backups (specify --details for more information): Completed: 1
删除pod和pvc
[root@xxxxxx ceph]# kubectl delete pod rbd-testpod "rbd-test" deletedkubectl delete pvc[root@p48453v ceph]# kubectl delete pvc rbd-pvc-zhfpersistentvolumeclaim "rbd-pvc-zhf" deleted
恢复资源对象
[root@xxxxx ceph]# velero restore create testpvc05 --restore-volumes=true--from-backup testpvc05Restore request "testpvc05" submitted successfully.Run `velero restore describe testpvc05` or `velero restore logs testpvc05` formore details.[root@xxxxxx ceph]#[root@xxxxxx ceph]# kuebctl^C[root@xxxxxx ceph]# kubectl get podNAME READY STATUS RESTARTS AGEnginx-6799fc88d8-4bnfg 1/1 Running 0 2d22hrbd-test 0/1 Init:0/1 0 6s

数据恢复显示

[root@xxxxxx ceph]# kubectl exec rbd-test sh -- ls -l /var/lib/www/htmltotal 20drwx------ 2 root root 16384 Mar 10 06:31 lost+found-rw-r--r-- 1 root root 13 Mar 10 07:11 zheng.txt[root@xxxxxx ceph]# kubectl exec rbd-test sh -- cat/var/lib/www/html/zheng.txtzhenghongfei[root@xxxxx ceph]#


4

HOOK

Velero持在备份期间在Pod中的容器中执命令。
metadata: name: nginx-deployment namespace: nginx-examplespec: replicas: selector: matchLabels: app: nginx template: metadata: labels: app: nginx annotations: pre.hook.backup.velero.io/container: fsfreeze pre.hook.backup.velero.io/command: '["/sbin/fsfreeze", "--freeze","/var/log/nginx"]' post.hook.backup.velero.io/container: fsfreeze post.hook.backup.velero.io/command: '["/sbin/fsfreeze", "--unfreeze","/var/log/nginx"]'

引导使前置和后置挂钩冻结件系统。冻结件系统有助于确保所有挂起的磁盘IO操作在拍摄快照之 前已经完成。

当然我们可以使这种式执备份mysql或其他的件,但是只建议使⽤⼩⽂件会备份恢复,针对于 pod进备份恢复。

5

探究备份实现

查找有哪些资源对象需要备份
 collector := &itemCollector{ log: log, backupRequest: backupRequest, discoveryHelper: kb.discoveryHelper, dynamicFactory: kb.dynamicFactory, cohabitatingResources: cohabitatingResources(), dir: tempDir, } items := collector.getAllItems()

调⽤函数

func (kb *kubernetesBackupper) backupItem(log logrus.FieldLogger, grschema.GroupResource, itemBackupper *itemBackupper, unstructured*unstructured.Unstructured, preferredGVR schema.GroupVersionResource) bool { backedUpItem, err := itemBackupper.backupItem(log, unstructured, gr,preferredGVR) if aggregate, ok := err.(kubeerrs.Aggregate); ok { log.WithField("name", unstructured.GetName()).Infof("%d errors encounteredbackup up item", len(aggregate.Errors())) // log each error separately so we get error location info in the log, andan // accurate count of errors for _, err = range aggregate.Errors() { log.WithError(err).WithField("name",unstructured.GetName()).Error("Error backing up item") } return false } if err != nil { log.WithError(err).WithField("name", unstructured.GetName()).Error("Errorbacking up item") return false } return backedUpItem}
通过clientset⽅式获取相应的资源
client, err :=ib.dynamicFactory.ClientForGroupVersionResource(gvr.GroupVersion(), resource,additionalItem.Namespace) if err != nil { return nil, err } item, err := client.Get(additionalItem.Name, metav1.GetOptions{})
将数据保存到⽂件中。
 log.Debugf("Resource %s/%s, version= %s, preferredVersion=%s",groupResource.String(), name, version, preferredVersion) if version == preferredVersion { if namespace != "" { filePath = filepath.Join(velerov1api.ResourcesDir,groupResource.String(), velerov1api.NamespaceScopedDir, namespace,name+".json")PX-Backup kanisterhttps://github.com/vmware-tanzu/velerohttps://portworx.com/https://www.kasten.io/https://github.com/kanisterio/kanisterhttps://duyanghao.github.io/kubernetes-ha-and-bur/https://blog.kubernauts.io/backup-and-restore-of-kubernetes-applications-using-heptios-velerowith-restic-and-rook-ceph-as-2e8df15b1487 } else { filePath = filepath.Join(velerov1api.ResourcesDir,groupResource.String(), velerov1api.ClusterScopedDir, name+".json") } hdr = &tar.Header{ Name: filePath, Size: int64(len(itemBytes)), Typeflag: tar.TypeReg, Mode: 0755, ModTime: time.Now(), } if err := ib.tarWriter.WriteHeader(hdr); err != nil { return false, errors.WithStack(err) } if _, err := ib.tarWriter.Write(itemBytes); err != nil { return false, errors.WithStack(err) }}

5

其他的备份工具

PX-Backup 需要交费的产品,⼈⺠币玩家可以更加强⼤。kanister更倾向于数据上的存储和恢复,⽐如etcd的snap,mongo等。

参考链接:

https://github.com/vmware-tanzu/velero https://portworx.com/ https://www.kasten.io/ https://github.com/kanisterio/kanister https://duyanghao.github.io/kubernetes-ha-and-bur/ https://blog.kubernauts.io/backup-and-restore-of-kubernetes-applications-using-heptios-velerowith-restic-and-rook-ceph-as-2e8df15b1487

- END -

公众号后台回复「加群」加入一线高级工程师技术交流群,一起交流进步。

 推荐阅读 

让运维简单高效,轻松搞定运维管理平台 
Kubernetes 1.21正式发布 | 主要变化解读
搭建一套完整的企业级 K8s 集群(v1.20,二进制方式)
记一次 Kubernetes 机器内核问题排查
Shell 脚本进阶,经典用法及其案例
Kubernetes 集群网络从懵圈到熟悉
记一次 Linux服务器被入侵后的排查思路
5个面试的关键技巧,助你拿到想要的offer!



点亮,服务器三年不宕机

浏览 38
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报
评论
图片
表情
推荐
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报