基于Helm离线部署高可用Zookeeper

Zookeeper Helm Chart

This helm chart provides an implementation of the ZooKeeper StatefulSet found in Kubernetes Contrib Zookeeper StatefulSet.

Prerequisites

  • Kubernetes 1.6
  • PersistentVolume support on the underlying infrastructure
  • A dynamic provisioner for the PersistentVolumes
  • A familiarity with Apache ZooKeeper 3.4.x

Chart Components

This chart will do the following:

  • Create a fixed size ZooKeeper ensemble using a StatefulSet.
  • Create a PodDisruptionBudget so kubectl drain will respect the Quorum size of the ensemble.
  • Create a Headless Service to control the domain of the ZooKeeper ensemble.
  • Create a Service configured to connect to the available ZooKeeper instance on the configured client port.
  • Optionally, apply a Pod Anti-Affinity to spread the ZooKeeper ensemble across nodes.

Installing the Chart

You can install the chart with the release name myzk as below.

$ helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
$ helm install --name myzk incubator/zookeeper

If you do not specify a name, helm will select a name for you.

Installed Components

You can use kubectl get to view all of the installed components.

$ kubectl get all -l app=zookeeper
NAME                   READY     STATUS    RESTARTS   AGE
po/myzk-zookeeper-0   1/1       Running   0          2m
po/myzk-zookeeper-1   1/1       Running   0          1m
po/myzk-zookeeper-2   1/1       Running   0          52s

NAME                           CLUSTER-IP    EXTERNAL-IP   PORT(S)             AGE
svc/myzk-zookeeper            10.3.255.86   <none>        2181/TCP            2m
svc/myzk-zookeeper-headless   None          <none>        2888/TCP,3888/TCP   2m

NAME                           DESIRED   CURRENT   AGE
statefulsets/myzk-zookeeper   3         3         2m
  1. statefulsets/myzk-zookeeper is the StatefulSet created by the chart.
  2. po/myzk-zookeeper-<0|1|2> are the Pods created by the StatefulSet. Each Pod has a single container running a ZooKeeper server.
  3. svc/myzk-zookeeper-headless is the Headless Server used to control the network domain of the ZooKeeper ensemble.
  4. svc/myzk-zookeeper is a Service that can be used by clients to connect to an available ZooKeeper server.

Configuration

You can specify each parameter using the --set key=value[,key=value] argument to helm install.

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install --name my-release -f values.yaml incubator/zookeeper

Tip: You can use the default values.yaml

Resources

The configuration parameters in this section control the resources requested and utilized by the ZooKeeper ensemble.

Parameter Description Default
servers The number of ZooKeeper servers. This should always be (1,3,5, or 7) 3
minAvailable The minimum number of servers that must be available during evictions. This should in the interval [(servers/2) + 1,(servers - 1)]. servers-1
resources.requests.cpu The amount of CPU to request. As ZooKeeper is not very CPU intensive, 2 is a good choice to start with for a production deployment. 500m
heap The amount of JVM heap that the ZooKeeper servers will use. As ZooKeeper stores all of its data in memory, this value should reflect the size of your working set. The JVM -Xms/-Xmx format is used. 2G
resources.requests.memory The amount of memory to request. This value should be at least 2 GiB larger than heap to avoid swapping. You many want to use 1.5 * heap for values larger than 2GiB. The Kubernetes format is used. 2Gi
storage The amount of storage to request. Even though ZooKeeper keeps is working set in memory, it logs all transactions, and periodically snapshots, to storage media. The amount of storage required will vary with your workload, working memory size, and log and snapshot retention policy. Note that, on some cloud providers selecting a small volume size will result is sub-par I/O performance. 250 GiB is a good place to start for production workloads. 50Gi
storageClass The storage class of the storage allocated for the ensemble. If this value is present, it will add an annotation asking the PV Provisioner for that storage class. default

Network

These parameters control the network ports on which the ensemble communicates.

Parameter Description Default
serverPort The port on which the ZooKeeper servers listen for requests from other servers in the ensemble. 2888
leaderElectionPort The port on which the ZooKeeper servers perform leader election. 3888
clientPort The port on which the ZooKeeper servers listen for client requests. 2181
clientCnxns The maximum number of simultaneous client connections that each server in the ensemble will allow. 60

Time

ZooKeeper uses the Zab protocol to replicate its state machine across the ensemble. The following parameters control the timeouts for the protocol.

Parameter Description Default
tickTimeMs The number of milliseconds in one ZooKeeper Tick. You might want to increase this value if the network latency is high or unpredictable in your environment. 2000
initTicks The amount of time, in Ticks, that a follower is allowed to connect to and sync with a leader. Increase this value if the amount of data stored on the servers is large. 10
syncTicks The amount of time, in Ticks, that a follower is allowed to lag behind a leader. If the follower is longer than syncTicks behind the leader, the follower is dropped. 5

Log Retention

ZooKeeper writes its WAL (Write Ahead Log) and periodic snapshots to storage media. These parameters control the retention policy for snapshots and WAL segments. If you do not configure the ensemble to automatically periodically purge snapshots and logs, it is important to implement such a mechanism yourself. Otherwise, you will eventually exhaust all available storage media.

Parameter Description Default
snapRetain The number of snapshots to retain on disk. If purgeHours is set to 0 this parameter has no effect. 3
purgeHours The amount of time, in hours, between ZooKeeper snapshot and log purges. Setting this to 0 will disable purges. 1

Spreading

Spreading allows you specify an anti-affinity between ZooKeeper servers in the ensemble. This will prevent the Pods from being scheduled on the same node.

Parameter Description Default
antiAffinity If present it must take the values 'hard' or 'soft'. 'hard' will cause the Kubernetes scheduler to not schedule the Pods on the same physical node under any circumstances 'soft' will cause the Kubernetes scheduler to make a best effort to not co-locate the Pods, but, if the only available resources are on the same node, the scheduler will co-locate them. hard

Logging

In order to allow for the default installation to work well with the log rolling and retention policy of Kubernetes, all logs are written to stdout. This should also be compatible with logging integrations such as Google Cloud Logging and ELK.

Parameter Description Default
logLevel The log level of the ZooKeeper applications. One of ERROR,WARN,INFO,DEBUG. INFO

Liveness and Readiness

The servers in the ensemble have both liveness and readiness checks specified. These parameters can be used to tune the sensitivity of the liveness and readiness checks.

Parameter Description Default
probeInitialDelaySeconds The initial delay before the liveness and readiness probes will be invoked. 15
probeTimeoutSeconds The amount of time before the probes are considered to be failed due to a timeout. 5

ImagePull

This parameter controls when the image is pulled from the repository.

Parameter Description Default
imagePullPolicy The policy for pulling the image from the repository. Always

Deep dive

Image Details

The image used for this chart is based on Ubuntu 16.04 LTS. This image is larger than Alpine or BusyBox, but it provides glibc, rather than ulibc or mucl, and a JVM release that is built against it. You can easily convert this chart to run against a smaller image with a JVM that is build against that images libc. However, as far as we know, no Hadoop vendor supports, or has verified, ZooKeeper running on such a JVM.

JVM Details

The Java Virtual Machine used for this chart is the OpenJDK JVM 8u111 JRE (headless).

ZooKeeper Details

The ZooKeeper version is the latest stable version (3.4.9). The distribution is installed into /opt/zookeeper-3.4.9. This directory is symbolically linked to /opt/zookeeper. Symlinks are created to simulate a rpm installation into /usr.

Failover

You can test failover by killing the leader. Insert a key:

$ kubectl exec <RELEASE-NAME>-zookeeper-0 -- /opt/zookeeper/bin/zkCli.sh create /foo bar;
$ kubectl exec <RELEASE-NAME>-zookeeper-2 -- /opt/zookeeper/bin/zkCli.sh get /foo;

Watch existing members:

$ kubectl run --attach bbox --image=busybox --restart=Never -- sh -c 'while true; do for i in 0 1 2; do echo zk-${i} $(echo stats | nc <pod-name>-${i}.<headless-service-name>:2181 | grep Mode); sleep 1; done; done';

zk-2 Mode: follower
zk-0 Mode: follower
zk-1 Mode: leader
zk-2 Mode: follower

Delete Pods and wait for the StatefulSet controller to bring them back up:

$ kubectl delete po -l app=zookeeper
$ kubectl get po --watch-only
NAME                READY     STATUS    RESTARTS   AGE
myzk-zookeeper-0   0/1       Running   0          35s
myzk-zookeeper-0   1/1       Running   0         50s
myzk-zookeeper-1   0/1       Pending   0         0s
myzk-zookeeper-1   0/1       Pending   0         0s
myzk-zookeeper-1   0/1       ContainerCreating   0         0s
myzk-zookeeper-1   0/1       Running   0         19s
myzk-zookeeper-1   1/1       Running   0         40s
myzk-zookeeper-2   0/1       Pending   0         0s
myzk-zookeeper-2   0/1       Pending   0         0s
myzk-zookeeper-2   0/1       ContainerCreating   0         0s
myzk-zookeeper-2   0/1       Running   0         19s
myzk-zookeeper-2   1/1       Running   0         41s

...

zk-0 Mode: follower
zk-1 Mode: leader
zk-2 Mode: follower

Check the previously inserted key:

$ kubectl exec myzk-zookeeper-1 -- /opt/zookeeper/bin/zkCli.sh get /foo
ionid = 0x354887858e80035, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
bar

Scaling

ZooKeeper can not be safely scaled in versions prior to 3.5.x. There are manual procedures for scaling an ensemble, but as noted in the ZooKeeper 3.5.2 documentation these procedures require a rolling restart, are known to be error prone, and often result in a data loss.

While ZooKeeper 3.5.x does allow for dynamic ensemble reconfiguration (including scaling membership), the current status of the release is still alpha, and it is not recommended for production use.

Limitations

  • StatefulSet and PodDisruptionBudget are beta resources.
  • Only supports storage options that have backends for persistent volume claims.
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,110评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,443评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,474评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,881评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,902评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,698评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,418评论 3 419
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,332评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,796评论 1 316
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,968评论 3 337
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,110评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,792评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,455评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,003评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,130评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,348评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,047评论 2 355