为何redis cluster偏偏使用16384个槽-技术圈

昨天跟同事讨论redis集群，谈到redis cluster时随口吹嘘了一遍工作机制："redis cluster采用虚拟槽分区，将key根据哈希函数映射到了16384个槽位... ..."云云

随即同事A：“为何redis cluster使用16384个槽位？”

是呀，redis cluster使用slot=CRC16(key) & 16384计算槽位。而hash函数crc16()产生的hash值有16位，自然会产生2^16=65536个值。也就是hash的值分布在0-65535范围内，按道理我们应该使用65536来进行mod操作，为何使用16384呢？

查了下，果然早有人有此疑问(https://github.com/redis/redis/issues/2576)，而且作者也给出了解释：

The reason is:
Normal heartbeat packets carry the full configuration of a node, that can be replaced in an idempotent way with the old in order to update an old config. This means they contain the slots configuration for a node, in raw form, that uses 2k of space with16k slots, but would use a prohibitive 8k of space using 65k slots.
At the same time it is unlikely that Redis Cluster would scale to more than 1000 mater nodes because of other design tradeoffs.
So 16k was in the right range to ensure enough slots per master with a max of 1000 maters, but a small enough number to propagate the slot configuration as a raw bitmap easily. Note that in small clusters the bitmap would be hard to compress because when N is small the bitmap would have slots/N bits set that is a large percentage of bits set.

总结一下，主要两个原因：

消息大小的考虑，槽位数越大，维护槽位信息占用空间越大，浪费带宽，也容易导致网络拥塞。

redis cluster中将节点加入到集群，需要执行cluster meet ip:port来完成节点的握手操作，之后节点间就可以通过定期ping-pong来交换信息，其消息头结构体如下:

#define CLUSTER_SLOTS 16384
typedef struct {
    char sig[4];        /* Signature "RCmb" (Redis Cluster message bus). */
    uint32_t totlen;    /* Total length of this message */
    uint16_t ver;       /* Protocol version, currently set to 1. */
    uint16_t port;      /* TCP base port number. */
    uint16_t type;      /* Message type */
    uint16_t count;     /* Only used for some kind of messages. */
    uint64_t currentEpoch;  /* The epoch accordingly to the sending node. */
    uint64_t configEpoch;   /* The config epoch if it's a master, or the last
                               epoch advertised by its master if it is a
                               slave. */
    uint64_t offset;    /* Master replication offset if node is a master or
                           processed replication offset if node is a slave. */
    char sender[CLUSTER_NAMELEN]; /* Name of the sender node */
    unsigned char myslots[CLUSTER_SLOTS/8];
    char slaveof[CLUSTER_NAMELEN];
    char myip[NET_IP_STR_LEN];    /* Sender IP, if not all zeroed. */
    char notused1[34];  /* 34 bytes reserved for future usage. */
    uint16_t cport;      /* Sender TCP cluster bus port */
    uint16_t flags;      /* Sender node flags */
    unsigned char state; /* Cluster state from the POV of the sender */
    unsigned char mflags[3]; /* Message flags: CLUSTERMSG_FLAG[012]_... */
    union clusterMsgData data;
} clusterMsg;

其中的unsigned char myslots[CLUSTER_SLOTS/8];维护了当前节点持有槽信息的bitmap。每一位代表一个槽，对应位为1表示此槽属于当前节点。因为#define CLUSTER_SLOTS 16384故而myslots占用空间为:16384/8/1024=2kb,但如果#define CLUSTER_SLOTS为65536,则占用了8kb。

而且在消息体中也会携带其他节点的信息用于交换。这个“其他节点的信息”具体约为集群节点数量的1/10，至少携带3个节点的信息。故而集群节点越多，消息内容占用空间就越大。

redis集群的主节点数据一般不可能超过1000个。

节点越多，交换信息报文也越大；另一方面因为节点槽位信息是通过bitmap维护的，传输过程中会对bitmap进行压缩。如果槽位越小，节点也少的情况下，bitmap的填充率slots/N(N表示节点数)就较小，对应压缩率就高。反之节点很少槽位很多则压缩率就很低。

所以综合考虑，作者觉得实际上16384个槽位就够了。

如果阅读过程中发现本文存疑或错误的地方，可以关注公众号留言。点赞在看人灿烂😁