【死磕 Redis】----- Redis 数据结构：sds-技术圈

字符串使我们在编程过程中使用最为广泛的对象了，在 Redis 中同样如此。我们知道 Redis 是 C 语言实现的，但是 Redis 放弃了 C 语言传统的字符串而是自己创建了一种名为简单动态字符串 SDS（Simple Dynamic String）的抽象类型，并将 SDS 用作 Redis 的默认字符串表示，其主要原因就是传统的字符串表示方式并不能满足 Redis 对字符串在安全性、效率、以及功能方面的要求。所以这篇文章就来说说 SDS。

在 Redis 里面，只会将 C 语言字符串当做字符串字面量，用于一些无须对字符串进行修改的地方，比如打印日志。在大多数场景下，Redis 都是使用 SDS 来作为字符串的表示。

对比 C 语言字符串，SDS 具有如下优点：

常数复杂度获取字符串长度。
杜绝缓冲区溢出。
减少修改字符串长度时所需的内存重分配次数。
二进制安全。
兼容部分 C 字符串函数。

SDS 的定义

SDS 的源码主要实现在 sds.c 和 sds.h 两个文件中。其定义为：

struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};

struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

从上述代码可以看出，每一个 sdshdr 都是由以下几个部分组成（sdshdr5除外）：

len：SDS 字符串已使用的空间
alloc：申请的空间，减去len就是未使用的空间，初始时和len一致。
flag：只使用了低三位表示类型，细化了SDS的分类，根据字符串的长度的不同选择不同的 SDS 结构体，而结构体的主要区别是 len 和 alloc 的类型，这样做可以节省一部分空间大小，毕竟在 Redis 字符串非常多，进一步的可以节省空间。
buf：char 类型数组，表示不定长字符串。

SDS 采用一段连续的内存空间来存储字符串，下图是字符串 “Redis” 在内存中的示例图：

SDS 与 C 字符串的区别

SDS相比C字符串，在安全性、性能、功能性具有优势：

安全性：防止缓冲区溢出、二进制安全
性能：获取字符串长度、空间预分配、惰性释放
功能性：二进制安全、兼容部分C字符串函数

缓冲区溢出

缓冲区溢出（buffer overflow）：是这样的一种异常，当程序将数据写入缓冲区时，会超过缓冲区的边界，并覆盖相邻的内存位置。

C 字符串不记录自身长度，不会自动进行边界检查，所以会增加溢出的风险。如下面函数

char* strcat(char* dest, const char* src);

该函数是将 src 字符串内容拼接到 dest 字符串的末尾。假如有 s1 = “Redis”，s2 = "MongoDB"，如下：

当执行 strcat(s1,'Cluster') 时，未给 s1 分配足够的内存空间，s1 的数据将会溢出到 s2 所在的内存空间，导致 s2 保存的内容被修改，如下：

与 C 字符串不同，SDS 杜绝了发生缓存溢出的可能性，他会按照如下步骤进行：

先检查 SDS 的空间是否满足修改所需的要求
如果不满足要求的话，API 会自动将 SDS 的空间扩展到执行修改所需的大小
最后才是执行实际的修改操作

例子可见：sds.c/sdscat：

sds sdscatlen(sds s, const void *t, size_t len) {
    size_t curlen = sdslen(s);

    s = sdsMakeRoomFor(s,len);
    if (s == NULL) return NULL;
    memcpy(s+curlen, t, len);
    sdssetlen(s, curlen+len);
    s[curlen+len] = '\0';
    return s;
}

常数复杂度获取字符串长度

我们知道 C 字符串是不会记录自身的长度信息，因此我们要获取一个 C 字符串的长度，需要变脸整个字符串，直到遇到第一个 '\0'，复杂度为 O(n)。但是 SDS 记录了自身长度 len，因此其复杂度降为 O(1) 就能获取字符串的长度。

空间预分配

当 SDS 的 API 要对一个 SDS 进行修改，并且需要对 SDS 进行空间扩展的时候，程序不仅会为 SDS 分配修改所必须要的空间，还会为 SDS 分配额外的未使用的空间，具体策略见 sds.c/sdsMakeRoomFor，如下：

sds sdsMakeRoomFor(sds s, size_t addlen) {
    void *sh, *newsh;
    size_t avail = sdsavail(s);
    size_t len, newlen;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen;

    /* Return ASAP if there is enough space left. */
    if (avail >= addlen) return s;

    len = sdslen(s);
    sh = (char*)s-sdsHdrSize(oldtype);
    newlen = (len+addlen);
    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;

    type = sdsReqType(newlen);

    /* Don't use type 5: the user is appending to the string and type 5 is
     * not able to remember empty space, so sdsMakeRoomFor() must be called
     * at every appending operation. */
    if (type == SDS_TYPE_5) type = SDS_TYPE_8;

    hdrlen = sdsHdrSize(type);
    if (oldtype==type) {
        newsh = s_realloc(sh, hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        s = (char*)newsh+hdrlen;
    } else {
        /* Since the header size changes, need to move the string forward,
         * and can't use realloc */
        newsh = s_malloc(hdrlen+newlen+1);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh);
        s = (char*)newsh+hdrlen;
        s[-1] = type;
        sdssetlen(s, len);
    }
    sdssetalloc(s, newlen);
    return s;
}