隨著nosql風潮興起,redis作為當中一個耀眼的明星,也越來越多的被關注和使用,我在工作中也廣泛的用到了redis來充當cache和key-value DB,但當大家發現數據越來越多時,不禁有些擔心,redis能撐的住嗎,雖然官方已經有漂亮的benchmark,自己也可以做做壓力測試,但是看看源碼,也是確認問題最直接的辦法之一。比如目前我們要確認的一個問題是,redis是如何刪除過期數據的?
用一個可以"find reference"的IDE,沿著setex(Set the value and expiration of a key)命令一窺究竟:
void setexCommand(redisClient *c) {
c->argv[3] = tryObjectEncoding(c->argv[3]);
setGenericCommand(c,0,c->argv[1],c->argv[3],c->argv[2]);
}
setGenericCommand是一個實現set,setnx,setex的通用函數,參數設置不同而已。
void setCommand(redisClient *c) {
c->argv[2] = tryObjectEncoding(c->argv[2]);
setGenericCommand(c,0,c->argv[1],c->argv[2],NULL);
}
void setnxCommand(redisClient *c) {
c->argv[2] = tryObjectEncoding(c->argv[2]);
setGenericCommand(c,1,c->argv[1],c->argv[2],NULL);
}
void setexCommand(redisClient *c) {
c->argv[3] = tryObjectEncoding(c->argv[3]);
setGenericCommand(c,0,c->argv[1],c->argv[3],c->argv[2]);
}
再看setGenericCommand:
1 void setGenericCommand(redisClient *c, int nx, robj *key, robj *val, robj *expire) {
2 long seconds = 0; /* initialized to avoid an harmness warning */
3
4 if (expire) {
5 if (getLongFromObjectOrReply(c, expire, &seconds, NULL) != REDIS_OK)
6 return;
7 if (seconds <= 0) {
8 addReplyError(c,"invalid expire time in SETEX");
9 return;
10 }
11 }
12
13 if (lookupKeyWrite(c->db,key) != NULL && nx) {
14 addReply(c,shared.czero);
15 return;
16 }
17 setKey(c->db,key,val);
18 server.dirty++;
19 if (expire) setExpire(c->db,key,time(NULL)+seconds);
20 addReply(c, nx ? shared.cone : shared.ok);
21 }
22
13行處理"Set the value of a key, only if the key does not exist"的場景,17行插入這個key,19行設置它的超時,注意時間戳已經被設置成了到期時間。這里要看一下redisDb(即c->db)的定義:
typedef struct redisDb {
dict *dict; /* The keyspace for this DB */
dict *expires; /* Timeout of keys with a timeout set */
dict *blocking_keys; /* Keys with clients waiting for data (BLPOP) */
dict *io_keys; /* Keys with clients waiting for VM I/O */
dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
int id;
} redisDb;
僅關注dict和expires,分別來存key-value和它的超時,也就是說如果一個key-value是有超時的,那么它會存在dict里,同時也存到expires里,類似這樣的形式:dict[key]:value,expires[key]:timeout.
當然key-value沒有超時,expires里就不存在這個key。剩下setKey和setExpire兩個函數無非是插數據到兩個字典里,這里不再詳述。
那么redis是如何刪除過期key的呢。
通過查看dbDelete的調用者,首先注意到這一個函數,是用來刪除過期key的。
1 int expireIfNeeded(redisDb *db, robj *key) {
2 time_t when = getExpire(db,key);
3
4 if (when < 0) return 0; /* No expire for this key */
5
6 /* Don't expire anything while loading. It will be done later. */
7 if (server.loading) return 0;
8
9 /* If we are running in the context of a slave, return ASAP:
10 * the slave key expiration is controlled by the master that will
11 * send us synthesized DEL operations for expired keys.
12 *
13 * Still we try to return the right information to the caller,
14 * that is, 0 if we think the key should be still valid, 1 if
15 * we think the key is expired at this time. */
16 if (server.masterhost != NULL) {
17 return time(NULL) > when;
18 }
19
20 /* Return when this key has not expired */
21 if (time(NULL) <= when) return 0;
22
23 /* Delete the key */
24 server.stat_expiredkeys++;
25 propagateExpire(db,key);
26 return dbDelete(db,key);
27 }
28
ifNeed表示能刪則刪,所以4行沒有設置超時不刪,7行在"loading"時不刪,16行非主庫不刪,21行未到期不刪。25行同步從庫和文件。
再看看哪些函數調用了expireIfNeeded,有lookupKeyRead,lookupKeyWrite,dbRandomKey,existsCommand,keysCommand。通過這些函數命名可以看出,只要訪問了某一個key,順帶做的事情就是嘗試查看過期并刪除,這就保證了用戶不可能訪問到過期的key。但是如果有大量的key過期,并且沒有被訪問到,那么就浪費了許多內存。Redis是如何處理這個問題的呢。
dbDelete的調用者里還發現這樣一個函數:
1 /* Try to expire a few timed out keys. The algorithm used is adaptive and
2 * will use few CPU cycles if there are few expiring keys, otherwise
3 * it will get more aggressive to avoid that too much memory is used by
4 * keys that can be removed from the keyspace. */
5 void activeExpireCycle(void) {
6 int j;
7
8 for (j = 0; j < server.dbnum; j++) {
9 int expired;
10 redisDb *db = server.db+j;
11
12 /* Continue to expire if at the end of the cycle more than 25%
13 * of the keys were expired. */
14 do {
15 long num = dictSize(db->expires);
16 time_t now = time(NULL);
17
18 expired = 0;
19 if (num > REDIS_EXPIRELOOKUPS_PER_CRON)
20 num = REDIS_EXPIRELOOKUPS_PER_CRON;
21 while (num--) {
22 dictEntry *de;
23 time_t t;
24
25 if ((de = dictGetRandomKey(db->expires)) == NULL) break;
26 t = (time_t) dictGetEntryVal(de);
27 if (now > t) {
28 sds key = dictGetEntryKey(de);
29 robj *keyobj = createStringObject(key,sdslen(key));
30
31 propagateExpire(db,keyobj);
32 dbDelete(db,keyobj);
33 decrRefCount(keyobj);
34 expired++;
35 server.stat_expiredkeys++;
36 }
37 }
38 } while (expired > REDIS_EXPIRELOOKUPS_PER_CRON/4);
39 }
40 }
41
這個函數的意圖已經有說明:刪一點點過期key,如果過期key較少,那也只用一點點cpu。25行隨機取一個key,38行刪key成功的概率較低就退出。這個函數被放在一個cron里,每毫秒被調用一次。這個算法保證每次會刪除一定比例的key,但是如果key總量很大,而這個比例控制的太大,就需要更多次的循環,浪費cpu,控制的太小,過期的key就會變多,浪費內存——這就是時空權衡了。
最后在dbDelete的調用者里還發現這樣一個函數:
/* This function gets called when 'maxmemory' is set on the config file to limit
* the max memory used by the server, and we are out of memory.
* This function will try to, in order:
*
* - Free objects from the free list
* - Try to remove keys with an EXPIRE set
*
* It is not possible to free enough memory to reach used-memory < maxmemory
* the server will start refusing commands that will enlarge even more the
* memory usage.
*/
void freeMemoryIfNeeded(void)
這個函數太長就不再詳述了,注釋部分說明只有在配置文件中設置了最大內存時候才會調用這個函數,而設置這個參數的意義是,你把redis當做一個內存cache而不是key-value數據庫。
以上3種刪除過期key的途徑,第二種定期刪除一定比例的key是主要的刪除途徑,第一種“讀時刪除”保證過期key不會被訪問到,第三種是一個當內存超出設定時的暴力手段。由此也能看出redis設計的巧妙之處,