最近在研讀tokyo cabinet的代碼,但是發現一個問題。
目前在跟進的是使用hash table實現的數據庫文件,在項目的examples目錄下面有一些作者寫的demo文件演示如何使用的。我使用這里的tchdbex.c文件跟蹤代碼的運行情況,不過在里面加入了下面兩行代碼:
/* store records */
if(!tchdbput2(hdb, "foo", "hop") ||
!tchdbput2(hdb, "bar", "step") ||
!tchdbput2(hdb, "baz", "jump")){
ecode = tchdbecode(hdb);
fprintf(stderr, "put error: %s\n", tchdberrmsg(ecode));
}
+ tchdbout2(hdb, "foo");
+ tchdbput2(hdb, "foo", "hop");
(代碼前面加上+號的是我加上的兩行代碼)
可以看到,我在加入了記錄(“foo”, “hop”)之后,對它進行了刪除操作,在這之后緊跟著再次插入相同的記錄。
tokyo cabinet的hash-table實現中,刪除一條記錄時會將塌存放到一個free pool類型的數組中,這里面保存了這條刪除記錄的大小和在文件中的偏移量。
問題在于,在加入記錄時,記錄的大小是32byte,刪除記錄時理所當然的也是刪除一條大小為32byte的記錄了,但是呢,緊跟著再次插入同樣的記錄時,代碼中(tchdb.c):
3417 rec.rsiz = HDBMAXHSIZ + ksiz + vsiz;
3418 if(!tchdbfbpsearch(hdb, &re
3417行算出這條新插入的記錄應該是38byte,于是走入下面的函數tchdbfbpsearch
查找能滿足這個大小的freepool,之前返回的freepool是32,于是這個查找失敗了,不得不重新分配空間來存放這個新的記錄,而不是復用已經返回的空間---盡管前后兩次插入的數據大小是一樣的。
其實,在上面的代碼中,第二次插入記錄的時候,查找可用的freepool失敗之后繼續往下走,最后真正插入到文件中的數據記錄大小還是32的。
我給作者發去郵件,咨詢這個問題,作者的回答大意是說我描述的這個現象確實存在,不過是有一定的考慮在里面的,后期會進行一些改進。我沒有完全的把這部分代碼閱讀完畢,所以也就不好多說什么了,下面附上郵件內容,做一個記錄,也為可能會發現這個問題的朋友提個醒吧。
我使用的版本是1.4.19。
chuang
發送至 hirarin
顯示詳細信息 15:10 (6 小時前)
Hi, hirarin,recently, I try to read and trace the tokyocabinet source.
when I use the examples/tchdbex.c to trace hash-table, I find a problem.
In the file tchdbex.c, I add two lines:
/* store records */
if(!tchdbput2(hdb, "foo", "hop") ||
!tchdbput2(hdb, "bar", "step") ||
!tchdbput2(hdb, "baz", "jump")){
ecode = tchdbecode(hdb);
fprintf(stderr, "put error: %s\n", tchdberrmsg(ecode));
}
+ tchdbout2(hdb, "foo");
+ tchdbput2(hdb, "foo", "hop");
so, you can see that after insert key "foo","bar" and "baz", I try to remove the record with the key "foo", and then insert record ("foo","hop") again.
But, when i use gdb to trace the program, i find that, when i first insert the record ("foo", "hop"), the record size is 32.
and then, when i remove the record ("foo", "hop"), the record size is 32, it is ok, then a free block with size 32 is inserted into the free block arrays.
But, when I insert record ("foo", "hop") once again, in the file tchdb.c:3417:
3417 rec.rsiz = HDBMAXHSIZ + ksiz + vsiz;
then the record size is 38
and then the next line it tries to find a free block fix to this size, but there is only free block with size 32, so it is falied to find a free block.
I mean that, if I remove the record ("foo", "hop") and then try to insert it again, seems that it should use the free block with size 32.Otherwise, there will be missing retrieval of the free block.
So, is it a bug??
BTW: the version is 1.4.19
|
Mikio Hirabayashi
發送至 我
顯示詳細信息 16:37 (5 小時前)
Hi,
Thanks for the report.
It's not a bug but on purpose.
The header size is not calcurated at the line, I estimate it the
maximum size in theory.
However, I hit on an idia thanks to you. I'll change the logic to
reduce the file size.
Regards.