• <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            隨筆 - 42  文章 - 3  trackbacks - 0
            <2010年4月>
            28293031123
            45678910
            11121314151617
            18192021222324
            2526272829301
            2345678

            常用鏈接

            留言簿(2)

            隨筆檔案

            文章檔案

            網(wǎng)頁收藏

            搜索

            •  

            最新評論

            閱讀排行榜

            評論排行榜


            The original post is
            http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/

            Over the past couple of years, auto-complete has popped up all over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn and lots of other websites all try to complete your phrase as soon as you start typing.

            Auto-complete definitely makes for a nice user experience, but it can be a challenge to implement efficiently. In many cases, an efficient implementation requires the use of interesting algorithms and data structures. In this blog post, I will describe one simple data structure that can be used to implement auto-complete: a ternary search tree.

            Trie: simple but space-inefficient

            Before discussing ternary search trees, let’s take a look at a simple data structure that supports a fast auto-complete lookup but needs too much memory: a trie. A trie is a tree-like data structure in which each node contains an array of pointers, one pointer for each character in the alphabet. Starting at the root node, we can trace a word by following pointers corresponding to the letters in the target word.

            Each node could be implemented like this in C#:

            class TrieNode
            {
            public const int ALPHABET_SIZE = 26;
            public TrieNode[] m_pointers = new TrieNode[ALPHABET_SIZE];
            public bool m_endsString = false;
            }

            Here is a trie that stores words AB, ABBA, ABCD, and BCD. Nodes that terminate words are marked yellow:

             

            gif_1

             

            Implementing auto complete using a trie is easy. We simply trace pointers to get to a node that represents the string the user entered. By exploring the trie from that node down, we can enumerate all strings that complete user’s input.

            But, a trie has a major problem that you can see in the diagram above. The diagram only fits on the page because the trie only supports four letters {A,B,C,D}. If we needed to support all 26 English letters, each node would have to store 26 pointers. And, if we need to support international characters, punctuation, or distinguish between lowercase and uppercase characters, the memory usage grows becomes untenable.

            Our problem has to do with the memory taken up by all the null pointers stored in the node arrays. We could consider using a different data structure in each node, such as a hash map. However, managing thousands and thousands of hash maps is generally not a good idea, so let’s take a look at a better solution.

            Ternary search tree to the rescue

            A ternary tree is a data structure that solves the memory problem of tries in a more clever way. To avoid the memory occupied by unnecessary pointers, each trie node is represented as a tree-within-a-tree rather than as an array. Each non-null pointer in the trie node gets its own node in a ternary search tree.

            For example, the trie from the example above would be represented in the following way as a ternary search tree:

            image

            The ternary search tree contains three types of arrows. First, there are arrows that correspond to arrows in the corresponding trie, shown as dashed down-arrows. Traversing a down-arrow corresponds to “matching” the character from which the arrow starts. The left- and right- arrow are traversed when the current character does not match the desired character at the current position. We take the left-arrow if the character we are looking for is alphabetically before the character in the current node, and the right-arrow in the opposite case.

            For example, green arrows show how we’d confirm that the ternary tree contains string ABBA:

             image

            And this is how we’d find that the ternary string does not contain string ABD:

            image 

            Ternary search tree on a server

            On the web, a significant chunk of the auto-complete work has to be done by the server. Often, the set of possible completions is large, so it is usually not a good idea to download all of it to the client. Instead, the ternary tree is stored on the server, and the client will send prefix queries to the server.

            The client will send a query for words starting with “bin” to the server:

              image

            And the server responds with a list of possible words:

            image 

            Implementation

            Here is a simple ternary search tree implementation in C#:

            public class TernaryTree
            {
            private Node m_root = null;
            private void Add(string s, int pos, ref Node node)
            {
            if (node == null) { node = new Node(s[pos], false); }
            if (s[pos] < node.m_char) { Add(s, pos, ref node.m_left); }
            else if (s[pos] > node.m_char) { Add(s, pos, ref node.m_right); }
            else
            {
            if (pos + 1 == s.Length) { node.m_wordEnd = true; }
            else { Add(s, pos + 1, ref node.m_center); }
            }
            }
            public void Add(string s)
            {
            if (s == null || s == "") throw new ArgumentException();
            Add(s, 0, ref m_root);
            }
            public bool Contains(string s)
            {
            if (s == null || s == "") throw new ArgumentException();
            int pos = 0;
            Node node = m_root;
            while (node != null)
            {
            int cmp = s[pos] - node.m_char;
            if (s[pos] < node.m_char) { node = node.m_left; }
            else if (s[pos] > node.m_char) { node = node.m_right; }
            else
            {
            if (++pos == s.Length) return node.m_wordEnd;
            node = node.m_center;
            }
            }
            return false;
            }
            }

            And here is the Node class:

            class Node
            {
            internal char m_char;
            internal Node m_left, m_center, m_right;
            internal bool m_wordEnd;
            public Node(char ch, bool wordEnd)
            {
            m_char = ch;
            m_wordEnd = wordEnd;
            }
            }

            Remarks

            For best performance, strings should be inserted into the ternary tree in a random order. In particular, do not insert strings in the alphabetical order. Each mini-tree that corresponds to a single trie node would degenerate into a linked list, significantly increasing the cost of lookups. Of course, more complex self-balancing ternary trees can be implemented as well.

            And, don’t use a fancier data structure than you have to. If you only have a relatively small set of candidate words (say on the order of hundreds) a brute-force search should be fast enough.

            Further reading

            Another article on tries is available on DDJ (careful, their implementation assumes that no word is a prefix of another):

            http://www.ddj.com/windows/184410528

            If you like this article, also check out these posts on my blog:


            posted on 2012-06-25 23:26 鷹擊長空 閱讀(471) 評論(0)  編輯 收藏 引用

            只有注冊用戶登錄后才能發(fā)表評論。
            網(wǎng)站導(dǎo)航: 博客園   IT新聞   BlogJava   博問   Chat2DB   管理


            色妞色综合久久夜夜 | 精品久久久久久中文字幕人妻最新| 无码人妻精品一区二区三区久久| 99精品国产99久久久久久97| 九九久久99综合一区二区| 久久这里只有精品首页| 精品久久久久一区二区三区| 99久久99久久| 久久精品国产亚洲av麻豆色欲| 亚洲日韩欧美一区久久久久我| 国产精品美女久久久久久2018| 久久99国产精品久久久| 精品午夜久久福利大片| 欧美久久综合九色综合| 免费观看久久精彩视频| 久久精品国产亚洲AV无码娇色| 久久亚洲国产精品五月天婷| 2021国产成人精品久久| 亚洲精品无码久久一线| 精品一久久香蕉国产线看播放| 亚洲国产美女精品久久久久∴| 国产高潮久久免费观看| 国产精品一区二区久久精品| 亚洲综合久久夜AV | 国产精品嫩草影院久久| 久久99精品国产麻豆宅宅| 狠狠综合久久AV一区二区三区| 精品久久久久久无码免费| 日韩精品久久久肉伦网站| 久久人人爽人人爽AV片| 国产激情久久久久影院小草| 久久97精品久久久久久久不卡| AV无码久久久久不卡蜜桃| 亚洲人成无码www久久久| 久久久久女教师免费一区| 超级碰碰碰碰97久久久久| 久久久久99精品成人片| 99久久亚洲综合精品成人| 久久精品国产亚洲麻豆| 国产精品成人精品久久久| 国产成人香蕉久久久久|