Write An Efficient Auto-complete Program Using Trie

Miguel A. Delgado
5 min readMar 4, 2021
Trie Data Structure

An auto-complete feature can be implemented by searching a list of words and returning all words that begin with a given prefix. Easy right? You could write a simple loop that searches the list like this:

This solution works, but the running time increases as the size of your word list does. So, if you have a list of 100,000 words, that means your program would have to perform 100,000 iterations multiplied by the running time of the .substr() method before returning your word suggestions. Total time complexity of O(m*n) where m is the length of the list and n is the length of the prefix.

Don’t worry, we can do better using a trie!

So, what is this trie you speak of?

In a nutshell, a trie is a tree structure where each node is a string. If you wish to learn even more about tries, I recommend this article. My goal is to demonstrate a clean JavaScript implementation of a trie (I have seen examples in other languages but not JS), break down the code, and provide some statistics when searching a trie with large datasets. The full source code can be found here.

Before getting into the code, lets take another look at a visualization of a trie.

If you didn’t notice already, you will see that characters that are deriving from the same prefix, split into different branches. The tree contains the following words: code, cool, come, fly, space, and spark. Yes, I picked them myself.

The beauty of this structure, is that no matter how many characters are in the trie, if you are looking for words that start with “co” then you will only need to search the branches that are formed from the parent nodes, “c” and “o”. The nodes after them, are referred to as children.

The running time for searching this branch would be O(a* w* n) where a is the length of the alphabet, w is the length of the word, and n is the number of words that start with the prefix.

Time to dive into some code 😎.

The trie class

This is the class breakdown:

  • charToIndex(char) & indexToChar(index) — These are helper methods to convert a string character to its ASCII value and vice-versa. Each letter of the alphabet has a numeric value and they start at 97 (hence the need for the constant CHAR_INDEX_OFFSET). So, a = 97, b = 98, and so on. The offset is subtracted in order to get the values to start at 0 and reduce the size of the children nodes.
  • TrieNode() — Returns a new object with the properties isEndOfWord, signals that it is the last letter of a word, and children, an array filled with 26 null values. This sets the ground work for possible child objects to be inserted.
  • add(word) — Takes a word as a parameter, splits it into individual characters, then checks the trie and adds a new child node if it doesn’t exist.
  • exists(word) — Checks if a word exists in the trie.
  • predict(prefix) & crawl(currentNode, currentIndex, currentText) — These two methods are responsible for finding and generating a list of candidates for auto-completion. Predict is called first and starts off by searching the trie based on the prefix that was passed in. Once it finds the last character of the prefix, the child node is passed to the crawl method. Crawl searches the rest of the branch recursively to retrieve all child nodes until it reaches the end of the branch. As each branch is searched, the children that are found are appended to the previous node until isEndOfWord is true and at that point adds the string to the variable matches (array). For example, if we were searching a trie, like the image shown previously, we would start with prefix “co”, then search down the branch and find the nodes “d” then “e”, then go search the other children and find “m” then “e”, and finally “o” then “l”. This would create a list like this: [“de”, “me”, “ol”] and use .getMatches() to tag the prefix on to the beginning of each of these strings and return a comma separated value. The auto-complete suggestion based on the user typing “co” would be “code”, “come”, and “cool”.

Time difference

Word count matters. If there the amount of words that are being searched are not significant, then there is little difference in actual time it takes to compute the possible words. First, lets establish 2 methods to get our word predictions.

  1. Searching the word list with a loop

2. Using a trie data structure

More to the trie code in the Trie.js class here -> https://github.com/migueladelgado/auto-complete-program-with-trie/blob/master/trie.js

Alright, you want to see numbers. Lets look at a list that has 20,000 words.

The results you see above are from a word list of 2,000 words. The first method takes 2 milliseconds and some change to figure out the predictions….pretty quick, but compare it to the second method, using trie, it is under 0 milliseconds!

Not Convinced?

I get it, who cares about 2 milliseconds right?

Let’s bump it up.

2 million words.

Ok, so 2 million words and you can see the first method time go up drastically, however the trie (method 2) still stays under a millisecond!

Alright, we are still talking about a small amount of time. Using a trie over an scanning over a million words may not piss off the user that much, but I personally think its cool that we can reduce time significantly using a trie!

Don’t take my word for it, check out the repo and run it yourself. Enjoy!

https://github.com/migueladelgado/auto-complete-program-with-trie

Code on,

MAD

--

--