Trie: Data Structure Explained

If you are here for the first time and looking to start from scratch. Here's the order I have learned about Algorithms.

To properly learn something you have to start from the beginning. In this series of articles, We(yes me too) will learn about algorithms step by step. There are many books, tutorials, and blogs available on Algorithms, and it is often confusing to choose anyone from this large pool of resources. I was also confused but now I have decided that I will provide a summary of all the resources available and people don't have to look in different websites for every different topic.

Who is this article for?

This article, and the possible series I'll create from it, is meant for those who want to learn algorithms in an efficient way step by step and from a single resource or looking to revise concepts before a coding interview.

Goals

The goal of this article is to focus on understanding the Trie data structure and move toward the other topics as mentioned here.

So, let's begin 🚀

Why Trie?

You are simply typing in your chat and there will be some choices of words to choose from or if you type a wrong spelling of a word and spell checker detects it instantly. Have you ever wondered how does it happen? If you don't know the answer then you are in the right place to learn about it.

Now, our goal is to find the best data structure to implement a valid-word checker or to provide word suggestions, i.e., our vocabulary. A few points to keep in mind:

  • We only need a single copy of each word, i.e., our vocabulary is a set, from a logical point of view.
  • We need to answer the following questions for any given word:

    • Does the current character sequence comprise a valid word?
    • Are there longer words that begin with this sequence? If not, we can abandon our depth-first exploration, as going deeper will not yield any valid words.

These are a few characteristics of our use case that can lead us to better performance:

  • We can safely assume that all words are lowercase.
  • We accept only a-z letters—no punctuation, no hyphens, no accents, etc.
  • The dictionary contains many inflected forms: plurals, conjugated verbs, composite words (e.g., ride –> rider). Therefore, many words share the same stem.

Here Trie data structure comes in the picture.

What is Trie?

Trie is also known as prefix tree

There are a handful of different ways to represent something as seemingly simple as a set of words. For example, a hash or dictionary is one that we’re probably familiar with, as is a hash table. But another structure was created to solve the very problem of representing a set of words: a trie. The term “trie” comes from the word retrieval and is usually pronounced “try”, to distinguish it from other “tree” structures.

A trie is a tree-like data structure whose nodes store the letters of an alphabet. By structuring the nodes in a particular way, words and strings can be retrieved from the structure by traversing down a branch path of the tree.

Here's a trie that stores "David", "Maria", and "Mario".

Trie Tree

Notice that we only store "Mari" once, even though it appears in two strings: "Maria" and "Mario".

Each trie has an empty root node, with links (or references) to other nodes — one for each possible alphabetic value. Most tries append a special character to every word as a "flag" for the end of the word.

The shape and the structure of a trie is always a set of linked nodes, connecting back to an empty root node. An important thing to note is that each node in it contains the number of pointers equal to the number of characters of the alphabet.

For example, if we assume that all the strings are formed with English alphabet characters “a” to “z” then each node of the trie contains 26 pointers.

Let's write some code and implement a Trie data structure and see how insertion and searching work in Trie.

from collections import defaultdict

class TrieNode(Object):
    def __init__(self):
        self.nodes = defaultdict(TrieNode) # Easy to insert new node
        self.is_word = False # True for the end of the word

class Trie(object):
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        """
        Insert a word into the trie
        :type word: str
        :rtype: void
        """
        curr = self.root
        for char in word:
            curr = curr.nodes[char]
        curr.is_word = True
    
    def search(self, word):
        """
        Returns if the word in trie
        :type word: str
        :rtype: bool
        """
        curr = self.root
        for char in word:
            if char not in curr.nodes:
                return False
            curr = curr.nodes[char]
        return curr.is_word
    
    def starts_with(self, prefix):
        """
        Returns if there is any word in the trie that
        with given prefix
        :type prefix: str
        :rtype: bool
        """
        curr = self.root
        for char in prefix:
            if char not in curr.nodes:
                return False
            curr = curr.nodes[char]
        return True
Performance Analysis

Big O notation of trie data structure

What makes the trie structure perform well in the above-mentioned example is that the cost of looking up a word or prefix is fixed and dependent only on the number of characters in the word and not on the size of the vocabulary.

The time complexity of searching, inserting, and deleting from a trie depends on the length of the word a that’s being searched for, inserted, or deleted, and the number of total words, n, making the runtime of these operations O(a*n)

The worst-case runtime for creating a trie structure is dependent on how many words the trie contains and how long they might potentially be which is O(m*n) where m is the longest word and n is total no. of words.

Strengths

  • Sometimes Space-Efficient. If you're storing lots of words that start with similar patterns, tries may reduce the overall storage cost by storing shared prefixes once.
  • Efficient Prefix Queries as compared to standard binary trees(2x).

Weaknesses

  • Usually Space-Inefficient. Tries rarely save space when compared to storing strings in a set.
  • Not Standard. Most languages don't come with a built-in trie implementation. You'll need to implement one yourself.

The trie is a very specialized data structure that requires much more memory than trees and lists. However, when specific, like a limited alphabet and high redundancy in the first part of the strings, it can be very effective in addressing performance optimization.

Applications
  1. Autocomplete
  2. Spell Checker
  3. IP routing (Longest prefix matching)
  4. Solving word games
Practice Problems
  1. Implement Trie (Prefix Tree)
  2. Add and Search Word - Data structure design
  3. Design Search Autocomplete System
  4. Implement Magic Dictionary
Resources

Tries are often asked in technical interview questions, in some variation of a question. If you want some helpful resources, here are a few good places to start.

  1. Lecture Notes on Tries, Professor Frank Pfenning
  2. Algorithms: Tries, Robert Sedgewick and Kevin Wayne

We have learned a new data structure today and reached at the end of this article.

Did you find this post useful?

I would be really grateful if you let me know by sharing it on Twitter!

Follow me @ParthS0007 for more tech and blogging content :)