Probably Done Before: Implementing the Dictionary Trie: Insertion and Search Algorithms: Haskell

In a previous post, I settled on a data structure for the Dictionary Trie. In this post, I present the Haskell code I used to implement the search and insertion algorithms. To understand what motivates the dictionary trie, make sure to read the following posts (including an informal discussion on the algorithms used in the insertion):

The Data Structure

Highlights:

Algebraic
Parametrically Polymorphic

As I had mentioned on my previous post about refining this data structure, the TrieNode is the essential component to the tree. I have used an enumeration type (NodeType) to indicate the node type rather than a set of booleans, in the belief that Haskell's pattern matching could make the code that much easier to understand.

The Insertion Algorithm
The insertion algorithm exposes one public API function:

addWordToTrie

which takes a string (actually an array of the parametrically typed value) and the dictionary trie.

There are several utility functions that I do not export out of the module (none of the module code at the top is shown here). These utility functions use internal utility data structures that are concerned with the type of match that two strings have towards each other -- the source string is always compared against a target string and there's always one of five different matches:

exact match
target is prefix of source
source is prefix of target
shared prefix
nothing in common

In the below code, I have inserted a graphic image as a visual description of the algorithm for the four different cases that you would actually run into during the insertion. The utility function insertNode is a direct mapping (and formal encoding) of the informal scenarios discussed in this previus post. The correspondence to the informally developed algorithms and the actual functions was quite pleasing, and was made possible by the pattern-matching capabilities of the language

Exact Match

Target Is Smaller Than Source

Source Is Smaller Than Target

Source and Target Share Same Prefix

The Querying Algorithms

Discussion
These are the subjects I want to touch on in my creation of the above code:

Keeping things functional with record syntax
Just Maybe stuff
Potentially too redundant in spots
my first use of home-grown operators
does pattern matching on record syntax get me that much?
no magic
functionally passing back self

Keeping Thing Functional with Record Syntax

In order to make a 'change' in the functional world, what you have to do is actually make a new value. For values that are aggregates of others, like our data structure, you need to make sure that the old values for the other stuff remains the same, so essentially you 'copy' the other values. I was extremely pleased to see that with the record syntax, you could code such that a 'modification' to one field in the record affected the overall change leaving the rest of the fields the same as previous. This required the use of Haskell's pattern-matching to name the overall structure/node.

Just Maybe Stuff

I know that I am not properly understanding the way for dealing with Maybe values. I feel that there has to be a better way for extracting the underlying value than the where-clause utility functions I use. Does this reflect an underlying misapprehension of the pholosophy of Maybe values? Maybe.

Potentially Too Redundant

I had two internal data structures to represent the matching of the source and target strings. One was purely a description of the matching that occurred (the type MatchType) and consisted of five nullary constructor data types. The other redundantly captured the state of the match but also encapsulated the prefix-suffix pair that resulted. I have a strong suspicion I could have done this with one pass. I will revisit.

First Use of Homegrown Operators

This is an issue of terseness and psychology. Once I had the algorithms developed and everything working according to plan, I refactored some prefix-oriented functions into operators. I turned:

insertNewChildNode into *->
takeTogetherWhile into =><=

In the end, this reduced the camelCaseClutterAndExcessinveJavaLikeNaming™, but as these functions were internal there probably was no real gain.

Pattern Matching Cost Benefit Analysis

The record syntax (as a data constructor) permits pattern matching on the left-hand side of function definitions or on case expressions. There were several times I wondered whether the use of the pattern-matching was really worth it. While it did lower the terseness on the right-hand side (no need to fetch the data out of the record with 'accessor' functions), it did raise the level of noise on the left-hand side.

No Magic

As I've been diving deep into Haskell, I've been itching to try out the magic that provides the allure to this language. I have been attracted to function composition and currying. I've desired to chain together functions in a point-free style or employ functors or applicative style idioms. I found that my code showed little of this magic. The reasons could be (and/or):

I am new to a different paradigm and still thinking idiomatically imperative
I am dealing with programming in the large which is really just massaging, manipulation, and transformation of record syntaxes.

Functionally Passing Back Self

The utility function insertNode is the heart of the insertion algorithm and is tree-recursive. As mentioned above, the functional nature of Haskell requires that in tree updates, we are actually doing tree 'copies' (the underlying implementation may share state to increase performance, but as far as the language semantics go, the values should all be new and copied). To accomplish this, functions which are tree recursive have to always do the same thing:

they need to pass back the node they are currently operating on as the result of the function
and assume that the invocations on themselves (the recursive call) are 'modifications' to the child nodes

This grew tedious after a while, and felt like unnecessary boilerplate. It was stuff like this that drew the development for higher-order functions like fold and map, and I considered exploring the benefit of casting the Trie as a functor, or of implementing a zipper data structure.

I will revisit this.

6 comments:

UnknownJanuary 1, 2015 at 11:15 PM
Thank you for the great series... Trie implementation in Haskell.
I am missing "wordNode" and "nonWordNode" definitions in the code snippets above..

Any idea ?
카지노사이트위키May 30, 2024 at 12:19 AM
Im extremely impressed with your writing skills and also with the layout in your blog
온라인카지노May 30, 2024 at 12:20 AM
Your personal stuffs is beautiful. All the time deal with it up!
카지노사이트탑May 30, 2024 at 12:20 AM
It's a very helpful section of information. We're happy that you discussed this
카지노사이트킹May 30, 2024 at 12:21 AM
I can not wait to learn far more from you. This is really a tremendous web site.
파워볼사이트May 30, 2024 at 12:22 AM
Continue on inspiring us with your writing skills! Your work inspires us