Another Digression: Tries for Capturing Pronunciation




Those linguist guys came up with an alphabet of symbols to try to capture the totality of human sounds as a sequence of characters. They called this alphabet IPA.  Here is the IPA for the word 'antediluvian'

 \ˌan-ti-də-ˈlü-vē-ən, -(ˌ)dī-\

I just had an idea that maybe we can use words of IPA pronunciation in a trie and it might give us some useful stuff.  For instance,  'does one word sound like another?', both at the front and at the end -- 'friendship' and 'phrenology' sound alike near the front (at least, according to me), and 'cone' and 'loan' have a similar ending sound (i.e. they rhyme -- anytime).

I am going to make a lot of assumptions, but as I am not attempting to be a linguist (not today), I can get away with doing some cool stuff while waving my hand with the pronouncement that this is just an approximation.  The title of this blog is probably done before.  I am sure that someone has done this before, but I am not going to google it -- I want to have fun exploring this on my own.

Plan
  1. Find a dictionary of IPA words.  (Strictly speaking I need to find two parallel dictionaries: an IPA dictionary along with the script word: a mapping from words of an IPA language (ˌan-ti-də-ˈlü-vē-ən, -(ˌ)dī-) to words of a written language (antediluvian).
  2. Insert these IPA words into two tries:  one with the constituent IPA characters from left to right, and one right to left.  The first tree should tell me 'friendship' and 'phrenology' share a pronunciation prefix -- even though they don't share the same written-word prefix.  The second trie should tell me that 'loan' and 'cone' rhyme (despite having different suffixes).  Here is where my hand-waving should come in handy.
This should be fun, and I am sure I am going to run into a LOT of problems.

    1 comment: