Difference between revisions of "Fojiba-Jabba Notes"
From SlugWiki
Line 1: | Line 1: | ||
− | [[Fojiba-Jabba]] is | + | [[Fojiba-Jabba]] is the module of [[Cruft Alarm]] supporting Automatic Text Generation. |
=Theoretical Foundations= | =Theoretical Foundations= | ||
− | Fojiba-Jabba | + | Fojiba-Jabba uses techniques from Markov Chain- and Recursive Transition Network-Theory. |
==Markov Chains== | ==Markov Chains== | ||
− | + | One method of text generation involves Markov Chains. In theory, Markov Chains can produce a delightfully quirky text; in practice, they sort of suck. | |
+ | |||
+ | ===Process=== | ||
+ | The process can be summarized as follows: | ||
+ | *The user specifies an initial word and the number of sentences desired in the text. | ||
+ | *Fojiba-Jabba, having previously analyzed a set of texts in order to gather statistics on which words follow which words, uses these data to generate the next word. | ||
+ | *This process repeats until the desired number of sentences is obtained. | ||
+ | |||
+ | ===Problems=== | ||
+ | There are, however, several problems with this method: | ||
+ | *The corpus available is too limited to attempt anything but an Order-1 Markov Chain, as anything higher results in what is essentially the original text itself. | ||
+ | *An Order-1 Markov Chain is often too retarded to produce anything but rather ungrammatical (and clearly fake) sentences. | ||
+ | |||
+ | ===Possible Solutions=== | ||
+ | *Use highly advanced linguistic knowledge to improve grammaticality (e.g., a noun or an adjective must follow a determiner). A Brill Part-of-Speech Tagger or the Stanford Parser may be useful here. | ||
+ | *Use google to find likely following words, or to increase the dataset somehow. | ||
==Recursive Transition Networks== | ==Recursive Transition Networks== | ||
Less idiosyncratic. | Less idiosyncratic. |
Revision as of 21:01, 8 June 2006
Fojiba-Jabba is the module of Cruft Alarm supporting Automatic Text Generation.
Contents
Theoretical Foundations
Fojiba-Jabba uses techniques from Markov Chain- and Recursive Transition Network-Theory.
Markov Chains
One method of text generation involves Markov Chains. In theory, Markov Chains can produce a delightfully quirky text; in practice, they sort of suck.
Process
The process can be summarized as follows:
- The user specifies an initial word and the number of sentences desired in the text.
- Fojiba-Jabba, having previously analyzed a set of texts in order to gather statistics on which words follow which words, uses these data to generate the next word.
- This process repeats until the desired number of sentences is obtained.
Problems
There are, however, several problems with this method:
- The corpus available is too limited to attempt anything but an Order-1 Markov Chain, as anything higher results in what is essentially the original text itself.
- An Order-1 Markov Chain is often too retarded to produce anything but rather ungrammatical (and clearly fake) sentences.
Possible Solutions
- Use highly advanced linguistic knowledge to improve grammaticality (e.g., a noun or an adjective must follow a determiner). A Brill Part-of-Speech Tagger or the Stanford Parser may be useful here.
- Use google to find likely following words, or to increase the dataset somehow.
Recursive Transition Networks
Less idiosyncratic.