Fojiba-Jabba Notes
From SlugWiki
Fojiba-Jabba is the module of Cruft Alarm supporting Automatic Text Generation.
Contents
Theoretical Foundations
Fojiba-Jabba uses techniques from Markov Chain- and Recursive Transition Network-Theory.
Markov Chains
One method of text generation involves Markov Chains. In theory, Markov Chains can produce a delightfully quirky text; in practice, they sort of suck.
Process
The process can be summarized as follows:
- The user specifies an initial word and the number of sentences desired in the text.
- Fojiba-Jabba, having previously analyzed a set of texts in order to gather statistics on which words follow which words, uses these data to generate the next word.
- This process repeats until the desired number of sentences is obtained.
Problems
There are, however, several problems with this method:
- The corpus available is too limited to attempt anything but an Order-1 Markov Chain, as anything higher results in what is essentially the original text itself.
- An Order-1 Markov Chain is often too retarded to produce anything but rather ungrammatical (and clearly fake) sentences.
Possible Solutions
- Use highly advanced linguistic knowledge to improve grammaticality (e.g., a noun or an adjective must follow a determiner). A Brill Part-of-Speech Tagger or the Stanford Parser may be useful here.
- Use google to find likely following words, or to increase the dataset somehow.
Recursive Transition Networks
Less idiosyncratic.