Difference between revisions of "Fojiba-Jabba Notes"
From SlugWiki
(→Problems) |
|||
Line 15: | Line 15: | ||
===Problems=== | ===Problems=== | ||
There are, however, several problems with this method: | There are, however, several problems with this method: | ||
− | *The corpus available is too limited to attempt anything but an Order-1 Markov Chain | + | *The corpus available is too limited to attempt anything but an Order-1 Markov Chain (anything higher results in what is essentially the original text itself). |
*An Order-1 Markov Chain is often too retarded to produce anything but rather ungrammatical (and clearly fake) sentences. | *An Order-1 Markov Chain is often too retarded to produce anything but rather ungrammatical (and clearly fake) sentences. | ||
Revision as of 21:02, 8 June 2006
Fojiba-Jabba is the module of Cruft Alarm supporting Automatic Text Generation.
Contents
Theoretical Foundations
Fojiba-Jabba uses techniques from Markov Chain- and Recursive Transition Network-Theory.
Markov Chains
One method of text generation involves Markov Chains. In theory, Markov Chains can produce a delightfully quirky text; in practice, they sort of suck.
Process
The process can be summarized as follows:
- The user specifies an initial word and the number of sentences desired in the text.
- Fojiba-Jabba, having previously analyzed a set of texts in order to gather statistics on which words follow which words, uses these data to generate the next word.
- This process repeats until the desired number of sentences is obtained.
Problems
There are, however, several problems with this method:
- The corpus available is too limited to attempt anything but an Order-1 Markov Chain (anything higher results in what is essentially the original text itself).
- An Order-1 Markov Chain is often too retarded to produce anything but rather ungrammatical (and clearly fake) sentences.
Possible Solutions
- Use highly advanced linguistic knowledge to improve grammaticality (e.g., a noun or an adjective must follow a determiner). A Brill Part-of-Speech Tagger or the Stanford Parser may be useful here.
- Use google to find likely following words, or to increase the dataset somehow.
Recursive Transition Networks
Less idiosyncratic.