Difference between revisions of "Cruft Alarm Notes"
|Line 28:||Line 28:|
Revision as of 13:34, 3 July 2006
- 1 Classes
- 2 MySQL
- 3 Natural Language Processing
- 4 Miscellaneous
- 5 Website
- 6 To Do (General)
- 7 See Also
Two classes, crufter.rb and post.rb, form the basis of Cruft Alarm. Other classes include db_reader.rb, froogle_item.rb, and scavenger.rb.
The post class manages the information in each cruft email. It cleans up an email and finds a location and items from an email.
The cruftlists_db_manager class processes information from a MySQL database.
The cruftlist class manages the information in each cruft list (id, title, description, items, and subscribers).
The subscriber class manages the information for each subscriber (id, athena username, email address, and phone number).
It may be desirable in the future to use froogle_item to also find the category of an item.
Cruft Alarm uses a "cruftlists" MySQL database to store cruftlists.
"cruftlists" contains a single "cruftlist" table, with the fields 'id', 'title', 'description', 'items', and 'subscribers'.
Natural Language Processing
Cruft Alarm is rumored to have passed the Turing Test.
- Cruft: this is a list of desirable cruft items (e.g., Pentium IV).
- Food: this is a list of food items (e.g., Bertucci's pizza).
- Location: this is a list of locations at MIT (e.g., Walcott).
- Prev: this is analogous to the Next dictionary.
The necessity of the Next, Prev, and Remove dictionaries is currently in question.
- It looks for any words beginning with 'NE', 'E', or 'W' (e.g., NE42, E50, and W20).
- It looks for any words containing a hyphen (e.g., 26-100), but which are not telephone numbers.
- It checks the words in the message against the words in the location dictionary.
Cruft Alarm also contains a thesaurus.
- Cruft items are often posted in list form. For example, a reuse email may go as follows:
I have accumulated too much crap. Following items will be left in EC, in the Wood stairwell (on the West parallel, closest to the triangle building). In a box...look for it! - lamp with a missing foot. Comes with light bulb! - 50 cent bulletproof DVD with over 50 songs and 12 music videos! - mechanical panda bear, does all sorts of tricks. comes with a bottle - Maxwell House Hazelnut coffee + godiva chocolate box filled with splenda packets - antique hourglass ...actually an hour! - Blueberry shower gel - book: Flaubert - "Three Tales" - book: Martin Page - "How I Became Stupid" - Red Gel toothpaste
Find a way to extract items from such lists.
Ideas to Improve
- Use WordNet.
- Implement relations such as (10 inches, height, monitor).
The post class removes HTML tags, punctuation, and signatures. The cheap hack it uses to remove signatures is to remove everything after 5 or more spaces. (After removing HTML tags and punctuation, it looks like signatures, and nothing else, follow 5 or more spaces.)
Cruft Alarm ignores all replies and all messages from Helen Ray.
Gmail is as the email account of choice for several reasons:
- Using an Athena account would require hardcoding in someone's username and password.
- Yahoo! Mail and hotmail do not support POP3 (or something like that).
Cruft Alarm connects to its gmail account through a ruby gmail library.
Attempts were made to add the-companion (a previous Athena mailing list) to freefood, but they ended in disaster. (It does not seem possible to blanche onto freefood, and it is not a mailman mailing list, so the only way to subscribe to it is through moira.) One consequence of these attempts is that cruftalarm-at-gmail.com is now banned from reuse.
The Cruft Alarm website is located at cruftalarm.mit.edu. It allows visitors to manage their subscriptions to cruftlists, edit cruftlists, and create their own cruftlists.
A cruftlist is a list of items associated with a (non-MIT) mailing list. For example, there may be a 'food' cruftlist, with items such as "pizza", "crackers", and "vegetables", linked to a list of its subscribers.
The goal of a cruftlist is to allow one to obtain desirable reuse items, without a daily deluge of reuse posts. More specifically, Cruft Alarm checks each reuse post to see if an item from a cruftlist is contained within it; if so, it emails the cruftlist, so that each subscriber can be notified of the item. In the future, Cruft Alarm may also make a phone call to each subscriber of the list.
It is unknown who should be able to edit which cruftlists.
To Do (General)
- Do the stupid lists thing.
- Add telephone capabilities.
- Implement stemming (using Snowball?).
- Add more to thesaurus.
- Find something better than the stupid gmailer library.