Difference between revisions of "Cruft Alarm Notes"
|Line 70:||Line 70:|
==Ideas to Improve==
==Ideas to Improve==
Revision as of 02:20, 29 May 2006
Cruft Alarm is a sophisticated computer program that screens several mailing lists for desirable items. It is written in the Ruby programming language.
Two classes, crufter.rb and post.rb, form the basis of Cruft Alarm. There is also a third class, froogler.rb
The crufter class connects to the cruftalarm-at-gmail.com email account. If there are any new messages, it gives them to the post class, which manages the information in each message.
The post class manages the information in each cruft email. It currently cleans up an email and finds a location and items from an email.
Given an item, the froogler class uses froogle.com to find the first and average prices for the item. Specifically, the first price is the first price listed when sorting by relevance, and the average price is the average of the ten prices listed on the first page (again when sorting by relevance).
The usefulness of this class is quite debatable, given the vague nature of reuse postings and the wide variety of items on froogle; that is, it is unlikely that the search results will match the reuse item very well.
It may be desirable in the future to use froogler to also find the category of an item.
Natural Language Processing
Cruft Alarm is rumored to have passed the Turing Test.
Cruft Alarm requires dictionaries in order to work. Cruft Alarm's current dictionaries are the following:
- Cruft: this is a list of desirable cruft items (e.g., Pentium IV).
- Food: this is a list of food items (e.g., Bertucci's pizza).
- Location: this is a list of locations at MIT (e.g., Walcott).
- Next: this is a list of words X such that if X Y appears in the email, where Y is another word, then X Y will want to be returned (e.g., if 'outside 10-250' appears in an email, and 'outside' is a word in Next, then it is desirable to return 'outside 10-250').
- Prev: this is analogous to the Next dictionary.
- Remove: this is a list of words to remove from an email (e.g., 'of'). Reasons for why these words might want to be removed may be discussed later.
The necessity of the Next, Prev, and Remove dictionaries is currently in question.
(cf. the 'get_items' method in the post class) In order to find the cruft items in a message, the post class checks each of the words in the message with the words in the cruft dictionary.
(cf. the 'get_location' method in the post class) In order to find where the items in a message have been posted, the post class does the following:
- It looks for any words beginning with 'NE', 'E', or 'W' (e.g., NE42, E50, and W20).
- It looks for any words containing a hyphen (e.g., 26-100).
- It checks the words in the message against the words in the location dictionary.
- Cruft items are often posted in list form. For example, a reuse email may go as follows:
I have accumulated too much crap. Following items will be left in EC, in the Wood stairwell (on the West parallel, closest to the triangle building). In a box...look for it!
- lamp with a missing foot. Comes with light bulb!
- 50 cent bulletproof DVD with over 50 songs and 12 music videos!
- mechanical panda bear, does all sorts of tricks. comes with a bottle
- Maxwell House Hazelnut coffee + godiva chocolate box filled with splenda packets
- antique hourglass ...actually an hour!
- Blueberry shower gel
- book: Flaubert - "Three Tales"
- book: Martin Page - "How I Became Stupid"
- Red Gel toothpaste
Find a way to extract items from such lists.
Ideas to Improve
- Use databases
- Froogler can determine the category of a word. So one way to find all the items in a post may be to search froogle for every word in the post (or every two/three/four/etc. words in a post), and see which ones are in a Computer/Electronics/etc. category.
The post class removes HTML tags and punctuation (cf. the 'cleanup' method).
Cruft Alarm ignores all replies and all messages from Helen Ray.
Gmail is as the email account of choice for several reasons:
- Using an Athena account would require hardcoding in someone's username and password.
- Yahoo! Mail and hotmail do not support POP3 (or something like that).
By using Gmail to check for new messages, Cruft Alarm in fact receives emails considerably faster than do normal Athena accounts (in contrary to what was initially believed). According to Ruth Shewmon, this is because Gmail updates its email servers more often than Athena (or something like that).
Cruft Alarm connects to its gmail account through a ruby gmail library.
Cruft Alarm receives e-mails from the gmail account cruftalarm-at-gmail.com. cruftalarm-at-gmail.com is subscribed to the Athena mailing list, the-companion. the-companion, in turn, is subscribed to the Athena mailing lists reuse, free-food, and freefood.
To Do (General)
- Add synonym capabilities.
- Add telephone capabilities, so that Cruft Alarm can dial a telephone number if a desirable item is posted.
- Set up Cruft Alarm's visual interface (the LCDs).
- Set up version control. (?)
- Add http://foo -removing capabilities to the cleanup method in post.rb
- Add Markov Chain-Artificial Intelligence-Natural Language Processing-Advanced Modelling Genomic Genetic-Behavior to Cruft Alarm.
- Add client/server capabilities (or something like that)
- For some retarded reason, ruby can no longer connect to the cruftalarm-at-gmail.com account, but can still connect to any other account.