Difference between revisions of "Cruft Alarm Notes"

From SlugWiki
Jump to: navigation, search
(To Do (General))
Line 116: Line 116:
 
*Add client/server capabilities (or something like that).
 
*Add client/server capabilities (or something like that).
 
*Implement stemming (for plural suffixes especially).
 
*Implement stemming (for plural suffixes especially).
 +
*Get Cruft Alarm to read from databases.
 +
*Improve website.
  
 
=Retardedness=
 
=Retardedness=

Revision as of 19:54, 4 June 2006

Cruft Alarm is a sophisticated computer program that screens several mailing lists for desirable items. It is written in the Ruby programming language.

Classes

Two classes, crufter.rb and post.rb, form the basis of Cruft Alarm. Other classes include froogle_item.rb and scavenger.rb.

crufter

The crufter class connects to the cruftster-at-gmail.com email account. If there are any new messages, it gives them to the post class, which manages the information in each message.

post

The post class manages the information in each cruft email. It currently cleans up an email and finds a location and items from an email.

froogle_item

Given an item, the froogle_item class uses froogle.com to find the first and average prices for the item. Specifically, the first price is the first price listed when sorting by relevance, and the average price is the average of the ten prices listed on the first page (again when sorting by relevance).

The usefulness of this class is quite debatable, given the vague nature of reuse postings and the wide variety of items on froogle; that is, it is unlikely that the search results will match the reuse item very well.

It may be desirable in the future to use froogle_item to also find the category of an item.

scavenger

The scavenger class supports automatic claiming. Given an array of items, it can also send emails notifying others that these items have been taken.

Text generation using Markov Chains may be implemented in the future. (See http://www.rubyquiz.com/quiz74.html for an example.)

Natural Language Processing

Cruft Alarm is rumored to have passed the Turing Test.

Dictionaries

Cruft Alarm requires dictionaries in order to work. Cruft Alarm's current dictionaries are the following:

  • Cruft: this is a list of desirable cruft items (e.g., Pentium IV).
  • Food: this is a list of food items (e.g., Bertucci's pizza).
  • Location: this is a list of locations at MIT (e.g., Walcott).
  • Next: this is a list of words X such that if X Y appears in the email, where Y is another word, then X Y will want to be returned (e.g., if 'outside 10-250' appears in an email, and 'outside' is a word in Next, then it is desirable to return 'outside 10-250').
  • Prev: this is analogous to the Next dictionary.
  • Remove: this is a list of words to remove from an email (e.g., 'of'). Reasons for why these words might want to be removed may be discussed later.

The necessity of the Next, Prev, and Remove dictionaries is currently in question.

Get Items

(cf. the 'get_items' method in the post class) In order to find the cruft items in a message, the post class checks each of the words in the message with the words in the cruft dictionary.

Get Location

(cf. the 'get_location' method in the post class) In order to find where the items in a message have been posted, the post class does the following:

  • It looks for any words beginning with 'NE', 'E', or 'W' (e.g., NE42, E50, and W20).
  • It looks for any words containing a hyphen (e.g., 26-100), but which are not telephone numbers.
  • It checks the words in the message against the words in the location dictionary.

Synonymy

Cruft Alarm also contains a thesaurus.

To Do

  • Cruft items are often posted in list form. For example, a reuse email may go as follows:

I have accumulated too much crap. Following items will be left in EC, in the Wood stairwell (on the West parallel, closest to the triangle building). In a box...look for it!

- lamp with a missing foot. Comes with light bulb!

- 50 cent bulletproof DVD with over 50 songs and 12 music videos!

- mechanical panda bear, does all sorts of tricks. comes with a bottle

- Maxwell House Hazelnut coffee + godiva chocolate box filled with splenda packets

- antique hourglass ...actually an hour!

- Blueberry shower gel

- book: Flaubert - "Three Tales"

- book: Martin Page - "How I Became Stupid"

- Red Gel toothpaste

Find a way to extract items from such lists.

Ideas to Improve

  • Use databases
  • Froogler can determine the category of a word. So one way to find all the items in a post may be to search froogle for every word in the post (or every two/three/four/etc. words in a post), and see which ones are in a Computer/Electronics/etc. category.
  • Use 'wordnet' or 'extended wordnet'.

Miscellaneous

Cleanup

The post class removes HTML tags and punctuation (cf. the 'cleanup' method).

Ignore

Cruft Alarm ignores all replies and all messages from Helen Ray.

Gmail

Gmail is as the email account of choice for several reasons:

  • Using an Athena account would require hardcoding in someone's username and password.
  • Yahoo! Mail and hotmail do not support POP3 (or something like that).

By using Gmail to check for new messages, Cruft Alarm in fact receives emails considerably faster than do normal Athena accounts (in contrary to what was initially believed). According to Ruth Shewmon, this is because Gmail updates its email servers more often than Athena (or something like that).

Cruft Alarm connects to its gmail account through a ruby gmail library.

Mailing Lists

Cruft Alarm receives e-mails from the gmail account cruftster-at-gmail.com. cruftster-at-gmail.com is subscribed to the Athena mailing list, cruftalarm. cruftalarm, in turn, is subscribed to the Athena mailing lists reuse and free-food.

Attempts were made to add the-companion (a previous Athena mailing list) to freefood, but they ended in disaster. (It does not seem possible to blanche onto freefood, and it is not a mailman mailing list, so the only way to subscribe to it is through moira.) One consequence of these attempts is that cruftalarm-at-gmail.com is now banned from reuse.

Current mailing lists used by Cruft Alarm are cruftster, colonbrander, fojiba, and luigicasanueva. Each of these comes in a (gmail, Athena mailing list) pair.

cruftster is used to receive reuse and free-food emails; colonbrander (alias, Colin Brander), fojiba (alias, Ryo Fojiba), and luigicasanueva (alias, Luigi Casanueva) are used by the scavenger class to claim and take items.

Website

The Cruft Alarm website is located at cruftalarm.mit.edu. It allows visitors to see other users' cruft lists, and also allows them to create their own.

The goal of a cruft list is to allow one to obtain desirable reuse items, without the daily deluge of reuse posts. More specifically, Cruft Alarm checks each reuse post to see if an item from a cruft list is contained within it; if so, it emails (and, in the future, may also make a phone call to) the creator of the cruft list, with a notification of the post.

To Do (General)

  • Add telephone capabilities, so that Cruft Alarm can dial a telephone number if a desirable item is posted.
  • Add http://foo -removing capabilities to the cleanup method in post.rb.
  • Add client/server capabilities (or something like that).
  • Implement stemming (for plural suffixes especially).
  • Get Cruft Alarm to read from databases.
  • Improve website.

Retardedness

  • For some retarded reason, ruby can no longer connect to the cruftalarm-at-gmail.com account, but can still connect to any other account.

See Also