Difference between revisions of "Cruft Alarm Notes"

From SlugWiki
Jump to: navigation, search
(scavenger)
m (68 revisions imported)
 
(2 intermediate revisions by one other user not shown)
Line 59: Line 59:
 
==Synonymy==
 
==Synonymy==
 
Cruft Alarm also contains a thesaurus.
 
Cruft Alarm also contains a thesaurus.
 
==To Do==
 
*Cruft items are often posted in list form. For example, a reuse email may go as follows:
 
 
<pre>
 
I have accumulated too much crap. Following items will be left in EC, in
 
the Wood stairwell (on the West parallel, closest to the triangle
 
building). In a box...look for it!
 
 
- lamp with a missing foot. Comes with light bulb!
 
 
- 50 cent bulletproof DVD with over 50 songs and 12 music videos!
 
 
- mechanical panda bear, does all sorts of tricks. comes with a bottle
 
 
- Maxwell House Hazelnut coffee + godiva chocolate box filled with splenda
 
packets
 
 
- antique hourglass ...actually an hour!
 
 
- Blueberry shower gel
 
 
- book: Flaubert - "Three Tales"
 
 
- book: Martin Page - "How I Became Stupid"
 
 
- Red Gel toothpaste
 
</pre>
 
 
Find a way to extract items from such lists.
 
  
 
==Ideas to Improve==
 
==Ideas to Improve==
Line 135: Line 105:
 
*Add more to thesaurus.
 
*Add more to thesaurus.
 
*Find something better than the stupid gmailer library.
 
*Find something better than the stupid gmailer library.
 +
 +
=Why Ruby Sucks=
 +
*Ruby on Rails. Debian.
 +
*Code rot.
 +
*The gmailer class can no longer connect to "crufster", "luigicasanueva", "fojiba", "colinbrander", or apparently any other gmail account created before July 4 (not to mention "cruftalarm"). It can, however, connect to the newly created "lyricifier" account.
  
 
=See Also=
 
=See Also=

Latest revision as of 22:54, 25 August 2015

Cruft Alarm is a sophisticated computer program that screens several mailing lists for desirable items. It is written in the Ruby programming language.

Classes

Two classes, crufter.rb and post.rb, form the basis of Cruft Alarm. Other classes include db_reader.rb, froogle_item.rb, and scavenger.rb.

crufter

The crufter class connects to the cruftster-at-gmail.com email account. If there are any new messages, it gives them to the post class, which manages the information in each message.

post

The post class manages the information in each cruft email. It cleans up an email and finds a location and items from an email.

cruftlists_db_manager

The cruftlists_db_manager class processes information from a MySQL database.

cruftlist

The cruftlist class manages the information in each cruft list (id, title, description, items, and subscribers).

subscriber

The subscriber class manages the information for each subscriber (id, athena username, email address, and phone number).

froogle_item

Given an item, the froogle_item class uses froogle.com to find the first and average prices for the item. Specifically, the first price is the first price listed when sorting by relevance, and the average price is the average of the ten prices listed on the first page (again when sorting by relevance).

The usefulness of this class is quite debatable, given the vague nature of reuse postings and the wide variety of items on froogle; that is, it is unlikely that the search results will match the reuse item very well.

It may be desirable in the future to use froogle_item to also find the category of an item.

scavenger

The scavenger class supports automatic claiming. Given an array of items, it can also send emails notifying others that these items have been taken.

MySQL

Cruft Alarm uses a "cruftlists" MySQL database to store cruftlists.

"cruftlists" contains a single "cruftlist" table, with the fields 'id', 'title', 'description', 'items', and 'subscribers'.

Natural Language Processing

Cruft Alarm is rumored to have passed the Turing Test.

Dictionaries

Cruft Alarm requires dictionaries in order to work. Cruft Alarm's current dictionaries are the following:

  • Cruft: this is a list of desirable cruft items (e.g., Pentium IV).
  • Food: this is a list of food items (e.g., Bertucci's pizza).
  • Location: this is a list of locations at MIT (e.g., Walcott).
  • Next: this is a list of words X such that if X Y appears in the email, where Y is another word, then X Y will want to be returned (e.g., if 'outside 10-250' appears in an email, and 'outside' is a word in Next, then it is desirable to return 'outside 10-250').
  • Prev: this is analogous to the Next dictionary.
  • Remove: this is a list of words to remove from an email (e.g., 'of'). Reasons for why these words might want to be removed may be discussed later.

The necessity of the Next, Prev, and Remove dictionaries is currently in question.

Get Items

(cf. the 'get_items' method in the post class) In order to find the cruft items in a message, the post class checks each of the words in the message with the words in the cruft dictionary.

Get Location

(cf. the 'get_location' method in the post class) In order to find where the items in a message have been posted, the post class does the following:

  • It looks for any words beginning with 'NE', 'E', or 'W' (e.g., NE42, E50, and W20).
  • It looks for any words containing a hyphen (e.g., 26-100), but which are not telephone numbers.
  • It checks the words in the message against the words in the location dictionary.

Synonymy

Cruft Alarm also contains a thesaurus.

Ideas to Improve

  • Froogler can determine the category of a word. So one way to find all the items in a post may be to search froogle for every word in the post (or every two/three/four/etc. words in a post), and see which ones are in a Computer/Electronics/etc. category.
  • Use WordNet.
  • Implement relations such as (10 inches, height, monitor).

Miscellaneous

Cleanup

The post class removes HTML tags, punctuation, and signatures. The cheap hack it uses to remove signatures is to remove everything after 5 or more spaces. (After removing HTML tags and punctuation, it looks like signatures, and nothing else, follow 5 or more spaces.)

Ignore

Cruft Alarm ignores all replies and all messages from Helen Ray.

Gmail

Gmail is as the email account of choice for several reasons:

  • Using an Athena account would require hardcoding in someone's username and password.
  • Yahoo! Mail and hotmail do not support POP3 (or something like that).

By using Gmail to check for new messages, Cruft Alarm in fact receives emails considerably faster than do normal Athena accounts (in contrary to what was initially believed). According to Ruth Shewmon, this is because Gmail updates its email servers more often than Athena (or something like that).

Cruft Alarm connects to its gmail account through a ruby gmail library.

Mailing Lists

Cruft Alarm receives e-mails from the gmail account cruftster-at-gmail.com. cruftster-at-gmail.com is subscribed to the Athena mailing list, cruftalarm. cruftalarm, in turn, is subscribed to the Athena mailing lists reuse and free-food.

Attempts were made to add the-companion (a previous Athena mailing list) to freefood, but they ended in disaster. (It does not seem possible to blanche onto freefood, and it is not a mailman mailing list, so the only way to subscribe to it is through moira.) One consequence of these attempts is that cruftalarm-at-gmail.com is now banned from reuse.

Current mailing lists used by Cruft Alarm are cruftster, colonbrander, fojiba, and luigicasanueva. Each of these comes in a (gmail, Athena mailing list) pair.

cruftster is used to receive reuse and free-food emails; colonbrander (alias, Colin Brander), fojiba (alias, Ryo Fojiba), and luigicasanueva (alias, Luigi Casanueva) are used by the scavenger class to claim and take items.

Website

The Cruft Alarm website is located at cruftalarm.mit.edu. It allows visitors to manage their subscriptions to cruftlists, edit cruftlists, and create their own cruftlists.

A cruftlist is a list of items associated with a (non-MIT) mailing list. For example, there may be a 'food' cruftlist, with items such as "pizza", "crackers", and "vegetables", linked to a list of its subscribers.

The goal of a cruftlist is to allow one to obtain desirable reuse items, without a daily deluge of reuse posts. More specifically, Cruft Alarm checks each reuse post to see if an item from a cruftlist is contained within it; if so, it emails the cruftlist, so that each subscriber can be notified of the item. In the future, Cruft Alarm may also make a phone call to each subscriber of the list.

It is unknown who should be able to edit which cruftlists.

To Do (General)

  • Do the stupid lists thing.
  • Add telephone capabilities.
  • Implement stemming (using Snowball?).
  • Add more to thesaurus.
  • Find something better than the stupid gmailer library.

Why Ruby Sucks

  • Ruby on Rails. Debian.
  • Code rot.
  • The gmailer class can no longer connect to "crufster", "luigicasanueva", "fojiba", "colinbrander", or apparently any other gmail account created before July 4 (not to mention "cruftalarm"). It can, however, connect to the newly created "lyricifier" account.

See Also