Lab Preparation: Mallet Installation and Set Up

In preparation for Friday’s lab on text analysis, we’re going to get you set up with Mallet, MAchine Learning for LanguagE Toolkit.

Installing

  1. Download Mallet.  You’re probably more comfortable with zip files but .tar.gz work as well
  2. Extract the download (likely automatic when you click on the file)
  3. Note the location of the extraction.

Running

Mallet is a command-line driven tool.  We haven’t had a chance to talk about command-line tools, but they are common for so-called “super users”.  Creating pretty, intuitive, interactive, graphical interfaces is a difficult process.  To keep the tool simpler (in terms of development) and easier to automate, developers sometimes choose to use a command-line interface.

Windows installation: After unzipping MALLET, set the environment variable MALLET_HOME to point to your MALLET directory. In all command line examples, substitute bin/mallet with bin\mallet.  (You may need to search for more information about this process.)

Command-line Tools

Let’s get you a little more comfortable with the command line.  On a Mac, the application you want to use is Terminal.  For Windows, you want to use a Command Prompt.

We will be in the Computer Science Department’s lab for this, so you could set this up on a Mac in that classroom.

Mac OS X

  1. You need to be in the mallet application directory within the Terminal.  Open a Terminal.  Open Finder to the mallet directory.  Type cd followed by a space and then drag the folder into your Terminal. Hit enter.
    You should now be in the mallet directory in the Terminal.   “cd” means “change directory”.  To verify that you’re in the correct directory, type “pwd”, which means “print working directory”.  (For me, the directory is /Users/sprenkle/Downloads/mallet-2.0.7)
    Unix (on which Mac OS X is built) is a bad parent.  It doesn’t tell you what you did correct; only what you did wrong.
  2. Type ls
    You should see a list of the directory’s contents, e.g., bin, src, stoplist, etc.
  3. Run bin/mallet import-dir --input sample-data/web/* --output web.mallet
    What you did was to run the mallet program, telling it to import a directory, given by the argument –input.  The output is saved in web.mallet.  The output in the terminal should look something like:

    Labels =
       sample-data/web/de
       sample-data/web/en
  4. If you get an error about Java,  install Java.  The “More Info” button will take you to the Java site.  Click Download.  Accept the license and download the appropriate version for your operating system.

You should be ready to go for Friday’s lab.

Windows

  1. Use the command cd malletdir , where malletdir is the location of the Mallet installation directory, to get to the appropriate directory.
  2. Type dir
    You should see a list of the directory’s contents, e.g., bin, src, stoplist, etc.
  3. Run bin\mallet import-dir --input sample-data\web\* --output web.mallet
    What you did was to run the mallet program, telling it to import a directory, given by the argument –input.  The output is saved in web.mallet.  The output in the terminal should look something like:

    Labels =
       sample-data/web/de
       sample-data/web/en

(Please let me know if this is correct or what I need to change.)

Help