Software

From Gwiki

Jump to: navigation, search

Contents

Natural Language Processing

NomBank API

I have written a C# .NET API for the NomBank resource. The API captures, in addition to everything captured by the TreeBank API (below), all nominalization argument information, including split and co-referential arguments. The API also includes all information from the NomLex resource, which is distributed with NomBank. A sample application is included. You can check out a working copy of the source code with the following Subversion command:

svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/NomBank

Alternatively, point your TortoiseSVN client to the same URL.

TreeBank/PropBank API

I have written a C# .NET API for both the TreeBank II and the PropBank resources. The TreeBank portion of the API captures all annotated parse trees, including syntactic constituent labels, grammatical function labels, and null element instantiations. The PropBank portion of the API captures (in addition to everything captured by the TreeBank portion) all verbal argumentation information, including split and co-referential arguments. Both APIs are demonstrated with a sample application. You can check out a working copy of the source code with the following Subversion command:

svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/PennBank

Alternatively, point your TortoiseSVN client to the same URL.

FrameNet API

I have written a C# .NET API for the FrameNet semantic frame resource. The API captures much of the useful content of the FrameNet project, including all frame definitions, frame and frame element relations, lexical unit annotations, and frame element bindings within those annotations. The API includes a sample application. You can check out a working copy of the source code with the following Subversion command:

svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/FrameNet

Alternatively, point your TortoiseSVN client to the same URL.

Machine Learning

SVM-Light server mode classification module

I have written a modified version of Thorsten Joachim's SVM classification module. The modified version runs in server mode, meaning it loads a model once and serves classifications from the loaded model without having to reload the model each time. This can dramatically reduce processing time in large-scale learning tasks that use large models. You can check out a working copy of the source code with the following Subversion command:

svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ThirdParty/SvmLight/classify_server

Alternatively, point your TortoiseSVN client to the same URL.

BXR logistic regression server mode classification module

I have written a modified version of the BXR logistic regression classification module. The modified version runs in server mode, meaning it loads a model once and serves classifications from the loaded model without having to reload the model each time. This can dramatically reduce processing time in large-scale learning tasks that use large models. You can check out a working copy of the source code with the following Subversion command:

svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ThirdParty/BxrLogisticRegression/classify_server

Alternatively, point your TortoiseSVN client to the same URL.

MediaWiki

Wikipedia dump tools and API for .NET

I have written a C# .NET API for accessing imported dumps of Wikipedia. This is profoundly useful if you want to do any NLP tasks over the Wikipedia data. More information can be found here.

Image preparation, upload, and display

  1. prepare_images_for_wiki.pl seattle_06/ .*\.JPG 640 scaled/seattle_06/
  2. upload.pl scaled/seattle_06/ "image upload bot" <password> http://209.124.40.19/wiki
  3. create_image_gallery.pl scaled/seattle_06/ .*\.JPG

Useful help pages

MediaWiki hacks

Miscellaneous scripts

pfind

If there is one thing I dislike about the Linux command line, it is the `find' program. Too much complexity for what I want it to do (find files). For example, if I want to find all files in or below the current directory that contain the words foo and bar — in that order, possibly separated by other words, ignoring case — I would use the following:

find -iname '*foo*bar*'

It might seem like a quick command, but type it a few dozen times and you will share my frustration. My solution is this trivial Perl script. The following `pfind' command is equivalent to the `find' command above:

pfind foo bar

Better, right?

GMail addiction breaker

Note:  The following has been made thoroughly obsolete by the LeechBlock Firefox add-on.

I am addicted to checking my GMail account. This script, in combination with the cron program, will disable GMail for a specified amount of time. Each time the script is executed it will "flip" the enabled status of the GMail site. My root crontab (the script requires root privileges) looks like this:

0 * * * * /home/gerb/.flip_gmail_enable.pl
15 * * * * /home/gerb/.flip_gmail_enable.pl

Thus, GMail is only enabled for 15 minutes of every hour. Much better!

DISCLAIMER: Learn Perl. Read the scripts. Don't blame me.




Image:Software license.png
All of my software can be used according to the Attribution-NonCommercial-ShareAlike 3.0 license.

Personal tools