Software
From Gwiki
Contents |
Natural Language Processing
NomBank API
I have written a C# .NET API for the NomBank resource. The API captures, in addition to everything captured by the TreeBank API (below), all nominalization argument information, including split and co-referential arguments. The API also includes all information from the NomLex resource, which is distributed with NomBank. A sample application is included. You can check out a working copy of the source code with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/NomBank
Alternatively, point your TortoiseSVN client to the same URL.
TreeBank/PropBank API
I have written a C# .NET API for both the TreeBank II and the PropBank resources. The TreeBank portion of the API captures all annotated parse trees, including syntactic constituent labels, grammatical function labels, and null element instantiations. The PropBank portion of the API captures (in addition to everything captured by the TreeBank portion) all verbal argumentation information, including split and co-referential arguments. Both APIs are demonstrated with a sample application. You can check out a working copy of the source code with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/PennBank
Alternatively, point your TortoiseSVN client to the same URL.
FrameNet API
I have written a C# .NET API for the FrameNet semantic frame resource. The API captures much of the useful content of the FrameNet project, including all frame definitions, frame and frame element relations, lexical unit annotations, and frame element bindings within those annotations. The API includes a sample application. You can check out a working copy of the source code with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/FrameNet
Alternatively, point your TortoiseSVN client to the same URL.
Machine Learning
SVM-Light server mode classification module
I have written a modified version of Thorsten Joachim's SVM classification module. The modified version runs in server mode, meaning it loads a model once and serves classifications from the loaded model without having to reload the model each time. This can dramatically reduce processing time in large-scale learning tasks that use large models. You can check out a working copy of the source code with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ThirdParty/SvmLight/classify_server
Alternatively, point your TortoiseSVN client to the same URL.
BXR logistic regression server mode classification module
I have written a modified version of the BXR logistic regression classification module. The modified version runs in server mode, meaning it loads a model once and serves classifications from the loaded model without having to reload the model each time. This can dramatically reduce processing time in large-scale learning tasks that use large models. You can check out a working copy of the source code with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ThirdParty/BxrLogisticRegression/classify_server
Alternatively, point your TortoiseSVN client to the same URL.
MediaWiki
Wikipedia dump tools and API for .NET
I have written a C# .NET API for accessing imported dumps of Wikipedia. This is profoundly useful if you want to do any NLP tasks over the Wikipedia data. More information can be found here.
Image preparation, upload, and display
- prepare_images_for_wiki.pl seattle_06/ .*\.JPG 640 scaled/seattle_06/
- upload.pl scaled/seattle_06/ "image upload bot" <password> http://209.124.40.19/wiki
- create_image_gallery.pl scaled/seattle_06/ .*\.JPG
Useful help pages
- Images/uploads: useful tips about working with images and other file uploads within MediaWiki
- Image syntax: detailed information about the Wiki-image syntax
- Image galleries
- Picture tutorial
MediaWiki hacks
- Restrict edit access to talk pages
- Allow external links to be opened in a new browser window
- Extension for blacklisting/whitelisting specific pages
Miscellaneous scripts
pfind
If there is one thing I dislike about the Linux command line, it is the `find' program. Too much complexity for what I want it to do (find files). For example, if I want to find all files in or below the current directory that contain the words foo and bar — in that order, possibly separated by other words, ignoring case — I would use the following:
find -iname '*foo*bar*'
It might seem like a quick command, but type it a few dozen times and you will share my frustration. My solution is this trivial Perl script. The following `pfind' command is equivalent to the `find' command above:
pfind foo bar
Better, right?
GMail addiction breaker
Note: The following has been made thoroughly obsolete by the LeechBlock Firefox add-on.
I am addicted to checking my GMail account. This script, in combination with the cron program, will disable GMail for a specified amount of time. Each time the script is executed it will "flip" the enabled status of the GMail site. My root crontab (the script requires root privileges) looks like this:
0 * * * * /home/gerb/.flip_gmail_enable.pl 15 * * * * /home/gerb/.flip_gmail_enable.pl
Thus, GMail is only enabled for 15 minutes of every hour. Much better!
DISCLAIMER: Learn Perl. Read the scripts. Don't blame me.
![]()
All of my software can be used according to the Attribution-NonCommercial-ShareAlike 3.0 license.

