NLP software
From Gwiki
The NLP community has produced many excellent resources; unfortunately, most of these resources were produced by different groups using different formats. Fortunately for you, I have taken the liberty of writing APIs for a few of them. All of my libraries are written in C# .NET and are thoroughly commented. Here's the lineup:
Penn TreeBank, PropBank, and DiscourseBank APIs
I have written C# .NET APIs for the Penn TreeBank, PropBank, and DiscourseBank resources. The TreeBank portion of the API captures all annotated parse trees, including syntactic constituent labels, grammatical function labels, and null element instantiations. The PropBank portion of the API captures (in addition to everything captured by the TreeBank portion) all verbal argumentation information, including split and co-referential arguments. The DiscourseBank portion of the API is rather preliminary, and only captures the argument nodes for each discourse connective - other information such as features is currently left out. The TreeBank and PropBank APIs are demonstrated with a sample application. I haven't gotten around to writing sample code for the DiscourseBank API, but, as usual, my code is meticulously commented so you should be able to figure out how it works. You can check out a local copy with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/PennBank
Alternatively, point your TortoiseSVN client to the same URL.
In addition to the APIs, I've also written a handy GUI for generating parse tree images in a variety of formats (e.g., PNG, JPG, EPS, etc.) - this relies on GraphViz. For example:
The grapher is included with the APIs and will work either with TreeBank data or user-defined parse trees. If you only want the executable TreeBank grapher, you can check it out individually. You can check out a local copy with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/PennBank/TreeBankGrapher/bin/Release
Alternatively, point your TortoiseSVN client to the same URL.
NomBank API
I have written a C# .NET API for the NomBank resource. The API captures, in addition to everything captured by the TreeBank API (described above), all nominalization argument information, including split and co-referential arguments. The API also includes all information from the NomLex resource, which is distributed with NomBank. A sample application is included. You can check out a local copy with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/NomBank
Alternatively, point your TortoiseSVN client to the same URL.
FrameNet API
I have written a C# .NET API for the FrameNet semantic frame resource. The API captures much of the useful content of the FrameNet project, including all frame definitions, frame and frame element relations, lexical unit annotations, and frame element bindings within those annotations. The API includes a sample application. You can check out a local copy with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/FrameNet
Alternatively, point your TortoiseSVN client to the same URL.
WordNet API
I have written a C# .NET API for the WordNet 3.0 lexical semantics resource. The API captures much of the useful content of the WordNet project, including all synset definitions (words and glosses) and synset relations. The API offers two access methods: in-memory and disk-based. The former requires quite a bit of memory (~200MB), but is extremely fast. The latter requires essentially no memory, but is slower due to on-disk searching of the WordNet data. The API includes a sample application. You can check out a local copy with the following Subversion command:
svn co http://links.cse.msu.edu:8000/svn/NLP/Source/ResourceAPIs/WordNet
Alternatively, point your TortoiseSVN client to the same URL.
Wikipedia dump tools and API for .NET
I have written a C# .NET API for accessing imported dumps of Wikipedia. This is profoundly useful if you want to do any NLP tasks over the Wikipedia data. More information can be found here.
