Heart of Gold - Download of Middleware and NLP Components

The Heart of Gold Middleware (Java) source code is licensed under the GNU Lesser General Public License (LGPL).

'Components' below are not part of the Middleware, but independent, individual natural language processing software.

Most components of the Heart of Gold available for download from this web page can be obtained free-of-charge for scientific research and teaching (e.g., Sprout, TnT, Chunkie).

NLP components developed by external authors, institutions or companies are published under different licenses, e.g., RASP, LingPipe, ChaSen. Please check their web sites mentioned below.

For a minimal system with (German/English) HPSG, you need 'Middleware', the TnT tagger and PET with a grammar. The table shows which components and linguistic resources have been integrated so far.

ComponentDefault DepthLanguage resources*
*ISO 639-1 language codes
JTok10de, en, it
ChaSen10ja
TnT20de, en
FreeLing20ca, en, es, gc, it
TreeTagger20de, en, es, fr, it
Chunkie30de, en
ChunkieRMRS35de, en
LingPipe40en
SProUT40de, el, en, ja
LoPar/Whiteboard Topoparser50de
RASP50en
Sleepy50de
PET100de, el, en, ja
RMRSMerge110independent
SDL independent

All archives below include the root directory './', except external systems like RASP and LingPipe which have to be unpacked or compiled to the directory specified under 'Location:'.

For installation instructions (including component-specific configurations etc.), hardware and software requirements see the User and Developer Documentation.

Currently, only Linux x86 is fully supported as platform. This is solely due to the fact that some components come with a Linux implementation only or that the module adapter has been implemented for Linux only. The core middleware (together with Java-implemented components e.g. JTok, LingPipe, SProUT) itself should run wherever JDK 1.5 is supported, e.g. Windows, Mac OS, Solaris etc. Any contribution to ports and increased portability is welcome as are bug reports, bug fixes, comments, contributions of new components or resources etc.!

If you publish on systems, software, applications, experiments or results that you have achieved with the help of the Heart of Gold middleware, we kindly ask you to cite an appropriate reference mentioned under Publications and/or Components.

Contributors (Heart of Gold middleware):
Concept: Ulrich Callmeier, Andreas Eisele, Ulrich Schäfer, Melanie Siegel
Implementation: Robert Barbey, Özgür Demir, Ulrich Schäfer
JTok and Module configuration and Launcher: Jörg Steffen
PET extensions (XML input chart): Bernd Kiefer
SDL: Hans-Ulrich Krieger
RMRS construction from chunks: Anette Frank and Kathrin Spreyer
RMRS merging stylesheets: Anette Frank
rmrs2html.xsl: Thomas Klöcker and Ulrich Schäfer, with ideas and styles borrowed from Stephan Oepens Javascript code for MRS (lkb.js)
Web demo: Özgür Demir
LoParModule, Port of the Whiteboard topoparser XSLT pipeline: Daniel Contag RASP2Module, deployment: Torsten Marek

Downloads:

Middleware

Description:Sources of the Java middleware including component adapters ('modules'), Python demo clients, stylesheets and configuration files
Institution:DFKI Language Technology Lab
License:LGPL, parts are Apache Software License (ant, log4j, xml2html.xsl)
Requirements:Java JDK 1.5, (for GUI client application: Python >= 2.2 with Python TK and Mozilla >= 1.3 on X11)
Location:./{conf,java,python,xsl,lib,components/jtok}
Download:hog-1.5-src.tar.gz, hog-1.5-bin.tar.gz
Alternative for src package (subversion):
svn checkout https://heartofgold.opendfki.de/repos/trunk hog-1.5
Installation script: install (instructions inside)
Publications:Publications

JTok

Description:Configurable Tokenizer implemented in Java
Ling. resources:en, de, it (additional languages can be added via XML configuration files)
Institution:DFKI Language Technology Lab
License:LGPL
Requirements:Java 1.5
Location:./components/jtok
Download:(included in the middleware bin archive)

TnT

Description:Statistical part-of-speech tagger
Ling. resources:en, de newspaper (other languages and genres can be trained)
Institution:Saarland University, Computational Linguistics Department
License:(free of charge for non-commercial, non-profit research purposes)
Requirements:Linux
Location:./components/tnt
Download:components-tnt.tar.gz
Publications:Publications/TnT

Chunkie

Description:Statistical Chunker
Ling. resources:en, de newspaper (other languages and genres can be trained)
Institution:DFKI Language Technology Lab, Saarland University, Computational Linguistics Department
License:DFKI LT General Research Software License (free of charge for scientific research and teaching)
Requirements:Linux, components: TnT
Location:./components/chunkie
Download:components-chunkie.tar.gz
Publications:Publications/Chunkie

ChunkieRMRS

Description:RMRS construction from shallow chunker
Ling. resources:en, de
Institution:DFKI Language Technology Lab
License:LGPL
Requirements:Java, components: SProUT, Chunkie, TnT
Location:./xsl/sdl/chunkiermrs
Download:(XSLT stylesheets and SDL files are included in the middleware src archive)
Sources of the SProUT cascade grammars: components-chunkiermrs.tar.gz
Publications:Publications/Applications

SProUT

Description:General-purpose linguistic processor, e.g., for named entity recognition, information extraction, tokenization, morphological analysis, compound segmentation, sentence boundary recognition, coreference resolution, combines finite-state and unification-based approaches
Ling. resources:de, el, en, ja
Institution:DFKI Language Technology Lab
License:DFKI LT General Research Software License (free of charge for scientific research and teaching), parts are Apache Software License and others (similar)
Requirements:Java 1.5
Location:./components/sprout
Download:runtime for Heart of Gold including language resources for de, el, en, ja:
components-sprout.tar.gz
Integrated Development Environment: http://sprout.dfki.de
Publications:Publications/SProUT

LoPar (external)/Whiteboard Topoparser

Description:Shallow part of the Whiteboard Topoparser Integration using TnT, LoPar and a Cascade of XSL transformations
Ling. resources:de
Institution:IMS Stuttgart (LoPar), DFKI Language Technology Lab (Topoparser Integration Cascade)
License:LoPar: may be freely used for education, research and other non-commercial purposes Topoparser Integration Cascade: DFKI LT General Research Software License (free of charge for scientific research and teaching)
Requirements:Java 1.5
Location:./components/lopar, ./xsl/topoparser
Download:(external) LoPar 3.0 binary http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/LoPar.html + components-lopar.tar.gz (topoparser XSLT stylesheets are included in the middleware src archive)
Publications:Publications/LoPar Whiteboard Topoparser

SDL

Description:Description language for NLP subarchitectures
Institution:DFKI Language Technology Lab
License:LGPL
Requirements:Java
Location:./java, ./xsl/sdl
Download:(included in the middleware src/bin archives)
Publications:Publications/SDL

PET

Description:HPSG parser
Ling. resources:separate grammar dumps (grammar development with LKB), see 'Download'
Institution:Saarland University, Computational Linguistics Department, DFKI Language Technology Lab
License:LGPL, parts are Apache Software License and others
Requirements:Linux
Location:./components/pet
Download:- binary of the HPSG parser for Heart of Gold: components-pet-binlib.tar.gz, source: notes; pet.opendfki.de
Each of the following grammars comes with its own license!
- English Resource Grammar (ERG, Stanford): binary for Heart of Gold: components-pet-erg.tar.gz, source: http://lingo.stanford.edu/erg
- German German Grammar (DFKI): binary for Heart of Gold: components-pet-german.tar.gz; or generate via GG download package, unpack, make hog, unpack generated gg4hog.tar.gz to ./components/pet/german/ - Source: http://gg.opendfki.de
- Japanese 'JACY' Grammar (DFKI): binary for Heart of Gold: components-pet-japanese.tar.gz, source: http://jacy.opendfki.de
- Modern Greek Grammar (Saarland U.): http://www.delph-in.net/mgrg/
- Spanish Grammar (UPF Barcelona): http://www.delph-in.net/srg/
Publications:Publications/PET

RMRSMerge

Description:Merge RMRS representations
Institution:DFKI Language Technology Lab
License:LGPL
Requirements:Java
Location:./java, ./xsl/sdl/rmrsmerge
Download:(included in the middleware src archive)
Publications:Publications/Applications

LingPipe (external)

Description:Statistical named entity recognition
Ling. resources:en newswire and genomics (other languages and genres can be trained)
Institution:Alias-i, Inc.
License:Alias-i royalty free license, parts are Apache Software License and others
Requirements:Java 1.5
Location:./components/lingpipe
Download:(external) http://www.alias-i.com/lingpipe
Publications:http://www.alias-i.com/lingpipe/papers.html
Install hints:see LingPipe section in the Heart of Gold User and Developer Documentation

RASP (external)

Description:Statistical parser
Ling. resources:en
Institutions:University of Sussex, University of Cambridge
License:RASP license
Requirements:Linux
Location:./components/rasp
Download:(external) http://www.informatics.susx.ac.uk/research/nlp/rasp/ + patch for HoG and a compiled RASP RMRS converter server: components-rasp2.tar.gz
Publications:http://www.informatics.susx.ac.uk/research/nlp/rasp/
Install hints:see RASP section in the Heart of Gold User and Developer Documentation

ChaSen (external)

Description:Japanese morphological analysis and PoS tagger
Ling. resources:ja
Institution:AIST Nara
License:Copyright Nara Institute of Science and Technology, may be redistributed under some conditions, see license in distribution
Requirements:Linux
Location:./components/chasen
Download:(external) http://chasen.aist-nara.ac.jp
Publications:ChaSen Publications and Manuals
Install hints:see ChaSen section in the Heart of Gold User and Developer Documentation

Sleepy (external)

Description:Statistical parser for German
Ling. resources:de
Institutions:Saarland University Computational Linguistics Department
License:unknown
Requirements:Linux
Location:./components/sleepy
Download:(external) http://www.coli.uni-saarland.de/~adubey/sleepy/
Publications:http://www.coli.uni-saarland.de/~adubey/sleepy/
Install hints:see Sleepy section in the Heart of Gold User and Developer Documentation

TreeTagger (external)

Description:Statistical part-of-speech tagger
Ling. resources:de, en, es, fr, it
Institutions:University of Stuttgart, Institute for Computational Linguistics
License:freely available for research, education and evaluation
Requirements:Linux, Mac OS X or Solaris
Location:./components/treetagger
Download:(external) http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
Publications:http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
Install hints:see TreeTagger section in the Heart of Gold User and Developer Documentation

FreeLing (external), preliminary integration

Description:Part-of-speech tagger, morphology, named entity recognition
Ling. resources:ca, en, es, gc, it
Institutions:Universitat Politècnica de Catalunya, TALP Research Center
License:LGPL
Requirements:Linux with Berkeley DB (version 4.1.25 or higher), pcre (version 4.3 or higher), libcfg+ (version 0.6.1 or higher)
Location:./components/freeling
Download:(external) http://www.lsi.upc.es/~nlp/freeling/
Publications:http://www.lsi.upc.es/~nlp/freeling/, additional LKBwrapper required, in components-freeling-sppp.tar.gz
Install hints:see FreeLing section in the Heart of Gold User and Developer Documentation, README in components-freeling-sppp.tar.gz

Top of page.