Heart of Gold - Download of Middleware and NLP Components

The Heart of Gold Middleware (Java) source code is licensed under the GNU Lesser General Public License (LGPL).

'Components' below are not part of the Middleware, but independent, individual natural language processing software.

Most components of the Heart of Gold available for download from this web page can be obtained free-of-charge for scientific research and teaching (e.g., Sprout, TnT, Chunkie).

NLP components developed by external authors, institutions or companies are published under different licenses, e.g., RASP, LingPipe, ChaSen. Please check their web sites mentioned below.

For a minimal system with (German/English) HPSG, you need 'Middleware', the TnT tagger and PET with a grammar. The table shows which components and linguistic resources have been integrated so far.

ComponentDefault DepthLanguage resources*
*ISO 639-1 language codes
JTok10de, en, it
TnT20de, en
FreeLing20ca, en, es, gc, it
TreeTagger20de, en, es, fr, it
Chunkie30de, en
ChunkieRMRS35de, en
SProUT40de, el, en, ja
LoPar/Whiteboard Topoparser50de
PET100de, el, en, ja
SDL independent

All archives below include the root directory './', except external systems like RASP and LingPipe which have to be unpacked or compiled to the directory specified under 'Location:'.

For installation instructions (including component-specific configurations etc.), hardware and software requirements see the User and Developer Documentation.

Currently, only Linux x86 is fully supported as platform. This is solely due to the fact that some components come with a Linux implementation only or that the module adapter has been implemented for Linux only. The core middleware (together with Java-implemented components e.g. JTok, LingPipe, SProUT) itself should run wherever JDK 1.5 is supported, e.g. Windows, Mac OS, Solaris etc. Any contribution to ports and increased portability is welcome as are bug reports, bug fixes, comments, contributions of new components or resources etc.!

If you publish on systems, software, applications, experiments or results that you have achieved with the help of the Heart of Gold middleware, we kindly ask you to cite an appropriate reference mentioned under Publications and/or Components.

Contributors (Heart of Gold middleware):
Concept: Ulrich Callmeier, Andreas Eisele, Ulrich Schäfer, Melanie Siegel
Implementation: Robert Barbey, Özgür Demir, Ulrich Schäfer
JTok and Module configuration and Launcher: Jörg Steffen
PET extensions (XML input chart): Bernd Kiefer
SDL: Hans-Ulrich Krieger
RMRS construction from chunks: Anette Frank and Kathrin Spreyer
RMRS merging stylesheets: Anette Frank
rmrs2html.xsl: Thomas Klöcker and Ulrich Schäfer, with ideas and styles borrowed from Stephan Oepens Javascript code for MRS (lkb.js)
Web demo: Özgür Demir
LoParModule, Port of the Whiteboard topoparser XSLT pipeline: Daniel Contag RASP2Module, deployment: Torsten Marek



Description:Sources of the Java middleware including component adapters ('modules'), Python demo clients, stylesheets and configuration files
Institution:DFKI Language Technology Lab
License:LGPL, parts are Apache Software License (ant, log4j, xml2html.xsl)
Requirements:Java JDK 1.5, (for GUI client application: Python >= 2.2 with Python TK and Mozilla >= 1.3 on X11)
Download:hog-1.5-src.tar.gz, hog-1.5-bin.tar.gz
Alternative for src package (subversion):
svn checkout https://heartofgold.opendfki.de/repos/trunk hog-1.5
Installation script: install (instructions inside)


Description:Configurable Tokenizer implemented in Java
Ling. resources:en, de, it (additional languages can be added via XML configuration files)
Institution:DFKI Language Technology Lab
Requirements:Java 1.5
Download:(included in the middleware bin archive)


Description:Statistical part-of-speech tagger
Ling. resources:en, de newspaper (other languages and genres can be trained)
Institution:Saarland University, Computational Linguistics Department
License:(free of charge for non-commercial, non-profit research purposes)


Description:Statistical Chunker
Ling. resources:en, de newspaper (other languages and genres can be trained)
Institution:DFKI Language Technology Lab, Saarland University, Computational Linguistics Department
License:DFKI LT General Research Software License (free of charge for scientific research and teaching)
Requirements:Linux, components: TnT


Description:RMRS construction from shallow chunker
Ling. resources:en, de
Institution:DFKI Language Technology Lab
Requirements:Java, components: SProUT, Chunkie, TnT
Download:(XSLT stylesheets and SDL files are included in the middleware src archive)
Sources of the SProUT cascade grammars: components-chunkiermrs.tar.gz


Description:General-purpose linguistic processor, e.g., for named entity recognition, information extraction, tokenization, morphological analysis, compound segmentation, sentence boundary recognition, coreference resolution, combines finite-state and unification-based approaches
Ling. resources:de, el, en, ja
Institution:DFKI Language Technology Lab
Requirements:Java 1.5
Download:runtime for Heart of Gold including language resources for de, el, en, ja:
Integrated Development Environment: http://sprout.dfki.de

LoPar (external)/Whiteboard Topoparser

Description:Shallow part of the Whiteboard Topoparser Integration using TnT, LoPar and a Cascade of XSL transformations
Ling. resources:de
Institution:IMS Stuttgart (LoPar), DFKI Language Technology Lab (Topoparser Integration Cascade)
License:LoPar: may be freely used for education, research and other non-commercial purposes Topoparser Integration Cascade: DFKI LT General Research Software License (free of charge for scientific research and teaching)
Requirements:Java 1.5
Location:./components/lopar, ./xsl/topoparser
Download:(external) LoPar 3.0 binary http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/LoPar.html + components-lopar.tar.gz (topoparser XSLT stylesheets are included in the middleware src archive)
Publications:Publications/LoPar Whiteboard Topoparser


Description:Description language for NLP subarchitectures
Institution:DFKI Language Technology Lab
Location:./java, ./xsl/sdl
Download:(included in the middleware src/bin archives)


Description:HPSG parser
Ling. resources:separate grammar dumps (grammar development with LKB), see 'Download'
Institution:Saarland University, Computational Linguistics Department, DFKI Language Technology Lab
License:LGPL, parts are Apache Software License and others
Download:- binary of the HPSG parser for Heart of Gold: components-pet-binlib.tar.gz, source: notes; pet.opendfki.de
Each of the following grammars comes with its own license!
- English Resource Grammar (ERG, Stanford): binary for Heart of Gold: components-pet-erg.tar.gz, source: http://lingo.stanford.edu/erg
- German German Grammar (DFKI): binary for Heart of Gold: components-pet-german.tar.gz; or generate via GG download package, unpack, make hog, unpack generated gg4hog.tar.gz to ./components/pet/german/ - Source: http://gg.opendfki.de
- Japanese 'JACY' Grammar (DFKI): binary for Heart of Gold: components-pet-japanese.tar.gz, source: http://jacy.opendfki.de
- Modern Greek Grammar (Saarland U.): http://www.delph-in.net/mgrg/
- Spanish Grammar (UPF Barcelona): http://www.delph-in.net/srg/


Description:Merge RMRS representations
Institution:DFKI Language Technology Lab
Location:./java, ./xsl/sdl/rmrsmerge
Download:(included in the middleware src archive)

LingPipe (external)

Description:Statistical named entity recognition
Ling. resources:en newswire and genomics (other languages and genres can be trained)
Institution:Alias-i, Inc.
License:Alias-i royalty free license, parts are Apache Software License and others
Requirements:Java 1.5
Download:(external) http://www.alias-i.com/lingpipe
Install hints:see LingPipe section in the Heart of Gold User and Developer Documentation

RASP (external)

Description:Statistical parser
Ling. resources:en
Institutions:University of Sussex, University of Cambridge
License:RASP license
Download:(external) http://www.informatics.susx.ac.uk/research/nlp/rasp/ + patch for HoG and a compiled RASP RMRS converter server: components-rasp2.tar.gz
Install hints:see RASP section in the Heart of Gold User and Developer Documentation

ChaSen (external)

Description:Japanese morphological analysis and PoS tagger
Ling. resources:ja
Institution:AIST Nara
License:Copyright Nara Institute of Science and Technology, may be redistributed under some conditions, see license in distribution
Download:(external) http://chasen.aist-nara.ac.jp
Publications:ChaSen Publications and Manuals
Install hints:see ChaSen section in the Heart of Gold User and Developer Documentation

Sleepy (external)

Description:Statistical parser for German
Ling. resources:de
Institutions:Saarland University Computational Linguistics Department
Download:(external) http://www.coli.uni-saarland.de/~adubey/sleepy/
Install hints:see Sleepy section in the Heart of Gold User and Developer Documentation

TreeTagger (external)

Description:Statistical part-of-speech tagger
Ling. resources:de, en, es, fr, it
Institutions:University of Stuttgart, Institute for Computational Linguistics
License:freely available for research, education and evaluation
Requirements:Linux, Mac OS X or Solaris
Download:(external) http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
Install hints:see TreeTagger section in the Heart of Gold User and Developer Documentation

FreeLing (external), preliminary integration

Description:Part-of-speech tagger, morphology, named entity recognition
Ling. resources:ca, en, es, gc, it
Institutions:Universitat Politècnica de Catalunya, TALP Research Center
Requirements:Linux with Berkeley DB (version 4.1.25 or higher), pcre (version 4.3 or higher), libcfg+ (version 0.6.1 or higher)
Download:(external) http://www.lsi.upc.es/~nlp/freeling/
Publications:http://www.lsi.upc.es/~nlp/freeling/, additional LKBwrapper required, in components-freeling-sppp.tar.gz
Install hints:see FreeLing section in the Heart of Gold User and Developer Documentation, README in components-freeling-sppp.tar.gz

Top of page.