1. Get the HDT Library 2. Using the Java Command Line tools 3. Generating and Searching HDT files programmatically from Java 4. Compiling the Java Library / Tools |
Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show
1. Get the HDT Library
The HDT library is available as a Maven artifact, to use it in your project, just add the following dependency to your pom.xml
:
<dependency> <groupId>org.rdfhdt</groupId> <artifactId>hdt-java-core</artifactId> <version>1.1</version> </dependency>
If you want to use the command line tools, you can download the hdt-java-package that includes the required jars, dependencies and some launcher scripts.
The code of the library is open source, distributed under the LGPL license on our HDT-java Google Code Project.
2. Using the Java Command Line Tools
Once you have downloaded the binary distribution, you can use the command line line tools to convert/browse HDT files. You can use the convenient launch scripts (*.sh or *.bat depending on your OS) to execute them. These are the typical operations that you’d probably want to perform:
- Convert your RDF Data to the HDT representation. You might need to increase the memory (The
-Xmx1G
option) inside thebin/java.env
script to generate very big files, since this process is memory-intensive. Please note that the tool accepts input files compressed with GZIP (E.g..nt.gz
). To convert it, just do:
$ bin/rdf2hdt.sh data/test.nt data/test.hdt
- Convert an HDT to another serialization format, such as NTriples:
$ bin/hdt2rdf.sh data/test.hdt data/test.hdtexport.nt
- Open a terminal to search triple patterns within an HDT file:
$ bin/hdtSearch.sh data/test.hdt >> ? ? ? http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5 http://example.org/uri4 http://example.org/predicate4 http://example.org/uri5 http://example.org/uri1 http://example.org/predicate1 "literal1" http://example.org/uri1 http://example.org/predicate1 "literalA" http://example.org/uri1 http://example.org/predicate1 "literalB" http://example.org/uri1 http://example.org/predicate1 "literalC" http://example.org/uri1 http://example.org/predicate2 http://example.org/uri3 http://example.org/uri1 http://example.org/predicate2 http://example.org/uriA3 http://example.org/uri2 http://example.org/predicate1 "literal1" 9 results shown. >> http://example.org/uri3 ? ? http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5 2 results shown. >> exit
- Extract the Header of an HDT file:
$ bin/hdtInfo.sh data/test.hdt > header.nt
3. Generating and Searching HDT files programmatically from Java
- Generating an HDT File (available at
examples/ExampleGenerate.java
):
public class ExampleGenerate { public static void main(String[] args) throws Exception { // Configuration variables String baseURI = "http://example.com/mydataset"; String rdfInput = "/path/to/dataset.nt"; String inputType = "ntriples"; String hdtOutput = "/path/to/dataset.hdt"; // Create HDT from RDF file HDT hdt = HDTManager.generateHDT( rdfInput, // Input RDF File baseURI, // Base URI RDFNotation.parse(inputType), // Input Type new HDTSpecification(), // HDT Options null // Progress Listener ); // OPTIONAL: Add additional domain-specific properties to the header: //Header header = hdt.getHeader(); //header.insert("myResource1", "property" , "value"); // Save generated HDT to a file hdt.saveToHDT(hdtOutput, null); // IMPORTANT: Close hdt when no longer needed hdt.close(); } }
- Searching Triple Patterns inside an HDT File (available at
examples/ExampleSearch.java
):
// Load an HDT and perform a search. (examples/ExampleSearch.java) public static void main(String[] args) throws Exception { // Load HDT file. // NOTE: Use loadIndexedHDT() for ?P?, ?PO or ??O queries HDT hdt = HDTManager.loadHDT("data/example.hdt", null); // Search pattern: Empty string means "any" IteratorTripleString it = hdt.search("", "", ""); while(it.hasNext()) { TripleString ts = it.next(); System.out.println(ts); } // IMPORTANT: Close hdt when no longer needed hdt.close(); }
You can also use HDTManager.mapHDT()
and HDTManager.mapIndexedHDT()
to map the file instead of loading everything into main memory. The main advantage is that it requires much less memory, as it loads the data from disk on-demand, and therefore allows loading files even bigger than the machine’s main memory. It also allows the Operating System to keep the fragments cached even after closing the application. This results in faster initial loading time, and even allows several processes to access the same HDT file without having multiple copies in memory. The disadvantage is that searches can be slower, especially the first ones and/or when the system does not have much free memory to cache the read blocks.
Please note that the HDT Object is Thread-safe. You can share a single HDT instance between multiple threads of your application. HDT is very concurrency friendly, so you will notice a significative performance improvement when doing so. However, the iterator returned by hdt.search()
is not thread-safe and should only be used from the Thread that did the hdt.search()
petition.
4. Compiling the Java Library / Tools
Normally you will just need to download the distribution packages of the hdt-java library. If you want to compile the library yourself, you can download the sources from the HDT-java Google Code Project, and compile them using Apache Maven:
$ git clone https://code.google.com/p/hdt-java/ $ cd hdt-java $ mvn clean package