Manual of the Java HDT library

1. Get the HDT Library
2. Compiling the Java Library / Tools
3. Using the Java Command Line tools
4. Generating and Searching HDT files programmatically from Java

<--Browse all libraries

 

1. Get the HDT Library

You can download the Java HDT Library both as binary distribution and also the source code from the GitHub repository.

Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show

 

 

2. Compiling the Java Library / Tools

The java library can be compiled using apache ant. Just download the source distribution and run ant jar to generate a jar package with the HDT library. For loading HDT files, there are no additional dependencies, just add the hdt-lib.jar to your application’s classpath. However, there are some optional dependencies:

  • Jena RIOT for parsing files in formats other than NTriples when generating HDT files.
  • JCommander to use the Java Command Line tools.

 

3. Using the Java Command Line Tools

Once you have compiled the library or downloaded the binary distribution, you can use the commandline line tools to convert/browse HDT files. You can use the convenient launch scripts (*.sh or *.bat depending on your OS) to execute them. These are the typical operations that you’d probably want to perform:

  • Convert your RDF Data to the HDT representation. You might need to increase the memory (The -Xmx1G option) inside the script to generate very big files, since this process is memory-intensive. Please note that the tool accepts input files compressed with GZIP (E.g. .nt.gz). To convert it, just do:
$ ./rdf2hdt.sh data/test.nt data/test.hdt
  • Convert an HDT to another serialization format, such as NTriples:
$ ./hdt2rdf.sh data/test.hdt data/test.hdtexport.nt
  • Open a terminal to search triple patterns within an HDT file:
$ ./hdtSearch.sh data/test.hdt
>> ? ? ?
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
http://example.org/uri4 http://example.org/predicate4 http://example.org/uri5
http://example.org/uri1 http://example.org/predicate1 "literal1"
http://example.org/uri1 http://example.org/predicate1 "literalA"
http://example.org/uri1 http://example.org/predicate1 "literalB"
http://example.org/uri1 http://example.org/predicate1 "literalC"
http://example.org/uri1 http://example.org/predicate2 http://example.org/uri3
http://example.org/uri1 http://example.org/predicate2 http://example.org/uriA3
http://example.org/uri2 http://example.org/predicate1 "literal1"
9 results shown.

>> http://example.org/uri3 ? ?
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
2 results shown.

>> exit
  • Extract the Header of an HDT file:
$ ./hdtInfo.sh data/test.hdt > header.nt

 

4. Generating and Searching HDT files programmatically from Java

public class ExampleGenerate {

	public static void main(String[] args) throws Exception {
		// Configuration variables
		String baseURI = "http://example.com/mydataset";
		String rdfInput = "/path/to/dataset.nt";
		String inputType = "ntriples";
		String hdtOutput = "/path/to/dataset.hdt";

		// Create HDT from RDF file
		HDT hdt = HDTManager.generateHDT(
                            rdfInput,         // Input RDF File
                            baseURI,          // Base URI
                            RDFNotation.parse(inputType), // Input Type
                            new HDTSpecification(),   // HDT Options
                            null              // Progress Listener
                );

		// OPTIONAL: Add additional domain-specific properties to the header:
		//Header header = hdt.getHeader();
		//header.insert("myResource1", "property" , "value");

		// Save generated HDT to a file
		hdt.saveToHDT(hdtOutput, null);
	}
}
// Load an HDT and perform a search. (examples/ExampleSearch.java)
public static void main(String[] args) throws Exception {
	// Load HDT file. 
        // NOTE: Use loadIndexedHDT() for ?P?, ?PO or ??O queries
	HDT hdt = HDTManager.loadHDT("data/example.hdt", null);

	// Search pattern: Empty string means "any"
	IteratorTripleString it = hdt.search("", "", "");
	while(it.hasNext()) {
		TripleString ts = it.next();
		System.out.println(ts);
	}
}

You can also use HDTManager.mapHDT() and HDTManager.mapIndexedHDT() to map the file instead of loading everything into main memory. The main advantage is that it requires much less memory, as it loads the data from disk on-demand, and therefore allows loading files even bigger than the machine’s main memory. It also allows the Operating System to keep the fragments cached even after closing the application. This results in faster initial loading time, and even allows several processes to access the same HDT file without having multiple copies in memory. The disadvantage is that searches can be slower, especially the first ones and/or when the system does not have much free memory to cache the read blocks.

Please note that the HDT Object is Thread-safe. You can share a single HDT instance between multiple threads of your application. HDT is very concurrency friendly, so you will notice a significative performance improvement when doing so. However, the iterator returned by hdt.search() is not thread-safe and should only be used from the Thread that did the hdt.search() petition.