HDT-java 1.1

1. Get the HDT Library
2. Using the Java Command Line tools
3. Generating and Searching HDT files programmatically from Java
4. Compiling the Java Library / Tools

Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show

 

<--Browse all libraries

 

1. Get the HDT Library

The HDT library is available as a Maven artifact, to use it in your project, just add the following dependency to your pom.xml:

<dependency>
   <groupId>org.rdfhdt</groupId>
   <artifactId>hdt-java-core</artifactId>
   <version>1.1</version>
</dependency>

If you want to use the command line tools, you can download the hdt-java-package that includes the required jars, dependencies and some launcher scripts.

The code of the library is open source, distributed under the LGPL license on our HDT-java Google Code Project.

 

2. Using the Java Command Line Tools

Once you have downloaded the binary distribution, you can use the command line line tools to convert/browse HDT files. You can use the convenient launch scripts (*.sh or *.bat depending on your OS) to execute them. These are the typical operations that you’d probably want to perform:

  • Convert your RDF Data to the HDT representation. You might need to increase the memory (The -Xmx1G option) inside the bin/java.env script to generate very big files, since this process is memory-intensive. Please note that the tool accepts input files compressed with GZIP (E.g. .nt.gz). To convert it, just do:
$ bin/rdf2hdt.sh data/test.nt data/test.hdt
  • Convert an HDT to another serialization format, such as NTriples:
$ bin/hdt2rdf.sh data/test.hdt data/test.hdtexport.nt
  • Open a terminal to search triple patterns within an HDT file:
$ bin/hdtSearch.sh data/test.hdt
>> ? ? ?
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
http://example.org/uri4 http://example.org/predicate4 http://example.org/uri5
http://example.org/uri1 http://example.org/predicate1 "literal1"
http://example.org/uri1 http://example.org/predicate1 "literalA"
http://example.org/uri1 http://example.org/predicate1 "literalB"
http://example.org/uri1 http://example.org/predicate1 "literalC"
http://example.org/uri1 http://example.org/predicate2 http://example.org/uri3
http://example.org/uri1 http://example.org/predicate2 http://example.org/uriA3
http://example.org/uri2 http://example.org/predicate1 "literal1"
9 results shown.

>> http://example.org/uri3 ? ?
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
2 results shown.

>> exit
  • Extract the Header of an HDT file:
$ bin/hdtInfo.sh data/test.hdt > header.nt

 

3. Generating and Searching HDT files programmatically from Java

public class ExampleGenerate {

	public static void main(String[] args) throws Exception {
		// Configuration variables
		String baseURI = "http://example.com/mydataset";
		String rdfInput = "/path/to/dataset.nt";
		String inputType = "ntriples";
		String hdtOutput = "/path/to/dataset.hdt";

		// Create HDT from RDF file
		HDT hdt = HDTManager.generateHDT(
                            rdfInput,         // Input RDF File
                            baseURI,          // Base URI
                            RDFNotation.parse(inputType), // Input Type
                            new HDTSpecification(),   // HDT Options
                            null              // Progress Listener
                );

		// OPTIONAL: Add additional domain-specific properties to the header:
		//Header header = hdt.getHeader();
		//header.insert("myResource1", "property" , "value");

		// Save generated HDT to a file
		hdt.saveToHDT(hdtOutput, null);

		// IMPORTANT: Close hdt when no longer needed
		hdt.close();
	}
}
// Load an HDT and perform a search. (examples/ExampleSearch.java)
public static void main(String[] args) throws Exception {
	// Load HDT file. 
        // NOTE: Use loadIndexedHDT() for ?P?, ?PO or ??O queries
	HDT hdt = HDTManager.loadHDT("data/example.hdt", null);

	// Search pattern: Empty string means "any"
	IteratorTripleString it = hdt.search("", "", "");
	while(it.hasNext()) {
		TripleString ts = it.next();
		System.out.println(ts);
	}

	// IMPORTANT: Close hdt when no longer needed
	hdt.close();
}

You can also use HDTManager.mapHDT() and HDTManager.mapIndexedHDT() to map the file instead of loading everything into main memory. The main advantage is that it requires much less memory, as it loads the data from disk on-demand, and therefore allows loading files even bigger than the machine’s main memory. It also allows the Operating System to keep the fragments cached even after closing the application. This results in faster initial loading time, and even allows several processes to access the same HDT file without having multiple copies in memory. The disadvantage is that searches can be slower, especially the first ones and/or when the system does not have much free memory to cache the read blocks.

Please note that the HDT Object is Thread-safe. You can share a single HDT instance between multiple threads of your application. HDT is very concurrency friendly, so you will notice a significative performance improvement when doing so. However, the iterator returned by hdt.search() is not thread-safe and should only be used from the Thread that did the hdt.search() petition.

 

4. Compiling the Java Library / Tools

Normally you will just need to download the distribution packages of the hdt-java library. If you want to compile the library yourself, you can download the sources from the HDT-java Google Code Project, and compile them using Apache Maven:

$ git clone https://code.google.com/p/hdt-java/
$ cd hdt-java
$ mvn clean package