Manual of the C++ HDT library

1. Get the HDT Source Code
2. Compiling the C++ Implementation
3. Using the C++ Command Line Tools
4. Generating and Browsing HDT Files programmatically

<--Browse all libraries

 

1. Get the HDT Source Code

You can download the C++ HDT Library from its GitHub repository.

Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show

 

 

2. Compiling the C++ Implementation

To compile the library run make under the dir hdt-lib, this will generate the library and tools.

The implementation can have the following optional dependencies:

  • Raptor RDF Parser Library Version 2+ (optional) This allows importing RDF in many serialization formats, i.e RDF/XML, Turtle, N3. etc. To activate uncomment the line USE_RAPTOR=true in the Makefile. If raptor is not used, the library will only be able to load RDF in NTriples format.
  • libz (optional) Allows to load files in ntriples compressed with GZIP (i.e. file.nt.gz) and gzipped HDTs (file.hdt.gz). To activate uncomment the line USE_LIBZ=true in the Makefile.
  • Kyoto Cabinet (optional) Allows generating big RDF Datasets on machines without much RAM Memory, by creating a temporary Kyoto Cabinet Database. To activate uncomment the line USE_KYOTO=true in the Makefile and edit the includepath / librarypath.

 

3. Using the C++ Command Line Tools

After compiling, these are the typical operations that you will perform:

    • Create the HDT representation of your RDF Data:
$ tools/rdf2hdt data/test.nt data/test.hdt
    • Convert an HDT to another serialization format, such as NTriples:
$ tools/hdt2rdf data/test.hdt data/test.hdtexport.nt
    • Open a terminal to search triple patterns within an HDT file:
$ tools/hdtSearch data/test.hdt
>> ? ? ?
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
http://example.org/uri4 http://example.org/predicate4 http://example.org/uri5
http://example.org/uri1 http://example.org/predicate1 "literal1"
http://example.org/uri1 http://example.org/predicate1 "literalA"
http://example.org/uri1 http://example.org/predicate1 "literalB"
http://example.org/uri1 http://example.org/predicate1 "literalC"
http://example.org/uri1 http://example.org/predicate2 http://example.org/uri3
http://example.org/uri1 http://example.org/predicate2 http://example.org/uriA3
http://example.org/uri2 http://example.org/predicate1 "literal1"
9 results shown.

>> http://example.org/uri3 ? ?
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
2 results shown.

>> exit
    • Extract the Header of an HDT file:
$ tools/hdtInfo data/test.hdt > header.nt
    • Replace the Header of an HDT file with a new one. For example by editing the existing one as extracted using hdtInfo
$ tools/replaceHeader data/test.hdt data/testOutput.hdt newHeader.nt

 

4. Generating and Browsing HDT Files programmatically

#include <HDTManager.hpp>

using namespace hdt;

int main(int argc, char *argv[]) {
	HDTSpecification spec;

	// Read RDF into an HDT file.
 	HDT *hdt = HDTManager::generateHDT(
                     "data/test.nt",   // Input file
                     "http://example.org/test", // Base URI
                     NTRIPLES,         // Input Format
                     spec              // Additional HDT Options
                   );

	// OPTIONAL: Add additional domain-specific properties to the header
	//Header *header = hdt->getHeader();
	//header->insert("myResource1", "property", "value");

	// Save HDT to a file
	hdt->saveToHDT("data/test.hdt");

	delete hdt;
}
#include <iostream>
#include <HDTManager.hpp>

using namespace std;
using namespace hdt;

int main(int argc, char *argv[]) {

	// Load HDT file (Use mapIndexedHDT if you plan to use ?p?, ?po or ??o queries.
	HDT *hdt = HDTManager::mapHDT("data/test.hdt");

	// Enumerate all triples matching a pattern ("" means any)
	IteratorTripleString *it = hdt->search("http://example.org/uri3","","");
	while(it->hasNext()){
		TripleString *triple = it->next();
		cout << triple->getSubject() <<
		", " << triple->getPredicate() <<
		", " << triple->getObject() << endl;
	}
	delete it; // Remember to delete iterator to avoid memory leaks!
	delete hdt; // Remember to delete instance when no longer needed!
}

Please note that you need to use mapIndexedHDT if you plan to do ?p?, ?po or ??o queries. The first time this method is called on an HDT, the library will generate a filename.hdt.index in the same directory of the original file, task that will take a few seconds/minutes depending on the size of the dataset. Subsequent calls will just load the hdt.index file and will be faster. You can safely delete the .hdt.index file if you don’t plan to use it. It will be automatically regenerated the next time is required.