1. Get the HDT Source Code 2. Compiling the C++ Implementation 3. Using the C++ Command Line Tools 4. Generating and Browsing HDT Files programmatically |
1. Get the HDT Source Code
You can download the C++ HDT Library from its GitHub repository.
Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show
2. Compiling the C++ Implementation
To compile the library run make
under the dir hdt-lib, this will generate the library and tools.
The implementation can have the following optional dependencies:
- Raptor RDF Parser Library Version 2+ (optional) This allows importing RDF in many serialization formats, i.e RDF/XML, Turtle, N3. etc. To activate uncomment the line
USE_RAPTOR=true
in the Makefile. If raptor is not used, the library will only be able to load RDF in NTriples format. - libz (optional) Allows to load files in ntriples compressed with GZIP (i.e. file.nt.gz) and gzipped HDTs (file.hdt.gz). To activate uncomment the line
USE_LIBZ=true
in the Makefile. - Kyoto Cabinet (optional) Allows generating big RDF Datasets on machines without much RAM Memory, by creating a temporary Kyoto Cabinet Database. To activate uncomment the line
USE_KYOTO=true
in the Makefile and edit the includepath / librarypath.
3. Using the C++ Command Line Tools
After compiling, these are the typical operations that you will perform:
- Create the HDT representation of your RDF Data:
$ tools/rdf2hdt data/test.nt data/test.hdt
- Convert an HDT to another serialization format, such as NTriples:
$ tools/hdt2rdf data/test.hdt data/test.hdtexport.nt
- Open a terminal to search triple patterns within an HDT file:
$ tools/hdtSearch data/test.hdt >> ? ? ? http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5 http://example.org/uri4 http://example.org/predicate4 http://example.org/uri5 http://example.org/uri1 http://example.org/predicate1 "literal1" http://example.org/uri1 http://example.org/predicate1 "literalA" http://example.org/uri1 http://example.org/predicate1 "literalB" http://example.org/uri1 http://example.org/predicate1 "literalC" http://example.org/uri1 http://example.org/predicate2 http://example.org/uri3 http://example.org/uri1 http://example.org/predicate2 http://example.org/uriA3 http://example.org/uri2 http://example.org/predicate1 "literal1" 9 results shown. >> http://example.org/uri3 ? ? http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5 2 results shown. >> exit
- Extract the Header of an HDT file:
$ tools/hdtInfo data/test.hdt > header.nt
- Replace the Header of an HDT file with a new one. For example by editing the existing one as extracted using
hdtInfo
$ tools/replaceHeader data/test.hdt data/testOutput.hdt newHeader.nt
4. Generating and Browsing HDT Files programmatically
- Generate an HDT representation from an RDF file (available under
examples/generate.cpp
)
#include <HDTManager.hpp> using namespace hdt; int main(int argc, char *argv[]) { HDTSpecification spec; // Read RDF into an HDT file. HDT *hdt = HDTManager::generateHDT( "data/test.nt", // Input file "http://example.org/test", // Base URI NTRIPLES, // Input Format spec // Additional HDT Options ); // OPTIONAL: Add additional domain-specific properties to the header //Header *header = hdt->getHeader(); //header->insert("myResource1", "property", "value"); // Save HDT to a file hdt->saveToHDT("data/test.hdt"); delete hdt; }
- Open an HDT file and search Triple patterns (examples/search.cpp):
#include <iostream> #include <HDTManager.hpp> using namespace std; using namespace hdt; int main(int argc, char *argv[]) { // Load HDT file (Use mapIndexedHDT if you plan to use ?p?, ?po or ??o queries. HDT *hdt = HDTManager::mapHDT("data/test.hdt"); // Enumerate all triples matching a pattern ("" means any) IteratorTripleString *it = hdt->search("http://example.org/uri3","",""); while(it->hasNext()){ TripleString *triple = it->next(); cout << triple->getSubject() << ", " << triple->getPredicate() << ", " << triple->getObject() << endl; } delete it; // Remember to delete iterator to avoid memory leaks! delete hdt; // Remember to delete instance when no longer needed! }
Please note that you need to use mapIndexedHDT
if you plan to do ?p?
, ?po
or ??o
queries. The first time this method is called on an HDT, the library will generate a filename.hdt.index
in the same directory of the original file, task that will take a few seconds/minutes depending on the size of the dataset. Subsequent calls will just load the hdt.index
file and will be faster. You can safely delete the .hdt.index
file if you don’t plan to use it. It will be automatically regenerated the next time is required.