Manual of HDT integration with Jena

1. Get the HDT Library
2. Using HDT Files from Apache Jena
3. Create a SPARQL Endpoint of HDT files using Jena Fuseki

<--Browse all libraries

 

1. Get the HDT Library

The Jena wrapper is included in the the binary distribution of HDT-java. You can also checkout the latest source code from the Java HDT library in the GitHub repository.

Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show

 

 

2. Using HDT Files from Apache Jena

The jena wrapper provides a Graph implementation on top of the HDT Library to access HDT files as a normal read-only Jena Model. In order to use it, you will need to include the hdt-jena.jar in addition to the hdt-lib.jar in your application’s classpath (see the manual of the Java HDT library).

Then, you can use this fragment of code to get the Jena Model (hdt-jena/examples/HDTSparql.java):

// Load HDT file using the hdt-java library
HDT hdt = HDTManager.mapIndexedHDT("path/to/file.hdt", null);

// Create Jena Model on top of HDT.
HDTGraph graph = new HDTGraph(hdt);
Model model = ModelFactory.createModelForGraph(graph);

// Use Jena Model as Read-Only data storage, e.g. Using Jena ARQ for SPARQL.

This Jena Model can also be used in conjunction with Jena ARQ to solve SPARQL Queries.

 

3. Create a SPARQL Endpoint of HDT files using Jena Fuseki

Thanks to the Jena Integration, you can use Jena Fuseki to set up a public endpoint of one or many HDT files in minutes. You just need to specify which HDT files you want to publish by following these steps:

  1. Download Jena Fuseki. You will need to download and extract the file named jena-fuseki-XXX.zip or tar.gz.
  2. Create a Fuseki configuration file (See an example). You will need to initialize the HDT Assembler class using the property ja:loadClass on the fuseki:Server instance, and define the HDT-related classes:
    [] rdf:type fuseki:Server ;
       ja:loadClass "org.rdfhdt.hdtjena.HDTGraphAssembler" .
    
    hdt:DatasetHDT rdfs:subClassOf ja:RDFDataset .
    hdt:HDTGraph rdfs:subClassOf ja:Graph .

    Then, you create a Service with query and read capabilities. This service contains one or many datasets which in turn are associated to one or many graphs. Each Graph can be associated to an HDT file or any other Jena data source.

    <#service1> rdf:type fuseki:Service ;
        fuseki:name "hdtservice" ;
        fuseki:serviceQuery "query" ;
        fuseki:serviceReadGraphStore "get" ;
        fuseki:dataset <#dataset> .
    
    <#dataset> rdf:type ja:RDFDataset ;
        rdfs:label "Dataset" ;
        ja:defaultGraph <#graph1> ;
        ja:namedGraph 
    	[ ja:graphName <http://example.org/name1> ;
              ja:graph <#graph2> ] .
    
    <#graph1> rdfs:label "RDF Graph1 from HDT file" ;
            rdf:type hdt:HDTGraph ;
            hdt:fileName "file1.hdt".
    
    <#graph2> rdfs:label "RDF Graph2 from HDT file" ;
            rdf:type hdt:HDTGraph ;
            hdt:fileName "file2.hdt" .
    
  3. Then, you need to edit the Fuseki launch script to include the hdt-lib.jar and hdt-jena.jar libraries to the classpath. By default Fuseki uses the -jar option to launch the program, which will ignore any additional -classpath directive, so you will need to remove the -jar option, add the fuseki-server.jar and HDT jars to the classpath, and the fully qualified name of the fuseki launcher class.

    After these changes, the line that calls java in the Fuseki’s launch script fuseki-server.bat for windows will look something like:

    java -Xmx1200M -classpath "hdt-lib.jar;hdt-jena.jar;fuseki-server.jar" \ 
      org.apache.jena.fuseki.FusekiCmd %*

    On Mac/Linux’s fuseki-server:

    java  $JVM_ARGS -classpath "hdt-lib.jar:hdt-jena.jar:$JAR" \ 
      org.apache.jena.fuseki.FusekiCmd "$@"
  4. And finally, launch Fuseki using your custom config file:
    $ ./fuseki-server --config=fuseki_example.ttl
  5. Try your SPARQL Endpoint on a Web Browser: http://localhost:3030.