HDT-java 1.1 integration with Jena

1. Using HDT Files from Apache Jena
2. Create a SPARQL Endpoint of one HDT file using Jena Fuseki (Simple)
3. Create a SPARQL Endpoint of several HDT files using Jena Fuseki (Advanced)

Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show

 

 

1. Using HDT Files from Apache Jena

The jena wrapper provides a Graph implementation on top of the HDT Library to access HDT files as a normal read-only Jena Graph or Jena Model.

The Jena integration is published as a maven artifact, with artifactId hdt-jena. To use it from your project just include it as a dependency in your pom.xml:

<dependency>
   <groupId>org.rdfhdt</groupId>
   <artifactId>hdt-jena</artifactId>
   <version>1.1</version>
</dependency>

Then, you can use this snippet of code to get the Jena Model:

// Load HDT file using the hdt-java library
HDT hdt = HDTManager.mapIndexedHDT("path/to/file.hdt", null);

// Create Jena Model on top of HDT.
HDTGraph graph = new HDTGraph(hdt, true);
Model model = ModelFactory.createModelForGraph(graph);

// ... Use Jena Model as Read-Only data storage.

model.close(); // Close when it's no longer needed.

This Jena Model can be used in conjunction with Jena ARQ to solve SPARQL Queries.

Query query = QueryFactory.create("SELECT * WHERE {?s ?p ?o}");
QueryExecution qe = QueryExecutionFactory.create(query, model);
ResultSetFormatter.outputAsCSV(System.out, qe.execSelect());
qe.close();

You can see the full example in HDTSparql.java.

 

2. Create a SPARQL Endpoint of an HDT file using Jena Fuseki (Simple)

Thanks to the Jena Integration, you can set up a public SPARQL endpoint with the content of an HDT file in minutes. Just follow the next steps:

  1. Download our modified Jena Fuseki HDT that extends Fuseki to support HDT files. Then extract the jena-fuseki-<VERSION>-distribution.tar.gz file.
  2. $ tar zxf hdt-fuseki-<VERSION>-distribution.tar.gz
    
  3. Enter the directory and launch the endpoint specifying the path of the .hdt to publish and a name for the dataset:
    $ bin/hdtEndpoint --hdt="path/to/file.hdt" /myhdt
    

    NOTE: The first time you open an HDT it will take a while to generate an .hdt.index. The following times it will be faster.

  4. Try your new SPARQL endpoint on your browser: http://localhost:3030. You need to select Control Panel / myhdt and there you have a shining text box to type your SPARQL query. If you want to connect another service that expects a SPARQL endpoint, the full url is: http://yourdomain.com:3030/swdf/query

 

2. Create a SPARQL Endpoint using a custom Fuseki config file (Advanced)

Since we use Jena Fuseki as engine, you can use all the advanced capabilities of Fuseki to set up your endpoint. For example, you can create your own config file to describe what to serve in your endpoint. Here’s the original Fuseki config documentation.

Here are the additional options that you need to add to the Fuseki config file to serve HDT files:

  1. Download hdt-fuseki and decompress it if you haven’t done so.
  2. Create a Fuseki configuration file, for example starting from fuseki_hdt_example.ttl.
  3. Make sure that you import the HDT Assembler class using the property ja:loadClass on the fuseki:Server instance, and define the HDT-related classes:
    [] rdf:type fuseki:Server ;
       ja:loadClass "org.rdfhdt.hdtjena.HDTGraphAssembler" .
    
    hdt:DatasetHDT rdfs:subClassOf ja:RDFDataset .
    hdt:HDTGraph rdfs:subClassOf ja:Graph .
  4. Declare your service with query capabilities. A service is composed by one or many datasets. A dataset is composed by one optional default graph, and zero to many named graphs. Each graph can be either an HDTGraph associated to one HDT file, or any other Jena data source.
    <#service1> rdf:type fuseki:Service ;
        fuseki:name "hdtservice" ;
        fuseki:serviceQuery "query" ;
        fuseki:serviceReadGraphStore "get" ;
        fuseki:dataset <#dataset> .
    
    <#dataset> rdf:type ja:RDFDataset ;
        rdfs:label "Dataset" ;
        ja:defaultGraph <#graph1> ;
        ja:namedGraph 
    	[ ja:graphName <http://example.org/name1> ;
              ja:graph <#graph2> ] .
    
    <#graph1> rdfs:label "RDF Graph1 from HDT file" ;
            rdf:type hdt:HDTGraph ;
            hdt:fileName "file1.hdt".
    
    <#graph2> rdfs:label "RDF Graph2 from HDT file" ;
            rdf:type hdt:HDTGraph ;
            hdt:fileName "file2.hdt" .
    
  5. Finally, launch Fuseki, this time using your config file:
    $ ./fuseki-server --config=fuseki_example.ttl
  6. Try your SPARQL Endpoint on a Web Browser: http://localhost:3030.