info.sswap.impl.empire.io
Class ClosureBuilder

java.lang.Object
  extended by info.sswap.impl.empire.io.ClosureBuilder

public class ClosureBuilder
extends java.lang.Object

Builds a closure of statements that are contained in a particular model by recursively following the URIs of Resources (until a certain depth is reached, a time/byte limit is reached, or no new statements could be added by this method). TODO: Add a note about reusing this object

Author:
Blazej Bulka

Nested Class Summary
private  class ClosureBuilder.DereferenceTask
          A dereference task that is executed in the separate thread.
 
Field Summary
private  long bytesRead
          The counter of bytes read by this closure builder.
private  int connectTimeout
           
private static java.util.Set<com.hp.hpl.jena.rdf.model.Resource> DEREFERENCED_TYPES
          A set of types of that are dereferenced during the closure computation.
private  java.util.List<java.lang.String> dereferenceQueue
          The current list of URIs to be dereferenced for the current depth.
private  ExpressivityChecker expressivityChecker
           
private static java.util.Set<com.hp.hpl.jena.rdf.model.Property> HIERARCHY_PROPERTIES
           
private  java.util.List<java.lang.String> ignoredNamespaces
           
private static org.apache.log4j.Logger LOGGER
           
private  java.util.Set<java.lang.String> markedURIs
          The set of URIs that were marked during closure computation.
private  long maxBytes
          The maximum number of bytes transferred over the network this builder should not exceed while computing the closure Please note: the byte limits from separate threads are synchronized at the beginning and the end of each dereference operation.
private  int maxThreads
          Maximum amount of concurrent threads (in addition to the caller's thread) for concurrent downloads.
private  long maxTime
          The maximum amount of time this builder is allowed to spend while computing the closure.
private  ModelCache modelCache
           
private  int readTimeout
           
private  long startTime
           
 
Constructor Summary
ClosureBuilder(long maxBytes, long maxTime, int maxThreads, ModelCache modelCache, java.util.List<java.lang.String> ignoredNamespaces)
          Creates a new closure builder.
 
Method Summary
private  void addStatements(com.hp.hpl.jena.rdf.model.Model destModel, com.hp.hpl.jena.rdf.model.Model sourceModel)
          Adds all statements from a source model to the destination model.
private  void addTypeStatements(java.lang.String url, com.hp.hpl.jena.rdf.model.Model dest, com.hp.hpl.jena.rdf.model.Model source)
          Adds only type statements from the source model that describe the specified resource.
private  boolean belongsToDereferencedProperty(com.hp.hpl.jena.rdf.model.Model model, com.hp.hpl.jena.rdf.model.Resource resource)
          Attempts to guess whether a resource is a property mentioning (in a position other than predicate position) that should be dereferenced (as opposed to an individual).
private  boolean belongsToDereferencedType(com.hp.hpl.jena.rdf.model.Model model, com.hp.hpl.jena.rdf.model.Resource resource)
          Attempts to guess whether a resource is a type definition that should be dereferenced (as opposed to an individual).
 Closure build(com.hp.hpl.jena.rdf.model.Model baseModel, java.lang.String modelURI)
          Build closure for the given model using default values for max closure parameters.
 Closure build(com.hp.hpl.jena.rdf.model.Model baseModel, java.lang.String modelURI, int degree)
          Build closure for the given model
 Closure build(com.hp.hpl.jena.rdf.model.Model baseModel, java.lang.String modelURI, int degree, int hierarchyDegree)
           
private  com.hp.hpl.jena.rdf.model.Model dereferenceURL(java.lang.String urlString)
          Retrieves a document at the specified URL and parses it into the Jena model.
private  com.hp.hpl.jena.rdf.model.Model doClosure(com.hp.hpl.jena.rdf.model.Model sourceModel, boolean typeStatementsOnly)
          Computes one degree of closure on the passed model
private  com.hp.hpl.jena.rdf.model.Model doTypeRetrievingStep(com.hp.hpl.jena.rdf.model.Model baseModel, com.hp.hpl.jena.rdf.model.Model sourceModel)
           
private  void enqueueHierarchyURIs(com.hp.hpl.jena.rdf.model.Model model)
           
private  void enqueueURI(java.lang.String uri)
          Add a URI to the dereference queue, unless the URI has been already processed (i.e., has either been dereferenced or is awaiting dereference).
private  void enqueueURIs(com.hp.hpl.jena.rdf.model.Model model)
          Add URIs of all resources to a dereference queue (unless they have already been processed).
private  long getBytesRemaining()
          Gets the amount of bytes that can still be transferred before the limit is reached
 int getConnectTimeout()
          Gets the connect timeout for a single HTTP connection
 int getReadTimeout()
          Gets the read timeout for a single HTTP connection
private  java.util.Collection<java.lang.String> getResourceURIs(com.hp.hpl.jena.rdf.model.Model model)
          Extract URIs of all resources that are non-anonymous resources and do not belong to standard ontologies (e.g., OWL, RDF etc.)
private  long getTimeRemaining()
          Gets the amount of milliseconds that can still be used for closure computation before the time limit is reached.
private  long getTimeUsed()
          Gets the amount of milliseconds that were already used while computing the closure
private  java.util.Collection<java.lang.String> getUpHierarchyResourceURIs(com.hp.hpl.jena.rdf.model.Model model)
           
private  boolean isIgnoredNamespace(java.lang.String uri)
          Checks whether a concept belongs to an ignored namespace.
private  boolean isOWLDL(com.hp.hpl.jena.rdf.model.Model baseModel, com.hp.hpl.jena.rdf.model.Model model, int currentDegree)
           
private  boolean isOWLMembersListOfType(com.hp.hpl.jena.rdf.model.Model model, com.hp.hpl.jena.rdf.model.Resource listResource, com.hp.hpl.jena.rdf.model.Resource typeResource)
           
private  boolean modelContainsPattern(com.hp.hpl.jena.rdf.model.Model m, com.hp.hpl.jena.rdf.model.Resource s, com.hp.hpl.jena.rdf.model.Property p, com.hp.hpl.jena.rdf.model.RDFNode o)
           
 void setConnectTimeout(int connectTimeout)
          Sets the connect timeout for a single HTTP connection
 void setReadTimeout(int readTimeout)
          Sets the read timeout for a single HTTP connection
private  boolean shouldBeEnqueued(com.hp.hpl.jena.rdf.model.Model model, com.hp.hpl.jena.rdf.model.RDFNode node, boolean property)
          Checks whether the URI of an RDFNode from the Jena model should be enqueued for retrieval.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOGGER

private static final org.apache.log4j.Logger LOGGER

DEREFERENCED_TYPES

private static java.util.Set<com.hp.hpl.jena.rdf.model.Resource> DEREFERENCED_TYPES
A set of types of that are dereferenced during the closure computation.


HIERARCHY_PROPERTIES

private static java.util.Set<com.hp.hpl.jena.rdf.model.Property> HIERARCHY_PROPERTIES

maxBytes

private long maxBytes
The maximum number of bytes transferred over the network this builder should not exceed while computing the closure Please note: the byte limits from separate threads are synchronized at the beginning and the end of each dereference operation. Therefore, this limit should be treated as a soft limit, because it is possible that multiple concurrent threads will exceed it and it will be noticed only after they finished their invidual transfers (no new transfers will occur, however).


maxTime

private long maxTime
The maximum amount of time this builder is allowed to spend while computing the closure.


maxThreads

private int maxThreads
Maximum amount of concurrent threads (in addition to the caller's thread) for concurrent downloads.


bytesRead

private long bytesRead
The counter of bytes read by this closure builder.


startTime

private long startTime

markedURIs

private java.util.Set<java.lang.String> markedURIs
The set of URIs that were marked during closure computation. A marked URI is considered as processed, and will never be enqueued for future dereference.


dereferenceQueue

private java.util.List<java.lang.String> dereferenceQueue
The current list of URIs to be dereferenced for the current depth.


modelCache

private ModelCache modelCache

expressivityChecker

private ExpressivityChecker expressivityChecker

connectTimeout

private int connectTimeout

readTimeout

private int readTimeout

ignoredNamespaces

private java.util.List<java.lang.String> ignoredNamespaces
Constructor Detail

ClosureBuilder

ClosureBuilder(long maxBytes,
               long maxTime,
               int maxThreads,
               ModelCache modelCache,
               java.util.List<java.lang.String> ignoredNamespaces)
Creates a new closure builder. This method is intentionally package private to encourage the use of ClosureBuilderFactory for creating the objects of this type.

Parameters:
maxBytes - the maximum amount of bytes this closure builder may transfer while computing the closure (soft limit)
maxTime - the maximum amount of time this closure builder can spend computing the closure
maxThreads - the maximum amount of concurrent threads (for concurrent downloads)
Method Detail

getConnectTimeout

public int getConnectTimeout()
Gets the connect timeout for a single HTTP connection

Returns:
the connect timeout in milliseconds

setConnectTimeout

public void setConnectTimeout(int connectTimeout)
Sets the connect timeout for a single HTTP connection

Parameters:
connectTimeout - the connect timeout in milliseconds

getReadTimeout

public int getReadTimeout()
Gets the read timeout for a single HTTP connection

Returns:
the readTimeout the read timeout in milliseconds

setReadTimeout

public void setReadTimeout(int readTimeout)
Sets the read timeout for a single HTTP connection

Parameters:
readTimeout - the read timeout in milliseconds

build

public Closure build(com.hp.hpl.jena.rdf.model.Model baseModel,
                     java.lang.String modelURI)
Build closure for the given model using default values for max closure parameters.

Parameters:
baseModel - the model whose closure should be computed (may be null, in such a case the model will be first dereferenced from the given URI)
modelURI - the URI of the model (may be null). If the passed model is null, this URI will be used to dereference the initial model. In all other cases, this URI will just prevent an unnecessary re-download of the initial contents of the model
Returns:
the computed closure

build

public Closure build(com.hp.hpl.jena.rdf.model.Model baseModel,
                     java.lang.String modelURI,
                     int degree)
Build closure for the given model

Parameters:
baseModel - the model whose closure should be computed (may be null, in such a case the model will be first dereferenced from the given URI)
modelURI - the URI of the model (may be null). If the passed model is null, this URI will be used to dereference the initial model. In all other cases, this URI will just prevent an unnecessary re-download of the initial contents of the model
degree - the maximum degree for the closure
Returns:
the computed closure

build

public Closure build(com.hp.hpl.jena.rdf.model.Model baseModel,
                     java.lang.String modelURI,
                     int degree,
                     int hierarchyDegree)

doTypeRetrievingStep

private com.hp.hpl.jena.rdf.model.Model doTypeRetrievingStep(com.hp.hpl.jena.rdf.model.Model baseModel,
                                                             com.hp.hpl.jena.rdf.model.Model sourceModel)

isOWLDL

private boolean isOWLDL(com.hp.hpl.jena.rdf.model.Model baseModel,
                        com.hp.hpl.jena.rdf.model.Model model,
                        int currentDegree)

doClosure

private com.hp.hpl.jena.rdf.model.Model doClosure(com.hp.hpl.jena.rdf.model.Model sourceModel,
                                                  boolean typeStatementsOnly)
Computes one degree of closure on the passed model

Parameters:
sourceModel - the model whose closure (one level) should be computed
Returns:
the computed closure

getBytesRemaining

private long getBytesRemaining()
Gets the amount of bytes that can still be transferred before the limit is reached

Returns:
the amount of bytes that can still be transferred (0 or more)

getTimeUsed

private long getTimeUsed()
Gets the amount of milliseconds that were already used while computing the closure

Returns:
the amount of milliseconds since the startTime

getTimeRemaining

private long getTimeRemaining()
Gets the amount of milliseconds that can still be used for closure computation before the time limit is reached.

Returns:
the amount of milliseconds (0 or more)

dereferenceURL

private com.hp.hpl.jena.rdf.model.Model dereferenceURL(java.lang.String urlString)
                                                throws java.io.IOException,
                                                       DataAccessException
Retrieves a document at the specified URL and parses it into the Jena model. This method obeys the byte limits while downloading the URL and updates the byte counters appropriately. This method is invoked by the concurrent worker threads. Note: in case of concurrent downloads, while the closure-wide byte counters will be updated correctly, the exceeding of the byte limit may not be noticed until a new concurrent download starts. (Every stream has its own internal counter, and these are not synchronized with each other during the download.)

Parameters:
urlString - the string containing the URL to be retrieved
Returns:
the downloaded model
Throws:
ByteLimitExceededException - if the byte limit has been exceeded during the dereferencing.
java.io.IOException - if a generic I/O error should occur
DataAccessException

enqueueURI

private void enqueueURI(java.lang.String uri)
Add a URI to the dereference queue, unless the URI has been already processed (i.e., has either been dereferenced or is awaiting dereference).

Parameters:
uri - the URI to be enqueued

enqueueURIs

private void enqueueURIs(com.hp.hpl.jena.rdf.model.Model model)
Add URIs of all resources to a dereference queue (unless they have already been processed).

Parameters:
model - the model whose URIs should be added

enqueueHierarchyURIs

private void enqueueHierarchyURIs(com.hp.hpl.jena.rdf.model.Model model)

modelContainsPattern

private boolean modelContainsPattern(com.hp.hpl.jena.rdf.model.Model m,
                                     com.hp.hpl.jena.rdf.model.Resource s,
                                     com.hp.hpl.jena.rdf.model.Property p,
                                     com.hp.hpl.jena.rdf.model.RDFNode o)

isOWLMembersListOfType

private boolean isOWLMembersListOfType(com.hp.hpl.jena.rdf.model.Model model,
                                       com.hp.hpl.jena.rdf.model.Resource listResource,
                                       com.hp.hpl.jena.rdf.model.Resource typeResource)

belongsToDereferencedType

private boolean belongsToDereferencedType(com.hp.hpl.jena.rdf.model.Model model,
                                          com.hp.hpl.jena.rdf.model.Resource resource)
Attempts to guess whether a resource is a type definition that should be dereferenced (as opposed to an individual). This method is used before reasoning (which would give us a straightforward answer to this question).

Parameters:
model - the model where the Resource is mentioned
resource - the resource that should be checked
Returns:
true if the resource belongs to a type that should be dereferenced.

belongsToDereferencedProperty

private boolean belongsToDereferencedProperty(com.hp.hpl.jena.rdf.model.Model model,
                                              com.hp.hpl.jena.rdf.model.Resource resource)
Attempts to guess whether a resource is a property mentioning (in a position other than predicate position) that should be dereferenced (as opposed to an individual). This method is used before reasoning (which would give us a straightforward answer to this question).

Parameters:
model - the model where the Resource is mentioned
resource - the resource that should be checked
Returns:
true if the resource belongs to a property that should be dereferenced.

shouldBeEnqueued

private boolean shouldBeEnqueued(com.hp.hpl.jena.rdf.model.Model model,
                                 com.hp.hpl.jena.rdf.model.RDFNode node,
                                 boolean property)
Checks whether the URI of an RDFNode from the Jena model should be enqueued for retrieval. We only enqueue non-anonymous nodes that do not belong to standard ontologies (e.g., RDF, OWL etc) -- the concepts in these ontologies do not follow the SSWAP convention of having every URI dereferenceable. This method takes into account the fact that this implementation of the API represents anonymous nodes internally as nodes with a special URI (these nodes are not enqueued as any other anonymous nodes).

Parameters:
node - the node that should be checked
Returns:
true, if the node should be enqueued

getResourceURIs

private java.util.Collection<java.lang.String> getResourceURIs(com.hp.hpl.jena.rdf.model.Model model)
Extract URIs of all resources that are non-anonymous resources and do not belong to standard ontologies (e.g., OWL, RDF etc.)

Parameters:
model - the model from which the URIs should be extracted
Returns:
a collection of uris

getUpHierarchyResourceURIs

private java.util.Collection<java.lang.String> getUpHierarchyResourceURIs(com.hp.hpl.jena.rdf.model.Model model)

addStatements

private void addStatements(com.hp.hpl.jena.rdf.model.Model destModel,
                           com.hp.hpl.jena.rdf.model.Model sourceModel)
Adds all statements from a source model to the destination model. Additionally, it copies other relevant information (e.g., namespace prefixes from the source model to the destination model)

Parameters:
destModel - the destination model
sourceModel - the source model

addTypeStatements

private void addTypeStatements(java.lang.String url,
                               com.hp.hpl.jena.rdf.model.Model dest,
                               com.hp.hpl.jena.rdf.model.Model source)
Adds only type statements from the source model that describe the specified resource. This method is used for the final iteration of the closure computation, when there may be still untyped resources (effectively making the underlying ontology OWL Full). In an attempt to fill the type information, the final iteration is performed by dereferencing the URI of the untyped resource, and then this method is called to retrieve the type information.

Parameters:
url - the URL of the concept in destination model whose type should be completed with information from the source model
dest - the destination model
source - the source model for the type information

isIgnoredNamespace

private boolean isIgnoredNamespace(java.lang.String uri)
Checks whether a concept belongs to an ignored namespace. These concepts are not enqueued/dereferenced

Parameters:
uri - the URI to be checked
Returns:
true if this is a standard concept


Copyright (c) 2011, iPlant Collaborative, University of Arizona, Cold Spring Harbor Laboratories, University of Texas at Austin.