info.sswap.impl.empire.model
Class RDFTSVWriter

java.lang.Object
  extended by info.sswap.impl.empire.model.RDFTSVWriter
All Implemented Interfaces:
com.hp.hpl.jena.rdf.model.RDFWriter

public class RDFTSVWriter
extends java.lang.Object
implements com.hp.hpl.jena.rdf.model.RDFWriter

Write the model to the output stream in Tab Separated Value (TSV) according to the fundamental RDF truism: subject, predicate, object -> row, col, value The output is (possibly multiple) rows per subject, and one column per predicate. Subjects have multiple rows when they have multiple property instances. URIs are URLencoded to escape tabs; String datatype values are quoted.

Author:
Damian Gessler

Nested Class Summary
protected  class RDFTSVWriter.DataStructure
           
protected  class RDFTSVWriter.NameMapper
           
 
Field Summary
static java.lang.String BNODE_PREFIX
          Blank (anonymous) node prefix designator
protected  int BUF_SIZ
          Read/write buffer size
protected  java.lang.String CHARSET
          Default charset for URLEncoding
protected  java.lang.String COLUMN_1_HEADER
          Column header for row1, col1
protected  java.lang.String DELIMITER_ENC
           
protected  java.lang.String DELIMITER_STR
          Default field delimiter
protected  byte[] NEWLINE_BYTES
           
protected  java.lang.String NEWLINE_STR
          Newline
protected  com.hp.hpl.jena.rdf.model.RDFErrorHandler rdfErrorHandler
           
 
Fields inherited from interface com.hp.hpl.jena.rdf.model.RDFWriter
NSPREFIXPROPBASE
 
Constructor Summary
RDFTSVWriter()
           
 
Method Summary
protected  RDFTSVWriter.DataStructure makeDataStructure(com.hp.hpl.jena.rdf.model.Model model)
          The data structure for a RDF model is: a map of "rows", where the key to each row is a (unique) RDF subject; each subject points to a map of (unique) predicates; each predicate points to a list of (possibly non-unique) values
private  java.lang.String makeKey(java.lang.String key1, java.lang.String key2, java.lang.String key3)
           
 com.hp.hpl.jena.rdf.model.RDFErrorHandler setErrorHandler(com.hp.hpl.jena.rdf.model.RDFErrorHandler rdfErrorHandler)
           
 java.lang.Object setProperty(java.lang.String arg0, java.lang.Object arg1)
          No properties are currently supported.
 void write(com.hp.hpl.jena.rdf.model.Model model, java.io.OutputStream outputStream, java.lang.String base)
          Mapping RDF (subject, predicate, object) => (row, col, value) will result in a (sparse) (n+1) x (m+1) matrix, where n is the number of (possibly replicated) subjects (rows), and m is the number of (unique) predicates (over all subjects).
 void write(com.hp.hpl.jena.rdf.model.Model model, java.io.Writer writer, java.lang.String base)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BUF_SIZ

protected final int BUF_SIZ
Read/write buffer size

See Also:
Constant Field Values

DELIMITER_STR

protected final java.lang.String DELIMITER_STR
Default field delimiter

See Also:
Constant Field Values

DELIMITER_ENC

protected java.lang.String DELIMITER_ENC

NEWLINE_STR

protected java.lang.String NEWLINE_STR
Newline


NEWLINE_BYTES

protected final byte[] NEWLINE_BYTES

CHARSET

protected final java.lang.String CHARSET
Default charset for URLEncoding

See Also:
Constant Field Values

BNODE_PREFIX

public static final java.lang.String BNODE_PREFIX
Blank (anonymous) node prefix designator

See Also:
Constant Field Values

COLUMN_1_HEADER

protected final java.lang.String COLUMN_1_HEADER
Column header for row1, col1

See Also:
Constant Field Values

rdfErrorHandler

protected com.hp.hpl.jena.rdf.model.RDFErrorHandler rdfErrorHandler
Constructor Detail

RDFTSVWriter

public RDFTSVWriter()
Method Detail

setErrorHandler

public com.hp.hpl.jena.rdf.model.RDFErrorHandler setErrorHandler(com.hp.hpl.jena.rdf.model.RDFErrorHandler rdfErrorHandler)
Specified by:
setErrorHandler in interface com.hp.hpl.jena.rdf.model.RDFWriter

setProperty

public java.lang.Object setProperty(java.lang.String arg0,
                                    java.lang.Object arg1)
No properties are currently supported.

Specified by:
setProperty in interface com.hp.hpl.jena.rdf.model.RDFWriter

write

public void write(com.hp.hpl.jena.rdf.model.Model model,
                  java.io.Writer writer,
                  java.lang.String base)
Specified by:
write in interface com.hp.hpl.jena.rdf.model.RDFWriter

write

public void write(com.hp.hpl.jena.rdf.model.Model model,
                  java.io.OutputStream outputStream,
                  java.lang.String base)
Mapping RDF (subject, predicate, object) => (row, col, value) will result in a (sparse) (n+1) x (m+1) matrix, where n is the number of (possibly replicated) subjects (rows), and m is the number of (unique) predicates (over all subjects). "+1" row for the header row; "+1" for column for the first subject column.

The first cell (row1, col1) is given the fixed header COLUMN_1_HEADER. Multiple property values for a subject generate a new row per property instance, such that the first row is the most densely populated and each successive row is populated until all property values are reported.

The model of fixing "columns" to non-repetitive predicates ("attributes", "fields"), and replicating additional row entries for a subject is akin to 1NF (first normal form) in database table modeling best practice.

Example:

 RDF triples:
 urn:subject1 urn:predicate1 strValue
 urn:subject1 urn:predicate2 numValue1
 urn:subject1 urn:predicate2 numValue2
 urn:subject2 urn:predicate1 strValue
 urn:subject2 urn:predicate3 anyURIValue
 urn:subject3 urn:predicate4 urn:subject2
 urn:subject3 urn:predicate4 _:b0
 
 TSV (4+1 x 5+1) = (5,6):
 Resource \t urn:predicate1 \t urn:predicate2 \t urn:predicate3 \t predicate4
 urn:subject1 \t "strVvalue" \t numValue1       \t\t
 urn:subject1 \t\t numValue2 \t\t
 urn:subject2 \t "strVvalue" \t\t "anyURIValue" \t
 urn:subject3 \t\t\t\t urn:subject2
 urn:subject3 \t\t\t\t _:b0
 
 

Specified by:
write in interface com.hp.hpl.jena.rdf.model.RDFWriter

makeDataStructure

protected RDFTSVWriter.DataStructure makeDataStructure(com.hp.hpl.jena.rdf.model.Model model)
The data structure for a RDF model is: a map of "rows", where the key to each row is a (unique) RDF subject; each subject points to a map of (unique) predicates; each predicate points to a list of (possibly non-unique) values

Parameters:
model - the source RDF model to transform
Returns:
a data structure of rows and properties

makeKey

private java.lang.String makeKey(java.lang.String key1,
                                 java.lang.String key2,
                                 java.lang.String key3)


Copyright (c) 2011, iPlant Collaborative, University of Arizona, Cold Spring Harbor Laboratories, University of Texas at Austin.