Chapter 12. Relations And A Bit Of Content [DRAFT]

Table of Contents

12.1. This is a DRAFT! Give your opinion!
12.2. Content vs. Properties
12.2.1. Blobs
12.3. The MimetypeRegistry
12.4. Improving upcoming events with photos
12.4.1. Nuxeo's internal relations
12.5. Using the annotation API with upcoming
12.5.1. Properties of relations
12.5.2. Seeing relations in action
12.6. Exercises

12.1. This is a DRAFT! Give your opinion!

If you have any comments, questions, or general-purpose harassment you would like give us about this book, then please use the comment form at the bottom of each page! We promise that we will try to incorporate any feedback you give (minus the profanity, of course), will respond to your questions, and credit you appropriately. I certainly hope that readers have not made the cheap joke that this is the first chapter with even a bit of content!

12.2. Content vs. Properties

Although Nuxeo EP 5 is described as an "Enterprise Content Management system," none of our lessons to this point have dealt at all with the content of documents! We have showed you how to create a new document type, associated schemas with it, handle events, access control, etc. and not discussed at all how to read even one byte of content! Why? There are two reasons. The first, and probably less important, reason we have favored discussing properties rather than content is that most developers are familiar with the reading of bytes and manipulating them programmatically from a typical filesystem. Thus, the features of Nuxeo that involving manipulating a document's meta-data and presentation were likely to be more interesting to the reader. The second, and probably more important reason, we have favored discussing Nuxeo's property features over content is simple: content is a property.

One of the supplied schemas in Nuxeo is file. This schema has one property defined on it, the filename property. This is used, by convention, to store the filename of a file that has been imported into Nuxeo. The Nuxeo web UI follows this convention anytime you have the opportunity to "upload" a file. If you were to write a program that does a bulk import of files into Nuxeo you should follow this convention too, for example. Also by convention there is a property file:content that holds the bytes of a file that has been imported into Nuxeo. This is how "content" gets turned into a property. There is no rule that says that content must be stored on this property, but if you follow this convention your documents will work nicely with the web UI of Nuxeo. Finally, it should be clear by now that you may have multiple properties with "the content" on a Nuxeo document. There is no notion of a distinguished property that always has "the sole content" of a document. You might, for example, have a document that has multiple translations and the content of them living on properties like version_english and version_française.

12.2.1. Blobs

The type of the data stored in a property in Nuxeo is a Blob. A Blob represents a large collection of unstructured bytes, so it fits the idea of content well. Blobs are used in Nuxeo to allow Nuxeo's infrastructure a place to introduce various types of optimizations. The number and complexity of these is beyond the scope of this book, but a couple of examples may be helpful. If you have a large amount of content Nuxeo will not load the content from the Blob until it is actually needed (fetched via an InputStream) even if you read the property's "value." Another example is that if your Nuxeo installation is configured to have multiple servers with the web front-end on one server and the content repository on another, Nuxeo will use a Blob that understands how to fetch data efficiently to the front-end from the back-end. All of this is hidden from you as an application developer; you simply manipulate Blob objects.

When you want to create content by means other than using the Web UI - such as in a test - there are a number of Blob implementations that will make your life easier. You can use the StringBlob to create a Blob of content from a Java String. In the tests for this lesson, we use the StringBlob to create content for a text file. Our tests for this lesson also use the FileBlob to create content from an existing file on the local filesystem.

Here is a listing of the key parts of the public API for a Blob. If you are familiar with Java's IO interfaces, much of this will be familiar to you.

public interface Blob {

    //how many bytes does the blob hold, if known
    long getLength();

    //cache the mime type
    String getMimeType();
    void setMimeType(String mimeType);

    //cache the digest (such as MD5 hash)
    String getDigest();
    void setDigest(String digest);

    //cache the originating filename
    String getFilename();
    void setFilename(String filename);

    //read content in various formats
    InputStream getStream() throws IOException;
    Reader getReader() throws IOException;
    byte[] getByteArray() throws IOException;
    String getString() throws IOException;

    //bulk transfer the entire blob content to various types of output
    void transferTo(OutputStream out) throws IOException;
    void transferTo(Writer out) throws IOException;
    void transferTo(File file) throws IOException;
}

12.3. The MimetypeRegistry

We have brought up the subject of content because in this lesson we will be creating relations between documents (see below) if and only if the documents are photos. A document's "mime type" is a string like "image/jpeg". The part before the slash indicating the main, or content, type of the document and the part after the slash indicating more specific information about the format of the content, or its subtype. For example, the MP3 files that contain your music typically have the mime type "audio/mp3" and web pages are written, at least normally, using "text/html". When we do not know, or can't figure out, a document's mime type we use the mime type "application/octet" which should be interpreted as "bunch of bytes we don't know the format of." A complete list of mime types is hard to write because various programs and developers are constantly creating new ones; however, the Internet Assigned Numbers Authority (IANA) periodically makes a significant list available.

Although some systems use the filename to determine the mime type of an object with simple rules, such as all files that end with ".mp3" have mime type "audio/mp3", a more reliable method is to actually interogate the content itself. There are various libraries available that know where to probe a file to see what mime type it is. These libraries work by knowing things of the form "files of type audio/mp3 always have a byte with content 0 at the 421st position and a byte with content 255 at the 97th position in the file."

Nuxeo, naturually, provides a service that implements this functionality, the MimeTypeRegistry. You access this service in the normal way, with Framework.getService(MimetypeRegistry.class). Once you have a reference to the registry, a block of code like the following can be used to determine (or at least try to determine) the mime type of a DocumentModel that you expect has the file schema associated with it:

    protected String mimeTypeOfFileDocument(CoreSession session,
            DocumentModel documentModel) throws Exception {
        String filename = (String) documentModel.getProperty("file", "filename");
        if (filename == null) {
            return "application/octet";
        }
        MimetypeRegistry registry = Framework.getService(MimetypeRegistry.class);
        // this just gets the blob from the default place that the UI puts it
        Blob content = (Blob) documentModel.getProperty("file", "content");
        if (content == null) {
            return "application/octet";
        }
        String type = registry.getMimetypeFromStreamWithDefault(
                content.getStream(), "application/octet");
        log.info("Found mime type of " + documentModel.getPathAsString()
                + " is " + type);
        return type;
    }

12.4. Improving upcoming events with photos

The previous two topics, content and mime types, have been brought in so that we can discuss photos about events. In particular, we are going to improve our code for handling the event that a new document has been created. The goal is to allow members of our special group of "social directors" to add photos about events, they are type of people who would do this! These photos can be taken before the event, as a preview and to encourage participation in the upcoming event, or they can be taken after and are considered photos taken at the event. Here is the set of rules we would like to encode, more specifically.

Create a relation between a new document D and an upcoming event if:

  • The owner of D is a member of the group of the "socialButterflies" OR the currently logged in user is a member of that group AND

  • D and the upcoming event are in the same directory (i.e. have the same parent) in the repository AND

  • The mime type of D is a recognized image format (like JPEG or PNG)

A relation in Nuxeo is a triple, or tuple, of three items. This triple is usually written as (SUBJECT, VERB, OBJECT). These relations allow two documents, the SUBJECT and OBJECT that would otherwise be considered quite distict to show they are, well... "related" by the VERB. If the verb were something like "is translation of" then you can assume that the original content is the subject document and the translation is the object document. You can create these relations by hand using the Nuxeo EP 5 web UI with the "Relations" tab that is shown when you examine a document in detail, as is highlighted in this screen capture:

You can see in the darker portion of the screen capture, that the author is creating a relation between a document called "Bowie With Cigarette" and another document in the repository that will be found via search. The type of relation is shown in the Predicate field (fancier than saying plain old "Verb" field!). Nuxeo ships with a set of 5 basic predicates including the "Is based on" above. The other four are "conforms to," "references," "replaces," and "requires."

If you are interested in adding to or changing the set of relations that are shown in the annotations tab, you will need to look at the Nuxeo source code and then rebuild the Nuxeo server. This type of change is why we are open source! The Nuxeo source has a bundle called nuxeo-platform-relations-default-config. This bundle has a directory that defines the default set of relations. This bundle has two files in (of course!) in src/main/resources/directories that define the relations and their inverses - the inverse of a relation being the way of expressing that a document is the OBJECT of the relation. These two files reference, of course labels like label.relation.predicate.References that are then translated to the user's preferred language. These, just as in our previous lesson, end up being referred to by messages_en.properties or similar files for other languages; for the default installation these can be found in the Nuxeo source code, in the bundle nuxeo-platform-lang in the src/main/resources/nuxeo.war/WEB-INF/classes directory. We hope that this continues to hammer home the point that your bundles and Nuxeo's bundles work in the same manner.

12.4.1. Nuxeo's internal relations

Although Nuxeo exposes the relations tab, as shown above, those are the relations it expects users to create "by hand." There are two built-in "features" of Nuxeo that are actually just some user interface candy that hide relations. First, the Nuxeo comment system is implemented using relations. When you use the UI to create a comment, like "This is an example" in the snap above, the Nuxeo comment system swings into action. It creates a new document, stores the contents of the comment in property and then creates a relation between the new document and the one the comment is related to. Then, when it displays the user interface for a document, it is easy to find the comments on the document being displayed since the pointers (relations) are already in place.

Note

We hope it comes as no surprise to you that the Nuxeo comment system defines a new schema type (in a file called comment.xsd!) with some fields in it like comment, author, and creationTime. Further, the comment system uses an event listener to become informed about documents getting deleted. It uses this event to delete comment documents when the document they are associated with gets removed. The only part of the comment system that uses parts of the API we have not yet covered is the User Interface presented on the comments tab. You could build most of the Nuxeo comment system with the lessons you have had to this point! Nuxeo has no magic.

The other feature that makes use of relations is Nuxeo is the annotation system that was released with version 5.2 of Nuxeo EP. This system allows you to select regions of a document - either an rectangular region of an image document or some text from the document - and associate an annotation with it. Just as a comment is a subject document with a relation to the whole of an object document, an annotation is a subject document with a relation to a part of an object document. When you display an image, for example, Nuxeo's annotation system uses the relations to display an image like this:

As you can see in the image above, Nuxeo knows which part of the original image is the source of the annotation and the text that was originally entered as the annotation itself is displayed in the box to the side. (This image is from the Inauguration of President Obama, the person pictured is Chief Justice John Roberts, who goofed up his lines for the presidential oath of office.)

12.5. Using the annotation API with upcoming

We have revamped the now somewhat busy DocumentCreationListener class to include a few new or refactored functions. One of these is the "middleman" that when given a new document, model, it checks to see if any event is "related" to it:

    private void checkDocumentForRelationToEvent(DocumentEventContext context,
            DocumentModel model) throws Exception {
        RelationHelper helper = new RelationHelper();
        CoreSession session = context.getCoreSession();
        log.info("Checking document for relation:" + model.getPathAsString());
        DocumentModel eventDoc = helper.isEventImage(session, model);
        if (eventDoc != null) {
            createBasicRelation(model, eventDoc, false);
        }
        eventDoc = helper.isPreviewImage(session, model);
        if (eventDoc != null) {
            createBasicRelation(model, eventDoc, true);
        }
        session.save();
    }

This method is straightforward most areas. First we create an instance of RelationHelper, a new class for the lesson, that implements the rules for adding an annotation we explained in the previous section. The helper has two methods, isEventImage and isPreviewImage, that implement the two different cases of photos from an event or images that are previews of the event. If the helper wants to indicated success, it return a DocumentModel that represents the Upcoming document that the new (photo) DocumentModel is related to. It returns null if no relations should be created.

So, this method should be called anytime a new document is created, to see if meets our criteria for being an image of the right kind and from a user in the right group. When this method wants to create a relation between two documents, it calls createBasicRelation, but to discuss that function we need to explain properties of relations.

12.5.1. Properties of relations

Properties of relations can be thought of in one of two ways. The first, more straightforward, way is to think of a Map that is associated with the relation that concerns the relation itself. But, who wants to do things the straightforward way? The more complex (or perhaps more sophisticated?) way to think of a relation property as another relation in which the first relation is the the subject. Seem strange? Returning to our discussion above of the "is comment on" relation, if document A has a comment document B then the "creationTime" property is really about the time that the relation is created. Both documents A and B may have existed for some time in the repository and would have their own respective creation time properties (part of the dublincore schema!). To finish this example we could write this "meta-relation" of creation time with the verbs in bold as: ((B is comment on A) created on tuesday at 5pm). No matter which formulation you prefer, you should consider this code sample:

    private void createBasicRelation(DocumentModel imageDocumentModel,
            DocumentModel eventDocumentModel, boolean isPreview)
            throws Exception {

        QNameResource imageAsResource = getDocumentResource(imageDocumentModel);
        QNameResource eventDocumentAsResource = getDocumentResource(eventDocumentModel);
        ResourceImpl predFwd, predRev;
        if (isPreview) {
            predFwd = new ResourceImpl(REFERENCES_URI);
            predRev = new ResourceImpl(REFERENCES_URI);
        } else {
            predFwd = new ResourceImpl(BASED_ON_URI);
            predRev = new ResourceImpl(BASED_ON_URI);
        }

        String commentText;
        if (isPreview) {
            commentText = "About the event";
        } else {
            commentText = "From the event";
        }

        Statement fwd = new StatementImpl(imageAsResource, predFwd,
                eventDocumentAsResource);
        setProperties(fwd, "[Automatically Added]", new Date(), commentText);

        Statement rev = new StatementImpl(eventDocumentAsResource, predRev,
                imageAsResource);
        setProperties(rev, "[Automatically Added]", new Date(), commentText);
        ArrayList<Statement> stmtList = new ArrayList<Statement>();
        stmtList.add(fwd);
        stmtList.add(rev);
        getRelationManager().add(DEFAULT, stmtList);
    }

The first thing you will notice is that we immediately turn both DocumentModels, imageDocumentModel and eventDocumentModel, into named resources. In the interest of simplicity, QNameResource type can be thought of as a URL that describes the server and location in the repository of the given documents. We then compute other "resources" that reference the verbs of the relation, one for the forward (Fwd) direction and one for the reverse (Rev). You can see that if the photo is a preview, we use the verb REFERENCES and if the photo is of the event itself we use the verb BASED_ON. These may not be ideal verbs for the relations we have, but these are known to the Nuxeo UI - one of the five built-in verbs - and the UI will display them correctly without modification.

You should see that we create some comments about the relations ("properties" of the relation in terms of the title of this subsection) to help the display be more informative. The critical Nuxeo type for creating Relations is Statement which we create via its StatementImpl implementation class. A Statement is the basic relation element in the Nuxeo system, so named because the SUBJECT VERB OBJECT relation can be read as a statement of fact (try it yourself!). You should see that we are adding some properties to each statment, again to help the user understand when looking at the relation that it was automatically created.

Finally, we retreive the RelationManager object (not shown) as a service and add our two statements, as a list, to the graph named DEFAULT. This graph is the one displayed and manipulated by the Nuxeo EP 5 web user interface. If you are using the RelationManager to maintain multiple graphs of relations (a.k.a. statements) you should probably be reading the RDF spec, not this document! Most folks will want to stick to the DEFAULT graph of relations.

12.5.2. Seeing relations in action

Since we have used relations that are understood by the Nuxeo EP 5 Web User Interface, the effects of our modified CreateDocumentListener and its new "helper" can be seen through your web browser. In the screen snap below, there has been an Upcoming document created by someone realted to an upcoming show by (some guy named) David Bowie. A member of the socialButterfiles group has helpfully added a two images to the repository to convince the (skeptical) masses to attend the concert by this unknown artist. The UI depicted in this snap is reached by clicking on the Relations tab of the Upcoming event:

The event document has two relations created for the incoming case (forward) and two for the reverse or outgoing case. It should be clear that the comments/author have been created by the method createBasicRelation above. Further, if you cilck on the links you will presented with the document that is the "other end" of the relation, in this case both photos that have been uploaded by one of the social directors.

12.6. Exercises

  1. We have supplied you with all the code to make this lesson work in this lesson's skeleton (lesson-relations in the usual svn repository), with one exception. We have not "hooked up" the middle man code above in the method checkDocumentForRelationToEvent, to the event handler code. You need to call this code in the right part of the event handler to make sure that the relations get created. Be careful to make sure that you follow the event handler code carefully to find the right place and be sure not to "miss" some photo documents.