Angel \”Java\” Lopez on Blog

January 6, 2012

Content Repository: Links, News and Resources (1)

Filed under: AjCoRe, Content Repository, Java, Links — ajlopez @ 5:35 pm

Recently, I was working on my open source simple content repository, AjCoRe, having nodes and properties, based on the JSR-170 concepts. Now, I’m reading some additional resources: API, concrete implementations, uses cases, etc. This is the list of links I have collected about the topic:

http://en.wikipedia.org/wiki/Content_repository

A content repository is a store of digital content with an associated set of data management, search and access methods allowing application-independent access to the content, rather like a digital library, but with the ability to store and modify content in addition to searching and retrieving. Acontent repository thus typically forms the technical underpinning of a content application, like a Content Management System or a Document Management System. It functions as the logical storage facility for content

A content repository exposes amongst other the following facilities:

  • Read/write of content
  • Hierarchy and sort order management
  • Query / search
  • Versioning
  • Access control
  • Import / export
  • Locking
  • Life-cycle management
  • Retention and hold / records management

What is Java Content Repository
http://onjava.com/pub/a/onjava/2006/10/04/what-is-java-content-repository.html?page=1
The Java Content Repository API (JSR-170) is an attempt to standardize an API that can be used for accessing a content repository. If you’re not familiar with content management systems (CMS) such as Documentum, Vignette, or FileNet, then you must be wondering what a content repository is. Think of a content repository as a generic application "data store" tht can be used for storing both text and binary data (images, word processor documents, PDFs, etc.). One key feature of a content repository is that you don’t have to worry about how the data is actually stored: data could be stored in a RDBMS or a filesystem or as an XML document. In addition to providing services for storing and retrieving your data, most content repositories provide advanced services such as uniform access control, searching, versioning, observation, locking, and more.
Introducing the Java Content Repository API

http://www.ibm.com/developerworks/java/library/j-jcr/

JCR Primer
http://jtoee.com/jsr-170/
The following pages contain a primer on the Java Content Repository specification.
JSR170, the Java Content Repository, constitutes an extremely complex specification. The successor, JCR 2.0 / JSR 283, specification adds even more complexity. However, the JCR REPRESENTS a very generic and object-oriented content repository which touches almost all features known in the space. The content repository is not a full-fledged content management system or a content management API. It is only the small subset of a content repository, a storage engine, which a content management system can be built on top of.

Java Content Repository: The Best Of Both Worlds
http://java.dzone.com/articles/java-content-repository-best

Compact Node Type Notation in a Nutshell
http://jtoee.com/jsr-170/compact-node-type-notation-in-a-nutshell/

Content Management with Apache Jackrabbit
http://www.slideshare.net/jukka/content-management-with-apache-jackrabbit

JCR In Action
http://www.slideshare.net/cziegeler/jcr-in-action-apachecon-us-2009
Content-based Applications with Apache Jackrabbit

Apache Jackrabbit Examples
http://wiki.apache.org/jackrabbit/ExamplesPage

Introducing the Alfresco Java Content Repository API
http://wiki.alfresco.com/wiki/Introducing_the_Alfresco_Java_Content_Repository_API
This article introduces you to the Alfresco implementation of the Java Content Repository API (aka JCR or JSR-170) by designing and developing a simple WIKI like back-end using both Level 1 and Level 2 JCR features.

Oracle Beehive Java Content Repository
http://docs.oracle.com/cd/E13789_01/bh.100/e13801/toc.htm

Catch Jackrabbit and the Java Content Repository API
http://www.artima.com/lejava/articles/contentrepository.html

The SharePoint content repository: It’s just a database
http://searchwinit.techtarget.com/feature/The-SharePoint-content-repository-Its-just-a-database
SharePoint’s repository — where all its content lives, is indexed, and is version-controlled — isn’t some special data construct. It’s just a database — a SQL Server database, to be specific.

Apache Sling – Bringing Back the Fun
http://sling.apache.org/site/index.html
Apache Sling in five bullets points:
- REST based web framework
- Content-driven, using a JCR content repository
- Powered by OSGi
- Scripting inside, multiple languages (JSP, server-side javascript, Scala, etc.)
- Apache Open Source project

Apache Jackrabbit
http://en.wikipedia.org/wiki/Apache_Jackrabbit
Apache Jackrabbit is an open source content repository for the Java platform. The Jackrabbit project was started on August 28, 2004, when Day Software licensed an initial implementation of the Java Content Repository API (JCR). Jackrabbit was also used as the reference implementation of JSR-170, specified within the Java Community Process. The project graduated from the Apache Incubator on March 15, 2006, and is now a Top Level Project of the Apache Software Foundation.

Content Repository API for Java
http://en.wikipedia.org/wiki/Content_repository_API_for_Java

Content repository like JSR-170 in .net?
http://forums.asp.net/t/1201446.aspx/1

Sensenet
http://www.sensenet.com/
Open Source Sharepoint Alternative

Eclipse Enterprise Content Repository
http://www.slideshare.net/efge/eclipse-enterprise-content-repository-ecr
Overview of Nuxeo Core

IIOP enabled jackrabbit-jcr-rmi, .NET 2.0 Remoting Layer Implementation, .NET 2.0 Repository Explorer implementation, .NET 2.0 implementation of JSR-170 API
https://issues.apache.org/jira/browse/JCRRMI-24

JSR 283: Content Repository for JavaTM Technology API Version 2.0
http://jcp.org/en/jsr/detail?id=283

JCR or RDBMS
http://www.scribd.com/doc/11163161/JCR-or-RDBMS-why-when-how
Why, when, how?

My own work, AjCoRe
http://ajlopez.wordpress.com/category/ajcore/

My Links
http://delicious.com/ajlopez/contentrepository

More work on AjCoRe is coming.

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

December 14, 2011

AjCoRe, A Simple Content Repository (2) Stores

Filed under: .NET, AjCoRe, Content Repository, Open Source Projects — ajlopez @ 10:04 am

Previous Post

I did some advances in my open source project AjCoRe, simple implementation of a content repository written in C#:

https://github.com/ajlopez/AjCoRe

If Content Repository is a new concept to you, check my links http://delicious.com/ajlopez/contentrepository and Roy Fielding’s JSR-170 overview .

Key concepts: there are Workspaces. Each workspace has a Root Node. Each Node can have Child Nodes, and Properties. A Property has a Name and a Value:

In the above diagram, nodes and workspaces are represented by interfaces. The idea is to have different implementations of that abstractions. In my previous post, I presented two such implementations: one representing a file system, allowing only read only operations. And another implementation, a more extensible one, using arbitrary nodes in memory. Since then, I added the support of saving and retrieving those arbitrary nodes using a store.

Now, the project looks:

where:

A: It is the concrete base implementation of workspace and nodes, based in memory, with the new support of having an optional store

B: The read-only implementation of workspace and nodes, representing an existing filesystem/directory

C: The new IStore (abstract) and the first concrete implementation, using a directory hierarchy and XML files to persists nodes in a workspace

D: The transaction support, during the lifetime of a session

Let’s see the new store capabilities. The Stores.IStore interface:

public interface IStore
{
    IEnumerable<string> GetChildNames(string path);
    PropertyList LoadProperties(string path);
    void SaveProperties(string path, PropertyList properties);
    void RemoveNode(string path);
}

Stores.Xml.Store is the first concrete implementation. Anyone could write another ones, using JSON, a relational database or a NoSQL backend. You can create, remove nodes, updated their properties, using a session, in the same way as in the previous post. BUT NOW, the workspace can be created injecting an IStore implementation in its constructor. Example (more details in the tests):

// Reference store in a directory
Store store = new Store("c:\\myworkspace");
// Workspace using that store to retrieve root node and their descendant
// (lazy loading)
Workspace workspace = new Workspace(store, "myws");
// Session accesing that workspace
Session session = new Session(workspace);
// You can use session to get the root node
// in case you have no direct workspace reference
INode root = session.Workspace.RootNode;
// Updates are made into a transaction
using (var tr = session.OpenTransaction())
{
    // Accessing a node
    INode node = root.ChildNodes["father"];
    // Changing a property
    session.SetPropertyValue(node, "Name", "Adam");
    
    // Creating a nodes    
    INode newnode = session.CreateNode(node, "newson", new Property[] {
                        new Property("Name", "Abel")
                    });
    
    // Removing a node
    session.RemoveNode(newson);
    tr.Complete();
}

The tr.Complete()  is in charge of updating the XML files where the properties of nodes are saved. An example file:

<?xml version="1.0" encoding="utf-8"?>
<Properties>
  <Name>John</Name>
  <Age type="int">35</Age>
  <Male type="bool">true</Male>
  <Hired type="datetime">2000-01-01T00:00:00</Hired>
  <Height type="double">167.2</Height>
  <Salary type="decimal">120000.5</Salary>
</Properties>

The key code in Transaction.Complete:

var nodestoupdate = 
    this.operations.Where(op => !(op is RemoveNodeOperation)).Select(op => op.Node).Distinct();
var nodestodelete = 
    this.operations.Where(op => op is RemoveNodeOperation).Select(op => op.Node).Distinct();
nodestoupdate = nodestoupdate.Except(nodestodelete);
foreach (var node in nodestoupdate)
    this.store.SaveProperties(node.Path, node.Properties);
foreach (var node in nodestodelete)
    this.store.RemoveNode(node.Path);

I could improve it, taken in account that some IStore implementation could prefer update each property, instead of a full node (I think in a database store).

I want to write another IStore implementation supporting JSON (I should design how to save the original type of each property). Other pending work: retrieve a node from a workspace using its full path, or query nodes (using XPath? Hmmm… I'm still in doubt).

Keep tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

December 6, 2011

AjCoRe, a simple Content Repository (1) First Steps

Next Post

Some years ago, I discovered Apache Jackrabbit, open source project that implements JSR170 (see first my links (2008) at http://delicious.com/ajlopez/jsr170), but I didn’t pay attention to them. Past week, in an private email list, the content repository topic raised again, so I read some links:

http://en.wikipedia.org/wiki/Content_repository
http://en.wikipedia.org/wiki/Content_repository_API_for_Java
http://www.jcp.org/en/jsr/detail?id=170
http://jcp.org/en/jsr/detail?id=283

More links at

http://delicious.com/ajlopez/contentrepository

The first paper I read was Roy Fielding’s overview:

http://www.day.com/content/dam/day/whitepapers/JSR_170_White_Paper.pdf

The second one (more detailed, it is an specification) was the JSR283 spec:

http://download.oracle.com/otndocs/jcp/content_repository-2.0-pfd-oth-JSpec/

After quick reading both papers, I started to think how to implement it (using .NET; there is an open source implementation, SenseNet). I didn’t see the code or the API described in the JSRs. I want to have a clear, simple idea of what is essential, the core concepts to implements. Then, last weekend, I did a code kata: my first steps in AjCoRe, simple Content Repository:

https://github.com/ajlopez/AjCoRe

using TDD to pratice, as usual (you can see the Git log to view the evolution of my ideas and tests).

The key points are:

- There are Workspaces identified by name
- Any Workspace has a Root Node
- Any Node has properties
- A Property has a name and a value (a simple one, like String, DateTime, int, not a complex object)
- A Node can have a Child Nodes (an enumeration that can be empty)

Initially, in my first code and tests, I could create a Node directly using a public constructor. But in the current state, I prefer to use a controlled entry point for main operations, a Session. The client code/user should create a Session that manage a Workspace.

Now, I have TWO implementations of Workspace and Nodes (after a refactor step, I have INode and IWorkspace interface, and concrete implementations of them). As a proof of concept (mentioned in Fielding’s paper), I want to have a directory in a FileSystem, represented by read-only nodes, with FileNode, and DirectoryNode. Some tests:

[TestMethod]
[DeploymentItem("Files/FileSystem", "fs")]
public void RootNodeProperties()
{
    Workspace workspace = new Workspace("fs", "fs");
    INode root = workspace.RootNode;
    DirectoryInfo info = new DirectoryInfo("fs");
    Assert.AreEqual(info.Extension, root.Properties["Extension"].Value);
    Assert.AreEqual(info.FullName, root.Properties["FullName"].Value);
    Assert.AreEqual(info.Name, root.Properties["Name"].Value);
    Assert.AreEqual(info.CreationTime, root.Properties["CreationTime"].Value);
    Assert.AreEqual(info.CreationTimeUtc, root.Properties["CreationTimeUtc"].Value);
    Assert.AreEqual(info.LastAccessTime, root.Properties["LastAccessTime"].Value);
    Assert.AreEqual(info.LastAccessTimeUtc, root.Properties["LastAccessTimeUtc"].Value);
    Assert.AreEqual(info.LastWriteTime, root.Properties["LastWriteTime"].Value);
    Assert.AreEqual(info.LastWriteTimeUtc, root.Properties["LastWriteTimeUtc"].Value);
    Assert.AreEqual("fs", workspace.Name);
    Assert.IsNotNull(workspace.RootNode);
    Assert.AreEqual(string.Empty, workspace.RootNode.Name);
    Assert.IsNull(workspace.RootNode.Parent);
}
[TestMethod]
[DeploymentItem("Files/FileSystem", "fs")]
public void GetFilesFromRoot()
{
    Workspace workspace = new Workspace("fs", "fs");
    INode root = workspace.RootNode;
    Assert.IsNotNull(root.ChildNodes["TextFile1.txt"]);
    Assert.IsNotNull(root.ChildNodes["TextFile1.txt"]);
}
[TestMethod]
[DeploymentItem("Files/FileSystem", "fs")]
public void GetFileProperties()
{
    Workspace workspace = new Workspace("fs", "fs");
    INode root = workspace.RootNode;
    INode file = root.ChildNodes["TextFile1.txt"];
    FileInfo info = new FileInfo("fs/TextFile1.txt");
    Assert.AreEqual(info.Extension, file.Properties["Extension"].Value);
    Assert.AreEqual(info.FullName, file.Properties["FullName"].Value);
    Assert.AreEqual(info.Name, file.Properties["Name"].Value);
    Assert.AreEqual(info.CreationTime, file.Properties["CreationTime"].Value);
    Assert.AreEqual(info.CreationTimeUtc, file.Properties["CreationTimeUtc"].Value);
    Assert.AreEqual(info.LastAccessTime, file.Properties["LastAccessTime"].Value);
    Assert.AreEqual(info.LastAccessTimeUtc, file.Properties["LastAccessTimeUtc"].Value);
    Assert.AreEqual(info.LastWriteTime, file.Properties["LastWriteTime"].Value);
    Assert.AreEqual(info.LastWriteTimeUtc, file.Properties["LastWriteTimeUtc"].Value);
}
[TestMethod]
[DeploymentItem("Files/FileSystem", "fs")]
public void GetDirectoriesFromRoot()
{
    Workspace workspace = new Workspace("fs", "fs");
    INode root = workspace.RootNode;
    Assert.IsNotNull(root.ChildNodes["Subfolder1"]);
    Assert.IsNotNull(root.ChildNodes["Subfolder2"]);
}

Some notes:

- Workspace is AjCoRe.FileSystem.Workspace class in the above code.

- It’s constructor takes two arguments: the name of the workspace in the content repository, and the directory name (maybe relative) that it represents.

- The INode objects are instance of concrete class FileNode or DirectoryNode.

- The properties of file and directory nodes reflect the simple values you find in FileInfo and DirectoryInfo System.IO .NET objects)

- DirectoryNode ChildNodes property is a dynamic one: it is built in EACH invocation (I could have adopted a lazy approach, but in this way, the node collection reflects the CURRENT state of the file system):

public NodeList ChildNodes
{
    get
    {
	NodeList nodes = new NodeList();
	foreach (var di in this.info.GetDirectories())
	    nodes.AddNode(new DirectoryNode(this, di.Name, di));
	foreach (var fi in this.info.GetFiles())
 	    nodes.AddNode(new FileNode(this, fi.Name, fi));
	return nodes;
    }
}

It’s time to present the two main abstractioncs, INode:

public interface INode
{
    string Name { get; }
    INode Parent { get; }
    PropertyList Properties { get; }
    NodeList ChildNodes { get; }
    string Path { get; }
}

and IWorkspace:

public interface IWorkspace
{
    string Name { get; }
    INode RootNode { get; }
}

Notice that I didn’t need have a unique Identifier for a node in workspace. Every Node has a path (the concatenated names of their parents, using / as separator). I should implement the retrieve of a particular node using its path. I didn’t need a NodeType, yet. I’m following the YAGNI principle ;-)

The other IWorkspace/INode concrete implementation manage node and properties in memory. The nodes can be created and removed by code, and the property values can be changed. It’s my main implementation that I want to extend. The key piece to add: an IStore that can retrieve and save modified nodes to persistence store (many implementations: database, NoSql, Azure blob storage, Json files (representing node properties) in file system (representing the node hierarchy), cloud file system, Azure tables, etc…).

The creation of AjCoRe.Base.Workspace:

Workspace workspace = new Workspace("ws1", null);

The second parameter is the list of properties to put in the new Root Node of the new in memory workspace.

Then, you can get a session to it:

Session session = new Session(workspace);

You can navigate the node hierarchy as in the previous example, using workspace.RootNode and the ChildNodes enumeration. BUT, to modify then, you SHOULD use a transaction:

INode node = session.Workspace.RootNode;
using (var tr = session.OpenTransaction())
{
    session.SetPropertyValue(node, "Name", "Adam");
    session.SetPropertyValue(node, "Age", 800);

    tr.Complete();
}

You MUST commit the transaction explicity with tr.Complete(). If you missed it, the changed properties are restored to their previous values. Creation, removing of nodes and properties are also tracked during a transaction. You can create a new node with their initial properties:

INode root = session.Workspace.RootNode;
using (var tr = session.OpenTransaction())
{
    INode node = session.CreateNode(root, "person1", new List<Property>()
    {
        new Property("Name", "Adam"),
        new Property("Age", 800)
    });

    tr.Complete();
}

Or you can add, change, remove (setting its value to null) properties in a transaction:

INode root = session.Workspace.RootNode;
using (var tr = session.OpenTransaction())
{
    session.SetPropertyValue(root, "Name", "Adam");
    session.SetPropertyValue(root, "Age", 800);
    tr.Complete();
}

Then, using the session as entry point for modifications, I could track the changes in a Unit of Work, without using observers over properties in nodes (if I adopted that approach, probably I should manage to have observers for every node that the client code could traverse in the workspace). I have plans to implements something like Software Transactional Memory to support concurrency (I already have code for that feature in AjTalk and AjSharp).

Next posts: implementation details (transactions, session factories, workspace registries, etc..)

Next steps: implement persistence in a store, concurrent transactions.

Keep tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 57 other followers