Angel \”Java\” Lopez on Blog

April 22, 2009

Introducing AjProcessor (Part 1)

Filed under: .NET, AjMessages, Grid Computing, Windows Communication Foundation — ajlopez @ 7:05 pm

Last month, I was working in AjProcessor code, as part of my code katas Google project:

http://code.google.com/p/ajcodekatas/source/browse#svn/trunk/AjProcessor

The idea is the evolution of some exploratory coding with AjMessages and other examples. I want to have an application, based in message passing, that can be run in a grid of heterogeneus machines. The application could 
be partitioned in steps, and each step could run in the same host machine, or on a another machine. The deployment in machines should be transparent to the writing of the application code.

Some of those goals were reached with AjMessages, but this time, I want a more clear kick off, based on the lesson learnt of my previous attempts.

First, I want to write down some basic ideas, to explain the motivation of the initial code written for AjProcessor. The basic idea, is to have message processors. A message processor is a code that receives a message, and process it.

 

The message has a payload (an arbitrary object, string, anything), and additional properties, with key/value.

Another essential brick in this lego-like application, is the element that publish messages. That is, there is a publisher:

 

An outgoing message could be received by any other code. The same message can be processed by more than one processor:

 

It’s like the publish/subscribe pattern. Another pattern to take into account, is a router-like component. Depending on the message (property, content), it could be send to different targets.

 

Frequently, a component will implement both roles, message processor and message publisher. In order to call plain old .NET objects, it would be nice to have a processor that receives a message, take some part of the message (i.e. the payload), and send it as a parameter to one its methods. The return value could feed a new message.

 

The components can be arranged as a chain, implementing a pipeline to process a message:

 

A more complex arrangement could receive a message, and forward it to different pipelines, according to some property or content in incoming message.

 

(this concept could be mapped to an application in AjMessage, but without the idea of distributed processing). A more interesting idea is to run that kind of pipeline routers in many machines

 

AjProcessor infrastructure is in charge of the serialization/routing/deserialization of messages between host machine. It could be WCF or anything else. The basic idea is to have a pluggable transport.

Well, these are the seed ideas behind the project. In an upcoming post, I’ll explain some of the current code (only few interfaces and classes, now).

Angel “Java” Lopez
http://www.ajlopez.com/en
http://twitter.com/ajlopez

December 4, 2008

Augmented Reality with Windows HPC Server

Filed under: Grid Computing, High Performance Computing — ajlopez @ 7:09 am

These days, my team is working with Windows High Performance Computing Server 2008. During my research on HPC, I found this demo (via Twitter search about HPC):

This work is frrom the people of High Performance Computing Center de Stuttgart (HLRS)

Augmented reality is a kind of virtual reality that combines real images with virtual ones. You can use a transparent headset and see a 3D scheme of an engine while you are repairing it. The group at HLRS is working with the Microsoft Technical Computing Initiative in such scenarios, more info at Augmented Reality in the automotive industry

There are photos of its installations at

Microsoft HPC Institute – HLRS – University of Stuttgart

It’s like my own hardware at home…. :-)

I found additional videos at

Augmented Reality mit Windows HPC

More videos about HPC and MPI debugging at

HLRS

More information about Augmented Reality

What Is the Metaverse and Should HPC Care?
Augmented reality – Wikipedia, the free encyclopedia
Mixed reality – Wikipedia, the free encyclopedia

International Symposium on Mixed and Augmented Reality (ISMAR)

http://www.augmented.org/
How Augmented Reality Will Work

I want my personal Holodek!

Angel “Java” Lopez
http://www.ajlopez.com/en
http://twitter.com/ajlopez

November 10, 2008

Grid as a Service

Filed under: Grid Computing — ajlopez @ 11:21 am

Since last year, I’m working with technologies related to distributed computing. Currently, my work is related to Windows High Performance Computing. But I was in touch with DSS/CCR, WCF, and I examined Java implementations, like GridGain. I mentioned Grid as a Service as an idea to implement in my post Grid Computing Programming.

A Grid as a Service is something that it could be implemented, with different base technologies. I guess there are some current implementations. Here is the idea in my mind:

- A grid of normal computers

- Software to distribute a task in that grid

- API and Web Front End, to upload a grid application to that grid.

- API and Web interface, to launch a task

The grid application could be:

- An executable, as an .exe or .dll (if the grid is Windows-based), or .jar (in Java).

- A manifest, describing the characteristics of the application: needed parameters, expected output (file, database, XML, …)

To launch a task, the user provides the input parameters to the uploaded grid application. And, at the end of execution, he/she receives a notification, possibly, an URL with the calculated result.

My previous work on AjMessages, AjAgents is oriented to be used as a base to grid as a service.

The underlying grid could be expanded to borrow more power from other grids. That is, the same API the developers use, could be consumed by other grid applications.

The machines could be provided by data centers that currently offer cloud services or virtual machines.

There are many details to discuss, as security concerns. An alternative is to program in a sandbox, or in a dedicated language oriented to grid and parallel computing. Parallel computing is not the same as grid computing: in my opinion, grid computing is more flexible, a grid application could send messages to any node in the grid, at any time, instead, parallel computing are more oriented to algorithms like map reduce, and more synchronized message passing (like MPI implementations).

Angel “Java” Lopez
http://www.ajlopez.com/
http://twitter.com/ajlopez

November 4, 2008

Windows High Performance Computing (HPC) and Programming Resources

Filed under: .NET, Grid Computing, High Performance Computing — ajlopez @ 9:48 am

Since the last year, I was researching about distributed and grid computing. I found many useful resources and information (links at end of this post). One of the topics I found is Microsoft implementation of High Performance Computing (HPC). This post is a list of resources I think are relevant to the

First, the page of Windows HPC Server 2008:

http://www.microsoft.com/hpc

The first video to watch is the last week PDC 2008 Session:

HPC Session at last PDC
http://channel9.msdn.com/pdc2008/ES13/

It’s an excellent presentation, covering the new Windows HPC Server 2008, nodes, tasks and jobs, management tools, programming options, MPI and MPI.NET programming, computer go on HPC (beautiful idea), all the presentation deserves a dedicated post.

I like a short but interesting video, showing the management console, at:

http://channel9.msdn.com/shows/The+HPC+Show/Five-Minute-Intro-to-the-HPC-Server-2008-Management-Console/

THE blog to read is Windows HPC survival guide

A post as example: No scientist left behind with CRAY Supercomputer running Windows HPC Server 2008

They collected a set of resources at HPC Resource Kit

All the videos related to HPC at:

HPC | Tags | Channel 9

(interesting topics: WCF and HPC programming, HPC Basic Profile: open web services you can invoke from Java and other languages)

There is a community site dedicated to Windows HPC:

http://www.windowshpc.net/

with files, resources, source code and examples. 

Software to use

To start writing software for HPC, install Microsoft HPC Pack (Windows). I downloaded it from:

HPC Pack 2008 SDK download

and then, install MPI.NET Software

(I installed MPI.NET SDK.msi but I expanded MPI.NET-1.0.0.zip: it has better examples, with VS solutions)

You don’t need the HPC server to run these examples.

An excellent tutorial, implementing a fractal application using HPC 2008 at:

Learning Parallel Programming — from shared-memory multi-threading to distributed-memory multi-processing

Additional Links

If you want to explore the HPC programming possibilites, these are the topics to research:

HPC

http://www.hpccommunity.org/ HPC Community
http://www.hpcwire.com/ High Productivity Computing
http://www.ddj.com/hpc-high-performance-computing/
YouTube – An Overview of High Performance Computing and Challenges for the Future
http://en.wikipedia.org/wiki/High-performance_computing

MPI

MPI (Message Passing Interface) is supported by Windows HPC. There is a Microsoft implementation:

Microsoft MPI (Windows)

that can be invoked from C++.

There is a .NET implementation over Microsoft MPI:

MPI.NET: High-Performance C# Library for Message Passing

It has source code and examples.

(An old .NET wrapper at Codeplex project:  MPI .Net – Home)

I posted about other .NET implementation:

MPI Message Passing Interface in .NET

More about MPI

MPI 2.0 Report
MPI Tutorials
Microsoft Messaging Passing Interface – Wikipedia, the free encyclopedia
Pure Mpi.NET

Parallel Programming

Introduction to Parallel Computing very complete resource, (thanks to jgarcia)
Microsoft Innovation Day – November 5, 2006 they are presenting something related to DryadLINQ
Multithreading and Concurrency in .NET a very complete list of technologies available in .NET
http://www.microsoft.com/ccrdss Now, CCR/DSS as a separated package (formerly in Microsoft Robotics)
Adobe Press – 9780321603944 – Software Pipelines: The Key to Capitalizing on the Multi-core Revolution
Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism | Going Deep | Channel 9
Welcome to Hadoop!
Dryad – Home A Microsoft research project
YouTube – Dryad: A general-purpose distributed execution platform Presentation at Google Talks
Concurrency: What Every Dev Must Know About Multithreaded Apps
Overview of concurrency in .NET Framework 3.5 | Igor Ostrovsky Blogging
Parallel Programming with .NET
Parallel Computing Developer Center from Microsoft
Parallel Virtual Machine – Wikipedia, the free encyclopedia
http://msdn.microsoft.com/msdnmag/issues/07/10/PLINQ/default.aspx Parallel LINQ

Map Reduce

Writing An Hadoop MapReduce Program In Python
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Google Research Publication: MapReduce

Delicious

My delicious links about HPC, MPI, Parallel programming, Grid Computing, Map Reduce algorithms, CCR/DSS:

http://delicious.com/ajlopez/hpc
http://delicious.com/ajlopez/mpi
http://delicious.com/ajlopez/parallel
http://delicious.com/ajlopez/gridcomputing
http://delicious.com/ajlopez/mapreduce
http://delicious.com/ajlopez/ccr
http://delicious.com/ajlopez/dss

Computer Go is a fascinating topic:

http://delicious.com/ajlopez/computergo

Angel “Java” Lopez
http://www.ajlopez.com/en
http://twitter.com/ajlopez

June 22, 2008

Distributed Agents and Fractals using DSS/VPL

Last week I wrote the base of an application running distributed agents over DSS/VPL, interchanging arbitrary messages, with automatic load balancing. You can read the details at:

Distributed Agents using DSS/VPL

Today, I extended the example with a new project, Fractal:

You can download from my Skydrive.

It has two DSS Service Components, one is the Calculator: it calculates a sector of the Mandelbrot fractal. The other component is the Renderer, that has a form to control and show the results of the calculation. The message that has the info of the sector to calculate is:

public class SectorInfo : MessagePayload { public double RealMinimum { get; set; } public double ImgMinimum { get; set; } public double Delta { get; set; } public int FromX { get; set; } public int FromY { get; set; } public int Width { get; set; } public int Height { get; set; } public int MaxIterations { get; set; } public int MaxValue { get; set; } }

Another class is the message that returns the calculation:

public class Sector : MessagePayload { public int FromX { get; set; } public int FromY { get; set; } public int Width { get; set; } public int Height { get; set; } public int[] Values { get; set; } }

The calculator splits the sector to calculate, if it too big. It can calculate all in only one step, but it is interesting to split the sector to generate more messages:

 

private void Calculate(AgentMessage msg) { LogInfo("Entering Calculator with Action: " + msg.Action); SectorInfo sectorInfo = (SectorInfo) msg.Payload; LogInfo(String.Format("X {0} Y {1} Width {2} Height {3}", sectorInfo.FromX, sectorInfo.FromY, sectorInfo.Width, sectorInfo.Height)); if (sectorInfo.Width > 100 && sectorInfo.Height > 100) SplitSector(sectorInfo); else CalculateSector(sectorInfo); }

As in the previous example, you can run this from a VPL diagram, FractalVpl1:

There is only one renderer, and two calculators agents. If you launch this VPL program, a window appears. This is the initial window content (after pressing the Calculate button):

You can drag and move the mouse to select a new region, or use the buttons to zoom in and out. Reset backs to the initial position. You can resize the window and invoke Calculate again. New colors button changes the color palette in use.

There is another VPL program, FractalVpl2, that it can be used to run the same example in a distributed way. It has a diagram with two AgentHosts:

and two nodes:

You must compile the VPL example and run as distributed nodes, using the rundeployer.cmd (see my previous post for details).

These some of the drawings the system produces:

 

Enjoy!

Angel “Java” Lopez
http://www.ajlopez.com/en

June 21, 2008

Messages everywhere

Recently, I wrote a post about Message Passing Interface:

Message Passing Interface, CCR, DSS, and Pure MPI.NET

I used to pass message between agents in my example of a web crawler:

Distributed Agents using DSS/VPL

The passing of messages between components, agents, objects, is a feature that deserves more study. I guess we could write new kinds of applications, using one-way message passing, so we could abandon the call-a-method current way of doing things. Let’s explore first, some message passing applications (not only one-way style).

In the Wikipedia entry about Message Passing:

http://en.wikipedia.org/wiki/Message_passing

we can read Alan Kay opinion:

Alan Kay has suggested that he placed too much emphasis on objects themselves and not enough on the messages being sent between them. Message passing enables extreme late binding in systems.

If you develop systems with Smalltalk, Self or alikes, you’ll notice that the message is a first citizen, in many implementations, a full object, not only a way of call a method.

There is another place for use message passing. For 25 years, QNX operating systems uses the message passing paradigm to run a real time kernel.

I found this interview at Dr. Dobb’s to Sebastien Marineau-Mes, VP of engeneering at QNX:

Real Time OS in Today s World

Sebastien talks about the use of QNX in current market, and the challenge that multi-core machine could create on legacy code.

Remember: all the QNX kernel is based on message passing, although its messages are not one-way, and the passing is optimized to not incurr in loss of performance (more details, see QNX at Wikipedia). I see that many of these challenges and opportunities could be translated to the use, not only to multi-core, but to “multi-machines” in a grid. There many forces that are conspiring to bring these topics to current arena:

- We have a better understanding of agents, message passing and other ideas

- Normal hardware is cheap

- Each year, there are more application with higher level needs of scalabity (the user base will be the whole Internet in the near future, for non-trivial app)

- Many application must interact with other application in the same enterprise or in the net, and messages are an effective way to do that.

- In order to face that challenges, we must begin to abandon n-tier-only apps, to a cloud, grid or “something alike” schema.

I could imagine languages and technologies, based on message passing features. That is one of the reasons I’ve been involved exploring simples and crazy ideas with AjMessages, AjAgents, and Distributed Software Services during last months. I hope to write more about these subjects:

- Another example of distributed agents using DSS/VPL

- AjMessages ported to DSS (I had an implementation last year, but I never published it, I published the WCF only version)

- One-way messages implemented as primitives in AjTalk (Smalltalk-like interpreter)

- Deferred/Concurrent/LazyEvaluation/SomethingAlike implemented in AjBasic (using CCR?)

- Blog about a better-finished app running in a grid of DSS hosts (my team was working hard on this, last months).

Incidentally, you can read more about use cases applied in the real world using CCR/DDS, in Arvindra Sehmi’s post:

CCR/DSS Use Cases in the Enterprise

So many ideas…. only one life…. Should I begin to parallized myself? ajlopez-in-a-grid…. ;-)

Angel “Java” Lopez
http://www.ajlopez.com/en

June 15, 2008

Distributed Agents using DSS/VPL

In this post, I explore some ideas to implement distributed agents, leveraging the features from Decentrilized Software Services (DSS) and Visual Programming Language (VPL), included in Microsoft Robotics Developer Studio (I’m working with CTP 2.0 version). You can download the source code from my Skydrive:

AjDssAgents-0.1.zip

In my post:

Web Crawler example using DSS (Decentralized Software Services)

I wrote DSS service components orchestrated from VPL. In that example, there are a Dispatcher, a Resolver, a Downloader, and a Harvester components.

Let’s suppose we have many machines to run the web crawling process. We want to deploy and run MANY downloaders and harvesters, in a grid of machines, using automatic load balancing. The problem with VPL orchestration is that it doesn’t support load balancing out of the box. Then, I wrote an example where the components communicate each other, as agents, using special messages.

An agent, in this example, is a DSS service component, capable of receiving and processes appropiate incoming messages. It can send outgoing messages to other components. Instead of sending a message to one of the other components, an agent specify in the message the type of agent to which the message is forwarded.

Another specialized component, AgentHost, is in charge of receive such outgoing messages, and it forward them to a local or remote agent, according to its type.

The solution

The solution has three projects:

AjDssAgents contains the generic agent contract and types, and the concrete AgentHost implementation.

DecrementAgent and WebCrawler contain simple agents to use in the example. The web crawler code is similar to the implementation described in the post mentioned above.

The message

Agents interchanges messages, objects of type AgentMessage:

[DataContract] public class AgentMessage { [DataMember] public string From { get; set; } [DataMember] public string To { get; set; } [DataMember] public string Action { get; set; } [DataMember] public object Payload { get; set; } }

The From field indicates the origin of the message (I’m not using this field yet). The To field is the physical address (DSS address) of the target agents, or its logical type. In the current example, I’m using only logical types. Why a logical type? If a message has a To with value “WebCrawler/Dispatcher”, it will be forwarder to one agent that has that logical type.

How an agent knows what other agents are running and what are their logical types? It doesn’t. The component that keeps that information is the local singleton AgentHost. Each agent post their outgoing message to the local AgentHost, so, this components selects a target agent and forwards the message to it.

The Agents

Each agent is a DSS service component, with an address assigned when it is created. During the start of the agent, it sends to its local AgentHost a DSS message, indicating its address and its logical type (i.e., WebCrawler/Dispatcher). This is the way an AgentHost knows the agents that are running in its local DssHost process. See the starting code for the Dispatcher agent in WebCrawler example:

 

protected override void Start() { base.Start(); // Add service specific initialization here. _state.AgentType = "WebCrawler/Dispatcher"; host.NewNode newNode = new host.NewNode(new host.AgentInfo() { Address = this.ServiceInfo.Service, AgentType = _state.AgentType }); _hostPort.Post(newNode); }

The type of the agent is keep in its state.

This is a typical code, from Dispatcher agent, showing the treatment of an incoming message and the production of outgoing messages:

 

[ServiceHandler(ServiceHandlerBehavior.Concurrent)] public IEnumerator<ITask> PostMessageHandler(generic.PostMessage postMessage) { if (postMessage.Body.Action.Equals("Dispatch")) Dispatch(postMessage.Body); else if (postMessage.Body.Action.Equals("Resolve")) Resolve(postMessage.Body); postMessage.ResponsePort.Post(DefaultSubmitResponseType.Instance); yield break; } private void Dispatch(AgentMessage msg) { LogInfo("Entering Dispatcher with Action: " + msg.Action); LogInfo("URL: " + msg.Payload); DownloadTarget target = new DownloadTarget(); target.Uri = (string) msg.Payload; target.Depth = 1; AgentMessage postmsg = new AgentMessage() { Action = "Resolve", To = _state.AgentType, Payload = target }; host.PostMessage post = new host.PostMessage(postmsg); _hostPort.Post(post); } private void Resolve(AgentMessage msg) { LogInfo("Entering Dispatcher with Action: " + msg.Action); DownloadTarget downloadtarget = (DownloadTarget)msg.Payload; LogInfo("URL: " + downloadtarget.Uri + ", Depth: " + downloadtarget.Depth); DownloadTarget target = ProcessUrl(downloadtarget); if (target != null) { AgentMessage agentmsg = new AgentMessage() { To = "WebCrawler/Downloader", Action="Download", Payload = downloadtarget }; host.PostMessage postmsg = new host.PostMessage(agentmsg); _hostPort.Post(postmsg); } }

The AgentHost

There is one and only one AgentHost per running DssHost. The AgentHost receives new agent information (address and logical type), and keeps that information in its state.

It receives messages from local agents, and then it forward them to other local agents, or to a remote AgentHost. In the later case, it serialize the payload to a string, using XML serialization (a generic object cannot be send using the DSS generated proxies). This is the structure of a remote message:

 

[DataContract] public class RemoteAgentMessage { [DataMember] public string From { get; set; } [DataMember] public string To { get; set; } [DataMember] public string Action { get; set; } [DataMember] public string PayloadTypeName { get; set; } [DataMember] public string Payload { get; set; } }

Note that the remote message has an string Payload (XML serialization of the original payload), and its qualified type, so the target host can deserialize the payload to reconstruct the original object.

An AgentHost supports subscription. Other AgentHosts can subscribe to new agent informations. In general, if you have three machine, you start one AgentHost in each machine, and subcribe each agent host to the others. In this way, any AgentHost has all the information about the running agents, local and remote ones.

A distributed Web Crawler VPL example

The WebCrawlerVpl2 VPL example contains the diagram:

There are one Dispatcher, two Downloaders and two Harverters agents. The Dispatcher launch the initial URL to crawl, and keeps a list of downloaded URLs. A Downloader reads the content of each page to crawl. A Harvester examines that content and gets new links to download.

Note that they are two AgentHost, and they are related so each one sends new agent information to the other.

All these agents and components are distributed in two nodes:

Windows node will run on localhost:50000/50001, and Windows0 node uses localhost:50002/50003 as address. You can modify these settings, add more agents and nodes, without changing the code.

To run the distributed app, you must compile using the Build -> Compile as a Service option in VPL menu. Note: you must change the settings in VPL properties, now they are pointing to local directories in my machine:

VPL will show the compiling process:

After compiling the VPL example, go to MRDS DOS prompt, change to the bin directory, and launch the rundeployer.cmd:

I run the deployer in my local machine. If you plan to run the example in remote machines, you must start the deployer in each one.

Now, we are ready to run the web crawler. Select the Run -> Run on distributed nodes option, and the application will start. A dialog prompt for the URL to crawl. After entering a valid URL, the process begins to retrieve the pages in the site. You can see the first AgentHost state at:

http://localhost:50000/agenthost

There are three local agents and two remote ones.

In the other side, there is another AgentHost:

http://localhost:50002/agenthost0

See the difference: here are two local nodes, and three remote ones.

To see the advance of the process, point your explorer to

http://localhost:50000/console/output

Conclusions

With these ideas, we can implement grid-alike applications, running in many nodes. We lost the VPL orchestration, we can’t draw the road of the messages. But we gain load balancing and dynamic deploying. With some additional effort, we can write a controlling service, to starts and deploy the system in a new remote machine, even in the middle of a running process. The serialization of arbitrary objects is possible, but with custom serialization.

I could add subscription to messaging, in a future version. That is, an agent could receive some messages that are not for it, according to some subscription criteria. The suscriptions could be kept by the AgentHosts. When an AgentHost route an outcoming message, it could forward it to any interested local or remote agent.

Angel “Java” Lopez
http://www.ajlopez.com/en

April 18, 2008

Grid Computing Resources

Filed under: Grid Computing — ajlopez @ 9:21 am

Past weeks, I was researching about Grid Computing, searching for links, resources, papers, implementations. This post is a result of that research.

If you are new to Grid Computing, this is a good introduction

New to Grid Computing

Grid Computing according IBM

The anatomy of the grid

The physiology of the grid

 

Recommended reading list for grid developers

 

Grid Cafe Grid Projects in the world  

Grid Cafe The place for everybody to learn about the Grid  

What is “the Grid”?  

Grid @ CERN 

 

Industry status:

http://www.gridtoday.com/ (a bit crowded, I guess)

http://www.gridblog.com/

 

The full list of links I collected at:

http://del.icio.us/ajlopez/gridcomputing

 

Some products to see:

http://www.gridgain.com
http://www.digipede.net something more about digitped at http://dotnetjunkies.com/WebLog/stefandemetz/archive/2006/12/09/Free_Grid_Computing_software.aspx
http://www.gridgistics.net/

http://sourceforge.net/projects/ngrid/

 

I’ve written some articles about Grid Computing:

http://ajlopez.wordpress.com/category/grid-computing/

including some toy implementation of ideas to explore.

 

Angel “Java” Lopez

http://www.ajlopez.com/en

April 15, 2008

Message Passing Interface, CCR, DSS, and Pure MPI.NET

Filed under: .NET, Concurrency and Coordination Runtime, Grid Computing — ajlopez @ 11:40 am

Recently, during my reseach about grid computing, Microsoft Robotics Studio, DSS and CCR, I found a very interesting paper:

High Performance Multi-Paradigm Messaging Runtime Integrating Grids and Multicore Systems

The authors are Two of the authors are Xiaohong Qiu, Geoffrey C. Fox, Huapeng Yuan, Seung-Hee Bae, from Indiana University Bloomington, and George Chrysanthakopoulos, Henrik Frystyk Nielsen, from Microsoft Research. Nielsen and Chrysanthakopoulos are the “creators” of the Concurrency and Coordination Runtime (CCR) and Decentralized Software Services (DSS), pillar technologies of Microsoft Robotics Studio, that can be used beyond robotics. More on these technologies at:

http://www.microsoft.com/robotics

The paper abstract is:

eScience applications need to use distributed Grid environments where each component is an individual or cluster of multicore machines. These are expected to have 64-128 cores 5 years from now and need to support scalable parallelism. Users will want to compose heterogeneous components into single jobs and run seamlessly in both distributed fashion and on a future “Grid on a chip” with different subsets of cores supporting individual components. We support this with a simple programming model made up of two layers supporting traditional parallel and Grid programming paradigms (workflow) respectively. We examine for a parallel clustering application, the Concurrency and Coordination Runtime CCR from Microsoft as a multi-paradigm runtime that integrates the two layers. Our work uses managed code (C#) and for AMD and Intel processors shows around a factor of 5 better performance than Java. CCR has MPI pattern and dynamic threading latencies of a few microseconds that are competitive with the performance of standard MPI for C.

What is MPI? The acronym refers to Message Passing Interface. According to Wikipedia:

Message Passing Interface (MPI) is both a computer specification and is an implementation that allows many computers to communicate with one another. It is used in computer clusters.

There is a Microsoft Implementation:

Microsoft Message Passing Interface (MS MPI) is an implementation of the MPI2 specification by Microsoft for use in Windows Compute Cluster Server to interconnect and communicate (via messages) between High performance computing nodes. It is mostly compatible with the MPICH2 reference implementation, with some exceptions for job launch and management. MS MPI includes bindings for C and FORTRAN languages. It supports using the Microsoft Visual Studio for debugging purposes.

Oh! FORTRAN….. Those old good days! ;-). I remember working with Gregory Chaitin implementation of Lisp on FORTRAN, last century. But no back to the past, paraphrasing David Hilbert: Out of this paradise that Java and .NET have created nobody will expell us…. ;-). You can read the original cite at this interesting thread.

But I disgress. Back to topic.

The main sites about MPI are:

http://www.mpi-forum.org/
http://www.open-mpi.org/
http://www.lam-mpi.org/

I was thinking of implementing some MPI ideas with .NET or Java, when I visited this site:

http://www.purempi.net/

PureMpi.NET is a completely managed implementation of the message passing interface. The object-oriented API is simple, and easy to use for parallel programming. It has been developed based on the latest .NET technologies, including Windows Communication Foundation (WCF). This allows you to declaratively specify the binding and endpoint configuration for your environment, and performance needs. When using the SDK, a programmer will definitely see the MPI’ness of the interfaces come through, and will enjoy taking full advantage of .NET features – including generics, delegates, asynchronous results, exception handling, and extensibility points.

PureMpi.NET allows you to create high performance production quality parallel systems, with all the benefits of in .NET

It is an implementation that you can download and use, with VS2005 or VS2008. It uses generics to implements typed channels on MPI.

I downloaded the library, and installed it on a machine with Visual Studio 2008. The installation program added a new project template, Mpi.NET:

I created a project, that looks:

I modified the Program.cs to:

 

using System; using System.Collections.Generic; using System.Linq; using System.Text; using Mpi; namespace Mpi.NET1 { class Program { static void Main(string[] args) { ProcessorGroup.Process("MPIEnvironment", delegate(IDictionary<string, Comm> comms) { Comm comm = comms["MPI_COMM_WORLD"]; Console.WriteLine(comm.Rank); IAsyncResult result = comm.BeginSend<string>(0, "", "Rank: " + comm.Rank, TimeSpan.FromSeconds(30), null, null); if (comm.Rank == 0) { for (int i = 0; i < comm.Size; i++) { string receivedMsg = comm.Receive<string>(i, Constants.AnyTag, TimeSpan.FromSeconds(30)); Console.WriteLine(receivedMsg); } } comm.EndSend<string>(result); }); } } }

The ProcessGroup class is in charge of the processes to run. Note the support of delegates to specify the process. A MPI process receives a dictionary of Comm objects, channels to use to communicate with other MPI processes.

The ProcessGroup class has this structure (according to metadata info):

 

namespace Mpi { public class ProcessorGroup : IDisposable { public ProcessorGroup(Environment environment, Processor processor); public ProcessorGroup(string environment, Processor processor); public Environment Environment { get; } public ICollection<IAsyncResult> Results { get; } public void Dispose(); protected virtual void Dispose(bool disposing); public static void Process(string environmentConfigName, Processor processor); public void Start(); public void WaitForCompletion(); } }

The number and configuration of processors could be defined in the App.config file:

 

<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <section name="Mpi" type="Mpi.ConfigurationSection, Mpi"/> </configSections> <Mpi> <Environments> <Environment name="MPIEnvironment"> <Hosts> <Host comms="MPI_COMM_WORLD" client="MpiClient1" service="MpiService1" /> <Host comms="MPI_COMM_WORLD" client="MpiClient2" service="MpiService2"/> <Host comms="MPI_COMM_WORLD" client="MpiClient3" service="MpiService3"/> </Hosts> </Environment> </Environments> </Mpi> <system.serviceModel> <client> <endpoint address="net.tcp://localhost:8080/MpiService" binding="netTcpBinding" bindingConfiguration="" contract="Mpi.IMpiService" name="MpiClient1"> <identity> <userPrincipalName value="" /> </identity> </endpoint> <endpoint address="net.tcp://localhost:8081/MpiService" binding="netTcpBinding" bindingConfiguration="" contract="Mpi.IMpiService" name="MpiClient2"> <identity> <userPrincipalName value="" /> </identity> </endpoint> <endpoint address="net.tcp://localhost:8082/MpiService" binding="netTcpBinding" bindingConfiguration="" contract="Mpi.IMpiService" name="MpiClient3"> <identity> <userPrincipalName value="" /> </identity> </endpoint> </client> <behaviors> <serviceBehaviors> <behavior name="MpiServiceBehavior"> <serviceDebug httpHelpPageEnabled="false" httpsHelpPageEnabled="false" includeExceptionDetailInFaults="true" /> </behavior> </serviceBehaviors> </behaviors> <services> <service behaviorConfiguration="MpiServiceBehavior" name="MpiService1"> <endpoint address="net.tcp://localhost:8080/MpiService" binding="netTcpBinding" bindingConfiguration="" name="MpiServiceEndpoint" contract="Mpi.IMpiService" /> </service> <service behaviorConfiguration="MpiServiceBehavior" name="MpiService2"> <endpoint address="net.tcp://localhost:8081/MpiService" binding="netTcpBinding" bindingConfiguration="" name="MpiServiceEndpoint" contract="Mpi.IMpiService" /> </service> <service behaviorConfiguration="MpiServiceBehavior" name="MpiService3"> <endpoint address="net.tcp://localhost:8082/MpiService" binding="netTcpBinding" bindingConfiguration="" name="MpiServiceEndpoint" contract="Mpi.IMpiService" /> </service> </services> </system.serviceModel> <system.runtime.serialization> <dataContractSerializer> <declaredTypes> </declaredTypes> </dataContractSerializer> </system.runtime.serialization> </configuration>

Oh! They use <host..>…  This remembers me AjMessages;-)

Running the program produces:

Well, it’s not a great program, I must admit: but it’s my first MPI program. There are 3 “ranks”, according to config file above.

You’ll find many running examples include with Pure MPI.NET distribution. For me, it’s an interesting implementation of MPI ideas, with twists adapted from .NET world: generics and delegates are welcome.

¿Grid and MPI? Maybe. I must study the references mentioned in the cited paper. Althought the paper is dedicated to high performance issues, it has a good conceptual discussion of execution model, and relations with MPI, CCR and DSS.

Angel “Java” Lopez
http://www.ajlopez.com/en

January 29, 2008

Grid Computing in the browser

Filed under: .NET, C Sharp, Grid Computing — ajlopez @ 9:35 am

Daniel Vaughan has published a very interesting project (LGPL license) at Codeproject:

Legion: Build your own virtual super computer with Silverlight

Daniel is one of the 40 CodeProject MVPs for 2008. He is a software developer based in Canberra Australia, and Prague in the Czech Republic. (Thanks to Arvindra Sehmi, that sends me the article’s link).

It’s a project that uses Silverlight, the new Microsoft technology that runs in the browser, exposing .NET framework to JavaScript and other languages (in the 2.0 version).

According to the CodeProject article:

Legion is a grid computing framework that uses the Silverlight CLR to execute user definable tasks. It provides grid-wide thread-safe operations for web clients. Client performance metrics, such as bandwidth and processor speed, may be used to tailor jobs. Also includes a WPF Manager application.

Recently, I posted about Agents in a grid. Legion puts the agent code inside the browser, using Silverlight as a host environment. The server implements a JsonGridService that is accesible via web services from client side. It serializes the results using JSON (JavaScript Object Notation). One example method from that web services (JsonGridService.asmx.cs):

 

[WebMethod] [ScriptMethod(ResponseFormat = ResponseFormat.Json)] public TaskDescriptor StartNewJob(Agent agent) { try { TaskDescriptor descriptor = GridManager.GetDescriptor(agent); return descriptor; } catch (Exception ex) { HandleException(ex.Message, agent, ex); } return null; }

An instance of TaskDescriptor has a Type and a Job. The Type is a string describing the full name of the class and assembly to load and run in the client. The compiled assembly must reside in the ClientBin directory in the web server application. Job instance is a bit more complex: it’s a message, containing an arbitrary object to process in the client.

The code to run in the browser must inherits from SlaveTask. The client obtains a TaskDescriptor, creates an object according to the Type in TaskDescriptor, and runs it:

 

static void LoadAndRunTask() { TaskDescriptor info = gridService.StartNewJob(CreateAgent()); if (!info.Enabled) { return; } log.Debug("LoadAndRunTask() for Job Id " + info.Job.Id); Type type = Type.GetType(info.TypeName, true, true); if (slaveTask != null) { /* Detach last task. */ slaveTask.Complete -= taskBase_Complete; slaveTask.ProgressChanged -= taskBase_ProgressChanged; } slaveTask = (SlaveTask)Activator.CreateInstance(type); slaveTask.Initialise(info); OnTaskChanged(EventArgs.Empty); slaveTask.Complete += taskBase_Complete; slaveTask.ProgressChanged += taskBase_ProgressChanged; Thread newThread= new Thread( delegate { slaveTask.RunInternal(); }); newThread.Priority = ThreadPriority.Lowest; newThread.Start(); // ThreadPool.QueueUserWorkItem( // delegate // { // slaveTask.RunInternal(); // } // ); progressTimer.Enabled = true; }

 Note the use of Type.GetType to load the type. Silverligth uses the ClientBin directory on server side as one of the source for assembly loading. That’s the trick. Activator.CreateInstance creates the instance of the SlaveTask, and a new thread is launched to run the RunInternal method on that instance (for some reason, Vaughan commented the code that used the ThreadPool; I guess that he prefers to manage the priority in code).

In the article, more artifacts are described, as a WPF application, the Legion Manager, that allows monitoring of the grid. But now, let’s examine some concepts and alternatives.

Gridifyng the browser

The great idea, from Vaughan, is to use the browser as a host application for grid node tasks. This idea allows the use of any machine as a grid node (altought Vaughan’s approach needs the support of Silverlight). In the following paragraph, we’ll back to basic, to analyze the full picture implied in this kind of solution.

In a grid computing application, we must resolve the following problems (I’m simplifying the scenario landscape: we could have inter node communications too):

- How to program the task to run (languages, technologies…)

One of the options to the first problem, is to use a specialized language, dedicated to grid computing. Another alternative is to use a main stream language and technology, like Java or .NET. An another one: use scripting/dynamic languages, now supported in Java 6 and in .NET DLR.

- How to inject code to the grid node

If the host node application is a browser, we can program in Java (applets reloaded!), or .NET (now with Silverlight), or even in Flash. To inject the code, the technology at browser can remotely load .jars or compiled assemblies, or it can receive string with source code in dynamic languages, and run it as is.

- How to send task data to the grid node

Well, the data can be serialized, JSON is one method, XML is another one. But, now, another question emerges: who is in charge in sending the data? The server can send the data, but this implies that the client has some listening method. Or the client can poll the server, asking for new task data to process. I don’t know if an Java applet can use a ServerSocket, or if a Silverlight code can open a listening socket. One option to explore, is to have a WCF duplex channel in Silverlight client. Today, the sure option is: the client poll the server. That is the way Legion works.

- How to send back result from the node to a server

This question is an easier one. The data is send using JSON, XML, any serialization technology, to a web service in the server.

Conclusion

Each year, the browser is a more powerful application. Gridifying the browser, using the browser execution capabilities, it’s a promising idea, that deserves more exploration. Security issues, serialization, and the decision about push/pull model on message, are the points to research in more detail.

The next time you open the browser, ask its name. It could be “Legion”…. ;-)

Angel “Java” Lopez
http://www.ajlopez.com/en
http://www.msmvps.com/lopez

Older Posts »

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 66 other followers