Angel \”Java\” Lopez on Blog

December 3, 2007

Grid Computing Programming

Filed under: Grid Computing, Software Development — ajlopez @ 10:53 am

In previous posts, I described my projects AjMessages and AjAgents, giving some source code to play with:

AjMessages: a message processor

Agents using Concurrency and Coordination Runtime (CCR)

Thanks my experience in the venerable (and old) project AjServer (see Hacia el AjServer (Spanish)), I could write the core of AjMessage in a day, from scratch. It was a funny day, of hard coding. Since then, I spent many hours adapting the code to use a custom message, instead of a WCF message, and writing the capability of use pluggable input and output channels, so it can achieve transport independence, to some degree.

One of the features of AjMessage is to communicate many program instances, that could be executing in different machines. They can send message with dynamic configuration, so we can distribute tasks to those machines at runtime. I must still resolve the assembly remote distribution, some ideas to explore at the end of this post. The same action (action is the minimal logical step to execute) can be attended by different machines. AjAgents project points to use distributed agents, but for now, it’s only a local application. I envision that any agent could be running in any machine, in a transparent way. I think that the use of agents, or arbitrary tasks, could be a more flexible way of distribution, in contrast to message passing as in AjMessages. An agent is capable of lunch many subtasks, assign them to other agents in an asynchronous way; it can send partial results to partner agents, and it can dialog and negotiate with many other agents in a more organized way.

Let explore some ideas for AjMessages project. It could be distributed using a server machine that controls the other instances of the system:

These scenario has similiarities with the concept of grid computing. I want to enumerate some of these scenarios, how we can use the system to use it in a grid of nodes and servers. “Grid Computing”, as many technological “buzzword”, has a wide scope, but let try to define it.

According to the excellent article from IBM people:

New to Grid Computing

Grid Computing can use a pool of server, storage systems, and networs, as a unique big system, in such way, that we can manage all those resources in the execution of a task. For the user, or for the application, the grid appears as a whole system.

In the case of AjMessage, following Fabriq ideas, these behaviour can be obtained because the servers execute one or more applications, distributing each action, in a transparent way.

The grid computing concept allows us to use more processing power, without the need of expensive hardware or sofware, using load balancing and task distribution on common machines. The scalability is reached via scale out: more machine in the grid, better results. Depending on the system that organize the distribution, we can add more nodes, obtain more throughput, without touching the application logic.

I imagine a set of machines, composing a grid, and exposing this set to use by users. I think that Grid as a Service is a term that can be coined to describe that arrangement.

Applications

Back to the main topic: what use cases, scenarios, could we imagine to use a grid?

There is a tentative list:

- Genetic Algorithm Processing: A problem could not have a clear solution. Its complexity could grow in exponential form, and then, it can be intractable using conventional approaches. Using genetic algorithms, the program can test many partial solutions, and using change and selection, it can discover better solutions. This work can be parallelized, being an ideal task to run in a grid. I’m collecting some candidates at:

http://del.icio.us/ajlopez/geneticalgorithms

I’m impressed by the results of http://www.darwinathome.org, although I think most of those results are not emergent, but they are consecuences of the selected fitness function.

- Tree Search: In many artificial intelligence problems is needed to explore branches in a search tree. One case is the analysis of play move in a game. It can be extended to business decisions and planning. A grid can help in the decision calculus of the next move in a computer go program, one of the hardest problems in artificial intelligence game programming.

- Web Crawler: the task that explores a site, gets its page contents, analyzes them, detects links, and continues the exploration to other linked pages, is one that can be distributed in many nodes in a grid. While a node gets a page content, other generates tasks for other nodes, as indexing the retrieved content, and retrieve new pages in process.

- Batch Processing: A network of nodes can process a great amount of information, if this info can be splitted in parts. The job could be to trasform data from a database table, to log analysis, to statistic generation. If the input is divisible, each part can be send to different nodes. An example: a node could process January data, meanwhile other ones process the other months. ETL processing in general is another example.

- Email List Distribution: A typical case. A company that offers email list distribution needs to receive, process and resend an incoming message to a list of recipients. The email could need some personalization process. Then, the incoming email could be derived to one or more nodes in the grid, to further process.

- Message Processing: In the actual SOA world, an application receives tons of XML message. Each one needs control, transformation, and content routing. In a grid system, each message is derived to a node. When more throughput is needed, more nodes are added to the grid.

- Workflow Execution: As in the previous example, this is more a scale out distributed task, rather than a grid specific one. A workflow can be designed, and each step can be assigned to a node or set of nodes. For example, in a SaaS application, the steps to make a new tenant provisioning can be executed in a grid. .

- Map Reduce: It’s a programming model to process big data sets. A Map function is specified to process an input key/value pair, commonly many pairs. There is other Reduce function to apply to all intermediate key/value pairs that share the same key. A function Map can receive a document to process, generates word/document pair, and the Recude function take those pairs with the same word, to make a list of documents that contains that word. For a more detailed explanation, see the Google Labs paper about MapReduce: Simplified Data Processing on Large Clusters.

- Biology and Genetic Software Applications: I’m interested in science in general, and in biology in particular. I guessed that there are applications where a grid can be applied, and I think I’m right (recently, I reviewed the course material of Introducción a la Biología Molecular para Programadores given by Sebastian Bassi and his partners). It’s interesting to found that there are implementations like BLAST that can be ported to a grid. See one such approach in the case studies of Digipede.

- Rendering and Image Processing: Many of the rendering, lightning, making of realistic images can be run in parallel.

- Animation Creation: Even if an image cannot be processed in parallel, sometimes we can lunch different tasks, one for each image, in order to produce an animation. A grid can be used to scale out this heavy processing.

- Media Processing: Video compression, key frame detection, scene change detection, can be partitioned to be processed with a grid.

- Simulations: A wide subject. There are systems where it’s not clear what output would be produced given an input. A set of input data set must be processed. Then, each input data set could be given to a node or nodes in the grid. With more nodes, the simulation can produce more results.

Software and languages

Point of view change: a grid can be exposed using web service. An interface can be defined to send tasks to the grid, tasks that can be written by grid programmers using an special SDK or framework. What kind of software can be send to a grid? Some options:

- Complete Assemblies, invoking some (predeterminated or not) methods.

- Scripting Language Programs, running in a “sandbox” interpreter, in order to control the security and health of the node.

- Agents, consisting in assemblies or code to run in an agent virtual machine.

- Grid Domain Specific Languages, designed to take advantage of the grid computing concept.

Such grid can be offered as a service to other service (even other grids). The concept of Grid as a Service emerges. The rent of its power, service level control, health monitoring, and more, are applications to consider in the future for these scenarios.

Links and resources

I’ll write in more detail about grid computing. For now, you can read the mentioned IBM article:

New to Grid Computing

There is an interesting open source implementation in Java:

GridGain

(The drawing at the beginning of this post was “inspired” in one from GridGain; but in my version, the nodes can communicate each other, using the location independence of each action in AjMessage).

I’ve collected links about grid computing in my del.icio.us account (del.icio.us is addictive):

http://del.icio.us/ajlopez/gridcomputing

For this post, I’ve pay attention to

http://www.gridgain.com
http://www.digipede.net
http://www.gridgistics.net/

Digipede implementation is very interesing. They distributed assemblies. There is a server that receive tasks, distributes them into the grid nodes, where the Digipede agents are running. The system keeps a database with the launched pending, and terminated tasks. It expose a control web interface. A user applicaction can communicate with the Digipede server, using a dedicated web service.

GridGain has a feature: “gridifying” a Java method, using an annotation: interesting idea to explore.

Some crazy ideas

I would need medication, but there is a list of crazy ideas to implement:

- Code Generation in a Grid: To generate code, using my project AjGenesis or anything else, is used to execute a list of steps. Not all of these steps must be executed in order: most of them could be launched in parallel, ideally in a grid. A code generation engine can consists of agents, mini expert systems, specialized on completing the model, making transformation, taking decisions, and more, in order to generate a system. A grid can host all these pieces.

- Computer Go in a Grid: I mentioned above, related to tree search. There is some work, gridifying GNUGo. For me, it’s a super interesing topic. Again, a community of agents, running distributed in a grid, can achieve more results than a common approach. The game of Go is not like chess: no game program could beat a professional human, yet. It merits more creative aproximations to the problem. More about the Computer Go at:

http://del.icio.us/ajlopez/computergo
Computer Go

- “Gridified” Programming Language: I have ideas to extends AjBasic with CCR or something similar, or to implement something more oriented to functional programming, where some operator (list processing, others) could be easily gridifiable. It would be interesting to write such language: its programs could run in a sole machine, but, transparently, could be distributed to multiple nodes on a grid. AjG# is coming… ;-)

Conclusion

As you see, grid computing is a great topic. I want to thanks here to Gabriel Szlechtman: he suggested many of the enumerated scenarios.

Any other applications, implementations, to comment?

Angel “Java” Lopez
http://www.ajlopez.com/en

Recipes with AjGenesis

Filed under: .NET, AjGenesis, Code Generation — ajlopez @ 10:51 am

Past weeks, I was giving speechs about my open source code generation project, AjGenesis, at two cities of my country: Tandil and Corrientes, thanks to the organization of the Microsoft User Group of Argentina. Usually, I show examples of my project in action; following the “dog fooding” principle, I used it every week. The current version under development is

AjGenesis 0.5

More about the capabilites of the project at:

Application Generation using AjGenesis
Code Generation with AjGenesis: A Hello World application

I’m experimenting adding the capability of invoking one or more entry windows, from a task, to take parameters from the user during the code generation process.

Now, all is produced from model or models, free designed by users, serialized on XML files. But it could be convenient, depending of the code to generate, to take new parameters from the user, in the middle of the code generation. Examples of parameters: target directory and file names, connection strings, namespaces and packages names, etc…

Following this idea, there is a new experimental project in the solution, named AjGenesis.Recipes,  that use the new project AjGenesis.UI. This UI project has a window form dedicated to accept data from the user.

An screenshot:

At the beginning, there is no information in the tree at left. Go to File | Open… and select an example file with recipe definitions: Recipes.xml (it is in the source code directory of the project).

This file contents:

 

<?xml version="1.0" encoding="utf-8" ?> <Recipes Name="Recipes"> <Node Name="Hello World"> <Recipe Name="Hello World VB.NET Module" Task="Tasks/HelloWorldTask.ajg" Documentation="Tasks/HelloWorldTask.html"/> </Node> <Node Name="VB.NET"> <Recipe Name="Entities" Task="Tasks/Task1.ajg" Documentation="Tasks/Task1.html"/> <Recipe Name="Services"/> </Node> <Node Name="CSharp"> <Recipe Name="Entities"/> <Recipe Name="Services"/> </Node> </Recipes>

Each  node can contains nodes and recipes. A recipe has a name, that is displayed in the tree, a task, AjBasic code to execute, and a file that documents the recipe in HTML.

If you double click over the Hello World VB.NET Module recipe, a window appears. You can enter the file to use as a model, and the name of the .vb file to generate:

The new feature is implemented in the task file, that uses the project AjGenesis.UI and its window form. Let see Tasks\HelloWorldTask.ajg:

form = UIManager.CreateInputForm() form.Text = "Class Generator Wizard" form.AddFileField("Model","Model to use","") form.AddFileField("FileName","File to Generate","") n = form.ShowDialog() if n = 1 then ModelFile = form.GetField("Model") FileName = form.GetField("FileName") ModelManager.LoadModel(ModelFile,Environment) TransformerManager.Transform("Templates/HelloWorldVb.tpl",FileName,Environment) end if

UIManager is a new manager object in this AjGenesis version. Note how a window and its field can be defined and processed. The field values can then retrieved, to use in the generation process.

This is work in progress, but it looks enough interesting to me to write this post.

The model is still used. Any recipe needs one or more model to use as a base to produce the code or text to generate. With AjGenesis Recipes the model could be given in a interactive way. The time will show if these feature is useful to the users, instead of the use of NAnt tasks.

A recipe could be written, without using a model: the paremeters could be enough  to feed the process.

Using a more elaborate interface, we could define our own files of tasks, recipes and templates, invoked from thise program. We can write a recipe to generate all the DAOs in a systems. Or a recipe that generates a set of entities..

Next steps:

  • Estabilization of code (I must fix some directory and file name treatment, their relative position)
  • Improve exception management
  • Capture the output of tasks, and show it in one window during the process 
  • Complete AjGenesis.UI to manage a more complete set of controls (now, it only has a text field, and a directory or file chooser) and a list of windows, like a real wizard
  • Allow the use of user defined controls and windows
  • After writing some recipes, explore if it is useful or interesting or feasible, to integrate this stuff into Visual Studio (or Eclipse, why not), as GAT, or like a  Visual Studio Integration Package.

Suggestions, comments, ideas, welcome!

Angel “Java” Lopez
http://www.ajlopez.com/en

A post a day keeps the doctor away

Filed under: Software Development — ajlopez @ 10:49 am

During last half year, I was writing dozens of post, inspired by a mindset: a post a day keeps the doctor way.

You can see the complete list of posts at my spanish non technical blog:

A post a day keeps the doctor away

I took inspiration from Gianpalo Carraro’s post:

1 day 1 blog: as simple as that

Gianpaolo is one of the Microsoft software architect: he writes a lot about Software as a Service. He describes himself as a dot com refugee: he survived the Internet bubble at the beginning of the century.

Some weeks ago, I met Gianpaolo, here, at Buenos Aires. He was visiting one of my customers, Southworks. He is a kindly man, that answered every question asked by a bunch of software developers (including me), during a private presentation.

His career includes works at Bell Laboratories, Internet companies, and now, the giant of Redmont.

Back to topic. Not all my post were relevant: but I think that writing them trained my communication skills, including this “non-standard” English dialect, known as Anglish, that is, “Angel’s English”…. ;-)

I began to write more in English, so, stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com/en

My Week

Filed under: Uncategorized — ajlopez @ 10:47 am

These are my plans for this week:

Monday, Tuesday: I’m going to give a course on ASP.NET at Club de Programadores.

Tuesday: Philosophy with Laura Klein, more info at: Más cursos de filosofia

Friday morning: I’m going to give a speech about F#, at Microsoft Argentina.

Thursday night: Giving a course about .NET 2.x

Friday night: A speech about Spring Framework, for Java.

Week end: Go activities, Torneo Abierto Buenos Aires (more details at Asociación Argentina de Go).

Red hours: Revamping Fabriq, at Southworks.

Angel “Java” Lopez
http://www.ajlopez.com/en

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 66 other followers