Angel \”Java\” Lopez on Blog

November 6, 2010

Web Crawler using the new AjAgents

Filed under: .NET, AjAgents, Open Source Projects — ajlopez @ 11:13 am

I wrote a new implementation of my project AjAgents, described in:

AjAgents: a new implementation

Now, I want to describe in this post, how I used it to write (again) a Web Crawler. Previous versions:

Web Crawler Using AjAgents and AjSharp
Distributed Web Crawler using AjMessages

This new version of the web crawler use the new interface:

    public interface IAgent<T>
    {
        void Post(Action<T> action);
    }

Instead of write a type that inherits or implements an agent, the agent is a wrapper applied to a simple type.

The new implementation of AjAgents, and a web crawler example (Project AjAgents.WebCrawler), can be downloaded from:

http://code.google.com/p/ajcodekatas/source/browse/#svn/trunk/AjAgents

In one of my previous implementation, I had:

Now, I dropped dispatcher. I have three classes:

The resolver receives a link to download. It decides to process or not (maybe, the link was already processed; or it is a link that is not in the original address to process). If the link is accepted, the info is sent to the downloader. This object downloads the content, and then send it to a harvester (it could be forward to other objects, too). The harvester examines the content, and extracts new links to process, each one is sent to the resolver.

The above classes are .NET types. But the references they have are of type Agent<Downloader>, Agent<Harvester>, Agent<Resolver>. The code to build  an initial group of interconnected objects/agents is in Program.cs:

       static Agent<Resolver> MakeAgent()
        {
            Resolver resolver = new Resolver();
            Harvester harvester = new Harvester();
            Downloader downloader = new Downloader();
            Agent<Resolver> aresolver = new Agent<Resolver>(resolver);
            Agent<Harvester> aharvester = new Agent<Harvester>(harvester);
            Agent<Downloader> adownloader = new Agent<Downloader>(downloader);
            resolver.Downloader = adownloader;
            harvester.Resolver = aresolver;
            downloader.Harvester = aharvester;
            return aresolver;
        }

 

AjAgents.WebCrawler console application accepts a parameter, with the initial link:

AjAgents.WebCrawler http://ajlopez.zoomblog.com

Output:

This is a proof of concept example. I would improve it:

- Agent<IDownloader>, load balancing its work with many Agent<Downloader> instances.

- Agent<T> instantiated in different machines, implementing a distributed web crawler.

Keep tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

October 12, 2010

AjAgents: a new implementation

Filed under: .NET, AjAgents, Open Source Projects — ajlopez @ 9:51 am

Some time ago, I wrote about my project AjAgents, describing it, implementing genetic algorithms and other mini demos. The project is base in the send of messages to “agents” (I could named them actors, or something else, never mind). The agents received the messages and process them one by one. Each instance of an agent don’t need to manage concurrency: the messages are queued in an internal queue by each agent (I could reimplement this feature using a queue for many agents, if needed).

But my previous implementations relied on reflection, or CCR library by Microsoft (see my previous posts). Now, I take another way: an agent is a wrapper around a normal object. The source code can be download from my AjCodeKatas Google Code project, under trunk/AjAgents folder.

The key interface is:

    public interface IAgent<T>
    {
        void Post(Action<T> action);
    }

The new idea: to use a generic class. T is the type of our “classical” type (no agent), and Post is the method to invoke an action over the inner object: an action that is a “routine” receiving a T instance as parameter. So, if the inner object has a method

inner.DoSomething(parameter);

it can be invoked as a message from the agent as:

agent.Post(x => x.DoSomething(1));

There is a base agent implementation:

    public class Agent<T> : IAgent<T>
    {
        private T instance;
        private AgentQueue<T> queue;
        private bool running;
        public Agent(T instance)
        {
            this.instance = instance;
        }
        public void Post(Action<T> action)
        {
            if (!this.running)
                this.Start();
            this.queue.Enqueue(action);
        }
        private void Start()
        {
            lock (this)
            {
                if (this.running)
                    return;
                this.queue = new AgentQueue<T>();
                Thread thread = new Thread(new ThreadStart(this.Execute));
                thread.IsBackground = true;
                thread.Start();
                this.running = true;
            }
        }
//...
    }

Note that Post method enqueues the action in an internal queue. The process of that queue is in the method .Execute (not shown). The queue is a blocking one (I know, there is one in .NET 4.0, but my code runs under 3.x, and under 2.x versions, I guess):

    public class AgentQueue<T>
    {
        private Queue<Action<T>> queue = new Queue<Action<T>>();
        private int maxsize;
        public AgentQueue()
            : this(100)
        {
        }
        public AgentQueue(int maxsize)
        {
            if (maxsize <= 0)
                throw new InvalidOperationException("AgentQueue needs a positive maxsize");
            this.maxsize = maxsize;
        }
        public void Enqueue(Action<T> action)
        {
            lock (this)
            {
                while (this.queue.Count >= this.maxsize)
                    Monitor.Wait(this);
                this.queue.Enqueue(action);
                Monitor.PulseAll(this);
            }
        }
        public Action<T> Dequeue()
        {
            lock (this)
            {
                while (this.queue.Count == 0)
                    Monitor.Wait(this);
                Action<T> action = this.queue.Dequeue();
                Monitor.PulseAll(this);
                return action;
            }
        }
    }

I applied TDD for the development of all these classes, one example test:

        [TestMethod]
        public void InvokeIncrement()
        {
            ManualResetEvent handle = new ManualResetEvent(false);
            Counter counter = new Counter();
            Agent<Counter> agent = new Agent<Counter>(counter);
            agent.Post(c => { c.Increment(); handle.Set(); });
            handle.WaitOne();
            Assert.AreEqual(1, counter.Count);
        }

Most of the above code was derived (err… copy and paste :-) from my own work on AjSharp (channels, queues, agents… ).

I have a proof of concept demo using this new AjAgents implementation (my “classical” web crawler). But that is a topic for a future post.

Keep tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

June 21, 2008

Messages everywhere

Recently, I wrote a post about Message Passing Interface:

Message Passing Interface, CCR, DSS, and Pure MPI.NET

I used to pass message between agents in my example of a web crawler:

Distributed Agents using DSS/VPL

The passing of messages between components, agents, objects, is a feature that deserves more study. I guess we could write new kinds of applications, using one-way message passing, so we could abandon the call-a-method current way of doing things. Let’s explore first, some message passing applications (not only one-way style).

In the Wikipedia entry about Message Passing:

http://en.wikipedia.org/wiki/Message_passing

we can read Alan Kay opinion:

Alan Kay has suggested that he placed too much emphasis on objects themselves and not enough on the messages being sent between them. Message passing enables extreme late binding in systems.

If you develop systems with Smalltalk, Self or alikes, you’ll notice that the message is a first citizen, in many implementations, a full object, not only a way of call a method.

There is another place for use message passing. For 25 years, QNX operating systems uses the message passing paradigm to run a real time kernel.

I found this interview at Dr. Dobb’s to Sebastien Marineau-Mes, VP of engeneering at QNX:

Real Time OS in Today s World

Sebastien talks about the use of QNX in current market, and the challenge that multi-core machine could create on legacy code.

Remember: all the QNX kernel is based on message passing, although its messages are not one-way, and the passing is optimized to not incurr in loss of performance (more details, see QNX at Wikipedia). I see that many of these challenges and opportunities could be translated to the use, not only to multi-core, but to “multi-machines” in a grid. There many forces that are conspiring to bring these topics to current arena:

- We have a better understanding of agents, message passing and other ideas

- Normal hardware is cheap

- Each year, there are more application with higher level needs of scalabity (the user base will be the whole Internet in the near future, for non-trivial app)

- Many application must interact with other application in the same enterprise or in the net, and messages are an effective way to do that.

- In order to face that challenges, we must begin to abandon n-tier-only apps, to a cloud, grid or “something alike” schema.

I could imagine languages and technologies, based on message passing features. That is one of the reasons I’ve been involved exploring simples and crazy ideas with AjMessages, AjAgents, and Distributed Software Services during last months. I hope to write more about these subjects:

- Another example of distributed agents using DSS/VPL

- AjMessages ported to DSS (I had an implementation last year, but I never published it, I published the WCF only version)

- One-way messages implemented as primitives in AjTalk (Smalltalk-like interpreter)

- Deferred/Concurrent/LazyEvaluation/SomethingAlike implemented in AjBasic (using CCR?)

- Blog about a better-finished app running in a grid of DSS hosts (my team was working hard on this, last months).

Incidentally, you can read more about use cases applied in the real world using CCR/DDS, in Arvindra Sehmi’s post:

CCR/DSS Use Cases in the Enterprise

So many ideas…. only one life…. Should I begin to parallized myself? ajlopez-in-a-grid…. ;-)

Angel “Java” Lopez
http://www.ajlopez.com/en

April 10, 2008

Genetic Algorithms with AjAgents and Concurrency and Coordination Runtime (CCR)

Filed under: .NET, AjAgents, Concurrency and Coordination Runtime — ajlopez @ 10:19 am

In previous post, I wrote about my C# example AjAgents:

Agents using Concurrency and Coordination Runtime (CCR)

and about some ideas to explore:

Agents in a Grid

Now, I extended my example AjAgentsCCR-0.1.zip implemented two new projects. One is a console project AjAgents.Genetic01, and the other is WinForms onw, AjAgents.WinGenetic01:

The basic idea is to run a genetic algorithm using agents, that send messages using CCR ports. Then, each agents process its incomming messages, and post outcomming message to other agents, in a “parallel” way. The problem I modeled is the classical Travelling Salesman Problem. According to Wikipedia:

If a salesman starts at point A, and if the distances between every pair of points are known, what is the shortest route which visits all points and returns to point A?

The Windows form is simple:

The Genoma class represents the travel (a list of Points and a distance value):

class Genoma { public List<Point> travel = new List<Point>(); public int value; } class Point { public int x; public int y; }

But the interesting point, is the lunch of the run. First, the agents are created and connected:

best = new BestSoFar(); evaluator = new Evaluator(); mutator = new Mutator(); collector = new Collector(); evaluator.Collector = collector; collector.Mutator = mutator; collector.Evaluator = evaluator; collector.BestSoFar = best; mutator.Evaluator = evaluator;

Then, a initial genoma is created:

Genoma genoma = new Genoma(); Random rnd = new Random(); for (int k = 0; k<40; k++) { Point pt = new Point(); pt.x = rnd.Next(12); pt.y = rnd.Next(12); genoma.travel.Add(pt); } genoma.value = evaluator.CalculateValue(genoma);

The BestSoFar agent can raise an event. The form is listening to that event, to refresh the best travel drawing. The program sends the genoma to the mutator agent, many times:

best.NewBest += BestGenoma; for (int k = 0; k < 60; k++) { mutator.Post("Mutate", genoma); }

Note the use of the Post method. Every agent implements that method, that invokes a method in the agent, but using a CCR port to invoke the target method. So, the method is executed in a “parallel way”, possibly in a thread from the CCR thread pool.

A simple diagram to explain the interactions between the agents:

The mutator sends each genoma to the Evaluator. This agent calculate the value assigned to that genoma. Then, it sends the genoma to a Collector. This is a new idea, for this example: instead of having a population class, the collector receives genomas, and then when it has n or more, evaluates the best ones, and post them to a Mutator, so the process continues. There is no population concept, or, at least, not the standard population found in other implementations. The best genoma detected by the Collector is informed to the BestSoFar agent. A typical method, from Collector class:

void ProcessList(List<Genoma> list) { list.Sort(comparer); bestsofar.Post("Process",list[0]); evaluator.Post("Evaluate",list[0]); evaluator.Post("Evaluate",list[1]); for (int k = 0; k < 4; k++) { mutator.Post("Mutate",list[k]); mutator.Post("Mutate",list[k]); } }

I must improve the genetic algorithm. Now, it’s using only mutation, and it has no crossover operator. Then, its result is not optimal: it can reach a local minimum, but it could not be the best solution. But the basis of this example, is to show how AjAgents and CCR can be used in an “proof of concept” example.

Conclusions

One can see each agent as a “little cell” or “little organism” that reacts to incoming messages, and sends outcomming messages to other agents. Each agent is loosly coupled with the others. In these examples, the agents were created by code, but it’s clear that they could be created and related using dependency injection frameworks, as Spring.NET or the new Unity Application Block from Microsoft.

Instead of using only CCR ports, I could write the agents as DSS Service components (see Microsoft Robotics Studio for more information), so the algorithm could be “gridified”, and each agents is located in a transparent way to others.

But this is another story….;-)

Incidentally, yesterday this post announced the new version of Microsoft Robotics Developer Studio 2008:

http://blogs.msdn.com/msroboticsstudio/archive/2008/04/09/microsoft-robotics-developer-studio-2008-ctp-april-available.aspx

The third new feature, mentioned in that post, could be a great feature for these ideas about “miniagents”:

Support for creating applications that run on multiple DSS nodes using Visual Programming Language and DSS Manifest Editor. This makes it much simpler to create applications that run across nodes, either on the same machine or across the network. When an application containing multiple nodes is to be started, VPL creates individual deploy packages for each node and fires them up across the network.

Ooooohhh!…. ;-) ;-)

Angel “Java” Lopez
http://www.ajlopez.com/

January 3, 2008

Agents in a Grid

Filed under: AjAgents, AjMessages, Grid Computing — ajlopez @ 9:31 am

Recently, I was involved in the development of an application that runs on a grid of diskless machines. I was an exciting project, and now, after the rush of shipping it, I want to write down some ideas to explore.

In this post, I use the term “agent”, that is a bit fuzzy. I won’t define precisely the concept, I want to use as a base term to refine in a future. By now, let explore some basic (naive?) ideas to better understanding of the problems related to agents and grid applications. Yes, some ideas presented here could be viewed as naive, but I feel that this is a needed exercise, to grasp the key concepts and problems to be solved in these kind of applications. At the end of this article, I’ll present some implementation suggestions.

I described some applications to run in a grid in my previous post:

Grid Computing Programming

Agent concepts

In this post, agent is a piece of software, with behaviour and state. It runs inside an agent host, a host application that provides the base services so an agent can “live” and run. An agent will be represented as in this drawing:

Agent communication patterns

There is a lot of literature about agents communication, from simple techniques to elaborated ones as contracts, bids, and more. We can have agents with beliefs, desires and intention. In this post, an agent is simpler: only an object with state, that can send messages. It can receive stimula from other agents and from the host environment.

The simplest communication pattern is composed of an agent sending a message to another agent:

Some notes:

- The sender agent knows the receiver agent. That is, some form of identity must be implemented. The message is not send to anyone: the sender intention is to send the message to a determined receiver.

- The message carries data, and it must be understood by the receiver, possibly invoking one of its methods.

- The sender doen’t wait for a response. It’s not interested in a return message or data.

- The agents could reside in different physical machines in the grid, in a transparent way.

During its life an agent can send many messages to differents target agents, each one is a known agent for the sender:

Sometimes, the sender agent will receive a message from the receiver agents, notifying some work done, or informing some processed data. Depending on the application, the response message could carry information to identify the original message:

In this case, the sender must be prepared to receive the response in an asynchronous way. This could be an interesting problem to solve: an agent can send many messages, and it would be better if it can still run without receiving all the responses in time. For example, in an board game application, one agent could delegate some tree branch exploring to other agents, and, after some time, it should be able to take a decision, without all the information.

Clouds in the grid sky

Now, another case: an agent could be interested in send a message, without forward it to a determined receptor. Instead, it could be send to a cloud of agents, so anyone interested in the message could have the opportunity of processing it.

(Ok, I’m a bit odd creative in my drawings… ;-)

This feature could be implemented using these strategies:

- An agent could send a message to a blackboard system.

- An agent could send a message to host application, indicating a topic (like in a message queue), so any subscribed agent will receive the message. An variant: only some subscriptors receive the message, depending on application settings.

- An agent could send a message targeting some service provider definition. A service provider is an agent that declares, at the beginning of its life, its capabilities and service that is capable to provide. 

As example, we could take a chess/board game application (I prefer the game of go). An agent in such application can send a message reclaiming to solve some attack position. In a blackboard system, it will publish the request in the blackboard. In a topic system, it deposits the message on topic “attackers”. In a service provider strategy, it sends the message to one or many AttackService provider.

As in the previous pattern, an agent can receive an asynchronous response, now from “the cloud”:

 

Agent Duplication

An agent has behaviour and state. If the agent must fork its jobs, it could take the way of duplicating itself:

It’s a bit strange action, but it could be useful, depending on application at hand.

Agents and the Grid

Each agent can be hosted in a grid node. The message sending mechanism must be able to send a remote message to another node. The host application keeps a list of agents by identity, and knows which one is local or remote. An acid test: a grid application with agents must be able to run all in one machine, or in grid, without changing the code or algorithm.

 

As in another grid implementations (discussed in my previous post), a central server is in charge of distributing the tasks between the grid nodes. That server could receive tasks from outside application or even, from other grid. “Grid as a Service” is a new buzzword to apply.

Moving agent

An agent can initiate its activities in one node. But at point, it can decide to continue the work in another node (the host application in that node could take that decision, too). Then, its state should be send from one node to another.

Note that the agent behaviour (perhaps implemented in compiled code) doesn’t travel.

Injecting behaviour

The behaviour of each kind of agent could be expressed in compiled code (.jar files in Java, assemblies in .NET). Other alternatives are possible: the behaviour could be specified in a specific scripting agent language.

If the behaviour is expressed in compiled form, one or more servers have the responsability of storing and distributing that components:

Implementation ideas

Most of these ideas can be implemented in any appropiate language/technology, like Java and .NET, that supports multi threading, remote invocation, message serialization, etc…

Last months I was working on my pet projects AjMessages and AjAgents, more info at previous posts:

AjMessages: a message processor

Agents using Concurrency and Coordination Runtime (CCR)

(I have a new project, named AjGrid, not published yet). For this post, I guess that AjAgents could be an appropiate implementation. AjMessages now has remote support, but it’s more oriented to a pipeline processing: it’s difficult to implement in such disposition some of the above ideas.

I must add some features to AjAgents (actually, AjAgents is a local, proof of concept implementation):

- Configuration: Load and creation of agents in runtime, from a configuration file, or on the fly, in the middle of a run.

- Remote assembly: So a grid node can be injected with new agents.

- Agent identification: To identify an agent in a unique way (an UID could be a solution).

- Message transport: Windows Communication Foundation is a candidate.

But another approach is to take Decentralized System Services (DSS) from Microsoft Robotics Studio. An agent can be implemented as a DSS Service, running in a DSS Host. The intermachine communication would be implemented using DSSP as protocol.

Stay tuned, to see some code in the coming weeks.

Angel “Java” Lopez
http://www.ajlopez.com/en

October 17, 2007

Agents using Concurrency and Coordination Runtime (CCR)

Filed under: .NET, AjAgents, Concurrency and Coordination Runtime — Tags: , — ajlopez @ 11:05 am

Yesterday, I was writing a minimal example, to explore some ideas using the Concurrency and Coordination Runtime, the Microsoft library introduced with its Robotics Studio. More about CCR at:

Concurrent Affairs: Concurrency and Coordination Runtime

The idea is to implement passing of message between components, in an asynchronous way. The example code can be download from AjAgentsCCR-0.1.zip

The Agent

In the example, the interface to implement for each component, loosely named agent, is:

public delegate void ReturnHandler<T>(T t);  

public interface IAgent
{
    void Post(Message msg);
    void Post(string action,
        params object[] parameters);
    void Post<T>(string action,
        ReturnHandler<T> handler,
        params object[] parameters);
}

The components can receive message (more about that class below). But it can be invoked using an action (like a method name) and a variable list of parameters.

The third Post method to implements has a delegate, a method to run when the post of the message returns a value, in the future. You can send a message to another agent, without waiting for a response, or you can write a delegate to process the returned value.

The Message

It is a class with

action: a name to identified the action associated with the message

body: the message payload

response port: where to send a return value, if needed

The code:

public class Message
{
    private string action;  

    public string Action
    {
        get { return action; }
        set { action = value; }
    }  

    private object body;  

    public object Body
    {
        get { return body; }
        set { body = value; }
    }  

    private Port<object> returnport;  

    internal Port<object> ReturnPort
    {
        get { return returnport; }
        set { returnport = value; }
    }  

    public Message()
    {
    }  

    public Message(string action)
    {
        this.action = action;
    }  

    public Message(string action, object body)
    {
        this.action = action;
        this.body = body;
    }  

    public Message(string action, object[] bodyvalues)
    {
        this.action = action;
        if (bodyvalues != null && bodyvalues.Length == 1)
            this.body = bodyvalues[0];
        else
            this.body = bodyvalues;
    }
}

Note that the message can have an array on body values, that are put in the body as an object.

The implementation

In this initial example, the IAgent interface has an implementation based on invoke, via reflection, a method in the agent, that has a name equals to the action in the message, and that can receive the parameters that are the body of the message.

The message, when received, is put on a CCR port, so it could be handled asynchronously. There is a Receive arbiter to attend the incoming messages, that routes them to corresponding method in the current object.

I think this is an interesting solution: you can write a normal class, with normal methods, and make it an agent, inheriting from Agent class.

You can send a message with

agent1.Post("Decrement",20);

You can “query” some value with:

agent1.Post<int>("GetCounter", PrintCounter);

Future improvements

I must work on:

  • Write an Enviroment, or Host class, where the agents “live”. The Host will be resposible of maintain a list of agents by name.
  • Write a configuration file where I can specify what agents to load and start in the Environment, creating and relating the agents using something like dependency injection. I envision hosts of agents that can load remote agent code from assemblies in other servers in the network.
  • Use these agents in a distributed way, with Windows Communication Foundation (WCF), or Decentralized System Services (DSS). In this latter case, a DSS service component could be an agent or it could be an entry point to an agent Environment/Host
  • Implement some Subscribe/Notify pattern between agents.

Angel “Java” Lopez
http://www.ajlopez.com/

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 57 other followers