Distributed Applications and Node.js

This moth, I gave a talk about distributed applications and Node.js, thanks to Node.js Argentina,  at Microsoft User Group de Argentina, Buenos Aires.

You can see and download my presentation from my Skydrive

My ideas was to show some concepts and some experiments I did with Node.js, porting previous work in other technologies. And to show code with concrete examples.

First, I explained what is a distributed application in the context of my talk. A distributed applications is an application that executes in many computers, linked by a network. They interact to solve a problem. See http://en.wikipedia.org/wiki/Distributed_computing

It is beyond the classic client/server. We could have many computers to crawler a site, or to resolve the rendering of a 3d movie. The computers that participates in the run don’t need to have the same software or program. Maybe, some of them were busy on some step of the problem. Neither they should have the same platform or programming language.

Then, I describde the features of Node.js plus JavaScript that help us to build such applications. In other talks I said “JavaScript is butter”, to put emphasis on its flexibility. And now, in this talk, I have the opportunity to show a topic where Node.js shines: its network support, with built-in modules or via its ecosystem and the combination of modules (see Unix philosophy and Node.js by @izs)

I mentioned an interview to Alan Kay (see his Dynabook)

I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages

Until real software engineering is developed, the next best practice is to develop with a dynamic system that has extreme late binding in all aspects

The big idea is “messaging”

It is interesting see Kay mentioning biologic cells: our execution nodes could be taken as cells, that forms an organism, our distributed application. As Kay says, the big topic is “messaging”, the send of message btw the cells. Distributed applications give us the change of forget the concept of function/subroutine, where we wait for a result, and even forget a callback: an element of our distributed application will send a message, without expecting an answer.

Also I mentioned Richard Feynman (post with links), see

http://boards.straightdope.com/sdmb/showthread.php?t=159936

Feynman was once asked by a Caltech faculty member to explain why spin 1/2 particles obey Fermi-Dirac statistics. He gauged his audience perfectly and said "I’ll prepare a freshman lecture on it." But a few days later he returned and said, "You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it."

Adapting his phrase, I could say “If we cannot explain it, then we don’t understand it” as a personal motivation to give talks. Feynman studied by his own the quantum physics of his time, see my Spanish post Feyman por Dyson. I could take his experience as a light justification to experiment with code, instead of taking directly what it is already build.

Then, I presented my experiments with Node.js for distributed applications. But why Node.js? In JavaScript, a message could be a simple object serializable to JSON (an object without cycles, that is, a tree). And the transport of the message could be supported by a myriad of modules, like require(‘net’) or require(‘http’) or others.

Questions we have to answer when we build a distributed application:

What elements of our application create messages?

What elements consume messages?

A message, will be consumed by only one element? or it can be processed by many interested elements?

The machines and programs to execute are determined from start? or they can change? An example: I would like to add to my distributed application forty machine at night. It can be do it? We need such dynamic feature?

A message generated by an element in a machine is consumed locally? or it can travel to other machine? what part decides it? the programmer? the supervisor program?

And given a message that should reach another machine, how does it work? what transport is used?

If a message should be remotely processed, how to choose the target machine? by program? by load balancing?

These questions were be answered in different ways, depending on each experiment or example

A repeated experiment is a web crawler, logically composed by:

To retrieve the page of a website, first send an initial URL to a Resolver element. This element keeps a list of already visited pages. If the URL points to a not processed page, that URL is sent to a Downloader element. This logic element could be implemented in many machines. Its responsibility is to retrieve the content of a page. Then, it send the retrieved data to a Harvester element, to extract the links in that page. Those links are sent to Resolver. The Resolver examines if each link belong to the initial website and if the page was already processed.

To start with my experiments, I wrote a module ObjectStream (it was not the first I wrote, but it emerges from refactor):

https://github.com/ajlopez/ObjectStream
https://github.com/ajlopez/ObjectStream/tree/master/samples/broadcast

It allows me to send a JavaScript object, serializable to JSON to an Object Stream, an stream that receives objects. This stream sends the serialized object to other stream. There is an Object Stream that reads lines from a stream and deserialized objects. In this way, I can use a socket stream to send JavaScript objects between machines. In Object Stream repo there is a broadcast sample I show in the talk.

Based on ObjectStream, I wrote:

https://github.com/ajlopez/SimpleMessages
https://github.com/ajlopez/SimpleMessages/tree/master/samples/Broadcast

It allows the communication of two processes, using bidirectional channels to interchange  messages. You can launch a server and many clients. After the connection is established the message interchange is symmetric: any of both parts (client or server) can send a message to the other part. It is the basis to write distributed applications.

But sometimes, we need more than send a simple message. I.e. we need to call an object that is running in other machine. Then, we need a Remote Procedure Call (RPC). I wrote a module:

https://github.com/ajlopez/SimpleRemote

Based on SimpleMessages, the client and the server can expose an object, and the other party can invoke it as it were a local object. The return value is given asynchronously, the programmer should provide a callback to process the call answer.

There are many alternatives in Node.js module ecosystem. See:

http://stackoverflow.com/questions/5010814/whats-the-best-way-to-make-one-node-js-server-talk-to-another
http://stackoverflow.com/questions/7986088/rpc-and-messagequeues-in-node-js
https://github.com/substack/dnode
https://github.com/Flotype/now
https://code.google.com/p/protobuf-for-node/
https://code.google.com/p/protobuf/
https://github.com/Frans-Willem/IPCNode

Dnode has clients for different languages. You can check something more agnostic: http://en.wikipedia.org/wiki/JSON-RPC

I had a use case where I needed not only to send a message from machine A to machine B, but also to broadcast the message. Then, I wrote:

https://github.com/ajlopez/SimpleBroadcast
https://github.com/ajlopez/SimpleBroadcast/tree/master/samples/Broadcast

In this way, a client can send a message to a server, and then, the server can distribute the message to the other connected clients. The received message is not sent to the original client, only to the rest of clients. You could have an “start” of servers: each server attends its clients and acts as another client to the other servers.

This arrange allows scalability adding servers to the star. Also, it has a concept, the repeater server (see code and tests). In this module I had the collaboration of Fernando Lores.

In other use cases, instead of sending message, I needed the message processors to consume them. I could use a queue written in other language. But I want to stick with Node.js, so I wrote:

https://github.com/ajlopez/SimpleQueue
https://github.com/ajlopez/SimpleQueue/tree/master/samples/DistributedProducerConsumer

At the beginning, I implemented a in-memory queue, totally local. And then, I exposed it to other machines using SimpleRemote.

Other use case I had was: I need to publish messages, but I don’t know what elements are interested in them. It is like Node.js events: an element emits an event, but it does not care who’s subscribed to them. Thinking in this case, I wrote:

https://github.com/ajlopez/SimpleBus
https://github.com/ajlopez/SimpleBus/tree/master/samples/Market

The subscribers and the publishers can reside in different machines.

In the talk, I show an example, Market, where the subscription to messages can be described with a “query by example”, that is, given a message with the field values we are interested on. Or we can give a predicate to indicate which message we want to receive. The predicate can be defined in one machine, and it can be sent to other machine.

Most of the use cases I tackled were motivated by implementations in other technologies. The work most influential in my experiments was Fabriq, by @asehmi:

See Remember Fabriq.

Based on that project, I wrote AjFabriqNode:

https://github.com/ajlopez/AjFabriqNode
https://github.com/ajlopez/AjFabriqNode/tree/master/samples/WebCrawler

AjFabriqNode is based on having many executing nodes, each one declares a tree of message processor. Based on the message content, a message processor is chosen. Each node can be connected to other one in the network, and they can interchange which message processors are available in each node. Then, we can run a distributed web crawler where in one node we could have a Resolver, and in the others we could run Downloaders and Harvesters.

In other scenarios, you prefer to send the message directly to an specific element. Based on the actor implementation of Akka/Scala, see:

http://doc.akka.io/docs/akka/snapshot/general/actor-systems.html
http://doc.akka.io/docs/akka/snapshot/general/actors.html
http://doc.akka.io/docs/akka/snapshot/general/addressing.html

I implemented something simpler:

https://github.com/ajlopez/SimpleActors
https://github.com/ajlopez/SimpleActors/tree/master/samples/DistributedWebCrawler

In Akka, a message is sent to an actor, identified by its address.

There is a Java project, Storm:

http://storm-project.net/

Where you can build a topology with logic elements, that receive and emit tuples (messages). There are tuples producers, the Spouts, and tuple processors, the Bolts, that can emit more tuples, too. Each logic element could be executed in many physical machine/processes. So, we can have a web crawler topology with Resolver, Downloader, and Harvester. But with 20 Downloaders in 10 machines, and 5 Harvesters in the same machines or others.

So, I wrote something on those lines, for Node.js:

https://github.com/ajlopez/SimpleStorm
https://github.com/ajlopez/SimpleStorm/tree/master/samples/DistributedWebCrawler

After some experiments, I saw the needs of such applications. Many times, I want to seed code to many machines in the network, and interchange messages. Then, I wrote:

https://github.com/ajlopez/MultiNodes
https://github.com/ajlopez/MultiNodes/tree/master/samples/collatz

A new node that is added to the network receives dynamically the code to execute, and the info related to the other nodes.

Two final examples:

An implementation of distributed genetic algorithm:

https://github.com/ajlopez/SimpleGA
https://github.com/ajlopez/SimpleGA/tree/master/samples/tspdistr

And a fractal that can be calculated at the browser, or using a server Node.js+Express, alone or helped by many additional worker nodes:

https://github.com/ajlopez/NodeSamples/tree/master/Fractal
https://github.com/ajlopez/NodeSamples/tree/master/Fractal/html
https://github.com/ajlopez/NodeSamples/tree/master/Fractal/server
https://github.com/ajlopez/NodeSamples/tree/master/Fractal/distributed

At the end of the talk, I mentioned many reasons to research on distributed applications: it’s an interesting topic, with practice uses (big data, distributed calculation, parallel calculation, etc…). But I put emphasis on distributed application and artificial intelligence. Our nervous systems runs as a distributed application of neurons (remember Alan Kay talking about biological cells with message)

Other topic is to have dynamic languages that can be easily distributed in the worker nodes. In fact, I liked JavaScript on Node.js because it so flexible. See:

Programming Languages, Distributed Computing And Artificial Intelligence

Lenguajes de Programación, Computación Distribuida, Inteligencia Artificial
Experimentos Distribuidos

More distributed app experiments are coming.

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

15 thoughts on “Distributed Applications and Node.js

    1. ajlopez Post author

      Research and Development, by now. But possible extensions:

      – A concrete genetic distributed algorithm
      – Distributed game evaluation (Go? Backgammon?)

      Reply
  1. Pingback: New Month’s Resolutions: June 2013 | Angel "Java" Lopez on Blog

  2. Pingback: Distributed Applications And Node.Js | MVPs de LATAM

  3. stream family guy online

    If your man is one of them, then you certainly must do
    all of that you are able to to make sure he does not slip via your fingers.
    For many people, it means climbing the corporate ladder
    until they achieve the office with the big wide desk (your
    window office), and then for others it means carrying out a congrats and being renowned for it.
    If he has certain plans he can surely share it with you.

    Reply
  4. binary

    While additional tools may possibly attempt to hide the marketplace value, binary makes use of real-time files to make sure that the
    industry is as transparent as you can. Using this
    method, a person may stay clear of any kind of
    personal financial crisis via diversifying the investing in a very
    systematic along with successful fashion. Just the actual merest move up or maybe down,
    and will also be inside the income, receiving a predetermined payment for your
    expenditure, regardless how moderate that go can be. You could
    have to place way up a large amount of cash termed the “margin” in order to control some of the binary that you just spend money on.

    Reply
  5. Pingback: Me siguen pidiendo cursos - Angel "Java" Lopez

  6. Pingback: Thinking a Bot (1) | Angel "Java" Lopez on Blog

  7. Pingback: Smalltalk, JavaScript, NodeJs, C#, and Tutti Li Fiocci | Angel "Java" Lopez on Blog

  8. Pingback: Smalltalk, JavaScript, NodeJs, C#, y Tutti Li Fiocci - Angel "Java" Lopez

  9. Pamala

    Sweet blog! I found it while surfing around on Yahoo News.
    Do you have any tips on how to get listed in Yahoo News? I’ve been trying for a while but I never seem to get
    there! Thank you

    Reply
  10. perilla

    good, but not enough.I want the designer don’t care the size of computer. so, all this kind of work should be done automatic, such as distribute file system, the code dynamic deployment .

    Reply

Leave a reply to Smalltalk, JavaScript, NodeJs, C#, and Tutti Li Fiocci | Angel "Java" Lopez on Blog Cancel reply