Angel \”Java\” Lopez on Blog

May 25, 2008

Web Crawler example using DSS (Decentralized Software Services)

Some weeks ago, I wrote an example Scribble app using DSS. Today, I wrote another example: a web crawler application managed from Visual Programming Language (VPL). There is a additional VPL program that reads a web pages, using a Text To Speech control. You can download the example from my Skydrive:

DssWebCrawler2008May.zip

The solution

It has three projects: one DSS assembly, one class library with some utilities to parse an HTML content, and a class library to test the HTML parser. The test library uses NUnit: you can remove it from the solution, if you want, the library is only used for testing and it’s not needed in the final solution.

The DSS assembly is named DssWebCrawler. I defined five DSS service components:

Dispatcher: It receives an initial URL to download, and then dispatch it to the resolver.

Resolver: This service mantains a list of visited URLs, and check if the new URLs to download are valid and are in the same domain of the first page. It has a hardcoded max depth of 3 level of links to explore

Downloader: This service performs the download of the content of an URL. The content is returned as part of the response message.

Harvester: It examines the received content and harvest new URLs to examine and download. For each of these URLs it sends a notification to any service interested in that info.

Reader: It uses the simple HTML parse I wrote for this project. It can obtain the title of the page, or the body, discarding HTML tags and scripts.

The VPL Program

There is a VPL program named VPLWebCrawler. It consists of three diagrams. The first one defines the kickoff process. The first URL to download is entered in a dialog window:

 

 The second diagram defines the process of harvester notifications of new URLs:

The third diagram is a plus: it processes downloader notification of new content, extracting the title, and forwarding it to a Text to Speech component:

 

To launch the application, go to Run -> Start menu. A windows appears, prompting to enter the page URL to begin crawling:

Enter a valid URL, then, the crawling process begins:

After some seconds, the titles of the downloaded pages are posted to the Text to Speech service: you can hear the crawling process.

Reading Pages

In another VPL program, VPLWebReader, you can read the content of a web page, using the Text to Speech:

It is interesting that we are using the same service components than in the last example. But using VPL composition, we can use them for another purpose.

You can use it to read my experiments in “Anglish” (Angel’s English) at http://ajlopezen.zoomblog.com.

Conclusions

The service components were written to use with VPL orchestration. They don’t have partners, or direct connections with other service components in the project. This is a new way of programming: you must plan the message request and message response, to use in the communication to draw with VPL. The notification feature is a plus: you can use the same outgoing messages in different target components.

You can play a little more: put some of the components in another node/machine, using VPL new features.

I hope you’ll find this example useful. I had fun writting it.

Thanks to Fernando Tubio, for his initial ideas for a web crawler implementation.

Angel “Java” Lopez
http://www.ajlopez.com/en

19 Comments »

  1. Hi Angel,

    Nice little demo which I got working under the April CTP of MRDS and VS2008 after upgrading the VS solution and changing a number of the path references in the DssWebCrawler.csproj file. I also had to change the contract identifier (2008/04 -> 2008/05) to allow it to work alongside a previous version of the same Web Crawler contract I had on my machine. For good measure I also re-constructed the VPL diagrams from scratch because the VPL didn’t appear to ‘forget’ the old contract until I deleted and recreated the services. Most downloaders of this demo will never encounter these contract collision issues, but it is worth pointing them out.

    Thanks and good luck with your Argentina RAF presentation tomorrow!

    - Arvindra

    Comment by Arvindra Sehmi — May 26, 2008 @ 3:22 pm

  2. [...] In another weekend project down in sunny Buenos Aires the prolific Angel “Java” Lopez has (re)written a web crawler sample app we had from another project so it can be used in the VPL in MRDS. Check it out here. [...]

    Pingback by Sehmi-Conscious Thoughts : Web Crawler in VPL/DSS — May 26, 2008 @ 4:33 pm

  3. [...] Web Crawler example using DSS (Decentralized Software Services) [...]

    Pingback by Presentando Microsoft Robotics en el Regional Architect Forum 2008 - Angel "Java" Lopez — May 27, 2008 @ 7:43 am

  4. [...] as a service. For more information about Dss and Ccr you can visit the blog of my friend Angel J Lopez, that he has written two applications using Dss one of them managed from VPL (Visual Program [...]

    Pingback by Juan Manuel Moyano : About Microsoft Robotics — May 27, 2008 @ 12:35 pm

  5. [...] Web Crawler example using DSS (Decentralized Software Services) [...]

    Pingback by Distributed Agents using DSS/VPL « Angel “Java” Lopez on Blog — June 15, 2008 @ 10:36 pm

  6. [...] with ASP.NET to handle async requests showcasing iterators with yield keyword.  Another being a web crawler by Angle ‘Java’ Lopez which also leverages the DSS (decentralised software services) [...]

    Pingback by High performance Pub/Sub .NET libraries « Fluent.Interface — June 17, 2008 @ 9:54 am

  7. [...] Web Crawler example using DSS (Decentralized Software Services)Ejemplo de Web Crawler usando DSS (Decentralized Software Services) [...]

    Pingback by Agentes Distribuidos usando DSS/VPL - Angel "Java" Lopez — June 20, 2008 @ 10:44 am

  8. [...] more information about Dss and Ccr you can visit the blog of my friend Angel J Lopez, that he has written two applications using Dss one of them managed from VPL (Visual Program [...]

    Pingback by » About Microsoft Robotics Juan Manuel Moyano’s Blog — June 23, 2008 @ 6:45 pm

  9. How can I use DSS service of Microsoft Robotics Studio in VC++?

    Comment by Dina — August 22, 2008 @ 11:00 am

  10. [...] Web Crawler example using DSS (Decentralized Software Services) [...]

    Pingback by Microsoft Robotics in enterprise applications « Angel “Java” Lopez on Blog — September 24, 2008 @ 8:28 am

  11. need a web crawler? contact me, we have a lot to discuss!

    Comment by PAN — January 18, 2009 @ 4:56 pm

  12. [...] Distributed Agents using DSS/VPL Web Crawler example using DSS (Decentralized Software Services) [...]

    Pingback by Web Crawler using Agents and AjSharp « Angel “Java” Lopez on Blog — February 22, 2010 @ 10:30 am

  13. I take pleasure in, result in I found exactly what I
    was taking a look for. You have ended my 4 day long hunt!
    God Bless you man. Have a great day. Bye

    Comment by Donny — April 30, 2013 @ 12:45 am

  14. What’s up Dear, are you truly visiting this site on a regular basis, if so then you will definitely get good knowledge.

    Comment by office depot store locator — May 1, 2013 @ 7:39 pm

  15. Hi, after reading this awesome article i am as well happy to share my experience
    here with colleagues.

    Comment by saving money — May 5, 2013 @ 4:28 am

  16. I am really pleased to read this weblog posts which carries tons of useful data, thanks
    for providing these information.

    Comment by step by step to — July 2, 2013 @ 10:19 pm

  17. Wow that was strange. I just wrote an extremely long comment but
    after I clicked submit my comment didn’t appear. Grrrr…
    well I’m not writing all that over again.

    Anyways, just wanted to say excellent blog!

    Comment by how to cure the gout — October 11, 2013 @ 4:42 am

  18. Its like you read my mind! You seem to know so much approximately this, such as you
    wrote the guide in it or something. I feel that you could do with some p.c.
    to pressure the message home a bit, however other than
    that, this is fantastic blog. A fantastic read.
    I’ll definitely be back.

    Comment by Mona — November 30, 2013 @ 3:44 pm

  19. Different allocation schemes ffor radio resource (RR) management have been defined in order to multiplex several MSs on the same
    physical channel. Wylder is a part time freelancer and amateur author.
    Owners will appreciate the low cost of tthe IBM Info
    - Print 1601 laser toner ( offered online.

    Comment by webrtc demo — July 19, 2014 @ 3:59 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 65 other followers

%d bloggers like this: