Some weeks ago, I wrote an example Scribble app using DSS. Today, I wrote another example: a web crawler application managed from Visual Programming Language (VPL). There is a additional VPL program that reads a web pages, using a Text To Speech control. You can download the example from my Skydrive:
The solution
It has three projects: one DSS assembly, one class library with some utilities to parse an HTML content, and a class library to test the HTML parser. The test library uses NUnit: you can remove it from the solution, if you want, the library is only used for testing and it’s not needed in the final solution.
The DSS assembly is named DssWebCrawler. I defined five DSS service components:
Dispatcher: It receives an initial URL to download, and then dispatch it to the resolver.
Resolver: This service mantains a list of visited URLs, and check if the new URLs to download are valid and are in the same domain of the first page. It has a hardcoded max depth of 3 level of links to explore
Downloader: This service performs the download of the content of an URL. The content is returned as part of the response message.
Harvester: It examines the received content and harvest new URLs to examine and download. For each of these URLs it sends a notification to any service interested in that info.
Reader: It uses the simple HTML parse I wrote for this project. It can obtain the title of the page, or the body, discarding HTML tags and scripts.
The VPL Program
There is a VPL program named VPLWebCrawler. It consists of three diagrams. The first one defines the kickoff process. The first URL to download is entered in a dialog window:
The second diagram defines the process of harvester notifications of new URLs:
The third diagram is a plus: it processes downloader notification of new content, extracting the title, and forwarding it to a Text to Speech component:
To launch the application, go to Run -> Start menu. A windows appears, prompting to enter the page URL to begin crawling:
Enter a valid URL, then, the crawling process begins:
After some seconds, the titles of the downloaded pages are posted to the Text to Speech service: you can hear the crawling process.
Reading Pages
In another VPL program, VPLWebReader, you can read the content of a web page, using the Text to Speech:
It is interesting that we are using the same service components than in the last example. But using VPL composition, we can use them for another purpose.
You can use it to read my experiments in “Anglish” (Angel’s English) at http://ajlopezen.zoomblog.com.
Conclusions
The service components were written to use with VPL orchestration. They don’t have partners, or direct connections with other service components in the project. This is a new way of programming: you must plan the message request and message response, to use in the communication to draw with VPL. The notification feature is a plus: you can use the same outgoing messages in different target components.
You can play a little more: put some of the components in another node/machine, using VPL new features.
I hope you’ll find this example useful. I had fun writting it.
Thanks to Fernando Tubio, for his initial ideas for a web crawler implementation.
Angel “Java” Lopez
http://www.ajlopez.com/en