Angel \”Java\” Lopez on Blog

November 6, 2010

Web Crawler using the new AjAgents

Filed under: .NET, AjAgents, Open Source Projects — ajlopez @ 11:13 am

I wrote a new implementation of my project AjAgents, described in:

AjAgents: a new implementation

Now, I want to describe in this post, how I used it to write (again) a Web Crawler. Previous versions:

Web Crawler Using AjAgents and AjSharp
Distributed Web Crawler using AjMessages

This new version of the web crawler use the new interface:

    public interface IAgent<T>
    {
        void Post(Action<T> action);
    }

Instead of write a type that inherits or implements an agent, the agent is a wrapper applied to a simple type.

The new implementation of AjAgents, and a web crawler example (Project AjAgents.WebCrawler), can be downloaded from:

http://code.google.com/p/ajcodekatas/source/browse/#svn/trunk/AjAgents

In one of my previous implementation, I had:

Now, I dropped dispatcher. I have three classes:

The resolver receives a link to download. It decides to process or not (maybe, the link was already processed; or it is a link that is not in the original address to process). If the link is accepted, the info is sent to the downloader. This object downloads the content, and then send it to a harvester (it could be forward to other objects, too). The harvester examines the content, and extracts new links to process, each one is sent to the resolver.

The above classes are .NET types. But the references they have are of type Agent<Downloader>, Agent<Harvester>, Agent<Resolver>. The code to build  an initial group of interconnected objects/agents is in Program.cs:

       static Agent<Resolver> MakeAgent()
        {
            Resolver resolver = new Resolver();
            Harvester harvester = new Harvester();
            Downloader downloader = new Downloader();
            Agent<Resolver> aresolver = new Agent<Resolver>(resolver);
            Agent<Harvester> aharvester = new Agent<Harvester>(harvester);
            Agent<Downloader> adownloader = new Agent<Downloader>(downloader);
            resolver.Downloader = adownloader;
            harvester.Resolver = aresolver;
            downloader.Harvester = aharvester;
            return aresolver;
        }

 

AjAgents.WebCrawler console application accepts a parameter, with the initial link:

AjAgents.WebCrawler http://ajlopez.zoomblog.com

Output:

This is a proof of concept example. I would improve it:

- Agent<IDownloader>, load balancing its work with many Agent<Downloader> instances.

- Agent<T> instantiated in different machines, implementing a distributed web crawler.

Keep tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

21 Comments »

  1. whoah this blog is wonderful i really like studying your posts.
    Stay up the good work! You recognize, many persons are
    hunting round for this information, you can help them greatly.

    Comment by identity theft insurance — July 18, 2013 @ 5:29 pm

  2. Hello my loved one! I want to say that this article is amazing, nice written and come with approximately all
    vital infos. I would like to look more posts like this .

    Comment by fifa ultimate team coin generator — July 20, 2013 @ 5:34 pm

  3. It is actually a nice and useful piece of information.
    I am satisfied that you just shared this useful info with us.
    Please keep us up to date like this. Thank you for
    sharing.

    Comment by downloadbumpn download gta 5 — July 28, 2013 @ 3:39 am

  4. There are traditions you’ll be able to follow or break, means that as much as you. When you find which colours look best upon you and use them consistently so as to you peer better, feel great and also have more confidence. Whichever fabric, length, color or style dress the caretaker with the bride chooses, for any the sunshine wedding, she should dress formal and go with dazzle.

    Comment by futuristic evening dresses by aidan mattox — July 28, 2013 @ 6:30 am

  5. Hey! I know this is sort of off-topic however I needed to ask.
    Does managing a well-established blog like yours require a massive amount work?

    I’m completely new to running a blog but I do write in my diary on a daily basis. I’d like to start a blog so I can share my
    personal experience and thoughts online. Please let me know if
    you have any kind of suggestions or tips for brand new aspiring bloggers.

    Appreciate it!

    Comment by quinceanera dresses big — July 31, 2013 @ 6:47 pm

  6. I do trust all of the ideas you’ve presented for your post. They’re really convincing and
    will certainly work. Nonetheless, the posts are too quick
    for newbies. Could you please prolong them a bit from next time?
    Thank you for the post.

    Comment by evening dresses guelph — July 31, 2013 @ 6:58 pm

  7. Attractive portion of content. I just stumbled upon your website and in accession capital to assert that I acquire actually
    loved account your blog posts. Any way I’ll be subscribing on your augment or even I success you get admission to constantly rapidly.

    Comment by sweater dresses buy online — August 1, 2013 @ 3:43 am

  8. Have you ever thought about publishing an ebook or guest authoring on
    other blogs? I have a blog based upon on the same
    ideas you discuss and would really like to have you share some stories/information.
    I know my visitors would appreciate your work. If you’re even remotely interested, feel free to shoot me an email.

    Comment by sweater dresses for toddlers — August 1, 2013 @ 3:59 am

  9. I used to be recommended this website by way of my cousin.
    I am not sure whether or not this put up is written
    by him as nobody else know such specific approximately my trouble.
    You’re incredible! Thanks!

    Comment by party wear jeans — August 1, 2013 @ 3:37 pm

  10. Howdy terrific blog! Does running a blog like this take a massive amount work?

    I’ve no understanding of computer programming however I was hoping to start my own blog soon. Anyhow, should you have any recommendations or tips for new blog owners please share. I understand this is off subject nevertheless I just had to ask. Appreciate it!

    Comment by sweater dress after 40 — August 2, 2013 @ 8:53 am

  11. If you want to increase your experience only keep visiting this
    web page and be updated with the most up-to-date information posted
    here.

    Comment by party dresses from next — August 4, 2013 @ 1:09 am

  12. I got this web page from my friend who shared with me
    about this site and now this time I am browsing this web page and reading very informative articles at this place.

    Comment by email address — September 21, 2013 @ 8:05 am

  13. Have you ever considered creating an ebook or guest authoring on other
    websites? I have a blog based on the same ideas you discuss and would love to have
    you share some stories/information. I know my subscribers would enjoy your work.
    If you are even remotely interested, feel free to send me
    an e mail.

    Comment by fifa 13 ultimate team coin generator — September 25, 2013 @ 8:21 am

  14. My partner and I stumbled over here by a different web page and thought I might as well check things
    out. I like what I see so now i am following you. Look forward to going over your web page for a second time.

    Comment by Kenneth — October 11, 2013 @ 1:49 am

  15. GT5 is the very best release so far, I simply love this game,
    nicely executed Rockstar!

    Comment by www.Friv.yt — November 10, 2013 @ 11:57 am

  16. I absolutely love your blog and find many
    of your post’s to be exactly I’m looking for. Does one
    offer guest writers to write content available for you?
    I wouldn’t mind writing a post or elaborating on a number of the subjects you write with regards to here.
    Again, awesome site!

    Comment by http://www.hygienecertificate.net — November 28, 2013 @ 4:39 am

  17. That is really interesting, You are an overly skilled blogger.

    I have joined your feed and stay up for searching for extra of your magnificent post.

    Also, I have shared your website in my social networks

    Comment by fifa 14 coin generator — December 31, 2013 @ 4:45 am

  18. Are you wondering how Amazon can sell a device that can get online without monthly fees
    or a contract that locks you in. PTCL has been, and still is one of the main telecommunications
    companies in Pakistan and is kind of interesting know how this market has evolved of the last two decades and how PTCL do to contribute to
    the grow of the market, giving to the Pakistan population a good quality service and an improved network where of course are include the Wi –
    Fi services. Caution: Before inserting or removing the SIM card, you must disconnect the router from the
    power adapter.

    Comment by Free Wifi Router Software — June 14, 2014 @ 3:45 am

  19. It’s amazing to pay a quick visit this website and reading the views of all mates concerning this article, while I am also zealous of getting familiarity.

    Comment by copy shoppy — June 20, 2014 @ 12:53 am

  20. nassau bahamas

    Web Crawler using the new AjAgents | Angel \”Java\” Lopez on Blog

    Trackback by nassau bahamas — July 14, 2014 @ 3:47 am

  21. This is really awesome the Web Crawler for AjAgents | Angel \”Java\” Lopez on Blog

    Comment by Paul coelho — July 22, 2014 @ 9:16 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 66 other followers

%d bloggers like this: