Angel \”Java\” Lopez on Blog

June 15, 2015

Spark: Links and Resources (1)

Filed under: Distributed Computing, Java, Links, MapReduce, Spark — ajlopez @ 10:09 am

There are two project named Spark in Java: a web framework, and a distributed map reduce runner.

Scalable Machine Learning | edX
https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x

Spark Programming Guide – Spark 1.3.1 Documentation
https://spark.apache.org/docs/latest/programming-guide.html

Overview – Spark 1.3.1 Documentation
https://spark.apache.org/docs/latest/

firmata/spark
https://github.com/firmata/spark

firmata/protocol
https://github.com/firmata/protocol

Top Best Tools for Java Programmers | Devzum – Its all about Design & Development
http://devzum.com/2015/01/15/10-best-java-tools-that-every-java-programmers-should-know/

Sparkling, A Clojure API for Apache Spark
https://gorillalabs.github.io/sparkling/

Getting Started with Sparkling
https://gorillalabs.github.io/sparkling/articles/getting_started.html

Developing Single Page Web Applications using Java 8, Spark, MongoDB, and AngularJS – OpenShift Blog
https://blog.openshift.com/developing-single-page-web-applications-using-java-8-spark-mongodb-and-angularjs/

Graylog2/spark
https://github.com/Graylog2/spark

Spark Framework – A tiny Java web framework
http://sparkjava.com/

Apache Spark with Scala
http://www.slideshare.net/frodriguezolivera/apache-spark-41601032

yieldbot/flambo
https://github.com/yieldbot/flambo

Windows on Devices
https://www.windowsondevices.com/

MapReduce and Spark | Cloudera VISION
http://vision.cloudera.com/mapreduce-spark/

spark-summit.org/wp-content/uploads/2013/10/Baldeschwieler-SparkSummit2013v2.pdf
http://spark-summit.org/wp-content/uploads/2013/10/Baldeschwieler-SparkSummit2013v2.pdf

Mesosphere · Learn how to use Apache Mesos
http://mesosphere.io/learn/

Got a Minute? Spin up a Spark cluster on your laptop with Docker. | AMPLab – UC Berkeley
https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/

Spark Streaming Programming Guide – Spark 0.7.3 Documentation
http://spark.incubator.apache.org/docs/0.7.3/streaming-programming-guide.html

Spark: Open Source Superstar Rewrites Future of Big Data | Wired Enterprise | Wired.com
http://www.wired.com/wiredenterprise/2013/06/yahoo-amazon-amplab-spark/

Spark | Lightning-Fast Cluster Computing
http://spark-project.org/

Learning Spark – O’Reilly Media
http://shop.oreilly.com/product/0636920028512.do

http://www.cs.berkeley.edu/~matei/papers/2011/tr_spark.pdf
http://www.cs.berkeley.edu/~matei/papers/2011/tr_spark.pdf

My Links
http://delicious.com/ajlopez/spark

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

June 13, 2015

Computer History: Links And Resources (17)

Filed under: Computer History, Links — ajlopez @ 8:21 pm

Previous Post

Happy 100th birthday Hedy Lamarr, the inventor who made the wireless internet possible
http://unasinnott.com/happy-100th-birthday-hedy-lamarr-inventor-wireless-internet/

The Hypercard Legacy
https://medium.com/@blprnt/the-hypercard-legacy-e5b9eb273b6a

Oracle founder Larry Ellison resigns after 35 years as CEO | Technology | theguardian.com
http://www.theguardian.com/technology/2014/sep/18/larry-ellison-oracle-billionaire-resigns-ceo

Object Oriented Programming: A Critical Approach
https://www.udemy.com/blog/object-oriented-programming-a-critical-approach/

(507) What is the most complex/significant program created by a single programmer? – Quora
http://www.quora.com/What-is-the-most-complex-significant-program-created-by-a-single-programmer

(496) Who is considered to be the best programmer of all time? – Quora
http://www.quora.com/Who-is-considered-to-be-the-best-programmer-of-all-time

Best Business Books To Read Fall 2014 – Business Insider
http://www.businessinsider.com/best-business-books-to-read-fall-2014-2014-8?op=1

Randy Pausch – Wikipedia, la enciclopedia libre
http://es.wikipedia.org/wiki/Randy_Pausch

(492) What is a Lisp machine and what is so great about them? – Quora
http://www.quora.com/What-is-a-lisp-machine-and-what-is-so-great-about-them

Why Should I Care What Color the Bikeshed Is?
http://bikeshed.org/

GoRuCo 2014 – What We Can Learn From COBOL by Andrew Turley – YouTube
https://www.youtube.com/watch?v=sB9_hVO9Cik

Twitter ‘Buy Now’ Button Appears for First Time
http://mashable.com/2014/06/30/twitter-buy-now-button/

Waterfall | Martín Alaimo | Agile Coach & Trainer
http://www.martinalaimo.com/es/blog/waterfall-la-historia-detras-del-error

AKKA 5 Year Anniversary
http://typesafe.com/akka-five-year-anniversary

The Mouse Trap: Raising Lazarus – The 20 Year Old Bug that Went to Mars
http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html

Guido van Rossum on the History of Python – YouTube
https://www.youtube.com/watch?v=ugqu10JV7dk

Google Glass and the Future of Technology – NYTimes.com
http://pogue.blogs.nytimes.com/2012/09/13/google-glass-and-the-future-of-technology/

My Links
http://delicious.com/ajlopez/computerhistory

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

June 9, 2015

Computer History: Links And Resources (16)

Filed under: Computer History, Links — ajlopez @ 8:55 pm

Previous Post
Next Post

Talking with Mikel Evins about the Lisp-based Newton OS from Apple
http://lispm.de/lisp-based-newton-os

I, Cringely The Decline and Fall of IBM – I, Cringely
http://www.cringely.com/2014/06/04/decline-fall-ibm/

You won’t believe how old TDD is | Arialdo Martini
http://arialdomartini.wordpress.com/2012/07/20/you-wont-believe-how-old-tdd-is/

20 Years of Beowulf Workshop Issues Call for Papers – insideHPC
http://insidehpc.com/2014/05/22/20-years-beowulf-workshop-issues-call-papers/

The Golden Age of Basic – IEEE Spectrum
http://spectrum.ieee.org/tech-talk/computing/software/the-golden-age-of-basic

General Magic – Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/General_Magic

Scientific computing’s future: Can any coding language top a 1950s behemoth? | Ars Technica
http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/

Injected, Inspected, Detected, Infected, Neglected and Selected | The Weekly Squeak
http://news.squeak.org/2014/04/29/injected-inspected-detected-infected-neglected-and-selected/

The Origins of Linux – Linus Torvalds – YouTube
https://www.youtube.com/watch?v=WVTWCPoUt8w

xerox :: parc :: techReports :: CSL-79-13 WFS A Simple Shared File System for a Distributed Environment : Free Download & Streaming : Internet Archive
https://archive.org/details/bitsavers_xeroxparctSASimpleSharedFileSystemforaDistributedE_1512814

Preliminary discussion of the logical design of an electronic computing instrument
http://www.cs.unc.edu/~adyilie/comp265/vonNeumann.html

In less than a decade, phone memory cards have grown from 128MB to 128GB | The Verge
http://www.theverge.com/2014/2/24/5441898/in-less-than-a-decade-phone-memory-cards-have-grown-from-128mb-to-128gb

Let Over Lambda
http://letoverlambda.com/index.cl

Un argentino ayudó a nacer al lenguaje Basic – 03.05.2014 – lanacion.com
http://www.lanacion.com.ar/1687116-un-argentino-ayudo-a-nacer-al-lenguaje-basic

How Steve Wozniak Wrote BASIC for the Original Apple From Scratch
http://gizmodo.com/how-steve-wozniak-wrote-basic-for-the-original-apple-fr-1570573636

Happy 50th Birthday, BASIC! – The Visual Basic Team – Site Home – MSDN Blogs
http://blogs.msdn.com/b/vbteam/archive/2014/05/01/happy-50th-birthday-basic.aspx

My Links
http://delicious.com/ajlopez/computerhistory

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

June 6, 2015

Docker: Links, News And Resources (2)

Filed under: DevOps, Docker, Links — ajlopez @ 8:29 pm

Previous Post

3 reasons for using Docker in your open cloud architecture – Thoughts on Cloud
http://thoughtsoncloud.com/2014/12/3-keys-incorporating-docker-open-cloud-architecture/

Initial thoughts on the Rocket announcement | Docker Blog
http://blog.docker.com/2014/12/initial-thoughts-on-the-rocket-announcement/

CoreOS is building a container runtime, Rocket
https://coreos.com/blog/rocket/

Why Docker and CoreOS’ split was predictable – Daniel With Music
http://danielcompton.net/2014/12/02/modular-integrated-docker-coreos

What is Tutum? | Tutum Blog
http://blog.tutum.co/2014/09/25/what-is-tutum/

Gartner Panel Reveals Stark Differences in Container Based PaaS Options | Pivotal P.O.V.
http://pivotalblogus.wpengine.com/cloud-foundry-pivotal/p-o-v/gartner-panel-reveals-stark-differences-in-container-based-paas-options

Docker Hosting – Run Docker Containers in the Cloud – Tutum
https://www.tutum.co/

Tutum | CrunchBase
http://www.crunchbase.com/organization/tutum

Tutum, a startup building on Docker’s hip tech, reels in $2.6M | VentureBeat | Deals | by Jordan Novet
http://venturebeat.com/2014/08/19/tutum-a-startup-building-on-dockers-hip-tech-reels-in-2-6m/

Docker: Present and Future
http://www.infoq.com/articles/docker-future

Technophilia: Distrubuted JMeter testing using Docker
http://srivaths.blogspot.com.ar/2014/08/distrubuted-jmeter-testing-using-docker.html

The beginner’s guide to Docker | Application virtualization – InfoWorld
http://www.infoworld.com/t/application-virtualization/the-beginners-guide-docker-248867

Getting started with Docker | Opensource.com
http://opensource.com/business/14/7/guide-docker

Systemd: Harbinger of the Linux apocalypse
http://www.infoworld.com/print/248436

How to Docker Workshop, Step-by-Step
http://blog.harbur.io/docker-workshop/

Docker: The first true devops tool? | Application development – InfoWorld
http://www.infoworld.com/t/application-development/docker-the-first-true-devops-tool-247170

The March Towards Go – Zef.me
http://zef.me/6191/the-march-towards-go

Docker Gets Ignore, Auto-Pauses Containers On Commit
http://www.infoq.com/news/2014/07/docker-ignore-pause

My Links
https://delicious.com/ajlopez/docker

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

June 4, 2015

New Month’s Resolutions: June 2015

Filed under: .NET, C Sharp, Java, JavaScript, Liqueed, NodeJs, Open Source Projects — ajlopez @ 9:48 am

It’s the time for review my May’s resolutions and write the new ones:

– Improve ClojJS [pending]
– Add NPM support to ClojJS [pending]
– Write posts about JavaScript and Artificial Intelligence [pending]
– Give a talk about Meteor [partial]
– Prepare a talk about Clojure or ClojureScript [partial]
– Improve BScript [pending]
– Improve AjErl, distributed features [partial] see repo
– Improve Liqueed Project, kudos features [complete] see repo

I also worked on:

– Start SparkSharp, Apache Spark-like in C# [complete] see repo
– Improve SharpMongo, MongoDB-like in C# [complete] see repo
– Improve OStore, object store in memory, JavaScript/NodeJS [complete] see repo
– Improve PythonSharp, Python interpreter in C# [complete] see repo
– Improve RedPython, compile Python to C using JavaScript/NodeJS [complete] see repo

My new month’s resolutions:

– Give a talk about Meteor
– Write posts about JavaScript and Artificial Intelligence
– Improve ClojJS
– Add NPM support to ClojJS
– Continue work on SparkSharp
– Start Message project in C#, a la Apache Camel
– Improve Liqueed Project, kudos features
– Improve Templie, template engine in Java
– Improve BScript, Basic-like interpreter in C#
– Continue work on OStore
– Continue work on SharpMongo

More fun is coming ;-)

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

June 3, 2015

Computer History: Links And Resources (15)

Filed under: Computer History, Links — ajlopez @ 10:27 am

Previous Post
Next Post

Celebrating 50 years of BASIC
http://www.theguardian.com/education/2014/apr/30/celebrating-50-years-of-basic

Smalltalk??? (2013/1/31)?????? – YouTube
https://www.youtube.com/watch?v=UbesO7wN1T0

PARC Movies – YouTube
https://www.youtube.com/watch?v=aqW6Sp279Z0

Before Silicon Valley got nasty, the Pirates of Analog Alley fought it out | Ars Technica
http://arstechnica.com/information-technology/2014/04/before-silicon-valley-got-nasty-the-pirates-of-analog-alley-fought-it-out/

Visual Programming Languages – Snapshots
http://blog.interfacevision.com/design/design-visual-progarmming-languages-snapshots/

The Birth & Death of JavaScript — Destroy All Software Talks
https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript

Memories of Steve
http://donmelton.com/2014/04/10/memories-of-steve/

RailsConf 09: Robert Martin, “What Killed Smalltalk Could K – YouTube
https://www.youtube.com/watch?v=YX3iRjKj7C0

Monad Manifesto
http://www.jsnover.com/Docs/MonadManifesto.pdf

The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI | Enterprise | WIRED
http://www.wired.com/2013/05/neuro-artificial-intelligence/all/

The Oberon-07 language report is 17 pages
http://www.hokstad.com/oberon

50 years of BASIC: Celebrating the programming language’s long, eventful life
http://www.networkworld.com/community/blog/50-years-basic-celebrating-resilient-programming-language

Xv6, a simple Unix-like teaching operating system
http://pdos.csail.mit.edu/6.828/2012/xv6.html

Objects are Just Objects, Aren’t they? – Rick DeNatale – Ruby Conference 2010
http://www.confreaks.com/videos/461-rubyconf2010-objects-are-just-objects-aren-t-they

How I Came to Write D | Dr Dobb’s
http://www.drdobbs.com/architecture-and-design/how-i-came-to-write-d/240165322

BBC News – Half-century milestone for IBM mainframes
http://www.bbc.com/news/technology-26886579

S-expression – Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/S-expression

Visual Smalltalk Brochure – a set on Flickr
https://www.flickr.com/photos/garduino/sets/72157643437028004/

My Links
http://delicious.com/ajlopez/computerhistory

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

June 1, 2015

Computer History: Links And Resources (14)

Filed under: Computer History, Links — ajlopez @ 10:05 am

Previous Post
Next Post

Contributions of Individual Programming Languages to Software Development | Javalobby
http://java.dzone.com/articles/contributions-individual

The Art of Atari: A celebration of game packaging’s golden age | Polygon
http://www.polygon.com/2014/3/26/5482198/the-art-of-atari-a-celebration-of-game-packagings-golden-age

Conway’s Game Of Life in APL – YouTube
https://www.youtube.com/watch?v=a9xAKttWgP4

Why Language Designers Tolerate Undefined Behavior | Dr Dobb’s
http://www.drdobbs.com/cpp/why-language-designers-tolerate-undefine/240165466

Microsoft makes source code for MS-DOS and Word for Windows available to public – The Official Microsoft Blog – Site Home – TechNet Blogs
http://blogs.technet.com/b/microsoft_blog/archive/2014/03/25/microsoft-makes-source-code-for-ms-dos-and-word-for-windows-available-to-public.aspx

Emails From Eric Schmidt And Sergey Brin On Hiring Apple Workers – Business Insider
http://www.businessinsider.com/emails-eric-schmidt-sergey-brin-hiring-apple-2014-3

Systems Past: the only 8 software innovations we actually use – Technical Journal
http://davidad.github.io/blog/2014/03/12/the-operating-system-is-out-of-date/

Self: The Movie; – YouTube
https://www.youtube.com/watch?v=Ox5P7QyL774#t=13

Ruby Turns 21: 5 Major Milestones Of The Programming Language – ReadWrite
http://readwrite.com/2014/02/24/ruby-21-anniversary-milestones-programming-language-rails#awesm=~owUqyXpGHejhfW

Douglas Lenat
http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Douglas_Lenat.html

artificial intelligence – How To Design Eurisko – Stack Overflow
http://stackoverflow.com/questions/2524129/how-to-design-eurisko

Eurisko, The Computer With A Mind Of Its Own | Alicia Patterson Foundation
http://aliciapatterson.org/stories/eurisko-computer-mind-its-own

EURISKO – Lesswrongwiki
http://wiki.lesswrong.com/wiki/EURISKO

http://www.aaai.org/Papers/AAAI/1980/AAAI80-047.pdf
http://www.aaai.org/Papers/AAAI/1980/AAAI80-047.pdf
Eurisko

Let’s reimplement EURISKO! – Less Wrong
http://lesswrong.com/lw/10g/lets_reimplement_eurisko/

Eurisko – Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Eurisko

James Iry’s history of programming languages (illustrated with pictures and large fonts) | The Quick Word
http://thequickword.wordpress.com/2014/02/16/james-irys-history-of-programming-languages-illustrated-with-pictures-and-large-fonts/

Robot Odyssey: The hardest computer game of all time.
http://www.slate.com/articles/technology/bitwise/2014/01/robot_odyssey_the_hardest_computer_game_of_all_time.html

_why: La historia de un genio post-moderno
http://examplelab.com.ar/_why-una-historia-de-un-genio-post-moderno/

Apple ‘1984’ Commercial Is 30 Years Old Today (VIDEO)
http://www.huffingtonpost.co.uk/2014/01/22/apple-1984-macintosh-commercial-30-years_n_4642573.html

My Links
http://delicious.com/ajlopez/computerhistory

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

May 30, 2015

Computer History: Links And Resources (13)

Filed under: Computer History, Links — ajlopez @ 10:02 am

Previous Post
Next Post

Remembering Apple’s “1984” Super Bowl ad – O Say Can You See?
http://blog.americanhistory.si.edu/osaycanyousee/2014/01/remembering-apples-1984-super-bowl-ad.html

Starting January 24, 2014
http://danbricklin.com/log/2014_01_24.htm#bcs1984

The Mac Turns 30
http://mashable.com/2014/01/24/apple-mac-turns-30/

ANNOUNCEMENT: The Scala Programming Language
http://article.gmane.org/gmane.comp.lang.scala/17

Rasmus Lerdorf: A look at PHP 5.4 and 5.5 – YouTube
http://www.youtube.com/watch?v=pzKrEmO8nEA

hypercard.org – Open Source HyperCard-related stuff
http://hypercard.org/

25 years of HyperCard—the missing link to the Web | Ars Technica
http://arstechnica.com/apple/2012/05/25-years-of-hypercard-the-missing-link-to-the-web/

Origins of Common UI Symbols
https://readymag.com/shuffle/ui-symbols/

Screen shots of computer code: Photo
http://moviecode.tumblr.com/image/72085828946

When Apple drops the ball: the gear that flopped | News | TechRadar
http://www.techradar.com/news/computing/apple/when-apple-drops-the-ball-the-gear-that-flopped-1208618

Hyperland
http://vimeo.com/72501076

Alan Turing, Enigma Code-Breaker and Computer Pioneer, Wins Royal Pardon – NYTimes.com
http://www.nytimes.com/2013/12/24/world/europe/alan-turing-enigma-code-breaker-and-computer-pioneer-wins-royal-pardon.html?_r=0

launch – Wikipedia’s forgotten founder Larry Sanger
http://launch.wistia.com/medias/hvv319hf81

WHAT IS FORTH
http://www.angelfire.com/in/zydenbos/WhatisForth.html

PHP Manual Masterpieces
http://phpmanualmasterpieces.tumblr.com/post/70257636397/im-crying-literally-crying-actual-tears-in-my

ComputerHistory – YouTube
http://www.youtube.com/user/ComputerHistory?feature=watch

Oral History of Adele Goldberg – YouTube
http://www.youtube.com/watch?v=IGNiH85PLVg

ALTO-Smalltalk-72
http://lively-web.org/users/Dan/ALTO-Smalltalk-72.html

blog dds: 2013.12.11 – The Birth of Standard Error
http://www.spinellis.gr/blog/20131211/

Grace Hopper: Google doodle for creator of Cobol computer language – Mirror Online
http://www.mirror.co.uk/news/technology-science/technology/grace-hopper-google-doodle-creator-2907172

Fast and Dynamic
http://www.infoq.com/presentations/dynamic-performance

DEFCON 19 (2011) – The History and Evolution of Computer Viruses – YouTube
http://www.youtube.com/watch?v=yswPIwDFYDY

My Links
http://delicious.com/ajlopez/computerhistory

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

May 28, 2015

SparkSharp, Spark in C# (2) Implementing Map and Reduce

Filed under: .NET, C Sharp, Open Source Projects, Spark, SparkSharp — ajlopez @ 10:15 am

Previous Post

The project repo is at:

https://github.com/ajlopez/SparkSharp

I’m using TDD (Test-Driven Development) workflow to write the application, so the code is evolving with the use cases I added as tests. I only wrote the code needed to pass the tests. When I had new use cases, I will add new functionality. As an example, the original project has an Spark Context, with factory methods to create Datasets. I don’t need that class, yet. So in the current conde, the Datasets are created as public objects using new operator..

Born after a refactor, the abstract class for all Datasets is BaseDataset. Partial code:

public abstract class BaseDataset<T> : IEnumerable<T>
{
    public abstract IEnumerable<T> Elements { get; }

    public BaseDataset<S> Map<S>(Func<T, S> map)
    {
        return new EnumDataset<S>(this.ApplyMap(map));
    }

    public S Reduce<S>(Func<S, T, S> reduce)
    {
        S result = default(S);

        foreach (var elem in this)
            result = reduce(result, elem);

        return result;
    }
    
    // ...

    private IEnumerable<S> ApplyMap<S>(Func<T, S> map)
    {
        foreach (var elem in this)
            yield return map(elem);
    }
    
    // ...
}

The enumeration of the dataset elements should be implemented by the concrete subclass. The implementation of Map and Reduce is general, for all datasets. Those methods are defined in the abstract class. Thanks to C#, those methods can receive a lambda or a Func (a function).

In the ApplyMap method I’m using the C# yield operator to return an element suspending the executiong of the foreach. That command will resume when the consumer needs the next element of the enumerable collection. In this way, the generation of the elements is lazy, each element is produced only under demand. A note: C# has lambdas and delegate functions, and they are examples of good and useful features added to a programming language. In contrast, Java world has Scala, that in my opinion, it a bit “too much”. I prefer the evolution of C# instead of Scala.

There are no tests using the abstract class (it was born as a refactor), but they are tests on concrete ones. A test of Map method using EnumDataset (a Dataset that is a wrapper around an IEnumerable collection):

[TestMethod]
public void MapIncrement()
{
    EnumDataset<int> ds = new EnumDataset<int>(new int[] { 1, 2, 3 });
    BaseDataset<int> mapds = ds.Map(i => i + 1);
    var enumerator = mapds.GetEnumerator();

    for (int k = 1; enumerator.MoveNext(); k++)
        Assert.AreEqual(k + 1, enumerator.Current);

    Assert.AreEqual(3, mapds.Count());
}

And a Reduce test:

[TestMethod]
public void ReduceSum()
{
    EnumDataset<int> ds = new EnumDataset<int>(new int[] { 1, 2, 3 });
    var result = ds.Reduce<int>((x, y) => x + y);

    Assert.IsNotNull(result);
    Assert.AreEqual(6, result);
}

Next topics: more BaseDataset methods, concrete classes, datasets with keys, etc…

Stay tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

May 23, 2015

SparkSharp, Spark in C# (1) First Ideas

Filed under: .NET, C Sharp, Open Source Projects, Spark, SparkSharp — ajlopez @ 11:20 pm

Next Post

In these days, I visited Apache Spark project:

https://spark.apache.org/

And started to think about implementing some of its ideas in C#.

The original project has Datasets, that can be consumed, item by item, and processed by methods like map and reduce. A dataset can consume a text file, local files or distributed ones. The jobs to run over datasets, applying transformations, can be launched in many distributed nodes (I should review the consolidation of results).

I started a new C# project:

https://github.com/ajlopez/SparkSharp

To me, it is important to start with small steps, using TDD (Test-Driven Development) workflow. So, in my first commits, I wrote datasets that implement IEnumerable. They have methods like Map, Reduce, Split, Take, Skip. Those methods were implemented writing the tests that express the expected API and behavior.

A dataset can be a simple wrapper of any IEnumerable, or it can read a text file, reading lines.

All these datasets are local, reside in the same machine. My idea is to implement a dataset wrapper, to expose the dataset content to other machines, and write a client wrapper that runs in each machine. The client wrapper looks like a regular dataset, but when the client program needs the next item of the dataset, that item come from the remote original machine.

The remote dataset gives the next item to any client. Each item is delivered only to ONE client. So, the items can be consumed and processed by n remote clients, without having an item processed twice.

To implement such pair server/client, I should implement serialization/deserialization of an arbitrary type T. I will use my previous work in AjErl and Aktores to have such feature. Using TDD, I could assert the expected behavior of the serializaction/deserialization process. If in the future, I have a better idea for such process, like using an external robust open source serialization library, all the TDD tests will help me to make the switch.

But, baby steps. Next steps: improve current local datasets, maybe add a new variant of dataset, and write keyed datasets, created using MapToKey method (to implement)

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

« Newer PostsOlder Posts »

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 68 other followers