Category Archives: MapReduce

Spark: Links and Resources (1)

There are two project named Spark in Java: a web framework, and a distributed map reduce runner.

Scalable Machine Learning | edX
https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x

Spark Programming Guide – Spark 1.3.1 Documentation
https://spark.apache.org/docs/latest/programming-guide.html

Overview – Spark 1.3.1 Documentation
https://spark.apache.org/docs/latest/

firmata/spark
https://github.com/firmata/spark

firmata/protocol
https://github.com/firmata/protocol

Top Best Tools for Java Programmers | Devzum – Its all about Design & Development
http://devzum.com/2015/01/15/10-best-java-tools-that-every-java-programmers-should-know/

Sparkling, A Clojure API for Apache Spark
https://gorillalabs.github.io/sparkling/

Getting Started with Sparkling
https://gorillalabs.github.io/sparkling/articles/getting_started.html

Developing Single Page Web Applications using Java 8, Spark, MongoDB, and AngularJS – OpenShift Blog
https://blog.openshift.com/developing-single-page-web-applications-using-java-8-spark-mongodb-and-angularjs/

Graylog2/spark
https://github.com/Graylog2/spark

Spark Framework – A tiny Java web framework
http://sparkjava.com/

Apache Spark with Scala
http://www.slideshare.net/frodriguezolivera/apache-spark-41601032

yieldbot/flambo
https://github.com/yieldbot/flambo

Windows on Devices
https://www.windowsondevices.com/

MapReduce and Spark | Cloudera VISION
http://vision.cloudera.com/mapreduce-spark/

spark-summit.org/wp-content/uploads/2013/10/Baldeschwieler-SparkSummit2013v2.pdf
http://spark-summit.org/wp-content/uploads/2013/10/Baldeschwieler-SparkSummit2013v2.pdf

Mesosphere · Learn how to use Apache Mesos
http://mesosphere.io/learn/

Got a Minute? Spin up a Spark cluster on your laptop with Docker. | AMPLab – UC Berkeley
https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/

Spark Streaming Programming Guide – Spark 0.7.3 Documentation
http://spark.incubator.apache.org/docs/0.7.3/streaming-programming-guide.html

Spark: Open Source Superstar Rewrites Future of Big Data | Wired Enterprise | Wired.com
http://www.wired.com/wiredenterprise/2013/06/yahoo-amazon-amplab-spark/

Spark | Lightning-Fast Cluster Computing
http://spark-project.org/

Learning Spark – O’Reilly Media
http://shop.oreilly.com/product/0636920028512.do

http://www.cs.berkeley.edu/~matei/papers/2011/tr_spark.pdf
http://www.cs.berkeley.edu/~matei/papers/2011/tr_spark.pdf

My Links
http://delicious.com/ajlopez/spark

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

SharpDoop Implementing Map Reduce in C# (1) The Project

I was experimenting with map-reduce in Node.js / JavaScript. But it is also something interesting to implement the algorithm in C #, to practice TDD (Test-Driven Development) and to learn more about what it takes to implement the algorithm. The project that I started is:

https://github.com/ajlopez/SharpDoop

The current status:

It has a class library project and the test project. I came to think of something not yet distributed. All I want for now is to specify the map/reduce algorithm map, and run it in the same process.

See that there is then a MapReduceJob. The base class that is responsible for running a map (lambda function) and reduce (takes a key,  a list of values for that key, and output the result). With C #, these two operations can be expressed as lambdas or delegate methods inside a more complex object. For now, I pose the simple cases in the tests, and all them are running.

I also put together a MapProcessJob, which is a variant of map / reduce to discuss. Instead of processing all keys first, then roll them into the reduce phase, try to do everything together. In some use cases it may be more efficient, but is an issue to discuss in more detail.

Upcoming topics: more in-depth review and implementation ideas, such as map / process.

Stay tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

Scalability: Links, News And Resources (6)

Previous Post
Next Post

What Does Your Webserver Do When a User Hits Refresh? — Ecommerce Blog by Shopify
http://www.shopify.com/technology/7535298-what-does-your-webserver-do-when-a-user-hits-refresh#axzz2O1D5ZhXG

VMware vFabric GemFire: High Performance Data Management for Cloud-Based Applications
http://www.vmware.com/products/application-platform/vfabric-gemfire

Indeed Engineering Blog » Blog Archive » From 1 to 1 Billion: Evolution of a Document Serving System
http://engineering.indeed.com/blog/2013/03/from-1-to-1-billion-part-1/

Scaling Node.js Applications | Colin J. Ihrig’s Blog
http://cjihrig.com/blog/scaling-node-js-applications/

Facebook kisses DRAM goodbye, builds memcached for flash — Tech News and Analysis
http://gigaom.com/2013/03/05/facebook-kisses-dram-goodbye-builds-memcached-for-flash/

Splout SQL
http://sploutsql.com/
MapReduce, using Pangool, from Spain

Pomelo home
http://pomelo.netease.com/

NetEase/pomelo · GitHub
https://github.com/NetEase/pomelo
Game development with Node.js

Scaling Facebook Engineering
http://www.infoq.com/presentations/Scaling-Facebook-Engineering

(10) Needle in a haystack: efficient storage of billions of photos
https://www.facebook.com/note.php?note_id=76191543919

Fully Loaded Node – A Node.JS Holiday Season, part 2 ✩ Mozilla Hacks – the Web developer blog
https://hacks.mozilla.org/2012/11/fully-loaded-node-a-node-js-holiday-season-part-2/

Structure:Data | GigaOM Events
http://event.gigaom.com/structuredata/

My Links
http://delicious.com/ajlopez/scalability

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

Code Katas in JavaScript/Node.js using TDD

These past weeks, I was working in JavaScript/Node.js modules, using TDD at each step. Practice, practice, practice, the journey to mastery.

You can see my progress, reviewing the commits I did at each new test. This is a summary of that work:

CobolScript: See my posts, an implementation of COBOL as a compiler to JavaScript, having console program samples, dynamic web pages and access to Node.js modules. See web sample, using MySQL, and SimpleWeb.

SimplePipes: A way to define message-passing using ‘pipes’ to connect different defined nodes/functions. I want to extend it to have distributed process.

SimpleBoggle: Boggle solver, it is better than me! See console sample.

SimpleMemolap: Multidimensional OLAP-like processing, with in-memory model, and SimpleWeb site see sample:

SimpleChess: Work in progress, define a board using SimpleBoard, and make moves. I’m working on SimpleGo, too, to have a board, game, and evaluators.

SimpleRules: forward-chaing rule engine. I should add rule compilation to JavaScript. The engine works a la Rete-2, detecting the changes in the current state, and triggering actions.

SimpleScript: see post, my simple language, compiled to JavaScript. See posts. WIP.

Py2Script: Python language compiler to JavaScript, first step. WIP.

SimpleWeb: web middleware, a la Connect, with web sample.

BasicScript: My first steps to compile Basic to JavaScript. I want to use it to program and compile a game.

SimplePermissions: Today code kata. It implements subjects, roles, and permissions, granted by context.

SimpleFunc: Serialization of functions.

SimpleMapReduce: Exploring the implementation of a Map-Reduce algorithm.

SimpleTuring: Turing machine implentation.

Cellular: Cellular automata implementation, including a Game of Life console sample.

I will work on:

NodeDelicious: To retrieve my links from my Delicious account, now the site was revamped and no more pagination.

SimpleDatabase: In-memory database, maybe I will add file persistence.

SimpleSudoku: Rewrite of my AjSudoku solver, from scratch.

I’m having a lot of fun, as usual 😉

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

New Month’s Resolutions: January 2013

The first month of a new year! I’m was busy coding a lot. It’s time to review past month resolutions:

– Work on PythonSharp [pending]
– Work on AjTalk for C# [complete] see repo and see my posts
– Give a Node.js course [complete] Spanish post
– Start .md pages Java tutorial [pending]

Additionally, I was working on:

– Start Py2Script Python to JavaScript compiler [complete] see repo
– Update my Node.js samples [complete] see repo
– Start and publish version 0.0.1 of SimpleWeb, my middleware layer [complete] see repo
– Start BasicScript [complete] see repo
– Start and publish version 0.0.1 of CobolScript [complete] see repo and see my posts
– Update AjConsorSite [complete] see repo
– Start Inmob [complete] see repo

For this new month, these are my new resolutions (some are already started):

– Start SimpleScript
– Start SimpleBoard
– Start SimpleChess
– Start SimpleGo
– Start and publish a version of SimpleMapReduce, with local and distributed sample
– Start and publish a version of SimpleFunc, object with functions serialization
– Start Memolap, C# in-memory multidimensional OLAP-like library and sample
– Start SimpleMemolap, the same but in JavaScript/Node.js
– Start SimpleRules, forward-chaining rule engine, that compiles to JavaScript

A lot of fun! 😉

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez