MapReduce: Links, News and Resources (1)

Ones of my preferred topics in programming are algorithms and distributed computing. You can have both with MapReduce. These are some of my links (thanks to @asehmi for his help; he sent me some of these links).

http://en.wikipedia.org/wiki/MapReduce

MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers.[1] Parts of the framework are patented in some countries.[2]

The framework is inspired by the map and reduce functions commonly used in functional programming,[3] although their purpose in the MapReduce framework is not the same as their original forms.[4]

MapReduce libraries have been written in C++, C#, Erlang, Java, OCaml, Perl, Python, PHP, Ruby, F#, R and other programming languages

MapReduce: Simplified Data Processing on Large Clusters
http://labs.google.com/papers/mapreduce.html

Parallel Processing Using the Map Reduce Programming Model
http://blog.diskodev.com/parallel-processing-using-the-map-reduce-prog

Graph Twiddling in a MapReduce World
http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.120

Cloud9: a MapReduce library for Hadoop
http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/index.html

MapSharp
http://mapsharp.codeplex.com/
An implementation of Map-Reduce in C#

Twister: iterative MapReduce
http://www.iterativemapreduce.org/

ySpace Qizmt – MySpace’s Open Source Mapreduce Framework
http://code.google.com/p/qizmt/

Cascading
http://www.cascading.org/
Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. All without having to ‘think’ in MapReduce.

Project Daytona – Microsoft Research
http://research.microsoft.com/en-us/projects/azure/daytona.aspx
Iterative MapReduce on Windows Azure

InfoQ: Introduction to Oozie
http://www.infoq.com/articles/introductionOozie
Combine multiple Map/Reduce jobs into a logical unit of work

InfoQ: Ville Tuulos on Big Data and Map/Reduce in Erlang and Python with Disco
http://www.infoq.com/interviews/tuulos-erlang-mapreduce

Spark Cluster Computing Framework
http://www.spark-project.org/

Preview of Storm: The Hadoop of Realtime Processing – BackType Technology
http://tech.backtype.com/preview-of-storm-the-hadoop-of-realtime-proce

Hadoop in Azure – Distributed Development – Site Home – MSDN Blogs
http://blogs.msdn.com/b/mariok/archive/2011/05/11/hadoop-in-azure.aspx

MapReduce: A Soft Introduction
http://www.javacodegeeks.com/2011/05/mapreduce-soft-introduction.html

Mapreduce & Hadoop Algorithms in Academic Papers
http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/

MSDN Magazine: MapReduce in F# – Parsing Log Files with F#, MapReduce and Windows Azure
http://msdn.microsoft.com/en-us/magazine/gg983490.aspx

F#: With a few lines of code entered into the powershell and analyze gigabytes of cloud data! – Systems, architecture and engineering solutions!
http://blogs.msdn.com/b/socal-sam/archive/2011/04/26/f-with-a-few-lines-of-code-entered-into-the-powershell-and-analyze-gigabytes-of-cloud-data.aspx

Data-Intensive Text Processing with MapReduce
http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf

The Geomblog: Workshop on Parallelism, and a "breakthrough" in combinatorial geometry
http://geomblog.blogspot.com/2010/11/workshop-on-parallelism-and.html

Pragmatic Programming Techniques: Designing algorithms for Map Reduce
http://horicky.blogspot.com/2010/08/designing-algorithmis-for-map-reduce.html

Mapreduce and Hadoop Algorithms in Bioinformatics Papers | Abhishek Tiwari
http://www.abhishek-tiwari.com/2010/08/mapreduce-and-hadoop-algorithms-in-bioinformatics-papers.html

Pragmatic Programming Techniques: Map/Reduce to recommend people connection
http://horicky.blogspot.com/2010/08/mapreduce-to-recommend-people.html

High Scalability – Dremel: Interactive Analysis of Web-Scale Datasets – Data as a Programming Paradigm
http://highscalability.com/blog/2010/8/4/dremel-interactive-analysis-of-web-scale-datasets-data-as-a.html

Tutorial: MapReduce with Riak « myNoSQL
http://nosql.mypopescu.com/post/849130434/tutorial-mapreduce-with-riak

High Scalability – How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

Pregel
http://portal.acm.org/citation.cfm?id=1807167.1807184

Pregel: Google’s other data-processing infrastructure | Scalable web architectures
http://www.royans.net/arch/pregel-googles-other-data-processing-infrastructure/

Apache Mahout – Overview
http://mahout.apache.org/
The Apache Mahout™ machine learning library’s goal is to build scalable machine learning libraries.

InfoQ: Billy Newport Discusses Parallel Programming in Java
http://www.infoq.com/interviews/billy-newport-parallel

Sector/Sphere: High Performance Distributed Data Storage and Processing
http://sector.sourceforge.net/

MapReduce – The Fanfiction « Snail in a Turtleneck
http://www.snailinaturtleneck.com/blog/2010/03/15/mapreduce-the-fanfiction/

Map / Reduce – A visual explanation
http://ayende.com/Blog/archive/2010/03/14/map-reduce-ndash-a-visual-explanation.aspx

An Introduction to JavaScript Map/Reduce in Riak on Vimeo
http://vimeo.com/9188550

Graph algorithms (and MapReduce)
http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/session5-slides.pdf

Using MapReduce Functionality To Process Data
http://freemakelove.info/http:/freemakelove.info/html/y2010/2145_using-mapreduce-functionality-to-process-data.html

My Links
http://www.delicious.com/ajlopez/mapreduce

More links about Hadoop and other systems are coming.

Keep tuned!

Angel "MapReduced" Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

2 thoughts on “MapReduce: Links, News and Resources (1)

  1. Pingback: Hadoop: Links, News and Resources (1) « Angel “Java” Lopez on Blog

  2. Pingback: Enlaces, Novedades y Recursos - Angel "Java" Lopez

Leave a comment