Angel \”Java\” Lopez on Blog

July 10, 2013

Scalability: Links, News And Resources (7)

Filed under: Links, Scalability, Software Architecture, Software Development — ajlopez @ 3:47 pm

Previous Post

What Does Your Webserver Do When a User Hits Refresh? — Ecommerce Blog by Shopify
http://www.shopify.com/technology/7535298-what-does-your-webserver-do-when-a-user-hits-refresh#axzz2O1D5ZhXG

VMware vFabric GemFire: High Performance Data Management for Cloud-Based Applications
http://www.vmware.com/products/application-platform/vfabric-gemfire

Indeed Engineering Blog » Blog Archive » From 1 to 1 Billion: Evolution of a Document Serving System
http://engineering.indeed.com/blog/2013/03/from-1-to-1-billion-part-1/

Scaling Node.js Applications | Colin J. Ihrig’s Blog
http://cjihrig.com/blog/scaling-node-js-applications/

Facebook kisses DRAM goodbye, builds memcached for flash — Tech News and Analysis
http://gigaom.com/2013/03/05/facebook-kisses-dram-goodbye-builds-memcached-for-flash/

Splout SQL
http://sploutsql.com/
MapReduce, using Pangool, from Spain

Pomelo home
http://pomelo.netease.com/

NetEase/pomelo · GitHub
https://github.com/NetEase/pomelo
Game development with Node.js

Scaling Facebook Engineering
http://www.infoq.com/presentations/Scaling-Facebook-Engineering

(10) Needle in a haystack: efficient storage of billions of photos
https://www.facebook.com/note.php?note_id=76191543919

Fully Loaded Node – A Node.JS Holiday Season, part 2 ✩ Mozilla Hacks – the Web developer blog
https://hacks.mozilla.org/2012/11/fully-loaded-node-a-node-js-holiday-season-part-2/

Structure:Data | GigaOM Events
http://event.gigaom.com/structuredata/

My Links
http://delicious.com/ajlopez/scalability

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

July 4, 2013

Scalability: Links, News And Resources (6)

Previous Post
Next Post

What Does Your Webserver Do When a User Hits Refresh? — Ecommerce Blog by Shopify
http://www.shopify.com/technology/7535298-what-does-your-webserver-do-when-a-user-hits-refresh#axzz2O1D5ZhXG

VMware vFabric GemFire: High Performance Data Management for Cloud-Based Applications
http://www.vmware.com/products/application-platform/vfabric-gemfire

Indeed Engineering Blog » Blog Archive » From 1 to 1 Billion: Evolution of a Document Serving System
http://engineering.indeed.com/blog/2013/03/from-1-to-1-billion-part-1/

Scaling Node.js Applications | Colin J. Ihrig’s Blog
http://cjihrig.com/blog/scaling-node-js-applications/

Facebook kisses DRAM goodbye, builds memcached for flash — Tech News and Analysis
http://gigaom.com/2013/03/05/facebook-kisses-dram-goodbye-builds-memcached-for-flash/

Splout SQL
http://sploutsql.com/
MapReduce, using Pangool, from Spain

Pomelo home
http://pomelo.netease.com/

NetEase/pomelo · GitHub
https://github.com/NetEase/pomelo
Game development with Node.js

Scaling Facebook Engineering
http://www.infoq.com/presentations/Scaling-Facebook-Engineering

(10) Needle in a haystack: efficient storage of billions of photos
https://www.facebook.com/note.php?note_id=76191543919

Fully Loaded Node – A Node.JS Holiday Season, part 2 ✩ Mozilla Hacks – the Web developer blog
https://hacks.mozilla.org/2012/11/fully-loaded-node-a-node-js-holiday-season-part-2/

Structure:Data | GigaOM Events
http://event.gigaom.com/structuredata/

My Links
http://delicious.com/ajlopez/scalability

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

July 2, 2013

Scalability: Links, News And Resources (5)

Filed under: Links, Scalability, Software Architecture, Software Development — ajlopez @ 3:53 pm

Previous Post
Next Post

Twitter’s programmers speed Hadoop development | Big Data – InfoWorld
http://www.infoworld.com/d/big-data/twitters-programmers-speed-hadoop-development-211931

Twitter’s programmers speed Hadoop development | Big Data – InfoWorld
http://www.infoworld.com/d/big-data/twitters-programmers-speed-hadoop-development-211931?source=IFWNLE_nlt_stradev_2013-02-05

Cross-Post: Windows Azure SQL Database and SQL Server — Performance and Scalability Compared and Contrasted – Windows Azure – Site Home – MSDN Blogs
http://blogs.msdn.com/b/windowsazure/archive/2013/02/01/cross-post-windows-azure-sql-database-and-sql-server-performance-and-scalability-compared-and-contrasted.aspx

Building A Node.JS Server That Won’t Melt – A Node.JS Holiday Season, part 5 ✩ Mozilla Hacks – the Web developer blog
https://hacks.mozilla.org/2013/01/building-a-node-js-server-that-wont-melt-a-node-js-holiday-season-part-5/

Disk-Locality in Datacenter Computing Considered Irrelevant
http://www.cs.berkeley.edu/~ganesha/talks/disk-irrelevant.pdf

bitly/nsq
https://github.com/bitly/nsq
realtime distributed message processing at scale

High Scalability – High Scalability – Switch your databases to Flash storage. Now. Or you’re doing it wrong.
http://highscalability.com/blog/2012/12/10/switch-your-databases-to-flash-storage-now-or-youre-doing-it.html

Gangnam Ons S4 Recording on 2012-11-07 1410-Vimeo1 +6db on Vimeo
http://vimeo.com/53261709

4store – Scalable RDF storage
http://4store.org/

Amazon Redshift
http://aws.amazon.com/redshift/
petabyte-scale data warehouse service in the cloud

High Scalability – High Scalability – Gone Fishin': Tumblr Architecture – 15 Billion Page Views A Month And Harder To Scale Than Twitter
http://highscalability.com/blog/2012/11/19/gone-fishin-tumblr-architecture-15-billion-page-views-a-mont.html?87125f76=t

Expanding the Cloud – Announcing Amazon Redshift, a Petabyte-scale Data Warehouse Service – All Things Distributed
http://www.allthingsdistributed.com/2012/11/amazon-redshift.html

High Scalability – High Scalability – BigData using Erlang, C and Lisp to Fight the Tsunami of Mobile Data
http://highscalability.com/blog/2012/11/26/bigdata-using-erlang-c-and-lisp-to-fight-the-tsunami-of-mobi.html

Facebook News Feed: Social Data at Scale
http://www.infoq.com/presentations/Facebook-News-Feed

SQLFire: Scalable SQL instead of NoSQL
http://www.infoq.com/presentations/SQLFire-Scalable-SQL-instead-of-NoSQL

How to Scale Your Start-up | Inc. 5000
http://www.inc.com/karl-and-bill/how-to-scale-your-start-up.html

When the Nerds Go Marching In – Alexis C. Madrigal – The Atlantic
http://www.theatlantic.com/technology/archive/2012/11/when-the-nerds-go-marching-in/265325/
How a dream team of engineers from Facebook, Twitter, and Google built the software that drove Barack Obama’s reelection

Scaling Software with Akka
http://www.infoq.com/presentations/Scalability-Akka

Erlang Scales … Do You?
http://www.infoq.com/presentations/Erlang-Scalability

RethinkDB: An open-source distributed database built with love over three years | Hacker News
http://news.ycombinator.com/item?id=4763879

Twitter survives election after Ruby-to-Java move • The Register
http://www.theregister.co.uk/2012/11/08/twitter_epic_traffic_saved_by_java/

Cases – Two Screen – Angry Bytes
http://two-screen.tv/cases/

Windows Azure’s Flat Network Storage and 2012 Scalability Targets – Windows Azure – Site Home – MSDN Blogs
http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx

The Startup Hangover: Supporting 15M Users
http://www.infoq.com/presentations/Scallability-SoundCloud

My Links
http://delicious.com/ajlopez/scalability

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

June 6, 2013

Scalability: Links, News And Resources (4)

Filed under: Links, Scalability, Software Architecture, Software Development — ajlopez @ 9:37 am

Previous Post
Next Post

Scaling with MongoDB
http://www.slideshare.net/mongodb/scaling-4868170

Big Data @ Foursquare: Slides from our recent talk
http://engineering.foursquare.com/2011/03/24/big-data-foursquare-slides-from-our-recent-talk/

Fun with MongoDB replica sets
http://engineering.foursquare.com/2011/05/24/fun-with-mongodb-replica-sets/

Scaling Rails
http://railslab.newrelic.com/scaling-rails

Voldemort is a distributed key-value storage system
http://www.project-voldemort.com/voldemort/

USING HAPROXY FOR MYSQL FAILOVER AND REDUNDANCY
http://www.alexwilliams.ca/blog/2009/08/10/using-haproxy-for-mysql-failover-and-redundancy/

Amazon’s Dynamo
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Sharding & IDs at Instagram
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

HAProxy
The Reliable, High Performance TCP/HTTP Load Balancer
http://haproxy.1wt.eu/

Ken Little on Scaling Tumblr
http://www.infoq.com/interviews/little-scaling-tumblr
Ken Little talks about scaling Tumblr to keep up with their blogging users: scaling the data model, sharding, their PHP frontend and the Scala backend, and much more.

The 4 Building Blocks Of Architecting Systems For Scale
http://highscalability.com/blog/2012/9/19/the-4-building-blocks-of-architecting-systems-for-scale.html

nsisodiya / Demo-Scalable-App
https://github.com/nsisodiya/Demo-Scalable-App/
This is small demo of Scalable JavaScript Application

Scaling to Millions of Simultaneous Connections: Rick Reed
http://vimeo.com/44312354

Drill
http://wiki.apache.org/incubator/DrillProposal
Drill is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel.
NuoDB
http://www.nuodb.com/

An Easy Way to Build Scalable Network Programs
http://blog.nodejs.org/2011/10/04/an-easy-way-to-build-scalable-network-programs/

Memcached
http://memcached.org/

Redis Virtual Memory: the story and the code
http://oldblog.antirez.com/post/redis-virtual-memory-story.html

Improving Web Site Performance and Scalability While Saving Money
http://channel9.msdn.com/Events/aspConf/aspConf/Improving-Web-Site-Performance-and-Scalability-While-Saving-Money

Scalable JavaScript Design Patterns
http://www.slideshare.net/AddyOsmani/scalable-javascript-design-patterns

Cinchcast Architecture – Producing 1,500 Hours Of Audio Every Day
http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html

The Netflix Simian Army
http://techblog.netflix.com/2011/07/netflix-simian-army.html

C Is For Compute – Google Compute Engine (GCE)
http://highscalability.com/blog/2012/7/2/c-is-for-compute-google-compute-engine-gce.html

My Links
http://delicious.com/ajlopez/scalability

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

April 13, 2013

Scalability: Links, News And Resources (3)

Filed under: Links, Scalability, Software Architecture, Software Development — ajlopez @ 6:24 pm

Previous Post
Next Post

The C10K problem
http://www.kegel.com/c10k.html
It’s time for web servers to handle ten thousand clients simultaneously

The Underlying Technology of Messages at Facebook
https://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919

Memcached
http://memcached.org/
Free & open source, high-performance, distributed memory object caching system

An Easy Way to Build Scalable Network Programs
http://blog.nodejs.org/2011/10/04/an-easy-way-to-build-scalable-network-programs/

NuoDB
http://www.nuodb.com/

How to beat the CAP theorem
http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
The CAP theorem states a database cannot guarantee consistency, availability, and partition-tolerance at the same time.

The LMAX Architecture
http://martinfowler.com/articles/lmax.html
LMAX is a new retail financial trading platform. As a result it has to process many trades with low latency. The system is built on the JVM platform and centers on a Business Logic Processor that can handle 6 million orders per second on a single thread.

Redis Virtual Memory: the story and the code
http://antirez.com/post/redis-virtual-memory-story.html

Improving Web Site Performance and Scalability While Saving Money
http://channel9.msdn.com/Events/aspConf/aspConf/Improving-Web-Site-Performance-and-Scalability-While-Saving-Money
Scalable JavaScript Design Patterns
http://www.slideshare.net/AddyOsmani/scalable-javascript-design-patterns

Cinchcast Architecture – Producing 1,500 Hours Of Audio Every Day
http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html

The Netflix Simian Army
http://techblog.netflix.com/2011/07/netflix-simian-army.html

C Is For Compute – Google Compute Engine (GCE)
http://highscalability.com/blog/2012/7/2/c-is-for-compute-google-compute-engine-gce.html

How we got rid of the database
http://lostechies.com/gabrielschenker/2012/06/20/how-we-got-rid-of-the-databasepart-4/

Improving performance on twitter.com
http://engineering.twitter.com/2012/05/improving-performance-on-twittercom.html

Against the Grain: How We Built the Next Generation Online Travel Agency using Amazon, Clojure, and a Comically Small Team
http://www.colinsteele.org/post/23103789647/against-the-grain-aws-clojure-startup

Building a Website To Scale
http://www.youtube.com/watch?v=RlkCdM_f3p4&feature=g-all-u

Vert.x vs node.js simple HTTP benchmarks
http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/

vert.x
http://vertx.io/
Effortless asynchronous application development for the modern web and enterprise

Introducing Resque
https://github.com/blog/542-introducing-resque
Resque is our Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later.

How to take advantage of Redis just adding it to your stack
http://antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html

PubSub with Redis and Akka Actors
http://debasishg.blogspot.com.ar/2010/04/pubsub-with-redis-and-akka-actors.html

MagLev
http://maglev.github.com/
The MagLev VM takes full advantage of GemStone/S JIT to native code performance, distributed shared cache, fully ACID transactions, and enterprise class NoSQL data management capabilities to provide a robust and durable programming platform.

My Links
http://delicious.com/ajlopez/scalability

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

April 4, 2013

Scalability: Links, News And Resources (2)

Filed under: Links, Scalability, Software Architecture, Software Development — ajlopez @ 5:00 pm

Previous Post
Next Post

Programming and Scaling
https://www.tele-task.de/archive/lecture/overview/5819/

Spain Scalability Group
https://sites.google.com/site/spainscalabilitygroup/

The Instagram Architecture Facebook Bought For A Cool Billion Dollars
http://highscalability.com/blog/2012/4/9/the-instagram-architecture-facebook-bought-for-a-cool-billio.html

Just how big are porn sites?
http://www.extremetech.com/computing/123929-just-how-big-are-porn-sites

Scalability at YouTube
http://www.youtube.com/watch?v=G-lGCC4KKok

7 Years Of YouTube Scalability Lessons In 30 Minutes
http://highscalability.com/blog/2012/3/26/7-years-of-youtube-scalability-lessons-in-30-minutes.html

NoSQL Data Modeling Techniques
http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/

Akka 2.0: Scalability of Fork Join Pool
http://letitcrash.com/post/17607272336/scalability-of-fork-join-pool

Scaling Erlang
http://inaka.net/blog/2011/10/07/scale-test-plan-simple-erlang-application/

Tumblr Architecture – 15 Billion Page Views A Month And Harder To Scale Than Twitter
http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html

How Facebook pushes new code live
http://agilewarrior.wordpress.com/2011/05/28/how-facebook-pushes-new-code-live/

Can Simplicity Scale?
http://blog.regehr.org/archives/663

Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications
http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html

Implementing Scalable HA Architectures with Spring Integration
http://www.infoq.com/presentations/Implementing-HA-Architectures-Spring-Integration

Arquitectura de un buscador vertical escalable con Hadoop
http://www.datasalt.es/2011/10/arquitectura-de-un-buscador-vertical-escalable-con-hadoop/

What is Zing? A Scalable, Elastic, High-Performance Java Virtual Machine (JVM)
http://www.azulsystems.com/products/zing/whatisit

Azul Making Java “Zing”
http://java.dzone.com/articles/azul-making-java-zing

Autoscaling with Enterprise Library Integration Pack for Windows Azure
http://blogs.msdn.com/b/jdom/archive/2011/12/02/autoscaling-with-enterprise-library-integration-pack-for-windows-azure.aspx

DataSift Using MySQL, HBase, Memcached to Deal With Twitter Firehose
http://nosql.mypopescu.com/post/13540746376/datasift-using-mysql-hbase-memcached-to-deal-with

DataSift Architecture: Realtime Datamining At 120,000 Tweets Per Second
http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html

Scaling at Gowalla: Databases & NoSQL
http://engineering.gowalla.com/2011/11/17/scaling-and-gowalla/

Microsoft drops Dryad; puts its big-data bets on Hadoop
http://www.zdnet.com/blog/microsoft/microsoft-drops-dryad-puts-its-big-data-bets-on-hadoop/11226

How StackOverflow Scales with SQL Server (Video)
http://www.brentozar.com/archive/2011/11/how-stackoverflow-scales-sql-server-video/

Scaling Isomorphic Javascript Code
http://blog.nodejitsu.com/scaling-isomorphic-javascript-code
Javascript is now an isomorphic language. By isomorphic we mean that any given line of code (with notable exceptions) can execute both on the client and the server.

One million!
http://blog.whatsapp.com/index.php/2011/09/one-million/

Building Scalable Systems: an Asynchronous Approach
http://www.infoq.com/presentations/Building-Scalable-Systems-Asynchronous-Approach

jdegoes / blueeyes
https://github.com/jdegoes/blueeyes
A lightweight Web 3.0 framework for Scala, featuring a purely asynchronous architecture, extremely high-performance, massive scalability, high usability, and a functional, composable design.

NOSQL Patterns
http://cloud.dzone.com/news/nosql-patterns

Wikimedia Architecture
http://highscalability.com/wikimedia-architecture

Stuff The Internet Says On Scalability For August 5, 2011
http://highscalability.com/blog/2011/8/5/stuff-the-internet-says-on-scalability-for-august-5-2011.html

Keep tuned!

My Links
http://delicious.com/ajlopez/scalability

August 15, 2011

Hadoop: Links, News and Resources (1)

Filed under: Distributed Computing, Open Source Projects, Scalability — ajlopez @ 9:54 am

After my posts with links about Scalability and MapReduce, it’s time to share my links about Hadoop (thanks to @asehmi for his links):

http://hadoop.apache.org/

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these subprojects:

Other Hadoop-related projects at Apache include:

  • Avro™: A data serialization system.
  • Cassandra™: A scalable multi-master database with no single points of failure.
  • Chukwa™: A data collection system for managing large distributed systems.
  • HBase™: A scalable, distributed database that supports structured data storage for large tables.
  • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout™: A Scalable machine learning and data mining library.
  • Pig™: A high-level data-flow language and execution framework for parallel computation.
  • ZooKeeper™: A high-performance coordination service for distributed applications.

http://wiki.apache.org/hadoop/

Papers – Hadoop Wiki
http://wiki.apache.org/hadoop/Papers

HDFS
http://hadoopblog.blogspot.com/

Realtime Hadoop usage at Facebook — Part 1
http://hadoopblog.blogspot.com/2011/05/realtime-hadoop-usage-at-facebook-part.html

HDFS: Realtime Hadoop usage at Facebook — Part 2 – Workload Types
http://hadoopblog.blogspot.com/2011/05/realtime-hadoop-usage-at-facebook-part_28.html

The top five most powerful Hadoop projects – SD Times: Software Development News
http://www.sdtimes.com/l/35596

How to Deploy a Hadoop Cluster on Windows Azure – Windows Azure
http://blogs.msdn.com/b/windowsazure/archive/2011/05/17/how-to-deploy-a-hadoop-cluster-on-windows-azure.aspx

Hadoop in Azure – Distributed Development
http://blogs.msdn.com/b/mariok/archive/2011/05/11/hadoop-in-azure.aspx

Radoop – It’s Like Yahoo Pipes for Hadoop | SiliconANGLE
http://siliconangle.com/blog/2011/08/11/radoop-its-like-yahoo-pipes-for-hadoop/?

Introduction to MapReduce and Hadoop
http://www.theserverside.com/discussions/thread.tss?thread_id=62376

Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011)
http://atbrox.com/2011/05/16/mapreduce-hadoop-algorithms-in-academic-papers-4th-update-may-2011/

Interning at Facebook: Bridging Marketing and Engineering (18)
http://www.facebook.com/note.php?note_id=10150254305343920

High Performance Computing: Understanding What is Hadoop
http://patodirahul.blogspot.com/2011/03/understanding-what-is-hadoop.html

Microsoft adds Hadoop support to SQL Server, data warehouse
http://www.tmcnet.com/usubmit/2011/08/10/5696037.htm

Parallel Data Warehouse News and Hadoop Interoperability Plans – SQL Server Team Blog
http://blogs.technet.com/b/dataplatforminsider/archive/2011/08/08/parallel-data-warehouse-news-and-hadoop-interoperability-plans.aspx

Cascading
http://www.cascading.org/
Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. All without having to ‘think’ in MapReduce.

Twitter Engineering: A Storm is coming: more details and plans for release
http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
"A Storm cluster is superficially similar to a Hadoop cluster"

Preview of Storm: The Hadoop of Realtime Processing – BackType Technology
http://tech.backtype.com/preview-of-storm-the-hadoop-of-realtime-proce

Mesos: Dynamic Resource Sharing for Clusters
http://www.mesosproject.org/
Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications.

Big Analytics for Big Data on Hadoop
http://karmasphere.com/

More about Big Data
http://www.bigdata.com/bigdata/blog
Good white papers

Hadoop Summit 2010 – Yahoo! Developer Network
http://developer.yahoo.com/events/hadoopsummit2010/

DBMS Musings: Hadoop’s tremendous inefficiency on graph data management (and how to avoid it)
http://dbmsmusings.blogspot.com/2011/07/hadoops-tremendous-inefficiency-on.html

Hoop – Hadoop HDFS over HTTP | Apache Hadoop for the Enterprise | Cloudera
http://www.cloudera.com/blog/2011/07/hoop-hadoop-hdfs-over-http/

Bioinformatics and the Future of Hadoop
http://www.genomeweb.com/blog/bioinformatics-and-future-hadoop

Seven Java projects that changed the world – O’Reilly Radar
http://radar.oreilly.com/2011/07/7-java-projects.html

InfoQ: Introduction to Oozie
http://www.infoq.com/articles/introductionOozie
Within the Hadoop ecosystem, there is a relatively new component Oozie, which allows one to combine multiple Map/Reduce jobs into a logical unit of work, accomplishing the larger task

The Future of Hadoop in Bioinformatics | insideHPC.com
http://insidehpc.com/2011/07/03/the-future-of-hadoop-in-bioinformatics/

HDFS: Realtime Hadoop usage at Facebook: The Complete Story
http://hadoopblog.blogspot.com/2011/07/realtime-hadoop-usage-at-facebook.html

SNA Projects Blog : Tech Talk: Anil Madan (eBay) — “Hadoop at eBay”
http://sna-projects.com/blog/2011/06/hadoop-at-ebay/

Ceph as a scalable alternative to the Hadoop Distributed File System
http://www.usenix.org/publications/login/2010-08/openpdfs/maltzahn.pdf

The elephant in the room … Hadoop and BigData!
http://mikethetechie.com/post/6822576191/the-elephant-in-the-room-hadoop-and-bigdata

Hadoop, Hive and Redis for Foursquare Analytics :: myNoSQL
http://nosql.mypopescu.com/post/3872483038/hadoop-hive-and-redis-for-foursquare-analytics

The Hadoop Distributed File System
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

IBM Jeopardy: Building Watson: An Overview of the DeepQA Project
https://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf
"To preprocess the corpus and create fast runtime indices we used Hadoop"

Jeopardy Goes to Hadoop :: myNoSQL
http://nosql.mypopescu.com/post/3406224331/jeopardy-goes-to-hadoop

ElephantDB, a Distributed Database for Working with Hadoop
http://www.readwriteweb.com/hack/2011/02/ravendb-a-distributed-database.php

InfoQ: Hadoop Redesign for Upgrades and Other Programming Paradigms
http://www.infoq.com/news/2011/02/hadoop_redesign

Riding the Elephant | The Molecular Ecologist
http://tomato.biol.trinity.edu/blog/2011/02/riding-the-elephant/

Yahoo focusing on Apache Hadoop, discontinuing “The Yahoo Distribution of Hadoop”
http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/

Lessons learned putting Hadoop into production « Cloudera » Apache Hadoop for the Enterprise
http://www.cloudera.com/blog/2010/12/lessons-learned-putting-hadoop-into-production/

Dimensional Reduction – Apache Mahout – Apache Software Foundation
https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction

Beyond Hadoop – Next-Generation Big Data Architectures – NYTimes.com
https://www.nytimes.com/external/gigaom/2010/10/23/23gigaom-beyond-hadoop-next-generation-big-data-architectu-81730.html

Large Scale Natural Language Processing
http://us.pycon.org/media/2010/talkdata/PyCon2010/098/large-scale-nlp-pycon-2010.pdf

Hadoop and Realtime Cloud Computing | Cloud Computing Journal
http://cloudcomputing.sys-con.com/node/1572508

Hadoop and NoSQL Downfall Parody on Vimeo
http://vimeo.com/15782414

Hadoop: The Definitive Guide, Second Edition – O’Reilly Media
http://oreilly.com/catalog/9781449389734

Hadoop Ecosystem World-Map « Sanjay Sharma’s Weblog
http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/

MapReduce, Hadoop: Young, But Worth A Look — Data Management — InformationWeek
http://www.informationweek.com/news/business_intelligence/warehouses/showArticle.jhtml?articleID=226600088

Distributed data processing with Hadoop – Part-3: App Build
http://www.gnarc.com/tutorials/distributed-data-processing-with-hadoop-part-3-app-build

HDFS: Facebook has the world’s largest Hadoop cluster!
http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html

High Availability MySQL: Hadoop and MySQL
http://mysqlha.blogspot.com/2007/10/hadoop-and-mysql.html

High Scalability – How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

Realtime Search for Hadoop – Scalable Log Data Management with Hadoop, Part 3 « mgm technology blog
http://blog.mgm-tp.com/2010/06/hadoop-log-management-part3/

Behind Caffeine May Be Software to Inspire Hadoop 2.0
http://gigaom.com/2010/06/11/behind-caffeine-may-be-software-to-inspire-hadoop-2-0

Hadoop in a box
http://www.slideshare.net/tim.lossen.de/hadoop-in-a-box

Scalability of the Hadoop Distributed File System (Hadoop and Distributed Computing at Yahoo!)
http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html

Introduction to Hadoop, HBase, and NoSQL
http://www.slideshare.net/xefyr/introduction-to-hadoop-hbase-and-nosql

InfoQ: Horizontal Scalability via Transient, Shardable, and Share-Nothing Resources
http://www.infoq.com/presentations/Horizontal-Scalability

Neuroph on Hadoop: Massive Parallel Neural Network System? | NetBeans Zone
http://netbeans.dzone.com/neuroph-hadoop-nb

Pushing the Limits of Distributed Processing « Cloudera » Apache Hadoop for the Enterprise
http://www.cloudera.com/blog/2010/04/pushing-the-limits-of-distributed-processing/
April Joke ;-)

My Links
http://www.delicious.com/ajlopez/hadoop
http://www.delicious.com/ajlopez/hadoop+tutorial
http://www.delicious.com/ajlopez/hadoop+video
http://www.delicious.com/ajlopez/hadoop+nosql
http://www.delicious.com/ajlopez/hadoop+distributedcomputing
http://www.delicious.com/ajlopez/hadoop+scalability
http://www.delicious.com/ajlopez/hadoop+machinelearning
http://www.delicious.com/ajlopez/hadoop+artificialintelligence

More links are coming (distributed computing? NoSql?).

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

August 11, 2011

Scalability: Links, News and Resources (1)

Next Post

Last Monday @federicoboerr mentioned, in a customer internal email list, the importance of considering scalability in most projects. So, I prepared this list of links about the topic (as usual, they were curated using my delicious). Enjoy!

http://en.wikipedia.org/wiki/Scalability

In electronics (including hardware, communication and software) scalability is the ability of a system, network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth.[1] For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added.

Scalability Rules
http://scalabilityrules.com/
via @federicoboerr

High Scalability
http://highscalability.com/

Stuff The Internet Says On Scalability For August 5, 2011
http://highscalability.com/blog/2011/8/5/stuff-the-internet-says-on-scalability-for-august-5-2011.html

Wikimedia architecture
http://highscalability.com/wikimedia-architecture
Wikimedia is the platform on which Wikipedia, Wiktionary, and the other seven wiki dwarfs are built on.

HPCC Systems | Open-source. Fast. Scalable. Simple.
http://hpccsystems.com/

Tweaking WCF to build highly scalable async REST API
http://omaralzabir.com/tweaking-wcf-to-build-highly-scalable-async-rest-api

Architecture of Tankster– Scale (Part 2) | Nathan Totten
http://ntotten.com/2011/07/architecture-of-tankster-scale-part-2/

Architecture of Tankster – Introduction to Game Play (Part 1) | Nathan Totten
http://ntotten.com/2011/07/architecture-of-tankster-introduction-to-game-play-part-1/

NoSQL is a Premature Optimization « SmoothSpan Blog
http://smoothspan.wordpress.com/2011/07/22/nosql-is-a-premature-optimization/

Nati Shalom’s Blog: Scale-out vs Scale-up
http://natishalom.typepad.com/nati_shaloms_blog/2010/09/scale-up-vs-scale-out.html

Azurescope: Best Practices for Developing on Window Azure
http://azurescope.cloudapp.net/BestPractices/

High Scalability – High Scalability – 35+ Use Cases for Choosing Your Next NoSQL Database
http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html

The Art of Scalability
http://theartofscalability.com/

Pusher is a hosted API for quickly, easily and securely adding scalable realtime functionality via WebSockets to web and mobile apps
http://pusher.com/

mnot’s blog: On HTTP Load Testing
http://www.mnot.net/blog/2011/05/18/http_benchmark_rules

10 rules for scalable performance in ‘simple operation’ datastores | June 2011 | Communications of the ACM
http://cacm.acm.org/magazines/2011/6/108651-10-rules-for-scalable-performance-in-simple-operation-datastores/fulltext

HDFS: Realtime Hadoop usage at Facebook — Part 1
http://hadoopblog.blogspot.com/2011/05/realtime-hadoop-usage-at-facebook-part.html

High Scalability – Zynga’s Z Cloud – Scale Fast or Fail Fast by Merging Private and Public Clouds
http://highscalability.com/blog/2011/5/19/zyngas-z-cloud-scale-fast-or-fail-fast-by-merging-private-an.html

High Scalability – Did the Microsoft Stack Kill MySpace?
http://highscalability.com/blog/2011/3/25/did-the-microsoft-stack-kill-myspace.html

High Scalability – 6 Lessons from Dropbox – One Million Files Saved Every 15 minutes
http://highscalability.com/blog/2011/3/14/6-lessons-from-dropbox-one-million-files-saved-every-15-minu.html

InfoQ: Scaling with MongoDB
http://www.infoq.com/presentations/Scaling-with-MongoDB

High Scalability – A Practical Guide to Varnish – Why Varnish Matters
http://highscalability.com/blog/2011/2/28/a-practical-guide-to-varnish-why-varnish-matters.html

Tagged Architecture – Scaling to 100 Million Users, 1000 Servers, and 5 Billion Page Views
http://highscalability.com/blog/2011/8/8/tagged-architecture-scaling-to-100-million-users-1000-server.html

Spain Scalability Group
https://sites.google.com/site/spainscalabilitygroup

Systems We Make
http://www.systemswemake.com/

High Scalability – Paper: An Experimental Investigation of the Akamai Adaptive Video Streaming
http://highscalability.com/blog/2011/2/16/paper-an-experimental-investigation-of-the-akamai-adaptive-v.html

Pensamientos ágiles: Grupo sobre escalabilidad en español
http://brigomp.blogspot.com/2011/02/grupo-sobre-escalabilidad-en-espanol.html

Patterns for Building Scalable and Reliable Applications with Windows Azure
http://www.microsoftpdc.com/2009/SVC08

Windows Azure Storage Abstractions and their Scalability Targets
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx

High Scalability – Pinboard.in Architecture – Pay to Play to Keep a System Small
http://highscalability.com/blog/2010/12/29/pinboardin-architecture-pay-to-play-to-keep-a-system-small.html

Agile Wiki : The open source Web Application Framework that just Scales!
http://agilewiki.org/templates

High Scalability – What the heck are you actually using NoSQL for?
http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html

Autoscaling in Windows Azure
http://convective.wordpress.com/2010/10/12/autoscaling-in-windows-azure/

Windows Azure Dynamic Scaling with AzureWatch
http://www.paraleap.com/AzureWatch/Instructions

InfoQ: LMAX – How to Do 100K TPS at Less than 1ms Latency
http://www.infoq.com/presentations/LMAX

Azul’s Pauseless Garbage Collector
http://www.artima.com/lejava/articles/azul_pauseless_gc.html

My thesis – building blocks of a scalable webcrawler – Marc’s Blog
http://blog.marc-seeger.de/2010/12/09/my-thesis-building-blocks-of-a-scalable-webcrawler

High Scalability – 7 Design Patterns for Almost-infinite Scalability
http://highscalability.com/blog/2010/12/16/7-design-patterns-for-almost-infinite-scalability.html

InfoQ: The Evolution of the Flickr Architecture
http://www.infoq.com/presentations/Flickr-Architecture

Apex Code: The World’s First On-Demand Programming Language – developer.force.com
http://wiki.developerforce.com/index.php/Apex_Code:_The_World%27s_First_On-Demand_Programming_Language

InfoQ: Abstractions at Scale–Our Experiences at Twitter
http://www.infoq.com/presentations/Abstractions-at-Scale

You don’t scale a multi tenant environment
http://ayende.com/Blog/archive/2010/12/12/you-donrsquot-scale-a-mutli-tenant-environment.aspx

InfoQ: Scaling Australia’s Most Popular Online News Sites with Ehcache
http://www.infoq.com/presentations/ehcache-newscorp-australia

High Scalability – GPU vs CPU Smackdown : The Rise of Throughput-Oriented Architectures
http://highscalability.com/blog/2010/12/3/gpu-vs-cpu-smackdown-the-rise-of-throughput-oriented-archite.html

Performance Testing – Response vs. Latency vs. Throughput vs. Load vs. Scalability vs. Stress vs. Robustness « Niraj Bhatt – Architect’s Blog
http://nirajrules.wordpress.com/2009/09/17/measuring-performance-response-vs-latency-vs-throughput-vs-load-vs-scalability-vs-stress-vs-robustness/

High Scalability – Great Introductory Video on Scalability from Harvard Computer Science
http://highscalability.com/blog/2010/11/24/great-introductory-video-on-scalability-from-harvard-compute.html

Scalability and the Relational Model « Experimental Thoughts
http://thoughts.j-davis.com/2010/03/07/scalability-and-the-relational-model/

Thinking about massively parallel Smalltalk
http://wiki.squeak.org/squeak/537

Pragmatic Programming Techniques: Scalable System Design Patterns
http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html

Yoshinori Matsunobu’s blog: Using MySQL as a NoSQL – A story for exceeding 750,000 qps on a commodity server
http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html

Cómo soporta Facebook 500M de usuarios | jmchia.com
http://jmchia.com/2010/09/estrategias-de-escalado-facebook/

How Facebook Scales with Open Source – ReadWriteCloud
http://www.readwriteweb.com/cloud/2010/08/how-facebook-scales-with-open.php

High Scalability – Hilarious Video: Relational Database vs NoSQL Fanbois
http://highscalability.com/blog/2010/9/5/hilarious-video-relational-database-vs-nosql-fanbois.html

Designing Web Applications for Scalability
http://www.osconvo.com/post/view/2010/8/12/designing-web-applications-for-scalability

High Scalability – Designing Web Applications for Scalability
http://highscalability.com/blog/2010/8/12/designing-web-applications-for-scalability.html

High Scalability – 7 Scaling Strategies Facebook Used to Grow to 500 Million Users
http://highscalability.com/blog/2010/8/2/7-scaling-strategies-facebook-used-to-grow-to-500-million-us.html

My Links
http://www.delicious.com/ajlopez/scalability

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

Theme: Shocking Blue Green. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 67 other followers