Category Archives: Big Data

Big Data: Links, News And Resources (7)

Previous Post

SAMOA: A Platform for Mining Big Data Streams

DevOps Round-Up: Hadoop and Big Data Analytics Get a Boost From Splunk | DevOpsANGLE

What The Hell is… Big Data? | LinkedIn

Cloud, Big Data and Mobile: Understanding Amazon Elastic Load Balancing in Detail

El Big Data y la dilución de la política by Alejandro Piscitelli on Prezi

What the ‘Internet of things’ really means | Consumerization Of It – InfoWorld

Big Data—for better or worse

Six disruptive possibilities from big data – Strata

Spark | Lightning-Fast Cluster Computing

Learning Spark – O’Reilly Media

Reactor – a foundation for asynchronous applications on the JVM | SpringSource Team Blog

How NoSQL, MySQL and MongoDB worked together to solve a big-data problem

Big Data – Hadoop – BIDOOP | PRAGSIS Big Data Hadoop

Introduction to HCatalog, Pig scripts and heavy burdens | Alejandro Jezierski

Developing Big Data Solutions on Windows Azure, the blind and the elephant | Alejandro Jezierski

My Links

Stay tuned!

Angel “Java” Lopez

Big Data: Links, News And Resources (6)

Previous Post
Next Post

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

¿Que pasa con BigData en Argentina? | IT on business!

Big Data Developers in Buenos Aires (Buenos Aires) – Meetup

A programmer’s guide to big data: 12 tools to know — Tech News and Analysis

Running the Largest Hadoop DFS Cluster

Thoughts on AWS Redshift… | Database Fog Blog

Amazon preparing ‘disruptive’ big data AWS service? • The Register

Incremental computing – Wikipedia, the free encyclopedia

Realtime vs Long Term Data Analysis with Storm/Hadoop/Cassandra – storm-user | Google Groups

The history of Hadoop: From 4 nodes to the future of data — Tech News and Analysis

Understanding the Parallelism of a Storm Topology – Michael G. Noll

Big Data Lets You Profile and Recruit the Best Employees | SmartData Collective

Push Technology

BigData Spain – Home

Big Jobs

‘Big data’ is dead. What’s next? | VentureBeat

How to Build Big Data Pipelines for Hadoop Using OSS

How Netflix is turning viewers into puppets –

My Links

Stay tuned!

Angel “Java” Lopez

Big Data: Links, News And Resources (5)

Previous Post
Next Post

The Big Bang: How the Big Data Explosion Is Changing the World

Customers Rapidly Adopting Big Data Solutions — Driven By Marketing, Sales and More — Reports New Microsoft Research

Structure:Data | GigaOM Events

CERN Data Centre passes 100 petabytes | CERN

DARPA puts $3M into startup pushing big data in Python — Tech News and Analysis

Click Dataset | Center for Complex Networks and Systems Research

Python for Data Analysis: Wes McKinney: 9781449319793: Books

Iteratees in Big Data at Klout « Klout Engineering

Big Data is over the hype – can we get on with real work now? | Capping IT Off | Capgemini

Event Driven Architecture | Inside Analysis

Disk-Locality in Datacenter Computing Considered Irrelevant (and then what?)

Technical Discovery: Passing the torch of NumPy and moving on to Blaze

A Python Compiler for Big Data

GigaSpaces | High Scalability with GigaSpaces XAP & Cloudify – Your Open PaaS Stack for Business Apps

Intuit CEO: Big Data Can Be “The Great Equalizer”

Precog is a powerful analytics platform for JSON data. Stream, upload, or synchronize data into Precog, and perform advanced analytics usingLabcoator our simpleREST APIs.

Amazon Redshift
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

Expanding the Cloud – Announcing Amazon Redshift, a Petabyte-scale Data Warehouse Service – All Things Distributed

High Scalability – BigData using Erlang, C and Lisp to Fight the Tsunami of Mobile Data

Keynote: Spring 2012 and Beyond
Adrian Colyer, Juergen Hoeller, Mark Pollack and Graeme Rocher present SpringSource’s Unifying Component Model, current developments regarding Big Data, and betting on Grails.

My Links

Keep tuned!

Angel “Java” Lopez

Big Data: Links, News And Resources (4)

Previous Post
Next Post

HBase Architecture 101 – Storage

Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space

Smart Sales Prospecting

SQLFire: Scalable SQL instead of NoSQL

Karmasphere 2.0
Collaborative Analytics Workspace on Hadoop with Self-Service for Everyone in the Business

MapR delivers on the promise of Hadoop, making managing and analyzing Big Data a reality for more business users.

Big Data Now: Current Perspectives from O’Reilly Radar

Big Data and Human Judgment

The Petabyte Age: Because More Isn’t Just More — More Is Different

Strata New York Speaker Slides & Video

Six Provocations for Big Data

Apache Giraph
For general-purpose big data computation, the map-reduce computing model has been well adopted and the most deployed map-reduce infrastructure is Apache Hadoop. We have implemented a graph-processing framework that is launched as a typical Hadoop job to leverage existing Hadoop infrastructure, such as Amazon’s EC2. Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.

Video: 3 Big Data Tech Talks You Can’t Miss

Big Data Needs an End in Sight

The “Big Five” IT trends of the next half decade: Mobile, social, cloud, consumerization, and big data

Hadoop Virtual Panel

BigData Spain 2012

Nathan Marz: “Cascalog: Making Data Processing Fun Again”

Data science in the natural sciences

Big Data To Drive $232 Billion In IT Spending Through 2016

Strata NYC 2012 and PyData

Grok turns data into action

Un nuevo canal para servir grandes cantidades de datos

Big Data @ Foursquare: Slides from our recent talk

The Value of Values – Rich Hickey
Creator of Clojure and Datomic, Rich Hickey delivers this excellent JAXconf keynote about how the definition of values has changed in light of the increasing complexity of information technology and the advent of Big Data.

MapReduce and Its Discontents
Dean Wampler discusses the strengths and weaknesses of MapReduce, and the newer variants for big data processing: Pregel and Storm.

7 new types of jobs created by Big Data

Meet the New Boss: Big Data

The GPU “Sweet Spot” for Big Data

BigData Diagram

A beginners guide to streamed data from Twitter

My Links

Keep tuned!

Angel “Java” Lopez

Big Data: Links, News And Resources (3)

Previous Post
Next Post

More links in my historical series:

Twitter vuelve a cambiar el acceso a su API y desafía su ecosistema

The Year Ahead In Big Data? Big, Cool, New Stuff Looms Large!

Mike Stolz on NoSQL and Big Data Design Patterns

Big Data Architectures at Facebook

chrisclark / PythonForDataScience

Big data: extraer y visualizar grandes volúmenes de datos

Adam & Greg Talk Storm, Big Data and Real Time Analytics with Dr. Matt

What is the Stratosphere System?

Big Data Architecture at LinkedIn

Factual Releases Drivers that Matter: Python, Clojure, Haskell

Just the Facts. Yes, All of Them.

Big Data, Hadoop on Azure and the elephant in the room

R Is Not Enough For “Big Data”

Big Data Counting: How To Count A Billion Distinct Objects Using Only 1.5KB Of Memory

Big Data Week London – Hadoop Day

IBM doing Hadoop as a service in its cloud

Dremel: Interactive Analysis of Web-Scale Datasets

Google BigQuery
Use Google BigQuery to interactively analyze massive datasets — up to billions of rows.

How Twitter is doing its part to democratize big data

HADOOP Enters the Enterprise Mainstream, and Big Data Will Never Be the Same

Big Data: Big Opportunities

My Links

Keep tuned!

Angel “Java” Lopez

Big Data: Links, News And Resources (2)

Previous Post
Next Post

What is big data?
An introduction to the big data landscape.

Do we have the tools we need to navigate the New World of Data?

Hadoop named as most popular big data source of 2011: report

DataSift Architecture Overview

Social-data platform to enable enterprises and entrepreneurs to aggregate, filter and extract insights from Twitter in real-time

Big Data: Big Opportunities to Create Business Value

Oxford Internet Institute
Big Data Research Officer

How to “crunch” your data stored in HDFS?

STXXL: Standard Template Library for Extra Large Data Sets.
The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i. e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks.

Hadoop and NoSQL in a Big Data Environment

Good Relationships
With Spring Data, the ever popular Spring Framework has cultivated a new patch of ground, bringing Big Data and NOSQL technology like Neo4j to enterprise developers.

Sorting 1PB with MapReduce

Real-time feed processing with Storm


Career of the Future: Data Scientist [INFOGRAPHIC]

The Business of  BIG DATA

Brave New Big Data World

The King of Big Data
One of the next big things that enterprises need to understand is Big Data, but the demands Big Data makes require a different kind of way of looking at data.

Big crime meets big data
Data and social media are being used against us in creative new ways.

Machine learning for dummies

Embracing Uncertainty
The new machine intelligent

6 Big HealthTech Ideas That Will Change Medicine In 2012
Artificial Intelligence, Big Data …


The feedback economy
Companies that employ data feedback loops are poised to dominate their industries.

Cloudera puts the Hadoop in Oracle’s Big Data Appliance

My Links

Keep tuned!

Angel “Java” Lopez

Big Data: Links, News And Resources (1)

Next Post

My first links about this topic:

Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage,[3] search, sharing, transfer, analysis,[4] and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.”[5][6][7]

As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes of data.[8][9] Scientists regularly encounter limitations due to large data sets in many areas, including meteorology, genomics,[10] connectomics, complex physics simulations,[11] and biological and environmental research.[12] The limitations also affect Internet search, finance andbusiness informatics. Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, andwireless sensor networks.[13][14] The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[15] as of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created.[16] The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.[17]

Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers”.[18] What is considered “big data” varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain. “For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”[19]

What is big data?

El desafío del “big data”, más que sólo grandes volúmenes de datos
Big data: The next frontier for innovation, competition, and productivity

Big Data

Data Science Summit

Big Data: Evolution or Revolution?

DataSift Using MySQL, HBase, Memcached to Deal With Twitter Firehose

DataSift Architecture: Realtime Datamining At 120,000 Tweets Per Second

Explaining Hadoop to Your CEO

The World’s Technological Capacity to Store, Communicate, and Compute Information

The Big Data Boom Is the Innovation Story of Our Time

MongoDB Intro & Application for Big Data

The Big Data Bottleneck In The Consumer Web

Microsoft drops Dryad; puts its big-data bets on Hadoop

Distributed Cache as a NoSQL Data Store?

Building Scalable Systems: an Asynchronous Approach

Big Data Intelligence on Hadoop

Ville Tuulos on Big Data and Map/Reduce in Erlang and Python with Disco

The elephant in the room … Hadoop and BigData!

Is Microsoft’s Future in Data-as-a-Service?

Resolving the contradictions between web services, clouds, and open source

Strata Gems: Where to find data

Big crime meets big data
Data and social media are being used against us in creative new ways.

Tech Talk: Nathan Marz — “Clojure at BackType”

Do We Need a New Programming Language for Big Data?

My Links

Keep tuned!

Angel “Java” Lopez