Monthly Archives: January 2013

Distributed Computing: Links, News And Resources (2)

Previous Post
Next Post

Distributed OTP Applications | Learn You Some Erlang for Great Good!
Hey there, it appears your Javascript is disabled. That’s fine, the site works without it. However, you might prefer reading it with syntax highlighting, which requires Javascript!Although Erlang leaves us with a lot of work to do, it still provided a few solutions. One of these is the concept of …

Inaka Networks: Scaling Erlang
One of the most common reasons why people choose Erlang is to build highly scalable systems. And Erlang does a great job helping developers reach those goals. But creating a scalable system is not a matter of just writing it in Erlang. Here at Inaka we usually have complex systems written in Erlang …

Made of Bugs » Why node.js is cool (it’s not about performance)
For the past N months, it seems like there is no new technology stack that is either hotter or more controversial than node.js. node.js is cancer! node.js cures cancer! node.js is bad ass rock star tech!. I myself have given node.js a lot of shit, often involving the phrase “explicit …

Robust Composition: Towards a Unified Approach to Access Control and Concurrency…
A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy. Baltimore, Maryland, May, 2006Copyright © 2006, Mark Samuel Miller. All rights reserved. Permission is hereby granted to make and distribute verbatim copies of this …

Open Source
Distributed Capabilities

This module is build upon Node JS and provides for the user the following features
A network distributed event system. Similar to node JS standard event system
A process pool, where objects can be added and ran at a periodic interval a predefined functions.
An auto-balancing system, that migrate objects in the process pool, from one running instance to another, based on the load of each instance.

Sensei DB
Open-source, distributed, realtime, semi-structured database

Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Intern…
Today is a very exciting day as we release Amazon DynamoDB, a fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications. DynamoDB is the result of 15 years of learning in the areas of large scale non-relational databases and cloud services. Several …

Galois Tech Talk (2 of 3 next week!): Model-based Code Generation and Debugging …
Design and implementation of distributed systems often involve many subtleties due to their complex structure, non-determinism, and low atomicity as well as occurrence of unanticipated physical events such as faults. Thus, constructing correct distributed systems has always been a challenge and …

How hadoop works.

Welcome to the Jungle « Sutter’s Mill
In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend – mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. …

Richard Jones | Anti-RDBMS: A list of distributed key-value stores
Whatever your reasons, there are a lot of options to chose from. At we do a lot of batch computation in Hadoop, then dump it out to other machines where it’s indexed and served up over HTTP and Thrift as an internal service (stuff like ‘most popular songs in London, UK this week’ etc).

YOW! 2011: Steve Vinoski – Riak Core, Erlang and Frisbee Freestyle | Charles | C…
Steve Vinoski is an architect at Basho Technologies in Cambridge, MA, USA. He is a senior member of the IEEE and a member of the ACM. Steve is regarded as an expert in the areas of middleware and distributed computing systems, topics for which he has authored or co-authored over 80 articles, …

Fractals in Clojure – Distributed Buddhabrot Fractal Using ClojureScript (by Nur…
This one got started because I wanted a large Buddhabrot image on my wall. A large good looking image takes a long time to render, now that we have ClojureScript I thought easiest way to distribute the calculation among machines in the house would be to compile to JavaScript since I’ve already …

Models for distributed parallelism | Lambda the Ultimate
I’ve been reading left and right (including on this forum) looking for models of parallel computing that are relevant to what I’m dealing with, and frankly not finding much. I get the impression that most theory of parallel computing is about shared memory, and so a bunch of old ideas for SMPs gets …
Awelon aims to be a secure programming language for open, distributed systems programming – a domain where challenges include disruption, latency, adversaries and security concerns, network partitioning, efficiency, scalability, activity spikes (i.e. the so-called slashdot-effect and DDOS attacks), …

Clojure/core — Introducing Avout: Distributed State in Clojure
Today we are releasing Avout, which brings Clojure’s in-memory model of state to distributed application development by providing a distributed implementation of Clojure’s Multiversion Concurrency Control (MVCC) STM along with distributable, durable, and extendable versions of Clojure’s Atom and …

InfoQ: Distributed STM – A New Programming Model for the Cloud
In case you are having issues watching this video, please follow these simple steps to help us investigate the issue: 1. Right click on the video player and select Copy log2. Paste the copied information in an email to (clicking this link will fill in the default details in …

devdazed/ – GitHub
P2P Distributed Workload for NodeJS

InfoQ: Distributed Cache as a NoSQL Data Store?
NoSQL data stores offer alternative data storage options for non-relational data types like document based, object graphs, and key-value pairs. Can a distributed cache be used as a NoSQL store? Greg Luck from Ehcache wrote about the similarities between a distributed cache and a NoSQL data store. ..

Distributed Podcast
In this episode we interviewed David Fowler and Damian Edwards who have created a great project called SignalR. This project is a signaling (or messaging) library which can be used to establish long-running connections between the browser and web server.Side note: This episode has been a long time …

My Links

Distributed Computing: Links, News and Resources (1)

Next Post

I just realized I never published a list of links of one of my preferred topic. This is the first post:

Storm – a real time Hadoop like system in Clojure

Hadoop Programming Challenge

The Design of Distributed Applications

Thoughts around REST, DDD, and CQRS: Models, Queries, and Commands

Akka 2.x roadmap…


Welcome to Apache Pig
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

Welcome to Hama project
Apache Hama is a distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations, e.g., matrix, graph and network algorithms. It was inspired by Google’s Pregel, but different in the sense that it’s purely BSP and common model, not just for graph.

InfoQ: Things Break, Riak Bends

HPCC Systems | Open-source. Fast. Scalable. Simple
HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems. The platform is now Open Source!

SmartFrog is a powerful and flexible Java-based software framework for configuring, deploying and managing distributed software systems.

Mesos: Dynamic Resource Sharing for Clusters
Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator.

Dryad – Microsoft Research

InfoQ: Secure Distributed Programming on ECMAScript 5 + HTML5

Ceph as a scalable alternative to the Hadoop Distributed File System

Data-driven Apps With Microsoft Velocity Distributed Caching

Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

Distributed computing fallacies and REST

Presentation Schedule // CS 525: Advanced Distributed Systems // Spring 2011

InfoQ: Francesco Cesarini and Simon Thompson on Erlang

Scala and Akka are deployed in production at some of the largest web properties and financial institutions in the world, and run on the battle-tested Java runtime environment. Deploy with confidence.

Introducing Riak Core

Actors: A Model of Concurrent Computation in Distributed Systems

The Hadoop Distributed File System

InfoQ: Concurrency Control in Data Replication

Build a distributed realtime tweet search system in no time. Part 1/2

Windows Azure futures: Turning the cloud into a supercomputer

Episode 1: Distributed Systems Host Introductions

Distributed Podcast

Frangipani: A Scalable Distributed File System

Systems We Make

Fault tolerance techniques for distributed systems

Swarm: A true distributed programming language

MSDN Magazine: Distributed Apps

Scalable System Design Patterns
Load Balancer, Scatter and Gather, Result Cache, Shared Space, Pipe and Filter, Map Reduce, Bulk Synchronous Parallel, Execution Orchestrator

My Links

Angel “Java” Lopez

Bioinformatics: Links, News And Resources (3)

Previous Post
Next Post

bioinformatics – node.js modules

YOKOFAKUN: Server-side javascript: translating a DNA with Node.js


Karelman (Karelman) on Twitter

sbassi/DNAFilter · GitHub

Bioinformatics Web Servers – University of Reading

UCL-CS Bioinformatics: Introduction

Python for Bioinformatics – Sebastian Bassi – Google Books

Perl and Javascript: bioinformatics in a browser window

EMBER: Login

Web apps for bioinformatics | KurzweilAI

Bio-Javascript? – BioStar

bio-js – A bioinformatics framework in JavaScript – Google Project Hosting

The Sequence Manipulation Suite

biosmalltalk – Bioinformatics Library for Smalltalk – Google Project Hosting

BioSmalltalk: A pure object system for doing bioinformatics with Smalltalk – SEQanswers

My Links

SimpleTags (1) First Ideas

More than a decade ago, I wrote the basis of my personal site, based on having heterogeneous items (links, pages, etc..) classified into categories (a tree of categories). An item could be into more than one category, and one category could be an alias for another. But after using Delicious and Gmail, now I prefer to have items grouped by tags. And instead of having categories and category tree (like folders), I think that a more flexible organization could be based on predicates over tags: that is, instead of having a Category Programming –> C#, I could have a predicate that returns all items tagged with “programming” and “c#”. Sometimes, I need key-value tags, like “author:unclebob”, or “project:storm”.

So, some days ago, I started a new project, written in JavaScript/Node.js, named SimpleTags:

From the readme:

var itemId = engine.createItem('', [ 'nodejs', 'javascript', 'engine', 'programming' ]);

An item has

  • data: Arbitrary value you supplied
  • tags: An array of tags. A tag could be a non-empty string or an object with only one property with non-empty value.

Once created, the item has an associated id, supplied by the engine.

The project has an in-memory model. A set of tags can be associated to an arbitrary data item. Usually, you don’t associate all customer data to a set of tags; instead, you could associate a customer id. The arbitrary data could reside in your database or it could an URL, or something else. The key feature is: you can associate an item to a set of tags, and retrieve items with given tags.

Think about this: you could have a huge link collection, and use SimpleTags to organize it. Or documents, instead of links. Or photos/images. You could use tagging for different purpose. It’s a powerful idea to be applied in many domains and scenarios.

I think to add a web site as a concrete use case, where you can add URLs and tag them, define “Categories” using tag predicates, explore the defined categories, and search items by tags. After this sample, I could implement something more concrete, like a tagged to-do list, or a list of tasks tagged by project, status (pending, closed, …), iteration, assignee, etc. I used a similar private app for tracking agile project backlog in one of my customers, and I feel it’s a good test bed case to be tackled by my tag engine project.

It could be used as an excuse to learn Express or to use my SimpleWeb project. In any case, it will be fun 🙂

As usual, SimpleTags code was written using TDD.

Keep tuned!

Angel “Java” Lopez

Bioinformatics: Links, News And Resources (2)

Previous Post
Next Post

Data mining, forecasting and bioinformatics competitions on Kaggle
Pjotr is a scientist/biologist/open source programmer,

BioTeam is a high-performance consulting practice. We are dedicated to delivering objective, technology agnostic solutions to the life science researchers. We leverage the right technologies customized to our client’s unique needs in order to enable them to reach their scientific objectives.

ANNOVAR website
Preparation of local annotation databases

biotoolbox – Tools for querying and analysis of genomic data

NCBI HomePage

Calling SNPs with Samtools

Cytoscape: An Open Source Platform for Complex Network Analysis and Visualizatio…

ROCR: Classifier Visualization in R

OpenWetWare is an effort to promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering.


DNA seen through the eyes of a coder

The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.

Powering the genomics revolution

Integrated system of aging biomarkers

Monte Carlo Method

The PubChem Project

RCSB Protein Data Bank

Welcome to BioConductor —


KEGG: Kyoto Encyclopedia of Genes and Genomes

Entrez cross-database search

European Bioinformatics Institute

UCSC Genome Browser Home

Uri Alon’s Molecular Cell Biology Lab

the Gene Ontology

BABA is an applet that tries to explains how some basic algorithms of bioinformatics work.

Human Genome Project Information

Main Page –
Collection of genetic parts that can be mixed and matched to build synthetic biology devices and systems.

Python course in Bioinformatics


Life Sciences Search Engine

What is Quirrel?
Quirrel is a purely declarative query language designed for performing analytics and statistics on large-scale, multi-structured data sets.


My Links

Bioinformatics: Links, News and Resources (1)

Next Post

Bioinformatics has many interesting problems, algorithms and software, related to parallelism, distributed computing, scalability, and algorithms. This is my first list of links about this fascinating topic, more lists are coming.

Bioinformatics and the Future of Hadoop
The Future of Hadoop in Bioinformatics |

Clojure or Scala for bioinformatics/biostatistics/medical research – Stack Overflow

Riding the Elephant | The Molecular Ecologist

Protein Structure Methods and Algorithms

HPCwire: Scientists Ratchet Up Understanding of Cellular Protein Factory

Molecular Animation – Where Cinema and Biology Meet

Microsoft Research Makes Microsoft Biology Foundation and MODISAzure-Based Environmental Service

Bioinformatics Programming Using Python

Computer gamers crack protein-folding puzzle

Mapreduce and Hadoop Algorithms in Bioinformatics Papers | Abhishek Tiwari
Gamers beat algorithms at finding protein structures

Nature paper decision | Foldit

Microsoft Biology Foundation 1.0 Released – Parallel Programming with .NET

The Molecular Programming Project – Caltech – U

Boris Schmid, PhD

Research field: Biological Sciences – Bioinformatics
Theoretical / Systems Biology: modeling of evolution, population dynamics, epidemiology, immunology, virology, networks.

bioinformatics toolkit in clojure: what would that look like? – Clojure | Google Groups

Saaien Tist: Encounter with incanter – about clojure, incanter and bioinformatics

Paul W.K. Rothemund
I am interested in how processes in biology and chemistry can actually act as computers and execute molecular algorithms

Python and databases (Mysql and SQLite) « Python for Bioinformatics

DataAllure: Hadoop for DNA sequence analysis

Multi-core Parallelization in Clojure – a Case Study

Hadoop for Bioinfomatics – Deepak Singh on Vimeo

Analyzing Human Genomes with Hadoop » Cloudera Hadoop & Big Data Blog

My Links:

Angel “YesIHaveAGenoma” Lopez 🙂

CobolScript (4) Web Pages with Templates

Previous Post

In the previous post I presented CobolScript generating output using templates. It can be used to generate web page output, too. The sample is at:

The launch program in JavaScript is simple:

var cobs = require('../..'),
    http = require('http'),
    fs = require('fs');

var program = cobs.compileTemplateFile('./factorial.cobp');

http.createServer(function(req, res) {{ request: req, response: res }));

console.log('Server started, listening at port 8000');

The key part is the call to compile the file template. It produce a compiled JavaScript function to be invoked. The call of executes the template, with a giving runtime context. The runtime context is build giving the request and response objects of the current incoming request. That runtime derives all the output of the CobolScript program to the response output. So, the template doesn’t know about web request and response. You can see the runtime object as a service provider to the compiled CobolScript program. Its properties can be accessed if you define a LINKAGE SECTION as in classic COBOL. But this feature was not used in this simple sample.

The template file

<p>Page generated by CobolScript, using templates</p>
<tr><th align='right'>n</th><th align='right'>n!
local n.
perform show-factorial using n varying n from 1 to 10.
stop run.

show-factorial section using n.
local result.
perform factorial using n giving result.
<td align='right'>${n}</td><td align='right'>${result}

factorial section using n.
local m.
if n = 1 then return n.
subtract 1 from n giving m.
perform factorial using m giving m.
multiply n by m.
return m.

Launch the server with

node server

Navigate to localhost:8000, the result:

Next post: a dynamic web site accessing MySQL database, using CobolScript.

Keep tuned!

Angel “Java” Lopez