An Overview of Riak: an Open Source NoSQL Database

Basho Riak is a document oriented open-source database system. It is shaped by the Amazon Dynamo document and provides a decentralized key-value store that supports standard Get, Put and Delete operations. Riak is a distributed, highly scalable and fault-tolerant store with map/reduce, HTTP, JSON and REST queries making it ideal for web apps. And, of course, Riak is NoSQL.

To understand why Riak is so powerful we need to know some theory. Let’s take a look at Amazon Dynamo. In Amazon Dynamo documentation we have 3 terms that describe distributed store behavior: N, R and W. N is the number of replicas of each value in store. R is the number of replicas needed for read operations. W is the number of replicas needed for write operations. Riak’s goal is to transmit N, R and W logic to an application. That’s why Riak can adapt application requirements.


The Riak ring is composed on identical nodes and data is automatically replicated to multiple nodes. So, any node can be added or removed easily without manual migration of data. Hence, since every node within the cluster is identical, there is no single point of failure or bottleneck. Riak clusters can grow into the hundreds or thousands of nodes, and naturally, there are sometimes machine failures. Riak detects machine failures and recovers when a machine is brought back into the cluster. The data (existing and new) is shared among the nodes automatically. If some nodes are unavailable, you can still read and write if the live node count is acceptable. If cluster parts that share the new version of a document become disconnected from each other, Riak returns both values and the application can choose what to do. In this scenario you can also configure Riak to use freshest variant.

Here are some tidbits about Riak:

  • Orientation: Document
  • Created: Basho
  • Implementation language: Erlang & C, some Javascript
  • Distributed: Yes – with the ability to span multiple machines, racks and data centers.
  • Storage: Bitcask, InnoStore or Google’s LevelDB.
  • CAP:  Riak stands for Availability + Partitioning with eventual Consistency.
  • Client: Basho supported Drivers are available for Erlang, Python, Java, PHP, Javascript, and Ruby.
  • Riak uses simple Rest interface so you can use any compatible client, for example cUrl.
  • Engine type:  Asynchronous ( Your heavy requests will be processed asynchronously without putting in queue others )
  • Open source: Yes (Apache License)
  • Map/Reduce: Yes (JavaScript based)
  • Best Practice: If you want something fast and scalable with simple REST interface and easy manageable, flexible schema. The most popular case is Mozilla’s bugstore database.
  • Production use: Riak has been used at Mozilla, SwingVine, WeGeo, AskSponsoredListings, AOL, GitHub, Formspring and many others.
  • Packages: Basho provides both 64 and 32 bit ,RPM and DEB packages, with extremely low amounts of dependencies. (I have installed and used Riak on Minimal Debian 6 Squeeze installation from Debian netInstaller CD)
  • Installation: Just download needed package and install it with your systems default package manager. (dpkg -i riak-XXX-XXX.deb, for Debian and Ubuntu)
  • If you need InnoStore engine, you will have to compile it from source and integrate it to Riak manually. 99% percent of Riak users do not want it.
  • Configuration: Single very well commented file /etc/riak/app-config with Erlang syntax.
  • better to do ln -s /etc/riak/app-config /etc/riak/app-config.erl and edit app-config.erl in VIM with syntax highlighting.


Links are metadata that establish one-way relationships between objects in Riak. They can be used to loosely model graph like relationships between objects.

The Link Header

The way to read and modify links via the HTTPAPI is the HTTP Link header. This header emulates the purpose of <link> tags in HTML, that is, establishing relationships to other HTTP resources. The format that Riak uses is like so:


Inside the angle-brackets (<,>) is a relative URL to another object in Riak. The “tag” portion in double-quotes is any string identifier that has a meaning relevant to your application. Objects can have multiple links by separating them with commas. For example, if an object was a participant in a doubly-linked list of objects, it might look like this:

Link: </riak/list/1>; riaktag=”previous”, </riak/list/3>; riaktag=”next”


Link-walking (traversal) is a special case of MapReduce querying and can be accessed through HTTPLinkWalking. Link-walks start at a single input object and follow links on that object to find other objects that match the submitted specifications. More than one traversal may be specified in a single request, with any number of the intermediate results returned. The final traversal in a link-walking request always returns results.

How Riak Spreads Processing

Riak’s MapReduce has an additional goal: increasing data-locality. When processing a large dataset, it’s often much more efficient to take the computation to the data than it is to bring the data to the computation. In practice, your MapReduce job code is likely less than 10 kilobytes, so it is more efficient to send the code to the gigs of data being processed than to stream gigabytes of data to your 10k of code.

It is Riak’s solution to the data-locality problem that determines how Riak spreads the processing across the cluster. In the same way that any Riak node can coordinate a read or write by sending requests directly to the other nodes responsible for maintaining that data, any Riak node can also coordinate a MapReduce query by sending a map-step evaluation request directly to the node responsible for maintaining the input data. Map-step results are sent back to the coordinating node, where reduce-step processing can produce a unified result.

Put more simply: Riak runs map-step functions right on the node holding the input data for those functions, and it runs reduce-step functions on the node coordinating the MapReduce query.

How a Link Phase Works in Riak

Link phases find links matching patterns specified in the query definition. The patterns specify which buckets and tags links must have. “Following a link” means adding it to the output list of this phase. The output of this phase is often most useful as an input to a map phase, or another reduce phase.

When you should use Riak

  • You need a reliable, extremely fast non-relational database.
  • You need private cloud storage for non-huge files which can be called directly through a browser.
  • You need a JavaScript based mapreduce engine.
  • You need to play with your database via a simple REST interface.
  • AP + Eventual consistency from CAP is OK with you.

When you can use Riak, but probably shouldn’t

  • C from CAP is needed (Real time chat server).
  • You need complex MapReduce processing with huge data.
  • You need cloud storage for big files.

When you cannot use Riak

  • You need a SQL database.
  • You need a POSIX compatible file system.
  • You need to make a cup of coffee.

Get started with 100% free, all-in-one server monitoring at