The Big Data explosion in recent years has created a number of new data storage and processing technologies. Platforms like NoSQL, Yarn, and Hadoop are now familiar terms within this growing ecosystem. However, it’s likely that many of you haven’t heard of “triplestores.” This is a new kind of database that’s been around since the early 2000s but has grown in popularity in recent years due to Big Data. In its simplest definition, a triplestore is “a purpose-built database for the storage and retrieval of triples.” Now in case that’s not very helpful, let’s describe what in the world a triple is.
A triple quite simply is a single data entity comprised of three serialized elements: <subject, predicate, object> that makeup a statement.
Imagine a string of data within a social network such as “Jack is a friend of Jill.” This information can be described as follows:
Over time, putting together these various statements builds up a web of facts, which is the technology that underlies what has been called the semantic web.
The semantic web is basically a collaborative effort to make the internet ‘smarter’ and more fit for human consumption; in other words, the semantic web brings customized information to the user rather than having the user “go out” and search for a needle in the haystack. It’s the difference between the web of 1999 and the web of today.
The advantage of “triples” is that they provide a simple and flexible way of modeling data that is more like how the human brain works. Rather than structured relational models, or even key-value pairs in the NoSQL model, triples have a semantic structure that can easily represent connections between structured data and free flowing text. As one source well states, “Because this model is so simple, it allows structured, semi-structured and unstructured data to be mixed, exposed, and shared across different applications.” Because of the semantic structure of subject, predicate, and object embedded in triples, triplestore databases are unique from other databases in their ability to interpret data through reasoning and by discovering new facts and relationships.
The following graph describes the basic concepts here. For example, here’s a triple on the person named Margaret Attwood:
And here’s the same information in a graph format (as opposed to a relational model):
Now the nice thing about triples is that in the enormous “web of information” new relations, connections, and discoveries can be made that would otherwise be impossible with relational models. In other words, you might discover that an object in the web of Margaret Attwood can be a subject in the web of Bruce Cockburn. Through this triplestore data we discover that Margaret and Bruce have a connection vis-à-vis Ottawa.
Much more could be said here, but hopefully you get the basic idea of how triples and triplestores provide a glimpse into the future of data storage on the semantic web. This technology has the potential to revolutionize how we connect to and interact with information.
Looking into the future of the internet, we see a new paradigm in information and database management. For instance, if you haven’t done so, it’s worth taking a look at Google’s Physical Web project, which represents a fundamental new way to interact with information and objects over the internet. According to this framework, the future of the internet will likely be URL based rather than app-based. And if this is indeed the case, then there will need to be new ways to manage and store massive amounts of data more quickly and efficiently. Triplestores represent a paradigm shift in this direction, towards a form of data modeling more attuned to the semantic structure of language and to how the human brain works.
The latest advances in Big Data, Internet of Things, semantic web, and the Physical Web, are portents of a future that is already at our doorstep. Now is the time to start getting acquainted with the new generation of Web 3.0 technologies. Put “triplestores” on your research agenda today and find ways to start leveraging this emerging and potentially disruptive technology.