Polyglot Persistence with NoSQL
Submitted by Ken
NoSQL is a parachute over a number of disparate data persistence patterns that fill all the needs of a demanding market today, giving businesses the ability to adapt in their design, be successful and on time to market. While the term ‘NoSQL’ is new to many, its patterns around data use and persistence are actually just as old as the Relational Model. The models originally squared off in the late 70’s, the winner being obvious but not because the others were inferior or lacked in design: The Relation Model simply fit a need for it’s time.
In the ‘Polyglot’ world of persistence, the Relational Model is just another database to help get a job done: it’s no longer the only one. In this new world of ‘Big Dynamic Data’ things like massive load, horizontal and vertical-scaling, sharding, replication and tiny to large machines are all expected to just exist, they need to exist. The Relational model fails time after time in this scenario, it was never meant to handle that. You don’t have to look far at all to see who is using NoSQL style implementations to handle their ‘Big Dynamic Data’ needs:
- IBM – couchDB – The inventors of the Relational Model and Object Data persistence (NoSQL) have been huge adopters buying up Cloudant providers, rolling out NoSQL implementations and adding NoSQL support to their DB2 database architecture.
- Amazon – a number of NoSQL uses and services, DynamoDB is one of the big ones for them.
- Facebook – MySQL (wait, what?)…They heavily modified their MySQL infrastructure with plugins to bolt on a NoSQL implementation.
- Google – they have adopted but internalized their own NoSQL called ‘Bigtable’
- Microsoft – couchDB for their internal unit test framework data recording, project called ‘Daylight’
- Twitter – Cassandra and a number of integrated NoSQL implementations
- Adobe – Neo4j and HBase (the SaaS Infrastructure team) with many product implementations. They also provide some very good best practice talks about these implementations and how they used them.
- Forbes – mongoDB (Online presence)
The following is a brief list of technical details associated with different Persistence Data Store Models
The Relational Model:
- Schema Driven
- Normalized Data
- Dependency on Joined data
- Scales vertically, inherently designed to operate on 1 machine
- Complicated infrastructure
- Specialized database admins in advanced usage scenarios are needed
- Transactional based
- Runs into lots of problems when used in a distributed environment and under heavy loads
- Products: Oracle, SQL Server, MySQL, etc.
Key-Value, Document and Graph Databases share the following in common:
- No schema required
- Disparate data storage
- Extremely fast querying as all data is index on key
- Under normal design, all actions are ACID compliant
- Designed for tiny to large infrastructures, spanning weak to power machines working in a cluster where the size and shape can change constantly to meet demand or usage
- Models data objects as they are stored in memory, no need for intermediate layers of application code to translate between the ‘Developer’ view of an object and it’s ‘Relational Model’
- Provides for fast integration of persistence into an application
- Can scale both vertically and horizontally
The Key-Value Pair
- Atomic operations, no joins. All data saved is complete as a whole
- Products: Amazon DynamoDB, Apache Cassandra, Riak
The Document Model
- Can incorporate the concepts of ‘Joins’, but must do so carefully. Understanding one’s data is very important and in the end helps with the overall design
- Most popular choice due to its ability to handle most data scenarios with ease and power. XML, JSON, custom protocols are all easily incorporated into this model
- Products: MongoDB, Apache CouchDB, RavenDB
- Data is document based, stored in what are called ‘Nodes’
- Nodes are made known to each other through ‘Relationships’
- Relationships can have dynamic properties (and Nodes as well) that describe how the different data Nodes relate to each other
- Fastest level of querying power
- Products: Neo4j, InfoGrid