Welcome to chapter 10 of DBS710, Advanced Database Systems. This week we dive into the world of NoSQL databases, and specifically MongoDB. MongoDB is one of the most popular NoSQL databases to rise over the past few years and is a good alternative solution to relational databases in some circumstances.
Before getting started this week, the student must install and Setup MongoDB and the associated Compass GUI. Instructions are Here
After this week you should be able to:
No SQL is a generic term that refers to any non-relational data management system that does not use SQL as its' standardized communication language. The basis for traditional relational database management systems are know as ACID:
There are several reasons why NoSQL is a potential fit for situations including:
There are several different NoSQL database management systems that run using slightly different principles:
Both MongoDB and CouchDB are document based database management systems, as we will explain below. We will be using MongoDB in this course for our NoSQL DBMS.
Other types of NoSQL DBMSs include key-value, XML, Column and Graph based systems.
Factors that determine the advantages and drawbacks of using a NoSQL solution include:
Let us look at a couple cases in making a decision between NoSQL and relational databases.
In data science, it is typical to have a very large amount of data and in order to process it, speed is a definite requirement in addition to data consistency and integrity. Machine learning requires the most accurate data possible to come to the best conclusions. Data errors and data inconsistency can add to processing, or learning, time and cause skewed results based on incorrect data.
Machine Learning and Artificial Intelligence, especially in unsupervised learning scenarios, relies heavily on the relationships between different entities in the data and can transverse the data though relationships and discover correlations that may or may not have been expected.
In this scenario, Relational Databases are the better choice.
These applications rely heavily on data consistency and data integrity and have the lower possibility of concurrency or have concurrency managed within the system through built-in processes. Transactional applications are highly structured with pre-defined business rules, parent-child relations and often the schema will not change over time.
Data access must be consistent at all times in order to ensure transactional success through one or more statements within the transaction.
Using NoSQL results in the inability to define relationships, does not support transactions at all, lacks ACID properties, and has no ability to use customized joins for varying the results based on the inquiry.
The relational model is preferred for transactional applications.
These small to mid-sized web applications need to scale easily and can contain a large variety of content that is evolving over time as new products, or new website features are added. The database must be flexible to change and handle large quantities of operations and users.
When the application is evolving over time and the schema needs to be changeable, partial record access is permissable and data consistence is of a lower priority, NoSQL database systems may be an option.
MongoDB is a Document-Oriented NoSQL database and differs greatly from a relational one. MongoDB is a powerful, flexible and scalable general-purpose database that will meet the needs of many small and medium sized database requirements. The following features are available in MongoDB:
In MongoDB, the concept of a row is replaced with a document, which a much more flexible. By using documents and arrays, complex hierarchical relationships can be represented with a single record. MongoDB is schema-less, i.e. there is no pre-defined schema, giving ultimate flexibility on the fly. The types of a document's keys and values can be variable, fields can be added or removed easily, and there are several models that can be chosen from.
In modern applications, the amount of data grows at a rapid pace and the database needs to scale easily. NoSQL databases can be easily scaled up by adding more powerful, or clustered servers. This option is expensive and may have limits of technology and physical space.
By partitioning data across more machines, databases can be scaled out by adding more storage to the cluster of servers. This cheaper alternative can be effective but can also be difficult to manage hundreds or thousands of machines.
MongoDB as a document-orientated model scales out easier by splitting data across multiple servers.
A document is the base unit of data and is equivalent to a row of data in a relational database.
For example, the book title "Blue Sky" is a document and the book title "Hamlet" is another document.
A collection is a group of similar rows put together, like a table in a relational database, but has a dynamic and changeable schema.
For example, the group of documents described above is a collection of book documents. The flexibility arrives when one book may have a category of genre and one book may have a category and a sub-category. The two documents can have slightly different schemas, but can be part of the same collection.
A database is a group of collections grouped together into a single instance. One MongoDB can host multiple databases.
Every document has a special key (Id) and is unique within the collection and is defined by the key and a value.
{"greeting": "Hello World!"}where the key is "greeting" and the value is "Hello World!".
{"greeting": "Hello World!", "signal": "wave", "count": 2}Where an additional key of "signal" is paired with the value "wave" and a third key, "count", is paired with the value 2. Note that the values do not need to be the same type, but can be varied as required.
The key must be of type string, using only UTF-8 characters, and can not be the null terminator '\0'. Do not include $ in a key name.
MongoDB is both type and case sensitive. Example:
{"count": 3} {"count": "3"}are different distinct documents as are:
{"count": 3} {"Count": 3}
A single document must not contain duplicate key names:
{"count": 3, "count": 4}would cause an error and gives an illegal document error due to duplicate keys.
A collection is a group of documents and can be considered as a table would be in a relational database but the collection has a dynamic schema where different documents, or rows, can have different schemas, or a different set of columns.
{"greeting": "Hello World!"} {"count": 4}A document can be in any collection.
A collection is identified by its' name, but the name may not be empty, contain a null terminator '\0', start with a reserved prefix, such as system, of include the reserved character $.
Sub-collections can be used in the organization of a collection and accessed using dot notation. Example: A blog collection may have 2 sub-collections, blog.posts
and blog.authors
.
As you may have assumed by now, a database is a group of collections containing data related to a single application. A single instance of MongoDB can host multiple databases. Each database is stored in a separate file and is identified by name. The same naming rules as collections, apply to database with the addition of almost no special characters are allowed in the name and must be less than 64 bytes.
There are three reserved database names that can not be used, as they are used for admin level data storage. These are:
Namespaces are simply a qualified collection name describing the hierarchy of the data storage.
database.collection.subcollection: cms.blog.posts
The MongoDB Shell is a full-featured JavaScript interpreter. This means you can use JavaScript programs in the shell to easily manipulate and access the data documents within the database.
> x = 200 200 > x / 5; 40and you can use all the standard JavaScript libraries:
> "Hello World!".replace("World", "MongoDB");
Hello MongoDB!
This capability of doing JavaScript is very extensible with the ability to write JavaScript functions.
> function factorial(n) { ... if (n <= 1) return 1; ... return n * factorial(n-1); ... } > factorial(5); 120 > factorial(10); 3628800