Medium 9781449303433

Scaling CouchDB

Views: 583
Ratings: (0)

This practical guide offers a short course on scaling CouchDB to meet the capacity needs of your distributed application. Through a series of scenario-based examples, this book lets you explore several methods for creating a system that can accommodate growth and meet expected demand. In the process, you learn about several tools that can help you with replication, load balancing, clustering, and load testing and monitoring.

  • Apply performance tips for tuning your database
  • Replicate data, using Futon and CouchDB’s RESTful interface
  • Distribute CouchDB’s workload through load balancing
  • Learn options for creating a cluster of CouchDB nodes, including BigCouch, Lounge, and Pillow
  • Conduct distributed load testing with Tsung

List price: $9.99

Your Price: $7.99

You Save: 20%

 

6 Slices

Format Buy Remix

1. Defining Scaling Goals

ePub

Before you can scale CouchDB, you need to define your scaling goals. Once you have defined your goals then you can design your system. You should test the scalability of your system before it is deployed. See Chapter6 for information about how to perform distributed load testing on your system using Tsung. When your system has been deployed to production, you should continue to monitor its performance and resource utilization using a tool such as Munin (a CouchDB plugin for Munin is available at https://github.com/strattg/munin-plugin-couchdb) or Nagios. You can monitor an individual node by issuing a GET HTTP request to /_stats which will return various statistics about the CouchDB node.

Its important to note the distinctions between performance, vertical scaling, and horizontal scaling. Performance typically refers to properties of a system such as response time or throughput. Vertical scaling (or scaling up) means adding computing capacity to a single node in a system. This could be through added memory, a faster CPU, or larger hard drives. While memory gets cheaper, CPUs get faster, and hard drives get larger every year, at any given moment theres an upward limit to vertical scaling. Also, the hardware for high capacity nodes is usually more expensive per unit of computing capacity (as defined by your application) than commodity hardware.

 

2. Tuning and Designing for Scale

ePub

In this chapter, we will take a look at some performance tips that you can apply when tuning your database. While not directly related to scalability, increasing performance can increase the overall capacity of your system. There are many options available when tuning CouchDB to meet your needs.

We will also discuss considerations around the design of your documents. CouchDB is a schema-less database, giving you much flexibility in designing the document boundaries for your data. However, the decisions you make around designing your documents can have an impact on the performance and scalability of your database.

The best way to increase the capacity of your database is to not send requests to it in the first place. Sometimes you cant forego a database request altogether, but you can limit the amount of work you ask the database to do. Here are a some tips to limit the amount of work you ask of CouchDB and to increase performance (the applicability of these tips to your application may vary):

 

3. Replication

ePub

Replication in CouchDB is peer-based and bi-directional, although any given replication process is one-way, from the source to the target. Replication can be run from Futon, CouchDBs web administration console, or by sending a POST request to _replicate containing a JSON object with replication parameters. Lets assume we have two databases, both running on the same CouchDB node, that we want to replicate: catalog-a and catalog-b (we can also replicate databases on different CouchDB nodes).

Using Futon:

Navigate to http://localhost:5984/_utils/ using your web browser.

Create the catalog-a and catalog-b databases.

Create a new, empty document (with only an _id field) in the catalog-a database.

Under Tools, click Replicator.

Under Replicate changes from, leave Local database selected, and select catalog-a.

Under to, leave Local database selected, and select catalog-b.

Click the Replicate button. Figure3-1 shows how everything should look (the details under Event will be different for you). Optionally, you could have checked Continuous to trigger continuous replication.

 

4. Load Balancing

ePub

Load balancing allows you to distribute the workload evenly across multiple CouchDB nodes. Since CouchDB uses an HTTP API, standard HTTP load balancing software or hardware can be used. With simple load balancing, each CouchDB node will maintain a full copy of your database through replication. Each document will eventually need to be written to every node, which is a limitation of this approach since the sustained write throughput of your entire system will be limited to that of the slowest node. You could replicate only certain documents using filter functions or by specifying document IDs, as discussed in Chapter3. This approach to clustering could get complicated very quickly. See Chapter5 for details on an alternative way to distribute your data across multiple CouchDB nodes.

In this scenario, we will set up a write-only master node and three read-only slave nodes. We will send all unsafe HTTP write requests (POST, PUT, DELETE, MOVE, and COPY) to the master node and load balance all safe HTTP read requests (GET, HEAD, and OPTIONS) across the three slave nodes. We will set up continuous replication from the write-only master to each of the read-only slave nodes. See Figure4-1 for a diagram of the configuration we will be creating in this chapter.

 

5. Clustering

ePub

In Chapter4, we looked at load balancing. Load balancing is very useful, but it alone may not provide you with the scale you need. Sometimes it is necessary to partition your data across multiple shards. Each shard lives on a CouchDB node and contains a subset of your data. You can have one or more shards on each node. CouchDB does not natively support this form of clustering. However, there are third-party tools that allow you to create a cluster of CouchDB nodes. These tools include BigCouch, Lounge, and Pillow.

An alternative to automatic partitioning is to manually partition your documents into different databases by type of document. The downside to this approach is that only documents in the same database can be included in any given view. If you have documents that dont need to be queried in the same view, putting them in separate databases can allow you to use CouchDB as-is without needing a third-party tool.

BigCouch is a fork of CouchDB that introduces additional clustering capabilities. It is available under an open source license and is maintained by Cloudant. For the most part, you can interact with a BigCouch node exactly the same way you would interact with a CouchDB node. BigCouch introduces some new API endpoints that are needed to manage its clustering features.

 

6. Distributed Load Testing

ePub

Each application has its own unique usage patterns. It can be very difficult to accurately predict these usage patterns. If you are working with an existing system, then you can take a look at log files and analytics data to get a sense of how your application is used. If this is a new system, then you can create scenarios based on how you expect the application to be used. Generic benchmarking can be of some use, but a test specifically designed for your system will be more useful.

The example load test in this chapter is intended as an illustrative example that can be helpful when you are writing your own tests. However, writing a test that is customized to your applications usage patterns can be very difficult. An appropriate test for your system will look very different than the example test in this chapter.

There are many tools available that allow you to create tests customized for your application. However, when creating a distributed system it can be difficult to actually generate enough load to push your system to its maximum capacity. In order to stress test a distributed system, you will need a distributed load testing tool. Tsung is a distributed load and stress testing tool that we will use for the example this chapter. We will be using Tsung on Ubuntu, but these steps can be easily adapted to other platforms. Tsung can generate GET and POST HTTP requests and, as of version 1.2.1, PUT and DELETE HTTP requests. Some of Tsungs features include:

 

Details

Print Book
E-Books
Slices

Format name
ePub
Encrypted
No
Sku
9781449307219
Isbn
9781449307219
File size
0 Bytes
Printing
Not Allowed
Copying
Not Allowed
Read aloud
No
Format name
ePub
Encrypted
No
Printing
Allowed
Copying
Allowed
Read aloud
Allowed
Sku
In metadata
Isbn
In metadata
File size
In metadata