Informatikr home

Automatic Optimal Pipelining of Redis Commands

18 Jan 2012

In this post I describe different approaches for client-libraries to implement Redis protocol pipelining. I will cover synchronous as well as asynchronous (event-driven) techniques and discuss their respective pros and cons: Synchronous client APIs require the library user to explicitly pipeline commands, potentially yielding optimal protocol performance, but at the cost of additional bookkeeping when handling replies. Asynchronous client libraries, on the other hand, allow automatic pipelining, while being less efficient in their pipelining behavior.

After discussing the two approaches, I will show how to get the best of both worlds: a Redis client library with the "look and feel" of a synchronous API that pipelines automatically and optimally, yet with none of the downsides of either approach. The library, written in the Haskell programming language, is called Hedis. It is available on Hackage.

Pipelining

The Redis datastore server uses a request/response protocol to communicate with clients. For each request sent from a client to the server, the server will send a reply back to the client. Redis offers a feature called pipelining. It allows clients to interleave the request/response cycle of several commands in such a way that first all the requests are sent and only then the client waits to receive the replies.

The benefit of this technique is a drastically improved protocol performance. The speedup gained by pipelining ranges from a factor of five for connections to localhost up to a factor of at least one hundred over slower internet connections.

Existing client libraries can be divided into two groups, synchronous as well as asynchronous, that determine how they handle pipelining.

Synchronous Clients

A synchronous client library is one where clients send a command to the server and then block, while waiting for the reply. Control returns to the user only after the reply is read from the network connection. Here is an example, using hiredis, the "official" Redis client library for the C language (examples are adapted from the hiredis website).

reply = redisCommand(context, "SET foo bar");

To pipeline commands, the user has to explicitly group them together, such that first all the requests are sent and afterwards the respective replies are read.

redisAppendCommand(context,"SET foo bar");
redisAppendCommand(context,"GET foo");
redisGetReply(context,&replySet); // reply for SET
redisGetReply(context,&replyGet); // reply for GET

The benefit of an explicit approach to pipelining is that commands can be pipelined optimally, i.e. by sending the least amount of packets over the network. The algorithm for this is as follows:

Other clients, such as the Java client Jedis, have a somewhat nicer syntax to build a pipeline. From a protocol point-of-view this is exactly the same as the hiredis approach of explicit sending and explicit receiving. It just hides some plumbing.

List<Object> results = jedis.pipelined(new PipelineBlock() {
    public void execute() {
        set("foo", "bar");
        get("foo");
    }
});

Actually, not all client libraries follow the optimal algorithm. Some flush their output buffer for every request in the pipeline, thus sending more packets than necessary. But all synchronous clients with explicit pipelining could make this optimization under the hood of their API.

Explicit pipelining has two downsides. The first one is, arguably, the explicitness itself. The library user can miss some pipelining opportunity, thus achieving suboptimal performance. The other downside is that individual commands are no longer directly linked to their respective replies. The user is required to take replies from a list by manually matching the command's position in the pipeline with the same position in the reply-list, which, of course, is very error-prone. Hiredis, with separate function calls for sending and receiving, even opens the possibility of mismatching the number of sent requests and received replies.

Asynchronous Clients

Asynchronous, also called event-based, client libraries let the library user register a callback function for each request sent to the server. As soon as the reply is received, the callback function is applied to it. Here is an example adapted from the node_redis library website.

client.set("foo", "bar")
client.get("foo", function (err, reply) {
    console.log(reply.toString());
});

This approach of registering callbacks to handle the replies enables automatic pipelining. Each command function sends its request over the network, registers the given callback and then immediately returns, without waiting for the reply. So the next command can be sent right after the previous one. That means an asynchronous Redis client automatically pipelines commands as much as possible, while keeping a clear relationship between each command and its respective reply.

However, contrary to the synchronous approach, the pipelining is not optimal in the number of packets sent over the network. The reason is, that it's not possible to follow the optimal algorithm and defer flushing the output buffer, since the library user never calls a blocking receive function. The drop in performance, measuring requests per second, will be about a factor of two for heavily pipelined command sequences.

The Hedis library

It turns out, it is possible to build a Redis client library that does automatic and optimal pipelining.

Hedis, a Redis client library for the Haskell programming language, achieves these goals while also having the "look and feel" of a synchronous, un-pipelined API. Here is a simple example with an explanation of what is going on; two pipelined get commands:

conn <- connect defaultConnectInfo
runRedis conn $ do
    foo <- get "foo"
    bar <- get "bar"
    liftIO $ print (foo,bar)

connect takes some information about the Redis server to connect to, such as host and port, and creates a pool of network connections to that server. The argument defaultConnectInfo means that the server is located at localhost, port 6379, and requires no authorization. The following call to runRedis takes a connection from the pool conn and sends the commands in it's argument action to the server. Note that each command function returns it's respective reply, so a clear relationship between request and reply is maintained.

The reason the two gets are pipelined is, that their replies are not evaluated (thanks to Haskell's laziness) until the final print. In contrast, the following code can not (and will not) be pipelined, since the reply from get "foo" is used as an argument to the following command.

conn <- connect defaultConnectInfo
runRedis conn $ do
    foo <- get "foo"
    bar <- get (either undefined id foo)
    liftIO $ print bar

A Peek Under the Hood

The connect function reads, from each socket it opens, a lazy list of all the replies the server will send over the connection. This is in principle the same as lazy I/O functions such as hGetContents, extended by some additional reply-parsing. To make sure the individual replies are read from the network on-demand and at the latest time possible, the lazy list is constructed by using unsafeInterleaveIO, which defers the actual reading from the socket until the reply is evaluated.

If you are unfamiliar with how to make use of unsafeInterleaveIO, I encourage you to read that part of the Hedis source code. Alternatively you could have a look a hGetContents from Data.ByteString.Lazy, which demonstrates the same principle but might be easier to understand if you already know how lazy ByteStrings work.

All command functions have the monadic type Redis. This Redis monad gives actions of it's type access to the network connection with the server as well as an IORef holding the lazy list of all the replies.

newtype Redis a = Redis (ReaderT (Handle, IORef [Reply]) IO a)
    deriving (Monad, MonadIO, Functor, Applicative)

Internally, each command function does two things:

  1. Send, by writing the Request to the socket handle and
  2. "receive" a reply by taking the first element from the lazy list of replies.

The request, encoded as a list of ByteStrings, is rendered according to the Redis protocol and written to the socket handle. As determined by the algorithm for optimal pipelining, flushing of the output buffer is not done here, but in the function that reads the replies.

send :: [ByteString] -> Redis ()
send req = Redis $ do
    h <- askHandle
    liftIO $ hPut h (renderRequest req)

Then, the first reply is popped from the lazy list and returned to the caller, while the reference is modified to hold the tail of the reply list.

recv :: Redis Reply
recv = Redis $ do
    rs <- askReplies
    liftIO $ atomicModifyIORef rs (tail &&& head)

Note that, in order to be lazy and non-blocking, head and tail are called instead of matching the : list constructor. What the command returns is not (yet) it's reply, but a thunk that will evaluate to it. The reply, at this point is not even read from the socket handle.

Summary

The discussion of synchronous and asynchronous client libraries for the Redis datastore shows, that each approach has it's own pros and cons. The Hedis library for Haskell combines the advantages of both techniques while avoiding their disadvantages. This way it offers high performance in combination with a pleasant programming model.