Upgrading hort.net to Redis, part 1

We’ve been playing with Redis at hort.net over the past year. Its speed and simplicity make system performance better and programming easier — what used to be 30 lines of MySQL statements can be condensed into five Redis queries that shave 1/10th of a second off page loading times. And that’s just for one part of the site. Redis also scales to multiple servers more easily, so as hort.net grows it won’t be painful to keep up technically.

After successfully moving our sHORTurl service to Redis (and getting the aforementioned performance gains), we started looking for other ways to use it here.

hort.net is entirely written in custom Perl (through mod_perl) under Apache with a MySQL backend.  Redis would eliminate the latency in database queries, but because Redis is stored in memory we only wanted to use it in highly repetitive cases that didn’t involve complicated MySQL statements.

Identifying MySQL queries that could shift to Redis

I restarted MySQL with the –log option so that every database command would be logged in a file.  After about 21 hours, a short shell command was run to identify the most frequently queried databases:

$ grep SELECT /var/log/mysql/query-log | awk -F'FROM' '{print $2}' | awk '{print $1}' | sort | uniq -c | sort -n | tail -10
1101 authentication
1294 account.email
2579 splits
4025 image
4383 plant
7737 vote
15248 wp_options
26336 topstory
50506 navigation

The number on the left shows how many queries the database named on the right received during that timeframe.

navigation and topstory are both displayed on every hort.net page, and navigation is actually used in multiple places on each page. wp_options is used by WordPress, and vote, plant, and image are used when images in the plant gallery are viewed (or voted upon). image and plant are fairly large databases that involve some complicated database queries so they wouldn’t make sense to move to Redis. But these three databases looked like great candidates:

  • navigation
  • topstory
  • vote

topstory was very straightforward.  topstory just contained a story_id, headline, url, date, and a bit of text in a database, and the most common query retrieved the most recent horticultural news articles posted on hort.net.  vote is more complicated because it uses the AVG() and STDDEV() MySQL functions to display higher-voting images first, and navigation was moderately complex because it used a weighting system to sort navigation links. Atlhough topstory didn’t give us the largest database hit, it seemed like the easiest to migrate.  So we started there.

Creating the Redis schema

First I had to develop a schema. The old MySQL database had this schema and these queries:

| Field       | Type             | Null | Key | Default | Extra          |
| story_id    | int(10) unsigned | NO   | MUL | NULL    | auto_increment |
| headline    | varchar(255)     | NO   | MUL |         |                |
| url         | varchar(255)     | NO   |     |         |                |
| description | text             | YES  |     | NULL    |                |
| date        | datetime         | NO   | MUL | NULL    |                |
5 rows in set (0.00 sec)

# get last 8 headlines and URLs, displayed on every page
SELECT SQL_CACHE url, headline FROM topstory ORDER BY date DESC LIMIT 8
# display last six hours of headlines in RSS
SELECT SQL_CACHE story_id, headline, url, description, DATE_FORMAT(DATE_SUB(date, INTERVAL 6 HOUR), “%a, %d %b %Y %T GMT”) FROM topstory WHERE date <= NOW() ORDER BY DATE DESC
# display all stories on the topstories page
SELECT SQL_CACHE headline, url, date FROM topstory WHERE date <= NOW() ORDER BY date DESC

Since speed was a primary goal, I had to make sure that our Redis datastore allowed us to easily sort our stories by date since every query relied on that. I ended up with this:

Hash s:h [story_id] [headline]
Hash s:u [story_id] [url]
Hash s:d [story_id] [description]
Hash s:e [story_id] [epoch time]
SortedSet s:t [epoch time] [story_id]
Set s:id [story_id]

The SortedSet uses the time in seconds since 1970 as our sorting field; with the ZREVRANGEBYSCORE command we can tell Redis to sort our ids by date and only return the last eight, if desired.  Once the list of ids is returned the script will use them to do a HMGET for the url, headline, and if necessary, description.

Before going any further, I wanted to time hort.net page loads before that part of the site was switched to Redis. I threw together a quick script that did 12 page loads five seconds apart, then averaged the load time. This should take other server activity into account, and it should also leverage MySQL caching to give us a true apples-to-apples comparison. The script was actually run regularly throughout the day and the values below represent the averages.

# hort.net front page load timing script

for i in {1..12}
   seconds=`(time lynx -dump -source http://www.hort.net/ > /dev/null ) |& grep real | sed -r 's/s$|0m//g' | awk '{print $NF}'`
   echo $seconds" seconds"
   sleep 5s
   total=$(echo $total + $seconds | bc) 

echo -n "Average load time: "; echo $total / 12 | bc -l
$ ./timeit
0.183 seconds
0.195 seconds
0.198 seconds
0.202 seconds
0.162 seconds
0.172 seconds
0.188 seconds
0.185 seconds
0.189 seconds
0.193 seconds
0.184 seconds
0.175 seconds
Average load time: .18550000000000000000

As you can see, the front page loaded in .1855 seconds on average.

Next I wrote a script to copy the topstory database into Redis.  It’s a script intended for onetime use, and since stories are manually added I didn’t have to worry about losing anything while migrating. It was fairly straightforward, and once it was run it took .979 seconds to move the 615 records into Redis.

Now came the hardest part — converting scripts running on the site to use Redis instead. The queries had to be rewritten, although that actually wasn’t as horrible as expected. The simple query that displayed the eight most recent stories was replaced with three Redis queries from Perl:

my @ids = $GLOBAL::RedisHandle->zrevrangebyscore('s:t', '+INF', '-INF', 'LIMIT', 0, 8);
my @headlines = $GLOBAL::RedisHandle->hmget('s:h', @ids);
my @urls = $GLOBAL::RedisHandle->hmget('s:u', @ids);

How did it perform?

So how did the timing go? It shaved off about 1/100th of a second, so the gains were minimal. I suspect that once the navbar is changed we’ll see improved gains like we did for sHORTurl because MySQL will no longer be used for the design framework, so the main Web page and things like the mailing list archives won’t have MySQL overhead anywhere. Pages like the gallery won’t see any real improvement because they still rely on MySQL for other things that aren’t appropriate for Redis.

We’ll tackle the navbar in Part 2 and see how much of a difference that makes.

$ ./timeit
0.178 seconds
0.168 seconds
0.162 seconds
0.175 seconds
0.175 seconds
0.191 seconds
0.174 seconds
0.167 seconds
0.169 seconds
0.167 seconds
0.179 seconds
0.177 seconds
Average load time: .17350000000000000000