What it takes to run Stack Overflow

http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/

I like to think of Stack Overflow as running with scale but not at scale. By that I meant we run very efficiently, but I still don’t think of us as “big”, not yet. Let’s throw out some numbers so you can get an idea of what scale we are at currently. Here are some quick numbers from a 24 hour window few days ago - November 12th, 2013 to be exact. These numbers are from a typical weekday and only include our active data center - what we host. Things like hits/bandwidth to our CDN are not included, they don’t hit our network. (只能说with scale而不是at scale, 因为规模还是不那么大。下面的数据是内部数据中心的数据量,请求访问应该比这大但是CDN挡住了部分)

Here’s what runs the Stack Exchange network in that data center: (用于在线服务的服务器配置)

The rest of those servers in the nearest rack are VMs and other infrastructure for auxiliary things not involved in serving the sites directly, like deployments, domain controllers, monitoring, ops database for sysadmin goodies, etc. Of that list above, 2 SQL servers were backups only until very recently - they are now used for read-only loads so we can keep on scaling without thinking about it for even longer (this mainly consists of the Stack Exchange API). Two of those web servers are for dev and meta, running very little traffic.(其他服务器则用来做部署,域名控制,监控,OPS数据库等。其中两台MySQL原来作为backups直到最近作为read-only server, 其中两台webserver则用来开发和Meta(?))


Core Hardware

如果除去冗余的话,下面这些服务器(应该,不确定)就可以支撑整个Stack Exchange

Now there are a few VMs and such in the background to take care of other jobs, domain controllers, etc., but those are extremely lightweight and we’re focusing on Stack Overflow itself and what it takes to render all the pages at full speed. If you want a full apples to apples, throw a single VMware server in for all of those stragglers. So that’s not a large number of machines, but the specs on those machines typically aren’t available in the cloud, not at reasonable prices. Here are some quick “scale up” server notes:(后台有些VM完成一些非常轻量工作比如域名控制等,主要精力还是用来应对SO. 使用服务器都是比较高端走scale-up路线的)

Is 20 Gb massive overkill? You bet your ass it is, the active SQL servers average around 100-200 Mb out of that 20 Gb pipe. However, things like backups, rebuilds, etc. can completely saturate it due to how much memory and SSD storage is present, so it does serve a purpose.


Storage

We currently have about 2 TB of SQL data (1.06 TB / 1.63 TB across 18 SSDs on the first cluster, 889 GB / 1.45 TB across 4 SSDs on the second cluster), so that’s what we’d need on the cloud (hmmm, there’s that word again). Keep in mind that’s all SSD. The average write time on any of our databases is 0 milliseconds, it’s not even at the unit we can measure because the storage handles it that well. With the database in memory and 2 levels of cache in front of it, Stack Overflow actually has a 40:60 read-write ratio. Yeah, you read that right, 60% of our database disk access is writes (you should know your read/write workload too). There’s also storage for each web server - 2x 320GB SSDs in a RAID 1. The elastic boxes need about 300 GB a piece and do perform much better on SSDs (we write/re-index very frequently).

It’s worth noting we do have a SAN, an Equal Logic PS6110X that’s 24x900GB 10K SAS drives on a 2x 10Gb link (active/standby) to our core network. It’s used exclusively for the VM servers as shared storage for high availability but does not really support hosting our websites. To put it another way, if the SAN died the sites would not even notice for a while (only the VM domain controllers are a factor).(使用SAN来做存储底层)


Put it all together

Now, what does all that do? We want performance. We need performance. Performance is a feature, a very important feature to us. The main page loaded on all of our sites is the question page, affectionately known as Question/Show (its route name) internally. On November 12th, that page rendered in an average of 28 milliseconds. While we strive to maintain 50ms, we really try and shave every possible millisecond off your pageload experience. All of our developers are certifiably anal curmudgeons when it comes to performance, so that helps keep times low as well. Here are the other top hit pages on SO, average render time on the same 24 hour period as above: (所有指标都在100ms以下)

We have high visibility of what goes into our page loads by recording timings for every single request to our network. You need some sort of metrics like this, otherwise what are you basing your decisions on? With those metrics handy, we can make easy to access, easy to read views like this:

so-route-time-questions-list.png


Room to grow

It’s definitely worth noting that these servers run at very low utilization. Those web servers average between 5-15% CPU, 15.5 GB of RAM used and 20-40 Mb/s network traffic. The SQL servers average around 5-10% CPU, 365 GB of RAM used, and 100-200 Mb/s of network traffic. This affords us a few major things: general room to grow before we upgrade, headroom to stay online for when things go crazy (bad query, bad code, attacks, whatever it may be), and the ability to clock back on power if needed. Here’s a view from Opserver of our web tier taken just now:(所有服务器利用率都比较低,空闲资源可以应对许多重大情况) 下图是服务器的资源利用率情况

so-opserver-screens.png

The primary reason the utilization is so low is efficient code. That’s not the topic of this post, but efficient code is critical to stretching your hardware further. Anything you’re doing that doesn’t need doing costs more than not doing it, that continues to apply if it’s a subset of your code that could be more efficient. That cost comes in the form of: power consumption, hardware cost (since you need more/bigger servers), developers understanding something more complicated (to be fair, this can go both ways, efficient isn’t necessarily simple) and likely a slower page render - meaning less users sticking around for another page load…or being less likely to come back. The cost of inefficient code can be higher than you think.(性能意味着很多东西)

comments powered by Disqus