Friday, September 10, 2010

Web pages per second: A simple calculation

Takeaway: When planning a Web site, it is difficult to estimate how much traffic it will take, which makes it even harder to plan for equipment and software. Brien Posey offers this quick guide to calculating how many pages per second a site will serve.

Web traffic is completely unpredictable, which can make it difficult to estimate how much traffic your servers can handle. You may know that a particular Web site averages 500 hits per hour, but you can't really tell exactly how many hits that a site will actually take on a given day. So when you’re planning for a new Web site’s capacity, it’s important to reasonably estimate the amount of traffic that you expect to get and then build a server that can comfortably handle that amount of traffic and more.
Calculating Web traffic
Web traffic can be a very difficult statistic to pinpoint. To estimate the number of hits a site takes, you must know what you want to estimate (unique IP addresses, total hits minus Web bot traffic, stickiness, etc.) and whether you will do dynamic (counters or banners) or static (log file analysis) monitoring.

Allowing for growth
Why build a server that’s more powerful than what you really need? Well, for starters, building a more powerful server allows for growth. After all, the whole idea behind having a Web site is to attract visitors to it. Because you’ll be trying to attract more and more visitors, the amount of traffic flowing in and out of your server will probably increase over time.

Even if you aren’t aggressively promoting your Web site, you still need to build a server that exceeds the amount of traffic that you planned for, because although your site may average a certain number of hits per hour, not all hours will be the same. For example, if your Web site offers news-related content, your peak traffic times will probably be early in the morning, late afternoon, and lunchtime. Of course, this is assuming that your site is limited to local interest. If your recreational site services more than one or two time zones, then there’s really no telling when your peak periods will be. It's important for you to realize that your site will receive a lot of hits during some parts of the day, and during other parts of the day, the site will receive comparatively fewer hits. It's important to be sure your server can comfortably handle the busiest parts of the busiest days and still have resources to spare.

Bandwidth and page type
Another rule of capacity planning is that not all pages are created equally. The majority of Internet capacity planning is focused around bandwidth needs, which are based on how many pages can be transmitted per second. For example, a standard page of text with one or two simple graphics might occupy about 5 KB of space. However, you don’t want to base your pages-per-second calculations on 5 KB if you’ve got other pages that are full of graphics.

Bandwidth isn’t the only factor to consider. If all of your pages are composed entirely from HTML text with the occasional graphic, bandwidth will be your primary concern. However, many Web sites these days depend on Active Server Pages (ASPs). This means that when the server receives a user request, the server must be able to dynamically construct a Web page in memory and transmit that page to the appropriate person. This process consumes a considerable amount of memory and processing power. The process of dynamically constructing a Web page and transmitting it also consumes a little bit more bandwidth, because the user isn’t simply transmitting a request for a static page. Instead, the user is transmitting all of the information necessary for the server to build the dynamic page. (This is typically done through the URL.) This information isn’t usually any more than a couple of hundred bytes, but when you compare a couple hundred bytes with the 20 or 30 bytes that might be needed to access a static page, you can see the negative impact that the operation would have on bandwidth.

This is especially true when you consider how a large number of users would affect the process. For example, let’s forget about outbound pages and every type of traffic except for the kind generated by an end user. Suppose that clicking on a static link sent 30 bytes of traffic to the server, while clicking on a dynamic link sent 200 bytes of traffic to the server. If 1,000 users performed the action at the same time, the action of clicking on the static link would generate just under 30 Kbps of traffic, while the same number of users connecting to a dynamic page would generate about 195 Kbps of traffic. While neither of these numbers seems significant, you must remember that this is only the traffic being sent to the server by the users clicking on a link.

I’m not saying not to use ASPs. ASPs are a great technology, and I encourage their use. I’m not even saying that ASPs are going to push your server to the breaking point. I'm merely pointing out that more traffic is generated by the users when you use ASPs, and you should at least consider it when calculating your bandwidth.

Estimating bandwidth capacity
If you’re planning on hosting a Web site, you probably already have an idea of what you want the site to consist of and what type of Internet connection you want to use. To help you determine your server’s bandwidth capacity, let’s use my Web site as an example. On my Web site, my largest page is my brother’s bio page, which includes a few fairly small graphics and two large JPEG photographs. Although the page is about 150 KB in size, this isn’t really excessive for a Web page. Just to make the math easier, let’s assume that the page was an even 200 KB. Overestimating the size of your biggest page gives you a smaller number of total hits per second than you’d actually be able to support. Remember though that underestimating is probably a good thing because calculating the exact numbers would reflect how many pages per second that your server could host under perfect conditions. Since conditions in the real world are seldom perfect, it makes since to play with the numbers a bit.

With that said, let’s look at some numbers. It’s tempting to simply divide the page size by your bandwidth (i.e., 1.5 Mbps of bandwidth divided by a 5 Kb page equals 300 pages per second) for an answer, but there’s more to the process than that. First, you need to determine what traffic is required for a client to access the page. The client must first establish a TCP/IP session with your Web server. This process requires about 180 bytes of information to flow across your connection. The next step in the process is the GET request, which is the process of requesting a specific page from your Web site.

The amount of data required by this process varies depending on the length of the URL (ASP URLs being longer than static page URLs). Unfortunately, you can’t simply count the number of bytes in the URL to come up with a number, because there is some overhead involved in the process. Instead, let's assume that the process requires about 256 bytes, which is an average number for a typical static Web page. Now, you need to determine the size of the page you’re working with. The page size will vary, depending on the number and size of graphics that the page contains.

I said I was going to use 150 KB as the size of my page. Since all of my other numbers have been calculated in bytes, I’ll convert the 150 KB into bytes by multiplying the number by 1,024, which comes out to be 153,600 bytes. There’s some overhead involved in using the TCP/IP protocol. Remember that all data flowing to and from your Web server is encapsulated into TCP/IP packets. In addition to your data, each packet must contain header information that includes information such as the packet’s source, destination, and sequence number.

The actual amount of overhead generated by TCP/IP varies, depending on whether you’re using an encryption algorithm such as IPSec, or just standard TCP/IP. You can determine the amount of overhead required by TCP/IP by performing some calculations. Each TCP/IP packet uses a 32-byte header that tells TCP/IP how to route the packet. The actual size of the message within the packet varies and can be up to 65,535 bytes in size. However, most of the time, the total size of the packet never exceeds 576 bytes. Because it’s impossible for me to know the exact packet structure that you’re using, I’ll go with the 576-byte model.

If a packet is 576 bytes in size and 32 bytes of the packet is the header, that leaves 544 bytes for the actual data. If you’re downloading a 153,600-byte page, and each packet can contain 544 bytes of data, it will take roughly 283 packets to move the page to the user’s browser. With that said, let’s do the math (see Table A).
Table A
Byte usage                                 Byte count 
TCP/IP connection                         Approximately 180 bytes 
GET request                                 Approximately 256 bytes 
150-KB Web page                         153,600 bytes 
Protocol overhead (32 bytes * 283 packets)     9,056 bytes 
Total:                                         163,092 bytes or 159.3 KB 

Now that you have an idea of how much data must actually be moved to display a page, you need to divide your connection speed (in bits per second) by the number of bits per page. Remember that you made your calculations in bytes, so you must convert the number of bytes to bits by multiplying the result by eight. This gives you a total of 1,304,736 bits per page.

Table B displays the number of bits per second offered by various types of connections. I’ve gone on to list the total number of pages per second that the connection could support at the current page size. Keep in mind that I’m working with approximate numbers.

Table B
Connection  Bits per second (divided by)  Bits per page (equals)  Pages persecond 
28.8 modem  28,800  1,304,736  0.02 
56 K modem  56,000  1,304,736  0.04 
T-1  1,544,000  1,304,736  1.18 
10 Mbps Ethernet  10,000,000  1,304,736  7.66 
100 Mbps Ethernet  100,000,000  1,304,736  76.6 

As you can see, a 150-KB page isn’t such a good idea if you’re expecting to get a lot of hits. But that’s why you do capacity planning, so you can figure these things out in advance.

Courtesy :