Web Server Statistics


We have statistics for August 15 - September 1, 1998:
Site
(hit link for more detailed stats)
Hits Data Clients
eco.freedom.org 2,444 19Mb 480
epi.freedom.org 11,475 63Mb 442
freedom.org 12,329 50Mb 1,138
sovereignty.net 10,047 91Mb 691
TOTALS 36,295 223Mb 2,751

What does all this mean?

(It might help to look at a report in another browser window while reading this; Ctrl-N in Netscape/Win95; I dunno how to do it with Internet Exploder)

First, the files are named by date; "980515" indicates a report for the period ending 5/15/98. The "-all" and "-top15" files for a given period are built from the same data. The "-all" files include all the data gathered from the log files, and can be quite large because of that. The "lifetime" reports are generated from the entire collection of logs for a site since it's beginning.

The report files contain a couple of lines at the top, indicating which site's logs they were generated from, and what kind of report it is. The "Generated from" line(s) record the actual filenames of the logs, and are meaningless outside our archiving system. The next line shows the date and time of the first and last request processed by the statistics generator.

The next line is a quick summary of some important numbers derived from the log. A request (aka "hit") is logged when someone requests a file from our server. This page generates two hits when you look at it, one for the page itself, and one for the graphic. A URL, for the purpose of these reports, is something that's been requested. The amount of data sent is abbreviated; k = 1,024 bytes, M = 1,024k. Clients are individual IP addresses; for most of the world one IP address = one computer.


The URL Summary shows statistics about what's been requested, how many times, by how many clients. First, the excluded URL's are those representing graphics; and the non excluded hit count is the number of requests logged for actual pages. The "Share" figure is calculated from the non-excluded hit count, and is meant to make it a little easier to compare individual page's popularity. The URL's as shown in the report are pretty much what you request with your browser when you click a link. Float your pointer over a link, and look at the bottom of your browser window: you should see something like: "http://sovereignty.net/sovboard/WebStats/freedom/". This is the URL that gets logged when you click on that link; the reports do not show the machine name portion, since that's invariant for all the URL's on a site.

Most times someone out there on the net clicks a link on one of our pages, it generates a hit that gets logged on our server. That doesn't imply that the file is actually sent; most browsers maintain copies of the most recently viewed pages and graphics, and will not download them again if they have a copy and that copy isn't older than the web server's copy. That's why the amount of data transferred is included in these reports.

The situation is made even more complex by the growing popularity of web cache systems, such as the one AOL uses. When AOL customer "A" looks at a page from freedom.org, his computer asks a computer at AOL for the page. AOL's computer then asks us. If a customer "B" at AOL asks for the same page 10 minutes later, their computer will probably give him the copy that it kept from customer "A"'s request, and not tell us about the request from "B" at all. AOL isn't the only caching system out there. There are several different ways of doing this trick, but the most popular allows for cache sharing, so that neighboring ISP's (Internet Service Providers) might have customers looking at our sites, without generating any more logged hits than a single user without a cache. I've read that such systems are very widely used overseas; Australia for example is supposed to have a wonderful cache system that covers the whole country. We might have thousands of people looking at our sites in .au, but we'll probably never know about it.


Next is the Client Domain Summary, which is split into two parts. First we have the Top Level Domains, (TLDs) which records how many clients were seen for each TLD. A TLD is something like ".com", or ".net", or ".ru". I've summarized this infomation because it's a quick and easy way to gauge how many clients are connecting from non-US locations. The funky two-letter TLD's are ISO 3166 Country Codes, ".ru" = Russia, and ".bh" = Bahrain; see the list for more. The "<No DNS>" entry records clients for which no reverse DNS is available; clients for which we can't easily determine a name.

After the TLD summary is the Domain Summary, which records in a little more detail where the people who look at our sites come from. Shown here is the total number of domains served, the number of hits from a given domain (not excluding graphics), the amount of data sent to the domain, and the number of clients from that domain seen in the logs. A domain is something like "aol.com". It's a little more complex than that; you'll see things like "lib.vt.us" in the domain lists (indicating the public library system in Virginia), as well as things like "205.188.154.x", which is a domain that doesn't provide reverse DNS. The system we use for figuring out which clients make up a domain isn't perfect, but it's as good as any other method.