PDA

View Full Version : Odd things in my log files



jpoc
February 13th, 2001, 17:40
I run a small, low volume website. I try to keep an eye on my log files on a regular basis. Mostly so that I can check for problems like 404s, get some idea of how folks use the site and where they come from.

On a few occasions, I have seen something in the logfile that I really just do not understand. I don't mean stuff like "what does that code mean?" but things like "why or how are my visitors doing that?"

For example, part of my site includes reviews of wine. Nothing really special or famous I try to highlight wines that are good to drink despite being at the lower end of the price range.

A couple of months ago, for a few days, ten percent of my traffic was centred on one single file: a jpg scan of the label of a bottle of French red wine. For every access, the referer field was either blank or included that phrase "you will not know" and there were no matching reads of the html file that referred to the image.

Also, every domain that made the access was from a French speaking country. (France, Belgium, Switzerland, Quebec etc.)

Now I suppose that, perhaps some subscriber to a French language news group or email list could have found a copy of the html file in the google cache and then emailed it round the world to a bunch of friends who _all_ use a browser that blocks the referrer.

It just doesn't sound like a likely reason.

Another weird episode was a few days ago when I received several hundred hits from an address in Korea. My log did not resolve the address into a name but a lookup showed that the IP address was registered to a company in Seoul. The odd things about this were that again, there was no referrer field in the log, all of the accesses were limited to one area of the site, two addresses were used, 95% of the hits came from x.y.z.91 and the rest from x.y.z.71. Every single hit from the '91' address led to a 404 despite that fact that the name of the requested html file was valid. Every hit from the '71' address was a success.

I'm not expecting and explanation (though one would be nice).

(Also, I don't really want to post my log files as there are privacy issues there.)

Has anyone else come across unexplained mysteries in the log files?

jpoc

polygon
February 14th, 2001, 02:29
Log files are full of mysteries and surprises, in my experience. Nowadays I get too many hits to worry about them much.

I once got hit by an (apparently) malfunctioning robot which hit my site many thousands of times in the course of a couple of days, going through the same list of pages over and over and over again.

I have noticed that some spiders will visit the site and collect hundreds of pages in a short period and then go away for weeks; others show up for at least a page or two almost every day.

My favorite discovery (back in 1996 when the server that hosted my sites would resolve most of the IP addresses into domains) was that individual machines at Yahoo were named for diseases. There was anthrax.yahoo.com, cancer.yahoo.com, alzheimers.yahoo.com, influenza.yahoo.com, scabies.yahoo.com, herpes.yahoo.com, etc., etc., hundreds of them. I ended up going through their list of IP addresses and found that only a handful of machines like print servers had non-disease names; all the rest were diseases.

Nowadays the referers from the http log are the main way I find pages with links to my site. However, I have pretty much given up trying to list every link on my reciprocal links page. Referers from search engines often tell what people were searching for, which can be pretty amusing.

A very small number of people have apparently configured their browsers not to give a referer address, or to give a meaningless address like www.noreferer.org (is there a simple way to do this?). Some spiders leave a URL for information about the spider, which I think is pretty cool.

jpoc
February 14th, 2001, 05:29
Originally posted by polygon

My favorite discovery (back in 1996 when the server that hosted my sites would resolve most of the IP addresses into domains) was that individual machines at Yahoo were named for diseases.


I saw something similar and very funny a while back. A user was visiting a part of my site that normally encourages a large number of page views. He was behind a bank of proxies and those machines were all named after spices.

So, I saw cinnamon.xxx.com, thyme.xxx.com, basil.xxx.com followed by several others in the same vein and then all of a sudden:

posh.xxx.com and sporty.xxx.com.

It gave me a giggle.

jpoc

polygon
February 15th, 2001, 11:45
Another issue with logs are referrer URLs which are one (or more) generation out of date. In other words, a person visits page X, follows a link there to page Y, and then follows a link from page Y to your page, but the referer listed is still page X.