Note from archiver<at>cs.uu.nl:
This page is part of a big collection
of Usenet postings, archived here for your convenience.
For matters concerning the content of this page,
please contact its author(s); use the
source, if all else fails.
For matters concerning the archive as a whole, please refer to the
or contact the archiver.
Subject: [FAQ] Gathering Traffic Data for Proposed Newsgroups
This article was archived around: 30 May 2006 04:19:53 GMT
Last-modified: 10 June 2001
Posting-Frequency: Monthly (on the 1st)
Maintainer: Rob Maxwell <firstname.lastname@example.org>
Disclaimer: Approval for *.answers is based on form, not content.
Gathering Traffic Data for Proposed Newsgroups
How to use Google Groups
The traditional expectation that a newsgroup justify its existence by virtue
of existing Usenet traffic goes back to the earliest days. It precedes the
birth of alt.*, the Great Renaming that bought forth the Big 7 (later the
more familiar Big 8 with the creation of the humanities.* hierarchy in 1995),
and even the rise and eventual fall of the backbone Cabal.
In the early 1980s, if discussion of a topic became significant enough, a new
newsgroup was created to centralize the discussion. With only a relatively
few corporate and university mainframes providing the Unix Users' Network
(Usenet) to a similarly few readers it was fairly easy to see when a topic
was worthy of receiving its own newsgroup. Today with over three Gigabytes of
text-only discussion occurring on a daily basis coupled with the abuse of the
alt.* newsgroup creation process leading to a significant number of alt.*
newsgroups not being carried on any given news server it has become
effectively impossible to see when a topic becomes popular enough to warrant
a newsgroup of its own.
This is where Google Groups comes into the picture. It would start in 1995
when Deja decided to begin archiving Usenet text postings until 2000 when the
task became too overwhelming and expensive leading them to try different
things but ultimately their efforts would be futile leading to their sale of
their archive and name to the Internet search engine company Google. After a
rough start, Google was finally able to bring together Deja's massive archive
with their recent efforts at archiving Usenet under the name of Google Groups
The journey to Justification begins at Google Groups' Advanced Group Search
<http://groups.google.com/advanced_group_search>. What you will be looking
for is how often the topic is discussed in English on Usenet. The customary
method uses a search for the keyword or phrase being used over the last
ninety-days. The recommended quantity of on-topic posts is ten (10) per day
on average. For the sake of this demonstration we will be trying to justify
the ABC television show "20/20".
Start by typing 20/20 into "Find Messages with all of the words", change the
dropdown box from "10 messages" to "100 messages", Language Return messages
written in "any language" to "English", and Message Dates () Return messages
posted between 29 Mar 1995 to the date three months before today's date. A
visual example is available at: <http://www.alt-config.org/20-20a.gif>
The results for this search for "20/20" on 27 May 2001 produced these
Relevant English Messages for 20/20 from 28 Feb 2001 to 27 May 2001 Results
1- 100 of about 12,400. <http://www.alt-config.org/20-20b.gif>
That averages out to 137.78 posts per day which clearly meets the 10 per day
recommendation, or does it?
Refining the search results
Taking a closer look at the 20/20 example shows that the first on-topic
mention of the show is the 14th search result. <http://www.alt-config.org/20-
Although this is an extreme example which is badly contaminated by "%20"
which is a way of representing a space in a URL when of course spaces are not
allowed and is often in a search result URL which is seen in the third search
result for 20/20.
Repeating the search for 20/20 and adding "abc" it is on produces radically
Relevant English Messages for 20/20 abc from 28 Feb 2001 to 27 May 2001
Results 1-100 of about 374
Three hundred seventy-four averages out to a mere 04.16 posts per day coming
to less than half of the desirable results. <http://www.alt-config.org/20-
This is why your initial search results must be checked carefully before
attempting to use them. First off, there is a known glitch in the software
Google acquired from Deja which usually does a poor (sometimes comically
poor) estimate of "about" how many results were found. A blatant example of
this was a search for "infertility insurance":
Relevant English Messages for "infertility insurance" from 18 Feb 2001 to 18
May 2001 Results 1 - 4 of about 6. <http://www.alt-config.org/20-20e.gif>
The quick way to see the actual totals or least enough to see if there is
justification which of course would be 900 on-topic messages over 90 days is
to scroll down to the bottom of the page (or press the [End] key) and double-
click the 9 under Goooooooooogle which will take you to the 901st message if
there is one. [Note: This is why "100 messages" is selected instead of the
default "10 messages".] The glitch is meaningless if the top line is:
Relevant English Messages for "_______" from 28 Feb 2001 to 27 May 2001
Results 901-1000 of about #,###.
Things to avoid
Most of the things that can falsely inflate results show up on the last
pages. A weekly Frequently Asked Questions (FAQ) on the topic or containing a
reference to same will produce 12-14 identical results with only one being
valid. Far worse then this is when the subject ends up in someone's signature
if they post a few messages per day they can create a few hundred false hits
in the 90 day period. A sig hit requires a search in the same time frame for
the author to determine the total number of hits the sig has caused and then
finding out the number of actual posts made on the subject being searched.
... END ...