[Comp.Sci.Dept, Utrecht] Note from archiver<at>cs.uu.nl: This page is part of a big collection of Usenet postings, archived here for your convenience. For matters concerning the content of this page, please contact its author(s); use the source, if all else fails. For matters concerning the archive as a whole, please refer to the archive description or contact the archiver.

Subject: FAQs: A Suggested Minimal Digest Format

This article was archived around: 29 Oct 2002 06:00:02 GMT

All FAQs in Directory: faqs
All FAQs posted in: news.admin.misc, news.software.readers
Source: Usenet Version

Archive-name: faqs/minimal-digest-format Posting-frequency: every 20 days Last-modified: Wed Jan 25 23:54:34 EST 1995
FAQs: A Suggested Minimal Digest Format Chris Lewis clewis@ferret.ocunix.on.ca The latest edition of this FAQ can always be retrieved from: ftp://rtfm.mit.edu/pub/usenet/news.answers/faqs/minimal-digest-format Changes: URLs are now documented in RFC1630. ------------------------------ Subject: 1. Introduction and Intent The intent of this FAQ is to provide current and future FAQ maintainers with a simple description of a minimal format for FAQs. This minimal format is a simplification of RFC1153 digest format that is sufficient to be compatible with common newsreader digest handling functionality, current practise, and Thomas Fine's "FAQ digest format to HTML" converter which allows more sophisticated viewing on HTML-aware systems such as Mosaic or WWW. There are other more sophisticated formats that you can use, but this is the simplest one that is compatible with a wide range of software that understands digest format. This format is entirely optional. But it is designed to give you the biggest "bang per buck" in terms of existing software compatibility and minimum effort. If you believe that your FAQ can benefit from more sophisticated formats, by all means use them. As such, this FAQ can be simply considered a guide on how to take advantage of some basic digest capabilities in end-user viewing software. Rather than confuse the issue by documenting all of the variation allowed by existing practise and software, this documents a single variant. However, it can be extended by reviewing the documentation for Thomas Fine's FAQ to HTML converter: <http://www.cis.ohio-state.edu/hypertext/faq/usenet/faq-format/top.html> This FAQ is written entirely in the minimal digest format, and can be used as an example. You can skip from one section to the next by pressing ^G in many newsreaders, such as rn, trn and strn. This FAQ describes only how FAQ sections should be delimited, and a couple of suggestions for meta-references to such things as FTP or WWW repositories in formats that other tools support. Note to reader software implementors: you should not take this format as gospel, instead, use it as a guide to one minimal format of many more sophisticated ones. You should really be reading RFC1153, Thomas Fine's material, and consulting news.answers for how FAQS are formatted in real life. See "Newsreader/Converter Specifics" for descriptions of how some newsreaders work with digest-like documents. ------------------------------ Subject: 2. Table of Contents 1. Introduction and Intent 2. Table of Contents 3. What Should the Overall FAQ Look Like? 4. What's a Section, and How is it Formatted? 5. What is the Table of Contents Format? 6. What are External Meta References, and What is Their Format? 7. Where Do I need to Look for Other Information? 8. Newsreader/Converter Specifics ------------------------------ Subject: 3. What Should the Overall FAQ Look Like? Most FAQs lend themselves to a format like: <news headers> <news.answers required headers if the FAQ is registered> <title and author> <section> <section> <section> <section> While FAQs aren't always lists of questions and answers, they usually have "sections" of text -- whether they be sets of lists, individual Q&A's, groups of Q&A, textual sections, whatever. The digest format is all about how these sections should be delimited for automatic parsing. Note that this FAQ doesn't attempt to explain the news headers and news.answers subheaders. For this, you should really consult the FAQs on how to create news.answers postings. It's worth noting a few things here. You should use Expires/Supersedes to manage the deletion of previous copies of your FAQ. It is also a very good idea to use References: lines to link the parts of multi-part FAQs together so that they remain together with Usenet news readers. ------------------------------ Subject: 4. What's a Section, and How is it Formatted? A "section" is merely a block of text. In many FAQs they are simply the introduction paragraph, the table of contents, and each question and answer. Through the use of digest format, most newsreaders can skip from section to section using the convention presented here, and more sophisticated packages can hypertext them. A "section" consists of: <blank line> <string of 30 hyphens> <blank line> Subject: <subject line> <additional optional RFC822-like headers> <blank line> <text> Note that the string of hyphens and "Subject:" must start in column one. "Subject:" has one space or tab between it and the subject line. If you have to put "Subject:" in and don't want it interpreted as a section header, just make sure that it isn't in column one (just like above). If your subject line is too wide to fit in 80 columns, you can continue it onto the next lines, with whitespace at the beginning of the following lines. Example: Subject: this is a long........ subject line The subject can be any arbitrary string of text. You may wish to use a numbering scheme, for it makes it easier for your readers to "grep" down to the precise section they want. You can place additional RFC-like headers after the Subject, such as "From:", "Date:" etc. Again, these headers should start in column one. There should be no blank lines in the entire set of headers in a section. The text is free format ASCII and may be formatted any way you wish. Current FAQ maintainers take note: if you're already using a consistent format for your FAQ, converting to this format will often require only one or two global edit commands. ------------------------------ Subject: 5. What is the Table of Contents Format? The Table of Contents simply consists of the subject lines from the rest of the FAQ, excluding "Subject:", and preferably indented. The subject lines should be exact copies of the section headers. This is only a suggestion. There is no existing software that parses this data. The intent of using exactly the same strings as the subjects is so that users can use search mechanisms to find specific sections. If the subject line is too long to fit in a table of contents line, it is suggested that you truncate it at a convenient point - the search will still work. ------------------------------ Subject: 6. What are External Meta References, and What is Their Format? Many of the more sophisticated viewers can "jump" from one FAQ to the next, retrieve data via FTP, or send email simply by "pointing at" properly formatted "tags" in your FAQ. This FAQ recommends "URL" ("Universal Resource Locator") format tags. See Section 7 for a reference. If your FAQ refers to a FTP-able file, use this format: ftp://<inet>/<str>/<str> Where "<inet>" is the Internet domain name of the server, and the rest of the "<str>/<str>" is the file name. If you want to refer to a directory, leave a trailing "/". This string can be anywhere in the document, inline with text or whatever. Similarly, for html (hypertext markup language)-compatible documents, use http://<inet>/<str>/<str> For clarity, it's best to surround the URL with angle brackets to make it easier to parse. This FAQ uses this convention, ie: <ftp://ftp.uunet.ca/distrib/chris_lewis/hp2pbm/> One difficulty with URLs is that they're often quite long. Do not break them in the middle, or they won't work. It is suggested that if the URL is too long to fit, start a new line with the URL. Even if it does look rather ugly, it's better than not working, or wrapping beyond the 80th column. ------------------------------ Subject: 7. Where Do I need to Look for Other Information? [These seemed relevant, but I need descriptions!] <http://www.cis.ohio-state.edu/hypertext/usenet/faq-format/www/faq.html>, <http://www.cis.ohio-state.edu/hypertext/faq/usenet/faq-format/top.html> <http://www.cis.ohio-state.edu/hypertext/faq/usenet/technical-notes/faq.html> John E. Goodwin's <JEGOODWIN@delphi.com> "Elements of E-Text Style", Note the specification of URLs is now to be found in rfc1630: "Universal Resource Identifiers in WWW" [Jun 94] by Tim Berners-Lee <timbl@info.cern.ch> URL <ftp://ds.internic.net/rfc/rfc1630.txt> <ftp://ftp.isi.edu/in-notes/rfc1630.txt> <http://info.cern.ch/hypertext/WWW/Addressing/URL/URL_Overview.html> ------------------------------ Subject: 8. Newsreader/Converter Specifics Rn, trn, and strn "^G" functionality skips to the next occurance of "Subject:" in column one. GNUs has two "digest" parsers. One insists on full RFC1153 compliance (main Subject: line "digest" tokens etc.), and the other skips to lines with (at least 8?) hyphens starting in column 1. Tin has no digest functionality at present, though, tin's author indicates willingness to add it in a way compatible with this format. This author suggests either the "^Subject:" or "^-*" approach. Nn triggers on Subject: plus From: which is often not applicable to FAQs. Nn "explodes" FAQs with both Subject: and From: subheaders into individual articles. Most nn users this author has discussed this with do not want FAQs to behave this way, which is why this format doesn't require "From:" lines. Thomas Fine's FAQ to HTML conversion system uses a scoring system to measure compliance with the: <blank line> <line of hyphens> <blank line> Subject: <subject> format. See the following for more detail: <http://www.cis.ohio-state.edu/hypertext/faq/usenet/faq-format/top.html> I would appreciate detail on digest/FAQ parsing in other newsreaders and conversion systems.