[Comp.Sci.Dept, Utrecht] Note from archiver<at>cs.uu.nl: This page is part of a big collection of Usenet postings, archived here for your convenience. For matters concerning the content of this page, please contact its author(s); use the source, if all else fails. For matters concerning the archive as a whole, please refer to the archive description or contact the archiver.

Subject: Email Addressing FAQ (How to use user+box@host addresses)

This article was archived around: 4 Dec 1998 23:43:02 GMT

All FAQs in Directory: mail
All FAQs posted in: comp.mail.misc, news.admin.net-abuse.email, news.software.readers
Source: Usenet Version


Archive-name: mail/addressing Last-modified: (2 Jun 98 14:32:39) URL: http://www.qz.to/~eli/faqs/addressing.html Reason-for-last-modification: exim, qmail correction, trn4 update Reason-for-previous-modification: MMDF updated, Pine updated
If you can add information PLEASE DO. This is Unix centric because I have answers for Unix, not because I am trying to shun other platforms. Questions covered __________ 1. What? 2. Why? 3. Things to consider 4. With qmail, how? 5. With sendmail, how? 6. With MMDF, how? 7. With exim, how? 8. With other MTAs, how? 9. With mutt, how? 10. With elm, how? 11. Using them in trn, how? 12. Using them in slrn, how? 13. Using them in tin, how? 14. Using them in nn, how? 15. Using them in Gnus, how? 16. Using them in knews, how? 17. Using them in MH, how? 18. Using them in pine, how? 19. Using them elsewhere, how? 20. Any other addressing suggestions? * Appendix 1. General format of a From: header 2. Filtering 3. A word on address types 4. Comprehensive plan for a new account 5. Verifying Email Addresses A good many of the examples in this FAQ are for a generic user, Alice, who receives mail at bobs-home.com. Her "login" at that host is "alice" and she uses boxes called "mail" and "news" in the localpart of her address. (The word "localpart" comes from RFC 822 which defines the now standard internet email address. It refers to the portion of the email address before the "@".) Bobs-home.com uses sendmail, and the examples reflect that. I find this a slightly more interesting way of giving examples than the crude "localpart@fully.qualified.domain.name" and "localpart@hostname.tld" that earlier versions of this FAQ used. Questions and answers: __________ 1. What? Mail transfer agents (MTAs) usually deliver mail to a single location for each user on the system. There are ways of having mail sent to more than one email address being delivered to the same user. The simplest conceptually is a mail alias. Aliases affect the whole machine and generally can only be set up by administrators. This is not convenient for anyone involved. So some MTAs have special ways of creating user level aliases. To keep these from interfering with other aspects of mail on the machine, these addresses are the username with a piece of punctuation and a boxname appended. ___________________ 2. Why? Traditionally it has been useful for special sorting of mail, eg casanova+anna@love.org, casanova+beth@love.org, casanova+cathy@love.org, etc. so that each of Anna, Beth, and Cathy send mail to a particular address and it gets treated specially. These days it is becoming popular as a way of providing special filtering (such as with procmail) on email addresses exposed to insecure channels such as Usenet. Nancy McGough's Filtering Mail FAQ, available at the URLs below, is a good start to finding out how to filter once you have the use of these addresses. See also the appendix of this FAQ. FAQ "launchers" for the Filtering Mail FAQ: o http://www.ii.com/internet/faqs/launchers/mail/filtering- faq/ o http://www.best.com/~ii/faqs/archive/mail/filtering-faq/ FTP sites for the Filtering Mail FAQ: o ftp://rtfm.mit.edu/pub/usenet/news.answers/mail/filtering -faq (and at the rtfm mirror sites) o ftp://ftp.netusa.net/users/eli/mail.filtering-faq.txt (not strongly recommended) ___________________ 3. Things to consider For these to be useful you really must have several things working together. Some of these things require system administrator privileges to implement, if they are not working on your system already, some do not. Some programs are as easy to configure as adding a line by hand to the headers shown, others require patching the code and recompiling. Hopefully the patches will become part of the official distributions, when needed, to prevent too much duplication of effort. Of primary importance, your system will need a mail transfer agent (MTA) that can understand these addresses when there is an attempt to give them to the system. Installing the MTA is definitely a task for a system administrator. In some cases, configuring the MTA will be as well. As a secondary, but not much less important, concern you will need to be able to configure the various user-level mail handing programs to use these aliases. There has not been a lot of interest in these in the past, so very few programs make it easy to do. The interest in filtering spam sent to addresses culled from Usenet has made configuring news posters my primary concern for this. I would like to add Q&As for mail user agents (MUAs), those programs that generally act as a user interface to mail, but I do not have a strong familiarity with any myself. As a last concern, you will want to know how to filter and sort mail for maximum effectiveness. The appendix currently has that information. Stephen R. van den Berg's procmail utility, available in source form at <URL:ftp://ftp.informatik.rwth-aachen.de/pub/packages/procmail/>, is probably the best tool around. I found it difficult to learn without lots of well-documented examples, so I have tried to provide examples here. ___________________ 4. With qmail, how? Qmail is an MTA. Dave Sill <de5@sws5.ctd.ornl.gov> told me that with qmail all Alice has to do is "touch ~/.qmail-news" to indicate that she wants to receive mail at <alice-news@bobs-home.com>. Or she can put the name of a mailbox in the file and mail will be delivered there. If she doesn't have the .qmail-news file mail will normally be bounced. She can also use a .qmail-default file to catch any other case. If her .qmail is a single line consisting of just "#" all her mail will silently be dropped. This might be a useful way to force people to only send mail to her sub-boxes. She should be sure to have files specified for her other mail, however. Since silently discarding mail is the same as getting it and not responding (to an outside observer), arranging to have the mail bounce might be more useful. She could put a single line consisting of "| exit 100" there which will send a bland bounce message in the typical qmail fashion. This is a user level configuration; it creates boxes of the form <alias-boxname@bobs-home.com>. (If your site already uses '-' in usernames, you may wish to chage the delimiter to another character. This is done at compile time. Version 1.00, change the value of USEREXT_BREAK in conf-unusual.h. Version 1.01, change the file "conf-break". Without this a user called "me-too" would mask some of the suffix space for a user called "me".) ___________________ 5. With sendmail, how? Sendmail is an MTA. Phil Edwards <pedwards@cs.wright.edu> talked about how to do it with sendmail. Quoting him, because it doesn't paraphrase well, "The usual setup is in ruleset 5, consisting of a couple lines like R$+ + * $#local $@ $&h $: $1 R$+ + $* $#local $@ $2 $: $1 + * with another one or two later on in the same ruleset. (I'm doing this from memory; it'll probably differ from your .cf.) From there, it's up to your local delivery program." He suggests using procmail as the local delivery program; refer to the procmail installation instructions for (a little) more information. This is an admin level configuration; it creates boxes of the form <alias+boxname@bobs-home.com>. Once sendmail is configured, usage of the mailboxes is user controlled. Recent releases of sendmail come with this working already. (You can try to send yourself mail at login+boxname to see if this is already set up on your system. If not, you'll need to contact your system administrator.) ___________________ 6. With MMDF, how? MMDF is an MTA. Jerry Sweet, maintainer of the MMDF FAQ, has provided this detailed answer. MMDF has long had an addressing feature of the form "mailbox=string", which causes a message so addressed to be delivered to "mailbox". The whole address, including the "=string" part, appears in the $(address) variable for use in your .maildelivery file. An application note on how to use MMDF's "=" addressing feature to pre-sort incoming mail is available at this URL: http://www.irvine.com/~mmdf/auto-sort/index.html Note that MMDF doesn't preserve the "=string" information for use following delivery, so all handling of the $(address) information must take place at delivery time, using the .maildelivery file. So, in order to take advantage of MMDF's "=" addressing feature, you must use local delivery as opposed to POP delivery. (In other words, you need a login account on the same host that runs MMDF.) For detailed information about using the "=" feature in one's $HOME/.maildelivery file, refer to the MMDF maildelivery(5) on-line man page. This is a user level configuration; it creates boxes of the the form <alias=boxname@bobs-home.com>. ___________________ 7. With exim, how? Exim is a MTA. _Dom Mitchell <hdm@demon.net> provides this method._ First, you must find the directors section of the exim configuration file. In it, you will find a director like this: (it's from the default configuration file, most people should have it) userforward: no_verify, driver = forwardfile; check_ancestor, file = .forward, # filter You must then change it to look like this: (look in the exim manual for details) userforward: no_verify, suffix = "-*", suffix_optional, driver = forwardfile; check_ancestor, file = .forward, filter This will enable username-extension for any value of extension. But it will still get dropped into your default mailbox. To do more with this, you can use exim's built in filtering. You must create a ~/.forward file with the following first line: # Exim filter And you can then use rules like the following to save mail into different mailboxes: if $local_part_suffix is "-foo" then save $home/Mail/foo-folder elif $local_part_suffix is "-bar" then pipe "/usr/local/mh/lib/rcvstore +bar" endif ___________________ 8. With other MTAs, how? Nobody responded with smail, exim, etc solutions. If you are using the Andrew Messaging System (AMS), which you probably are if you have any other Andrew stuff on your system, then you have all of this already. In fact AMS introduced the concept. Read the documentation on FLAMES to find out how to use this. ___________________ 9. With mutt, how? Mutt is a MUA. To change the from header, Alice in her .muttrc defines additional lines with the command "my_hdr": my_hdr From: alice+mail@bobs-home.com (Alice) To just change the Reply-to: header, she can use: my_hdr Reply-To: alice+mail@bobs-home.com (Alice) ___________________ 10. With elm, how? Elm is a MUA. Elm stores extra headers to be used in mail in a .elm/elmheaders file in her home directory. She just needs to add a From: and/or Reply-To: header in that file and it will be used automatically. The headers should be properly formatted for mail. The appendix has sample formats acceptable for a from line. ___________________ 11. Using them in trn, how? Trn is a newsreader. Trn uses two environment variables, NEWSHEADER and MAILHEADER, to control the headers that appear in posts and replies respectively. For the default values Alice can read the man page for her copy of trn. In the comprehensive plan for a new account there is a sample configuration for trn. Suffice to say that she can add a from header herself with those variables, or she can add one by hand every time she sends mail or news. (I use a combination approach myself.) Note that newer versions of trn can be controlled by a user level configuration file to do this stuff, so she doesn't need to clutter the environment. The documentation should cover this. At compile time it is possible to make the inews distributed with trn4 "lax". Just define "LAX_INEWS" in config.h and recompile. When the inews is "lax" it will check $USER and check $LOGNAME (in that order) first for username. ___________________ 12. Using them in slrn, how? Slrn is a newsreader. Alice can add to her .slrnrc: username "alice-news" With newer versions of slrn, she can set them per-group with the article_mode_hook in the .slrn.sl file. ___________________ 13. Using them in tin, how? Tin is a newsreader. Some versions of tin use the USER environment variable to generate the local part of the email address. Here are two aliases resetting it just while using tin: csh or tcsh alias tin "env USER=alice+news tin" ksh, bash, or zsh alias tin="env USER=alice+news tin" If you already have a tin alias, it will not be used. Both aliases will still allow you to give arguments to tin normally. Apparently there are some versions of the "unofficial" tin floating about that would allow her to set a "forged" address from the menu of configuration options. These are probably not too commonly installed. ___________________ 14. Using them in nn, how? Nn is a newsreader. As in trn, Alice can add the from line by hand while posting or configure nn to add them all the time. To get the latter, she would add this to her .nn/init file: set news-header From: Alice <alice+news@bobs-home.com> She should be careful using semicolons (;) in it, unless escaped they separate lines. The appendix has sample formats acceptable for a from line. There is a mail-header setting for replies which works the same way. In an over-zealous interpretation of RFC 1036, nn will add a sender header to her posts with her un-boxed address. I don't know any way around this. Using my inews may work, but I have not tried. ___________________ 15. Using them in Gnus, how? Gnus is a newsreader and MUA. To change them in news she adds to her .gnus: (setq message-default-news-headers "From: Alice <alice+news@bobs-home.com>\n") For news and mail she can change message-default-headers instead. The appendix describes the consideration for a From: header. If she would also use Gnus to read and filter your mail, she can add an entry like the following to nnmail-split-fancy: ("to" "alice\\+news" (| ("subject" "re:.*" "misc") ("references" ".*@.*" "misc") "spam")) This says that all mail to this address is suspect, but if it has a Subject that starts with a Re: or has a References header, it's probably ok, and will be put in the "misc" group. All the rest goes to the "spam" group. This combines filters I use with those used by Tim Pierce. This will sort virtually everything into the right group. She still must check the "spam" group from time to time to check for legitimate mail, though. See the Info documentation for nnmail-split-fancy for details and variations. Mark T. Gray offers this further advice: Also gnus rather unfortunately will insert a Sender: line if it finds that the From: line has a different username than it thinks it should. To correct this put the following function redefinition in your .gnus file: (defun message-make-sender () "Return the \"real\" user address. This replaces the function in message.el which tries to ignore all user modifications, and give as trustworthy answer as possible." (cadr (mail-extract-address-components (message-fetch-field "from")))) It basically will return the same name as you stuck in her From: field above and gnus is then happy to leave out the Sender: line. Era Eriksson points out that this behavior is a configurable option of Gnus: (nconc message-syntax-checks '((sender . disabled)) ) This will preserve any existing overrides for message-syntax-checks by merely adding (sender . disabled) to the end of any preset values. If the variable is already being set e.g. in your site-wide initialization files, you should still be able to use this without clobbering anything. Another very useful function to put in her .gnus is the following hook which will query the user for the alias to add on the username whenever she sends a message: (defun my-address-choice () "This function chooses which alias-suffix to use" (interactive) (let ((alias-suffix (read-string "Which alias-suffix: ")) (alias)) (setq alias (concat user-login-name "+" alias-suffix "@bobs-home.com") mail-default-reply-to alias user-mail-address alias) (message "%s" alias))) (add-hook 'message-header-setup-hook 'my-address-choice) One person I have corresponded with is uncertain that will work and believes it to be of dubious value anyway. Caveat emptor. All of this question was written with Gnus 5.x in mind. I don't know how much will apply to earlier versions. ___________________ 16. Using them in knews, how? Knews is a newsreader. David Kennedy says that to use these, Alice needs to put a line in the Knews file like: Knews.mailName: alice+news He also offers the warning "knews = executable, Knews = config file." David was using version 0.9.8 when he offered the advice. ___________________ 17. Using them in MH, how? Answer submitted by Philip Guenther. I had thought that MH was an MTA/MUA, but apparently it has a news section or I am confused. If you already have a components file for comp(1), then you just need to add a Reply-To: or From: line to it. Be aware that if you add a From: line then a Sender: line with your plain address will be added by post(8), so your plain address will appear somewhere in the message. If you don't already have a components file then create on in your MH mail directory (probably $HOME/Mail), that contains: To: Cc: Subject: -------- then follow the previous directions. In order for replies and forwards to have the additional header you'll need to create or edit replcomp and forwcomp file, and then tell repl and forw to use them by editing your .mh_profile to include lines like: repl: -form replcomp forw: -from forwcomp For the default entries for those file check the repl and forw manpages. ___________________ 18. Using them in pine, how? Pine is a MUA and newsreader. Pine tries to protect users from details and folly/ignorance, so it is not easy. It requires editing the source code and recompiling. If you feel up to it, here's how to do it. Edit the file pine/osdep/os-xxx.h of the source where xxx is the 3-letter abbreviation for your platform. Uncomment out the section which applies to "ALLOW_CHANGING_FROM". Compile. Run it. Go to the pine configuration control/menu and add an entry in "customized-hdrs" for your new From: header. The appendix has sample formats acceptable for a from line. Nancy McGough has a good web page on this that goes into more detail. See it at either <URL:http://www.ii.com/internet/messaging/pine/changing_from/> or at <URL:http://www.best.com/~ii/internet/messaging/pine/changing_from />. ___________________ 19. Using them elsewhere, how? Well, if Alices uses sendmail directly to send mail (as I often do) it is pretty obvious how. Otherwise, I am not sure. Netscape and other programs which ask her to provide her own address will doubtlessly accept these addresses readily. If she uses a program which has inews inject news into an NNTP server, you might find the mini inews that I have hacked up useful. It works like inews with programs that use it in one of these forms: + inews -h < file (new inews) + inews < file (old inews) + inews -h file (new inews) + inews file (old inews) Anything that counts on command line options working is going to be disappointed. Older versions of lynx and all versions of rn, trn and nn that I know use inews in this fashion. This mini inews is from nn by way of lynx and has a number of hacks and bug fixes by me. Of primary concern is that it is now influenced by a NEWSMAILBOX environment variable when generating the From: header. If you would like to use it, the source is kept at ftp://ftp.netusa.net/users/eli/mini-inews.tgz in a tar/gzip file. Alice would use one of these two set ups with this: A. In her .login or her shell rc file (.cshrc or .tcshrc) for csh or tcsh she would have: setenv NEWSMAILBOX +news B. For sh/ksh/bash/zsh she would have this in her .profile: NEWSMAILBOX=+news export NEWSMAILBOX ___________________ 20. Any other addressing suggestions? RFC822 is a long and complex document for those without strong computer science backgrounds. Parts of it will be understandable to most who read it, parts will not. Among other things it specifies how email addresses can be formatted. News and SMTP mail use RFC822 addresses. If she changes her address to one that is equivalent but written differently, she may be able to use procmail or another filtering tool process them. In particular the case sensitivity (upper vs. lower) of the localpart (that which comes before the @ reading left to right) is at the option of the destination machine. So if her MTA is (most are) case insensitive for those addresses, she can change the case of her name and all other mailers should preserve that capitalization change. Humans typing the name tend not to be. Many mailers will preserve any "comments" included in an email address. RFC822 defines a comment: comment = "(" *(ctext / quoted-pair / comment) ")" ctext = <any CHAR excluding "(", ")", "\" & CR, & including linear-white-space> quoted-pair = "\" CHAR ; may quote any char linear-white-space = 1*([CRLF] LWSP-char) ; semantics = SPACE ; CRLF => folding CHAR 7-bit character LWSP linear whitespace: TAB and SPACE CR carriage return, an ASCII control character LF line feed, an ASCII control character This is not very legible to the typical human. What it means is a comment may contain any character, but some -- "(" ")" "\" CR -- must be "backslash escaped". That just means put a backslash (it is "\" and not "/") in front of characters whose special meanings that need to be escaped. Mostly. There is also that bit that comments may contain other comments; this will preclude using regular expressions to match addresses. Basically this translates to mean that your parentheses must match in the normal manner or else should be backslash escaped. So some valid comments are (hi-ya @lice here) (This is //-\\lice) (Let's (recurse (and (again (and (again (and (I'm (bored (now)))))))))) (\) whee) And some invalid comments would be (\/\/\/\/\) ()ops) (Spiff() Now, if that were not ugly enough, RFC822 says that there can be whitespace or comments by any delimiter. The delimiters are ( ) < > @ , ; : \ " . [ ] So the following are all legal and equivalent addresses for me: < eli @ netusa . net > <eli(jah)@netusa.net> < eli(Elijah)@netusa(not associated with usa.net).net > (Elijah) <eli@(dougs-home)netusa.net> < eli @ (the raw IP for mail (and thus subject to change)) [204.141.0.25] > < eli @ (a subtler variation on the above) [204.141.25] > <eli (Pogonatus (latin for <the bearded>))@ (qz (pronounced (queasy) ) \ .little-neck (I did not want that, but RFC1480 required it) .ny (New \ F%@!: York) .us (USA) or ) netusa (Located on Long Island) . net> (Elijah) Alice will find that a lot of mailers are not RFC822 compliant by trying some of those out, especially that last one. Most mailers seem happy to accept and preserve addresses like the second one from the top, so that may be a good way for Alice to modify her address for filtering purposes. "Preserve" does not mean that the comment will be in the same place, just that it will be included. The URL (uniform resource locator) mailto: scheme employs an "RFC822 addr-spec" address in the scheme specific part. According to my reading of the relevant RFCs, this means you can use comments in mailto: addresses, but the "%", "+", and whitespace characters should be "percent escaped". This escaping works by having a % followed by the hexadecimal value of the character so % => %25 + => %2b <space> => %20 Note that the "<" and ">" address delimiters should not be used in a URL. The URL format itself precludes the need for them. Other people disagree with my reading of the RFCs on this issue. They think comments are not allowed at all in the mailto: scheme. And it is true that some of the more obscure web browsers incorrectly percent escape parentheses, thus destroying the address. I think all versions of Netscape, Explorer, Mosaic, and Lynx get it right, but use with caution. ___________________________________ Appendix __________ General format of a From: header Here are the three generally accepted formats for a From: line. These are actually just special cases of RFC 822 addresses appearing after the string "From: ". RFC 1036 specifies that only one of these variations should be used for news. I have tried other variations in mail and seen weird things happen when people with combined mail and news readers try to reply. From: alice+news@bobs-home.com (Alice) From: Alice <alice+news@bobs-home.com> From: alice+news@bobs-home.com localpart will be your username+box; the fully qualified domain name is the full name of a machine you receive mail at; the name field has whatever "full name" you want to use. See the other suggestions question for more information on RFC 822 addresses. __________ Filtering Well, now that Alice has a means of marking addresses for different handling, it is entirely reasonable that she might want some advice on how to use the marks effectively. Nancy McGough's Filtering Mail FAQ referenced earlier in this FAQ is a good place to start learning about filtering tools. I am going to provide some advice here for what to filter with. I have a filtering resource page that gives information on the filters I use, including a link to an annotated version of my procmailrc. The page is at <URL:http://www.netusa.net/~eli/filtering.html> I have found it very effective to give addresses exposed to Usenet in the form of a From: or Reply-To: line the most stringent filtering. I use procmail to handle my filtering, and this is the exact recipe I have for it (my Usenet posts go out from <usenet-tag@qz.little-neck.ny.us>): :0: * ^TOusenet-tag@qz * !^Subject:(.Re:|(.*[[({ -](was|Re):)) IN.junk IN.junk is a mbox file I check occasionally for false positives. Maybe 1-3% of it is false positives, often very easily distinguishable from the other stuff by the subject line. For those unfamilar with regular expressions (REs), that recipe is probably not very clear. Unfortunately REs are quite complicated and beyond the scope of this document. Jeffrey Friedl has written a good book about them, Mastering Regular Expressions published by O'Reilly and Associates. Many other sources cover them in less detail. Check your documentation for grep or egrep in particular. Be aware that most egreps today are much more sophisticated than the one procmail claims to be compatible with. In English, here's what that does. All mail sent to an address containing the phrase "usenet-tag@qz" which gets delivered to me will be tested for compliance with the second rule. The second rule says any of that mail which does not have a subject which begins with "Re:" or have a subject which anywhere in it contains a "[", "(", "{", " " or "-" followed immediately by a "was:" or "Re:" is put into the junk pile. The first part of the subject rule matches regular replies to posts and so they do not get junked. The second part recognizes the standard Usenet convention for changing a subject line, which is to put the new subject in front and then leave the old one there (sometimes parenthetically) after the word "was:". The types of mail this messes up on are those where someone takes the time to note the address, and uses it to send new mail later. This happens very infrequently in my experience. Tim Pierce and others have advocated checking for headers left in by newsreaders when filtering mail for Usenet addresses. This is subject to the same problem as my technique, but is additionally dependent on people not replying from copies found in DejaNews or Altavista and on relying upon newsreaders to add headers not specified in any standards document. That second bit makes it much too unreliable, in my opinion. A note about that recipe: My username is not "usenet" and that is not a submailbox for me. With sendmail (at least how I had it configured, YMMV with legacy sendmail.cf files) sub-boxing works just the same. Other MTAs might not work that way: the box name may get canonicalized out of the header. Mailers are not supposed to rewrite these headers, but it happens all the time. In any case, Philip Guenther <guenther@gac.edu> wrote on the procmail mailing list (to join, send subscription requests to <procmail-request@Informatik.RWTH-Aachen.DE>): Michael Ghens <michael@spconnect.com> writes: >I missed it, how does the + addressing get passed in > >as in michael+pgp@spconnect.com It shows up as $1 in your procmail, which you can then test with something like: ARG = $1 :0 * ARG ?? ^^pgp^^ pgp-folder :0 * ARG ?? ^^cypher-punks^^ cypher-punks-folder You can't test $1 directly, thus the assignment to ARG. ^^ is a weird procmail-specific regexp anchor. ^ and $ work just as well for what we want. (Gory details: ^ and $ match zero width conditions near line ends, while ^^ matches zero width conditions near string ends, much like \A and \Z in perl.) The main advantage of that recipe is $1 is set to the correct subbox even for Bcc'ed stuff when procmail is used as the local delivery agent from sendmail. (For certain sendmail.cf configurations at least.) One can AND test together in a procmail recipe in a number of ways. Including a number of separate * tests is one way, but it does not allow if-then-else type structures easily. For those { ... } grouping can be used. Here is an example for a submailbox named "news" employing my subject rule. ARG=$1 # Rules to apply to Usenet responses. :0 * ARG ?? ^news$ { # Where to send stuff missing Re: or was: in the subject :0: * !^Subject:(.Re:|(.*[[({ -](was|Re):)) IN.junk # Where to send the rest of it. :0: IN.good } _____________________ A word on address types In the simple mail transport protocol (SMTP) there are two types of sender address and two types of recipient address. This can be used for all sorts of obfuscation, and often is. The types are, respectively, the envelope addresses and the header addresses. Within the protocol, the difference between the two is that envelope addresses are specified with protocol commands and header addresses are specified as part of the mail data. There need not be any correlation between a From: or To: in the headers and those from the envelope. Most mailer delivery agents add a Return-Path: header that has the envelope from address when delivering mail. Some delivery agents add the envelope to address to a Received: header, but they do not do it when there is more than one recipient at a site. If either of the header to or from addresses are missing they will be added as an Apparently-To: or Apparently-From: using the envelope addresses. Why the two different types? Well it turns out that this is really useful for forwarding mail, especially for mailing lists. Mail sent to a mailing list will be delivered with the envelope from set to the list, the envelope to set to each member of the list, the header from set to the sender of the message, and the header to set to the list. Note that this is a convention, not a requirement. It makes a lot of sense for a mailing list, if you think about it. Similar behavior is used for blind carbon copied (BCC) mail. There is the problem however that people wishing to send bulk email (legitimately or not) can just set up a mailing list and it all gets delivered to the recipient without them knowing what address was used. The implications of this on sub-box filtering should be obvious. Unless your mail delivery agent does your filtering, all mail to desired mailing lists must be pre-filtered out and then all other Bcc'ed mail must be considered suspect. Here is a sample procmailrc for Alice if she were subscribed to several lists I read: # Process mail not specifically addressed to alice@bobs-home :0 * !^TOalice@bobs-home { # Sent by or to the quickcam driver developer list, a generic list # example. Return-Path:, assuming your mailer adds it, is another # good generic header to filter on. Some mailing lists are run by # a special list user and so all lists from that machine will have # the same Return-Path:. This will break things in some situations. :0: * ^(Sender|To):.*quickcam-drivers IN.quickcam-list # Mail from the lynx developer's list -or- a Bcc'ed personal reply # to mail from it. This is a filtering (dis)advantage of lists that # modify the subject. :0: * ^Subject:.*LYNX-DEV IN.lynx-list # From the procmail list. All smartlist mailing lists have a handy # X-Loop header to filter on. :0: * ^X-Loop: procmail@informatik.rwth-aachen.de IN.procmail-list # Remaining mail is either a Bcc or an unknown mailing list. :0: IN.bcc-suspect } With MMDF filtering or procmail as her local delivery agent and some sendmail.cf magic, Alice can filter on the envelope address. For procmail see the example above on the usage of $1. For MMDF she can add to her .maildelivery file a line like: addr = | A "/path/to/procmail -a $(address) -d alice" This will enable procmail filtering on the envelope as above. Read the man page for maildelivery(5) for more information on a MMDF .maildelivery file. _____________________ Comprehensive plan for a new account This is a plan of action Alice, mail id alice at bobs-home.com could use. Bobs-home has sendmail configured to use + addresses, but not to use procmail as a delivery agent. (So procmail cannot know the envelope address.) Here is what she is going to do: bounce all mail addressed to "alice@bobs-home", accept mail addressed to "alice+mail@bobs-home", and filter mail to "alice+news@bobs-home". Then set the reply to for all of her mail to "alice+mail@bobs-home.com" and send all of her posts from "alice+news@bobs-home.com". This is hard to do on an old account, because too many people know the regular address. With a new account she can prevent them from ever getting into the habit of using it. She starts off by creating this .forward file to invoke procmail on all of her mail. The quotes used are an important part of this. "|IFS=' '&&exec /usr/local/bin/procmail -f-||exit 75 #alice" Here is a procmailrc that should do the filtering part. As yet this is UNTESTED. The DEFAULT variable should be defined by procmail to her system mailbox, setting it herself won't hurt, assuming she gets it right. # The system mailboxes at bobs-home are in /var/spool/mail/. DEFAULT=/var/spool/mail/alice # Where non-DEFAULT mailboxes live; Mutt uses $HOME/Mail/ by default. # Destination files in recipes will be relative to this unless they # start with a '/'. $HOME is set already. MAILDIR=$HOME/Mail # Where to log. LOGFILE=$HOME/.procmaillog # Set the shell, just to make sure it's /bin/sh SHELL=/bin/sh # Three line summary log for each piece of mail. LOGABSTRACT=yes # Addtionally, log the To: header. formail comes with procmail. LOG=`formail -X To:`" " # Accept real mail :0: * ^TOalice\+mail@bobs-home $DEFAULT # Accept mailer daemon mail :0: * ^FROM_MAILER $DEFAULT # Filter news replies # (The plus character in the address is "magical" in REs # and needs to be protected with a backslash) :0 * ^TOalice\+news@bobs-home { # Bad subject: not in reply form :0: * !^Subject:(.Re:|(.*[[({ -](was|Re):)) IN.junk # Reply to an article :0: $DEFAULT } # Bounce remainder, saving a copy in junk, just in case. The EXITCODE is # the value procmail will return to sendmail on exit. 67 is a sendmail # specific value that will cause a bounce message to be sent. EXITCODE=67 :0: IN.junk Alice uses trn to read and post news, so she sets the NEWSHEADER and MAILHEADER environment variables. sh/ksh/bash/zsh commands that go into a .profile for this are: # This is the default value of NEWSHEADER in 3.6 with a from line prepended. NEWSHEADER='From: Alice <alice+news@bobs-home.com> %(%[followup-to]=^$?:%(%[followup-to]=^%n$?:X-ORIGINAL-NEWSGROUPS: %n ))Newsgroups: %(%F=^$?%C:%F) Subject: %(%S=^$?%"\n\nSubject: ":Re: %S) Summary: Expires: %(%R=^$?:References: %R )Sender: Followup-To: %(%{REPLYTO}=^$?:Reply-To: %{REPLYTO} )Distribution: %(%i=^$?%"Distribution: ":%D) Organization: %o Keywords: %[keywords] Cc: \n\n' # This is the default value of MAILHEADER in 3.6 with a from line prepended. MAILHEADER='From: Alice <alice+mail@bobs-home.com> To: %t Subject: %(%i=^$?:Re: %S %(%{REPLYTO}=^$?:Reply-To: %{REPLYTO} )Newsgroups: %n In-Reply-To: %i) %(%[references]=^$?:References: %[references] )Organization: %o Cc: Bcc: \n\n' export NEWSHEADER MAILHEADER This will cause the Originator: header to be set to "alice@bobs-home.com", but since she is bouncing that address it shouldn't matter too much. If she really wanted to clean that up, she could use a different inews and Pnews. (Stuff that is posted and mailed using the Cc: header on a post will appear to be from the "alice+news" address.) Alice uses mutt to read and reply to mail, so she creates a .muttrc with the following line it it: my_hdr From: Alice <alice+mail@bobs-home.com> When Alice wants to subscribe to mailing lists she will have to adjust her .procmailrc accordingly. She also can't get Bcc'ed mail as sent by most programs. As a last comment let me state that there is nothing inherently special about her using "+mail" and "+news" as the box names, it was just convient. _____________________ Verifying Email Addresses An often asked question in many newsgroups is "How can I verify an email address?" Sometimes people mean that they want to verify deliverability, sometimes they wish to verify valid syntax. The reasons for doing this vary from not wanting to accept invalid addresses in databases to wanting to discard all mail from invalid addresses. Neither verification variant is probably worth the effort. A sanity check might be worthwhile for catching obvious problems, but that is about it. Some of the techniques and pitfalls of the deliverability variant include: * Technique: "Verify that the destination machine name is real" Pitfalls: Between mailer exchange (MX records) redirections and wildcarded mailer exchanges for whole domains, a real address might be incorrectly flagged as bad. Combine that with temporary domain name service (DNS) and other transient problems that crop up and the complexity becomes apparent. Then you have to worry about routed addresses which have multiple hostnames to worry about, at least one of them probably not Internet accessible. UUCP bang ("!") paths are probably the best known variety of these. * Technique: "Use the SMTP VRFY command" Pitfalls: The most common pitfall with this is that most modern MTAs can be configured to not give out useful information in response to this or to the related EXPN command. Gateways, remailers, and various transient problems will further frustrate the would-be VRFYer. * Technique: "Attempt to mail to the address and check for a bounce" This sometimes includes actually sending mail, and sometimes just presenting a mail envelope to the mail server. Pitfalls: Some addresses never bounce mail even if it is an invalid address. Some addresses send faked bounces to some mail while still saving it for a recipient to read. Sometimes MTAs send informational transient error bounce messages to indicate the mail is spooled and delivery will be reattempted. These do not provide any information about validity of the address. * Technique: "Send mail to the address asking for a reply" Pitfalls: Getting a reply from this is the only way to be sure that an address was valid at the time you sent the mail. Since then, of course, the address could have become invalid. Not getting a reply doesn't help too much. Getting a bounce may help. The "check for bounce" tests are probably the best sanity check for address deliverability. Most addresses that bounce are invalid, most that don't are valid. The other variant, checking for valid syntax in an address, is doable, even when isolated from a network. It is not done easily however. Look at some of the valid address examples I gave earlier in this FAQ. The most pathological of the lot are probably these two: < eli @ [ 204 . 141 . 25 ] > <eli (Pogonatus (latin for <the bearded>))@ (qz (pronounced (queasy) ) \ .little-neck (I did not want that, but RFC1480 required it) .ny (New \ F%@!: York) .us (USA) or ) netusa (Located on Long Island) . net> (Elijah) The first one uses a domain literal, hence the [brackets]. Many people seem to think that it is fine to just use raw IP addresses in the route specification portion of an address. This is not true. Some mailers may accept it, but the proper notation uses brackets. Even when people do know that, they don't often know that in dot quad notation the third quad or both the second and third quads may be dropped if they are zeros. So "[127.1]" is equivalent to "[127.0.0.1]" and "[204.141.25]" is equivalent to "[204.141.0.25]". The second example is designed to break all regexps I have seen to check for a valid RFC 822 address. Jeffrey Friedl designed the most comprehensive one I have seen as an example for his book Mastering Regular Expressions. But because regular expressions may not match arbitrary paired nestings, e.g. properly matched parentheses, his effort was only designed to match two deep nesting. For other efforts to recognize addresses, that example includes six characters special in email addresses ( < > % @ ! : ) inside of comments in it. Not to mention all that whitespace, since it's allowed and improves readibility so much. :^) The two practical options are to write a full RFC 822 address parser or to "destructively" test a copy of the address. The first is not too difficult, but there are about three pages of BNF in RFC822 so it would probably take at least an afternoon's time to write. The second involves making a temporary copy of the address and then iteratively removing comments and other trouble- some things with say a regexp, and then feeding it to Friedl's verifier. Here's some perl code that should work for such a destructive preprocessor: $copy = $address_to_test; $copy =~ s:\\.:a:g ; # replace \quoted stuff $copy =~ s:\\\n::gs ; # unfold lines $copy =~ s:"[^"]*":b:g ; # replace "quoted" stuff while ( $copy =~ s:\([^()]*\)::g ) {;} # (remove ((all) comments)) The while loop removes nested comments one layer at a time. Friedl's code to check the address is available from his web pages: <URL:http://enterprise.ic.gc.ca/~jfriedl/regex/code.html>. It is about 5k. If the copy is a syntactically valid address, then the original was as well. For a reasonable sanity check after my destructive rewriting, try this. $copy =~ s:\s+::g ; # suppress whitespace $lft = ($copy =~ y:<:<: ); # count <s $rgt = ($copy =~ y:>:>: ); # count >s if (( $rgt != $lft ) || ( $lft < 0 ) || ( $lft > 1 )) { print "Address has invalid encapsulation\n"; } else { # extract from encapsulation, if any $copy =~ s:.*<([^>]*)>.*:$1: ; if ( $copy =~ m:\@\.|\.\.|\.$: ) { print "Address has invalid dot placement\n"; } elsif ( $copy =~ m:\@.*\@|^\@|\@$: ) { # There are legal addresses this will reject, but they # are mind-numbingly rare in practice. The most common # form is illustrated in this routed address for me: # <@alpha.netusa.net:eli@mail.netusa.net> print "Address has dubious \@ usage\n"; } elsif(!($copy =~ y:@:@:)) { print "Address does not have an at sign\n"; } elsif(!($copy =~ y:.:.:)) { print "Address does not have a dot\n"; } else { print "Address is probably good.\n"; } } People often try to make other tests, many of them ill-advised. One common one is testing for a known top level domain (TLD). There are a lot of these, so doing it right involves a lot of tests or a very large and probably slow regexp. And that ignores the fact that new ones get added every now and then. As I write this there is the case of Zaire. Until recently it was ".zr", but now that Laurent Kabila has forced Mobutu Sese Seko out, the country has been renamed the Democratic Republic of Congo. It is not unreasonable to expect that a new TLD might be added for it. Another common but unreliable test is checking for spaces. Besides the case I have often illustrated of whitespace being allowed around the delimiting characters, it is legitimate to have spaces in the local part of an address. Try sending mail to my "echo request"@qz.little-neck.ny.us address. Don't worry about the subject, but put just the word "ping" in the body. This toy will reply from the address it receives your mail from. Thus you can see if your mailer is breaking things, such as sending the mail to "request@qz" instead. _________________________________________________________________ Comments on this FAQ? Send me mail.