RFC # 724 NIC #37435 12 May 1977 Proposed Official Standard for the Format of ARPA Network Messages by Ken Pogran, MIT-LCS/CSR (Pogran at MIT-Multics) John Vittal, BBN (Vittal at BBN-TENEXA) Dave Crocker, RAND-ISD (DCrocker at Rand-Unix) Austin Henderson, BBN (Henderson at BBN-TENEXD) Proposed Standard for Message Format / ii PREFACE ARPA's Committee on Computer-Aided Human Communication (CAHCOM) wishes to promulgate an official standard for the format of ARPA Network mail headers which will adequately meet the needs of the various message service subsystems on the Network today. The authors of this RFC constitute the CAHCOM subcommittee charged with the task of developing this new standard; this document presents our current thoughts on the matter and a specific proposal. This document is organized as follows: First, we present a history, of the development of what has become known as the ARPA Network "mail" or "message" service, and the issues which we feel are most pressing -- problems for which solutions are lacking today, inhibiting the further development of message subsystems. We then present the specification for the new ARPA Network Message Header standard. This is followed by a References section. Essentially, we propose a revision to Request for Comments (RFC) 561, "Standardizing Network Mail Headers", and RFC 680, "Message Transmission Protocol". This revision removes and compacts portions of the previous syntax and adds several features to network address specification. In particular, we focus on people and not mailboxes as recipients and allow reference to stored address lists. We expect this syntax to provide sufficient capabilities to meet most users' immediate needs and, therefore, give developers enough breathing room to produce a new mail transmission protocol "properly". We believe that there is enough of a consensus in the Network community in favor of such a standard syntax to make possible its adoption at this time. We would like to make clear the status of this proposed standard: The CAHCOM Steering Committee has replaced the Message Service Committee as the ARPANET standards-setting organization in the area of message services. It is expected that the proposal of this CAHCOM subcommittee, when in its final form, will be adopted as an ARPANET standard by CAHCOM. In the interests of making this standard the best possible one, we are distributing this proposal as an RFC. Please send any comments and criticisms to any of the authors of this RFC by 15 June 1977. It is planned that the standard will be officially adopted by 1 September 1977, with hosts expected to accept its syntax by 1 January 1978. Proposed Standard for Message Format / iii CONTENTS I. PROBLEMS WITH ARPANET MESSAGE STANDARDS A. Background and History B. Issues and Conclusions C. Message Parts D. Adoption of the Standard II. STANDARD FOR THE FORMAT OF ARPA NETWORK MESSAGES A. Framework B. Syntax C. Semantics D. Examples III. REFERENCES APPENDIX A. Alphabetical Listing of Syntax Rules I. Problems with ARPANET Message Standards / 1 A. Background and History I. PROBLEMS WITH ARPANET MESSAGE STANDARDS A. BACKGROUND AND HISTORY Today's ARPA Network "mail" or "message" service uses, for its delivery mechanism, two special commands of the File Transfer Protocol. Viewed from within the structure of FTP, the entire message, both header and text, is data for the FTP MAIL and MLFL commands. This facility was added to the File Transfer Protocol as an afterthought; it was an interim solution to be used only until a separate mail transmission protocol was specified. Several versions of such a protocol have been proposed, but none has yet received general acceptance. Meanwhile, attempts have been made to improve upon the original interim facility. As message service subsystems on various host systems (especially TENEX) developed to the point where rudimentary parsing of incoming messages was being done, it became clear that it would be desirable to standardize the format and content of the headers of messages transmitted between hosts using these FTP commands. To this end, an ad hoc committee wrote RFC 561, which suggested a standard message header format. The committee was unofficial, so it could not legislate a standard, it could only recommend. However, the standard it suggested adequately met an urgent need, and was generally adopted. Several salient points should be noted: 1. RFC 561 defined the concept of a message header, and specified the syntax which delimited it from the actual text of a message; 2. It proposed a standard format for the most obvious and most urgently-needed header items: "From:", "Date:", and "Subject:"; 3. It proposed that a general standard syntax be used for all other header items; 4. RFC 561 is still, today, an unofficial standard, adhered to by most because of its utility; 5. Its syntax was designed to allow humans to read the text easily, without the aid of special message processing systems. I. Problems with ARPANET Message Standards / 2 A. Background and History As message services grew in sophistication, the need for specific header items in RFC 561's "miscellaneous" category grew: "To:" and "cc:", especially, were generated and recognized by several different message services. However, there was no specific standard for the syntax of the contents of these items. The message service subsystems on TENEX developed a particular format for these items; since more messages originated from the TENEX hosts on the Network than from any other type of host system, the TENEX format for these fields soon became a de facto standard. Message service subsystems on TENEX began to parse these fields, expecting them to be in the TENEX-generated format. Message service subsystems on other hosts -- Multics, for example -- began to dabble with other formats for these fields, since there was no standard for them, only to receive complaints from users of TENEX message service subsystems that their "non- standard" message headers could not be parsed according to the (de facto) "standard" syntax. Recognizing that the time had come to make an attempt to standardize the additional header fields that had come into use since RFC 561 was published, ARPA's Message Service Committee chartered a small group in 1975 to develop a revised version of RFC 561 which would define the syntax of these additional message header fields. Several things should be noted about this small group of people: first, they were TENEX-oriented; when the functionality of the message header items they desired was matched by the functionality of an already-existing message header item of the TENEX message subsystems, they adopted the syntax used by the TENEX message subsystems. Second, they based additional header items not already found on TENEX message subsystems on the deliberations of the Message Service Committee. Third, they were not familiar with the procedure for publication of a document as a Network RFC. The document which this group produced, labelled RFC 680, "Message Transmission Protocol", received only limited distribution. Matters were further confused because its title was misleading, since it was not a protocol for the transmission of messages between ARPA Network hosts, but rather a standard for the format of messages transmitted via the standard File Transfer Protocol. Some, including the Message Service Committee, believed that RFC 680 became a Network Standard. This was not strictly true, because it never received proper distribution, and it had never been "officially blessed" by anyone, to turn it from a request for comments into an accepted official ARPA Network standard document. Reflecting this confusion over the status of the document are the facts that the document DOES currently reside in the "official" ARPANET Protocol Handbook, and most users and message system implementors remain unaware that this is so. I. Problems with ARPANET Message Standards / 3 A. Background and History For all its shortcomings, RFC 680 has performed a needed service, just as did RFC 561 before it. It defined additional message header items at a time when this needed to be done. Unfortunately, since the group had not sought ideas and input from others, the specification did not adequately respond to a sufficient set of community needs. In addition, the manner in which the document was promulgated -- or not promulgated -- left a great deal to be desired. Implementators of message-processing subsystems who had not received RFC 680 proceeded to go their own ways, feeling justified in doing so, while those who accepted RFC 680 as a standard felt justified in complaining to -- and about -- those whom they considered to be maverick implementors of idiosyncratic message service subsystems. Perhaps because of the ad-hoc nature of the interim mail facility, users have not, until recently, attempted to push the system to the limits of their imagination. Presently, however, several different sites are using the "interim" mail facility for more than it was designed and in ways which are incompatible both with each other and with the original intent of the facility. Mail subsystem implementors are increasingly being asked to provide for the handling of mail from idiosyncratic hosts. Also, it has become clear that there are a few very specific features, too useful to ignore, which cannot reasonably be specified within the syntax of RFC 680. B. ISSUES AND CONCLUSIONS At first glance, it would seem that a resolution of today's somewhat chaotic situation could best be obtained by immediately junking the existing "interim" mail facility, and adopting a true mail transmission protocol. We strongly believe that this would be ill-advised at this time, for we feel that there is no general understanding within the Network community today of how to specify and implement a full and adequate mail transmission protocol. However, we are convinced that there is, finally, a strong commitment within the Network community to attack this problem (which there was not at the time the "interim" mail transmission facility was specified and developed). The frontal attacks on the mail protocol problem have, so far, resulted in at least two suggestions for a mail transmission protocol. Why should not one of these protocols be adopted immediately? We feel that, in general, there has been a tendency for experimental Network software to be prematurely treated as though it were adequately designed and fully operational. Typically, the system or protocol proposed is so much better than what was previously available that its experimental nature is disregarded, and it is pressed into service before it has had a I. Problems with ARPANET Message Standards / 4 B. Issues and Conclusions chance to properly develop and mature. We are very concerned that this phenomenon not afflict the Network mail system any more than it already has. While it is true that there are several sites in the ARPA Community which have mail systems that understand the syntax specified in RFC's 561 and 680, in addition to some of the "non- standard" syntax provided by the mail generating programs at several other sites, most mail systems do not parse much of the contents of received messages. A consideration of the syntax specified here is that messages which are sent to people should be easily read by people. Parsers which can turn an ugly, syntactically expedient form into something which is easy to read are the exception, rather than the rule, in today's message systems. Also, the modifications to the existing "non-standard" syntax should be kept to a minimum, enhancing the probability that the requirement of small perturbations to existing software will be accepted. With this syntax, we introduce mechanisms so that: 1. Users of mail systems can have multiple mailboxes, either on one machine or multiple machines, all of which are treated identically; the default mailbox for a user is not necessarily associated (directly) with his login name. 2. Mail for a person can be sent to other than a single, default mailbox. 3. Named groups may consist of both individuals and (possibly) other named groups (i.e., nesting within groups is permitted). 4. Address lists may contain references to other, stored, lists. The complete path with which one can retrieve the stored list may be specified in order to allow either manual or automatic retrieval of the stored list. 5. Address lists may contain references to addresses which are not accessible through the standard ARPANET message system. For example, U.S. Postal system addresses can be specified. Such addresses are, of course, expected to be ignored by the ARPANET system, although individual sites may provide services for using the information (e.g., automatically sending a copy of the message to a line printer, in preparation for transmission through the Postal system). 6. Parenthetical remarks, or comments, can be included and syntactically recognized as such within some header items. I. Problems with ARPANET Message Standards / 5 B. Issues and Conclusions 7. Received messages are capable of being read by humans without a program having to parse the message (or parts of it) before presenting the message to the user; however there is sufficient formal syntax to enable a parsing program to modify the appearance and content of material presented to users. Although message-display software may exercise considerable control over message appearance, the degree to which a message's actual format is PLEASANT for humans to read is entirely the responsibility of the message creation program. No mechanism for authentication is provided, since the Network provides no mechanisms for enforcing mail security. The syntax does provide for one aspect of "correctness": a distinction is made between an address which is claimed to be a valid network address and one which is simply free text, included for the convenience of the human participants. C. MESSAGE PARTS Some confusion has existed over the roles played by different message parts. Einar Stefferud has suggested using the perspective of envelope, letter head, and letter content. The presence of structured portions in messages additionally requires reference to "headers". In computer-based message systems, human users do not generally encounter "envelopes", which are often constructed automatically, to be used by the participating system(s) to deliver the message. For example on TENEX, the envelope is the name of the file containing a message awaiting transmission. For FTP servers, it is the data portion of the MAIL or MLFL command line. Some systems attach "envelope-like" information to the message header, such as time-stamp and originating host name. In paper-based communications, headers occur both before (e.g., "To:" and "From:" and after (e.g., "cc:" and "enclosure:") the body of the message. Within this standard, all headers occur before the body of the message, although local message display programs may choose to alter that ordering. Wayne Hathaway has pointed out that ARPANET message format does not support specification of letterheads, since these are a type of organizational public relations symbol. Some idiosyncrasies are supported, however, by way of choosing special field names. In general, it is important to realize that the header portion of a message plays several roles during the life of a I. Problems with ARPANET Message Standards / 6 C. Message Parts message, variously participating in each of the three functions suggested by Stefferud. D. ADOPTION OF THE STANDARD During the early phases of specifying this standard, a great deal of concern was expressed over the problems which may be experienced during the transition from the current standard to this new one. We feel that the true problem is the lack of realization that THERE IS NO CURRENT OFFICIAL STANDARD. Enough systems have enough overlapping behaviors to allow the current mail environment to function, but this in no way constitutes a standard. In fact, we strongly believe that the new requirements imposed by the proposed standard involve less complexity than the ambiguities resulting from the current variations in system behaviors. II. Standard for the Format of Messages / 7 II. STANDARD FOR THE FORMAT OF ARPA NETWORK MESSAGES This standard supercedes the informal standards specified in ARPANET Request for Comments numbers 561, "Standardizing Network Mail Headers", and 680, "Message Transmission Protocol". In this document, a general framework is described. The formal syntax is then specified, followed by a discussion of the semantics. Finally, a number of examples are given. This specification is intended strictly as a definition of what is to be passed between hosts on the ARPANET. It is NOT intended to dictate either features which systems on the Network are expected to support, or user interfaces to message creating or reading programs. A distinction should be made between what the specification requires and what it allows. Certain equivalences are defined, such as between a space character and an end-of-line character , which both facilitate the formal specification and indicate what the OFFICIAL semantics are for messages. Particular implementations may wish to preserve further distinctions which the specification does not require. A. FRAMEWORK Since there are many message systems which exist outside the ARPANET environment, as well as those within it, it may be useful to consider the general framework, and resulting capabilities and limitations, of this standard. Messages are expected to consist of lines of text. No special provisions are made, at this time, for encoding drawings, facimile, speech, or structured text. No significant consideration has been given to questions of data compression or transmission/storage efficiency. The standard, in fact, tends to be very free with the number of bits consumed. For example, field names are specified as free text, rather than special terse codes. A general "memo" framework is used. That is, a message consists of some information, in a rigid format, followed by the main part of the message, which is text and whose format is not II. Standard for the Format of Messages / 8 A. Framework specified in this document. The syntax of several fields of the rigidly-formated ("header") section is defined in this specification; some of the header fields must be included in all messages. In addition to the fields specified in this document, it is expected that other fields will gain common use. User- defined header fields allow systems to extend their functionality while maintaining a uniform framework. Our approach is similar to that of the TELNET protocol, in that we are defining a basic standard which includes a mechanism for (optionally) extending itself. The authors of this document will regulate the publishing of specifications for these extensions. Such a framework severely constrains document "tone" and appearance and is primarily useful for most intra-organization communications and relatively structured inter-organization communication. A more robust environment might allow for multi- font, multi-color, multi-dimension encoding of information. A less robust environment, as is present in most single-machine message systems, would more severely constrain the ability to add fields and the decision to include specific fields. Relative to paper-based communication, it is interesting to note that the RECEIVER of a message can exercise an extraordinary amount of control over the message's appearance. The amount of actual control available to message receivers is contingent upon the capabilties of their individual message systems. II. Standard for the Format of Messages / 9 B. Syntax B. SYNTAX This syntax is given in four parts. The first part describes a base-level lexical analyzer which feeds the higher- level parser described in the succeeding sections. The second part gives a general syntax for messages and standard header fields. The third part specifies the syntax of addresses. A final section specifies some general syntax which supports the other sections. 1. LEXICAL ANALYSIS OF MESSAGES a. General Description A message consists of headers and, optionally, a body (i.e. the ). The part is just a sequence of ASCII characters; it is separated from the headers by a null line (i.e., a line with nothing preceding the ). 1) Folding and unfolding of headers Each header item can be viewed as a single, logical, long line of ASCII characters. For convenience, this conceptual entity can be split into a multiple-line representation (i.e., "folded"). The general rule is that wherever there can be characters, you can instead insert a immediately followed by AT LEAST one character. Thus, the single line To: "Joe Dokes & J. Harvey" , JJV at BBN can be represented as To: "Joe Dokes & J. Harvey" , JJV at BBN and To: "Joe Dokes & J. Harvey" , JJV at BBN II. Standard for the Format of Messages / 10 B. Syntax 1. Lexical Analysis and To: "Joe Dokes & J. Harvey" , JJV at BBN The process of moving from this folded multiple-line representation of a header field to its single line representation will be called "unfolding". Unfolding is accomplished by regarding immediately followed by a as equivalent to the . 2) Structure of header fields Once header fields have been unfolded, they may be viewed as being composed of a followed by a ":" (colon), followed by a . The must be composed of printable ASCII characters (i.e., characters which have decimal values between 33 and 126) and characters. The may composed of any ASCII characters (other than and , which have been removed by unfolding). Certain header fields may be interpreted according to an internal syntax which some systems may wish to parse. These fields will be referred to as structured fields. Examples include fields containing dates and addresses. Other fields, such as the subject field, are regarded simply as a single line of text. 3) Field names To aid in the creation and reading of s, the free insertion of characters is allowed in reasonable places. Rather than obscuring the syntax specification for with the explicit syntax for these characters, the existence of a simple "lexical" analyzer is assumed. The analyzer reinterprets the unfolded text which comprises the as a sequence of separated by characters. The field name may be conveniently represented by the sequence of these atoms, separated by a single ASCII space character. II. Standard for the Format of Messages / 11 B. Syntax 1. Lexical Analysis 4) Field bodies To aid in the creation and reading of structured fields, the free insertion of characters is allowed in reasonable places. Rather than obscuring the syntax specifications for these structured fields with explicit syntax for these characters, the existence of another simple "lexical" analyzer is assumed. It provides an interpretation of the unfolded text comprising the body of the field as a sequence of lexical symbols. These include - individual special characters - quoted strings - comments - atoms The first three symbols are self-delimiting. Atoms are not; they therefore are delimited by the self-delimiting symbols and by . So, for example, the folded body of an address field ":sysmail"@ Some-Host, Muhammed(I am the greatest)Ali at WBA is analyzed into the following lexical symbols and types: ":sysmail" quoted string @ special Some-Host atom , special Muhammed atom (I am the greatest) comment Ali atom at atom WBA atom b. Formal Definition ::= ":" ::= | ::= | II. Standard for the Format of Messages / 12 B. Syntax 1. Lexical Analysis ::= , as defined in the following sections, and consisting of combinations of , , , and tokens> ::= > ::= <"> ::= and > ::= ::= | | | ::= "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | <"> ::= "(" > ")" ::= | ::= | ::= ::= ::= ::= ::= II. Standard for the Format of Messages / 13 B. Syntax 1. Lexical Analysis c. Clarifications 1) Comments Comments may appear only within s of structured fields. A comment is any set of TELNET ASCII characters, which is not within a quoted string, and which is enclosed in matching parentheses; parentheses nest, so that if a left paren occurs in a comment string, there must also be a matching right paren. Comments are NOT passed to the FTP server, as part of a MAIL or MLFL command, since comments are not part of the "formal" address. 2) "White space" Remember that in structured fields, MULTIPLE LINEAR WHITE SPACE TELNET ASCII CHARACTERS (namely s and s) ARE TREATED AS SINGLE SPACES AND MAY FREELY SURROUND ANY SYMBOL. In all header fields, at least one is REQUIRED only at the beginning of folded lines. Writers of mail-sending (i.e. header generating) programs should realize that there is no Network-wide definition of the effect of TELNET ASCII characters on the appearance of text at another Network host; therefore, the use of s in message headers, though permitted, is discouraged. Note that the contents of messages are required to conform with TELNET NVT conventions (e.g. must be followed by either , making a , or , if the is to stand alone). 3) Quoted strings Where permitted (i.e., in structured fields) quoted strings are treated as a single symbol (i.e. equivalent to an syntactically). However, if quoted strings are to be "folded" onto multiple lines, then the syntax for folding must be adhered to (See items II.B.1.a.1, above, and II.B.1.c.6, below.) Note that the official semantics do not encounter s in quoted strings, although particular parsing programs may wish to note their presence. II. Standard for the Format of Messages / 14 B. Syntax 1. Lexical Analysis 4) Bracketing characters There are two types of brackets which must be well nested: - Parentheses are used to indicate comments. - Angle brackets ("<" and ">") are used where there is a question of the presence of machine-usable code (e.g. deliminating mailboxes). 5) Case independence of certain specials s It should be assumed by all mail reading programs that certain s can be represented in any combination of upper and lower case. These are: - s, - "File", in a , - "at", in an , - s, - s, - s, and - s For example, the s "From", "FROM", "from", and even "FroM" should all be treated identically. Note that, at the level of this specification, case IS relevant to other s and s. Also see Section II.C.1.a.4, below. 6) Folding long lines Each header item (field of the message) may be represented on exactly one line consisting of the name of the field and its body, and this is what the parser sees. For readability, it is recommended that the portion of long header items be "folded" onto multiple lines of the actual header. 7) Backspace characters Backspace TELNET ASCII characters (ASCII BS, decimal 8) may be included in and to effect overstriking; however, any use of backspaces which effects an overstrike to the left of the beginning of the or is prohibited. II. Standard for the Format of Messages / 15 B. Syntax 2. Messages 2. GENERAL SYNTAX OF MESSAGES: NOTE: The syntax indicates that items in must be in a specific order and precede all other header items. Header fields, in fact, are NOT required to occur in any particular order. Required header items must be unique (occur exactly once). This specification permits multiple occurrences of most optional fields. However, the interpretation of such multiple occurrences is not specified here. ::= | ::= | ::= ::= | | | ::= "Date" ":" ::= "From" ":" ::= "From" ":" ::= "From" ":" ::= "Sender" ":" ::= "Reply-To" ":" ::= | ::= | ::= "To" ":" | "cc" ":" | "bcc" ":" | "Fcc" ":" ::= "In-Reply-To" ":" | "Keywords" ":" | "Message-Id" ":" | "References" ":" | "Subject" ":" | "Comments" ":" | II. Standard for the Format of Messages / 16 B. Syntax 2. Messages ::= which has a not defined in this specification> The following syntax for the bodies of various fields should be thought of as describing each field body as a single long string (or line). The section on Lexical Analysis (section II.B.1) indicated how such long strings can be represented on more than one line in the actual transmitted message. 3. SYNTAX OF GENERAL ADDRESSEE ITEMS ::= | "," ::= | | "," ::= | ":" ";" | | ::= | "<" ">" ::= ::= ::= | "," ::= ::= ":" "File" ":" ::= | "<" ">" ::= | "," ::= II. Standard for the Format of Messages / 17 B. Syntax 4. Supporting Constructs 4. SUPPORTING SYNTAX ::= | | "," ::= | ::= "<" ">" ::= ::= ::= "at" | "@" ::= | ::= > ::= "(" > ")" ::= ::= ::= | ::= "Date" ":" ::= ::= ::= ::= | ::= | ::= | | | ::= ":" "File" ":" ::= ::= | "," ::= | "<" ">" ::= | ::= | | "," ::= | ::= | | "," ::= <"> ::= "Reply-To" ":" ::= ::= "Sender" ":" ::= "/" "/" <2-digit-year> ::= ::= "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | <"> ::= ::= "January" | "Jan" | "February" | "Feb" Appendix / 33 Alphabetical Listing of Syntax Rules | "March" | "Mar" | "April" | "Apr" | "May" | "June" | "Jun" | "July" | "Jul" | "August" | "Aug" | "September"| "Sep" | "October" | "Oct" | "November" | "Nov" | "December" | "Dec" ::= ::= and > which has a not defined in this specification> ::= |