Network Working Group Mike Kraley (Harvard) Request for Comments #57 John Newkirk (Harvard) June 19, 1970 Thoughts and Reflections on NWG/RFC #54 In the course of writing NWG/RFC #54 several new ideas became apparent. Since these ideas had not previously been discussed by the NWG, or were sufficiently imprecise, it was decided not to include them in the official protocol proffering. We thought, however, that they might be proper subjects for discussion and later inclusion in the second level protocol. I. Errors and Overflow In line with the discussion in NWG/RFC #48, we felt that two types of errors should be distinguished. One is a real error, such as an RFC composed of two send sockets. This type of error can only be generated by a broken NCP. In the absence of hardware and software bugs, these events should never occur; the correct response upon detection of such an event was outlined in the description of the ERR command in NWG/RFC #54. The other "error" is an overflow condition arising because finite system resources are exhausted. An overflow condition could occur if an RFC was received, but there was no room to create the requisite tables and queues. This is not a real error, in the sense that no one has done anything incorrect (expect perhaps the system planners in not providing sufficient table space, etc.) Further, a [Page 1] RFC 57 Thoughts and Reflections on NWG/RFC #54 June 1970 recovery procedure can be well defined, and simply entails repeating the request at a future time. Thus, we believe an overflow condition should be distinguished from a real error. In NWG/RFC #54 an overflow condition was reported by returning a CLS, as if the connection had been refused. This sequence performs the necessary functions, and leaves the connection in the correct state, but the initiating user is misinformed. He is deluded into thinking that he was refused by the foreign process, when, in fact, this was not the case. In certain algorithms this difference is crucial. In further defining error conditions, we felt that it would be helpful to specify why the error was detected, in addition to specifying what caused the error. While writing the pseudo-Algol program mentioned in NWG/RFC #55 we differentiated 9 types of errors (listed below). We would, therefore, like to propose the extension of the ERR message to include an 8-bit field following the op code to designate the type of error. This would be followed by the length and text fields, as before. We propose these error types; 0. UNSPECIFIED ERROR 1. HOMOSEX (invalid send/rcv pair in an RFC) 2. ILLEGAL OP CODE 3. ILLEGAL LEADER (bad message type, etc.) 4. ILLEGAL COMMAND SEQUENCE 5. ILLEGAL SOCKET SPECIFICATION - COMMAND 6. ILLEGAL COMMAND LENGTH (last command in message was too short) 7. CONNECTION NOT OPEN - DATA 8. DATA OVERFLOW (message longer than advertised available buffer space) 9. ILLEGAL SOCKET SPECIFICATION - DATA (socket does not exist) [Page 2] RFC 57 Thoughts and Reflections on NWG/RFC #54 June 1970 In light of the other considerations mentioned earlier, we would also like to propose an additional control command to singify overflow: +-------------+-------------------+---------------------+ | OVF | my socket | your socket | +-------------+-------------------+---------------------+ The format of the message is similar to that of the CLS message, which it replaces in this context. The socket numbers are 32 bits long and correspond to the socket numbers in the RFC which is being rejected. The semantics of an incoming OVF should be indentical to an incoming CLS; in addition, the user should be informed that he has not been refused but rather has overtaxed the foreign host's resources. An alternative to creating a separate control command can be realized by considering the similarity between a CLS and an OVF. Conceivably, an eight-bit field could be added to the CLS command to define its derivation. We believe, however, that this alternative is conceptually inferior and practically more difficult to implement. Overflow does not require serious consideration if it is a significantly rare occurrence. We do not believe this will be the case, and we further believe that its absence will be an unnecessary restriction upon the user. [Page 3] RFC 57 Thoughts and Reflections on NWG/RFC #54 June 1970 II. Host Up and Host Down Significant problems can arise when a host goes down and then attempts to restart. Two cases can easily be distinguished. The first is a "soft" crash, where the system has prior notice that the machine is going down; sufficient time is available to execute pre-recovery procedures. The other case can be termed a "hard" crash, often the result of a system failure. Insignificant warning is usually given; but more important, the state of the machine after recovery is rarely predictable. When a host returns from a hard crash, the network will be in an undefined state. Very probably the NCP's data structures are destroyed or are meaningless. The network has declared the host dead -- but only to processes which attempted data transmission and were refused. The only alternative for the crashed host is re-initialization of its tables. What are the alternatives for the foreign hosts? We would like to propose the addition of two control commands: RESET (RST) and RESET REPLY (RSR). Each would consist solely of an op code with no parameters. Upon receipt of an RST, a host would immediately terminate all connections with the sending host, but would not issue any CLS's. The receiver of the RST would also note that the originator of the RST was alive, and would then echo an RSR to the sender. When a host receives an RSR, he sould then note that the echoing host is alive. (The function of RST can be partially simulated if a host will immediately close all relevant table entries upon discovering that another host is down.) Thus, after a hard crash, all connections and request for connections are terminated. The RST also informs all foreign hosts that we are again alive, and an RSR is received from every functioning NCP. A host live table (see NWG/RFC #55) can easily be [Page 4] RFC 57 Thoughts and Reflections on NWG/RFC #54 June 1970 assembled, and establishment of connections can resume. Related problems also crop up when we consider attempting to synchronize the network, which may still be carrying messages generated prior to the crash, with an NCP which has an initialized environment. We lack the facilities for unblocking links, discarding messages, etc. -- facilities which this proposal will necessitate. Further interaction with BBN should resolve these difficulties. The problems associated with "soft" crashes are not nearly as pressing, and they demand more sophisticated (i.e., complex) solutions. Our preliminary experimentation with the network demonstrates that a good initialization and recovery protocol are far more necessary. Many of the ideas presented herin wre germinated and/or jelled through conversations with Steve Crocker and Jon Postel. We would also like to acknowledge the assistance of Jim Balter and Charles Kline of UCLA, who devoted a great deal of effort toward helping develop the pseudo-Algol program which was the predecessor of much of our recent documentation. [ This RFC was put into machine readable form for entry ] [ into the online RFC archives by Katsunori Tanaka 2/98 ] [Page 5]