6.195 Final Project: Byte-wide High-level Data Link Control Protocol Controller

Daniel Lee
omega@mit.edu

Chia (Janet) Wu
janetwu@mit.edu


1.0 Overview

We designed a controller in VHDL which is a part of a network interface card designed for point-to-point communications. It takes a network data packet already processed by the Point-to-Point Protocol and maps it as a High-level Data Link Control data packet. It also extracts Point-to-Point Protocol packets from a byte stream.

2.0 Background

2.1 Data Networking

Data networking operates on two basic principles: protocol layering and data segmentation. Protocol layering provides the abstraction barrier for separating network services and requirements into different levels; therefore, each layer can be changed independently of other layers to reflect improved servies or technology. Data segmentation takes a single, high-level block of data and divides it into multiple data "packets".

The Open Systems Interconnection (OSI) Reference Model, developed by the International Standards Organization (ISO) is a popular model of protocol layering. Figure 1 illustrates how the different layers relate to each other.

FIGURE 1. ISO OSI Reference Model, with examples of transport, network, data link, and physical protocols. The presentation and session layers are usually omitted.

The application layer contains the user programs that require network services (e.g. telnet, FTP). The transport layer disassembles data from the application layer into packets or reassembles data packets into data streams for the application layer. Packet delivery or forwarding is handled by the network layer. The data link layer takes a packet and turns it into a frame, a data structure that can be transmitted through the physical layer and recovered at the receiver and also provides error detection.

In protocol layering, data packets from higher layers in the model are black-boxed and encapsulated in lower layer packets. In Figure 2, packet data in the transport layer (data with the transport header, TH) are complete abstracted away in the network layer, so that the layer cannot differentiate the transport header with the rest of the data.

FIGURE 2. An example of how packet data gets encapsulated at successively lower protocol levels. The OSI presentation and session layers have been integrated into the application layer. Note that a trailer is added only in the data link layer.

2.2 High-level Data Link Control (HDLC) Protocol

The Point-to-Point Protocol (PPP) connects two network nodes via a single link. It is a data link layer protocol, commonly found in modern networks, that can be separated into two parts: a network packet protocol multiplexor and a packet framer.

FIGURE 3. Structure of a PPP encapsulated packet.

The multiplexing functionality comes from the PPP header. Sixteen bits long, the header identifies the type of packet in the encapsulation (Figure 3). For example, the PPP header could identify the network layer packet as an IP or an IPX packet.

FIGURE 4. Structure of a HDLC encapsulated packet. At least one flag byte (0x7E) is required between HDLC packets.

After PPP encapsulation, additional headers and a frame check sequence (FCS) are added to the packet; the FCS is an error checksum calculated over the entire HDLC encapsulated packet (Figure 4). This HDLC protocol, sometimes referred to as HDLC framing, then, is defined as the addition of the address and control fields, along with the FCS, and a process known as byte stuffing.

Byte stuffing is required when the PPP encapsulated packet contains special characters, like the flag byte, 0x7E. A flag byte found in the middle of a HDLC packet would confuse the receiver and cause a false end-of-packet condition. Stuffing replaces a flag byte in the PPP packet with a two byte sequence: 0x7D and 0x5E (0x7E xored with 0x20). The receiver should xor any value following 0x7D with 0x20. Since this makes 0x7D a special character, it must also be stuffed, so 0x7Ds found in the PPP packet are replaced with the sequence 0x7D 0x5D.

3.0 Subsystem Modules

The HDLC controller is comprised of two modules -- a transmit and receive module. The transmit module performs three distinct functions: it receives packets byte-serially from a higher level (PPP) device, adds headers and trailers, and subsequently transmits the processed packet to a physical layer device. Essentially, the transmit module delivers processed packets from the data link layer to the physical layer.

The receive module, on the other hand, implements the reverse function of the transmit module. It receives a byte stream from the physical layer device. Since the physical layer has no concept of packet structure, the HDLC controller must determine the boundaries of packets in the byte stream. Once a packet has been extracted from the byte stream, HDLC headers are stripped from the packet. The stripped packet (Figure 3) is then transmitted up to the higher level (PPP) device.

FIGURE 5. Top level block diagram of HDLC controller. The PPP layer and network physical layaer device are not implemented in this project but are included for illustrative purposes.

In summary, the HDLC controller handles the interaction between the data link layer and the physical layer.

3.1 Transmit Module

The transmit module is composed of two parts: a state machine and a FCS generator. The state machine receives data from a PPP source (which is outside the scope of this project) and the FCS generator. It transmits data to a physical layer device and to the FCS generator.

3.1.1 State Machine

The state machine takes data from a PPP source and adds the address and control headers if necessary. It also performs byte-stuffing and adds the trailing FCS to the packet. During idle times, 0x7E is transmitted.

For the implementation of the state machine, one-hot encoding was used. We felt that this approach will gain the best results in terms of speed and area, because the target device will be a FPGA. FPGA devices tend to be rich in flip-flops, so using one-hot should not be expensive.

When byte-stuffing, the transmit state machine requires two clock cycles to transmit one byte of PPP data. Similarly, when the controller transmits data from the FCS generator instead of from the PPP source (there may be a packet queued up at the input to the HDLC controller), the state machine requires a way to let the PPP source know that it is not ready to receive any data. A not-ready signal, then, is asserted by the state machine when it will not clock PPP data to the physical layer (it is otherwise deasserted).

The default state is IDLE (Figure 6), where the state machine waits for the PPP source to signal the start of a new packet. While in IDLE, the state machine transmits 0x7E to the physical layer. Successive 0x7Es represent "empty" packets. When a start-of-packet is detected, the state machine also checks if default HDLC address and control fields will be used, via the default-fields signal. If default fields are used, then the PPP source only needs to provide the packet shown in Figure 3. Otherwise, the PPP source must supply the address and control fields.

In state DEFAULT FIELDS, the state machine transmits the default values for the address and control fields in the HDLC packet: 0xFF and 0x03, respectively. After they are transmitted, the state machine starts transmitting the PPP packet, performing byte-stuffing when necessary.

If default address and control fields are not used, then the PPP must present its own address and control fields. This can happen when different values are negotiated between network nodes. In any event, the PPP source presents its own fields on the same port as PPP packet data.

FIGURE 6. Transmit state transition diagram.

3.1.2 Frame Check Sequence Generator

The frame check sequence (FCS) generator calculates a new FCS every clock cycle. The FCS algorithm is identical to the 32 bit cyclic redundance code (CRC) used in Ethernet, but the FCS (CRC) register is preset to 0xFFFFFFFF. The FCS provides a way for the receiver to check if transmission errors have corrupted the packet en route.

The generator can be implemented in a number of different ways. Our choice was a byte-oriented method using an XOR tree with a 32 bit register to hold the value of the FCS. A multiplexor selects which byte of the FCS to transmit back to the state machine.

FIGURE 7. Transmit FCS generator.

The XOR tree derives from a linear feedback shift register (LFSR) implementation of the 32 bit CRC, where data are shifted bit-wise into a shift register with XORs between certain flip-flops. The contents of the LFSR at any given time is the FCS of all the data shifted in since the last reset or preset. Each bit of the 32 bit register can be expressed as a chain of XORs, and the chain can be determined by shifting in 8 bits of packet data into the LFSR.

FIGURE 8. Partial derivation of the XOR tree from the LFSR. Rxx are the initial values of the FCS register after preset. Txx are Rxxs XOR'ed with the message bit of that term (i.e. T00 is R00 XOR M00, where M00 is the first message bit). Each column is XOR'ed, so after 8 shifts, the contents of the 0th bit of the register is R08 XOR T02.

3.2 Receive Module

The receive module also contains a state machine and a FCS generator. The state machine extracts HDLC packets from a byte stream from the physical layer device. The HDLC packets are verified with the FCS generator and passed up to the PPP layer. Packets that are too short for the HDLC protocol or contain transmission errors (detected through the FCS) are flagged as such. The address and control fields are not stripped, since the design allows for the PPP layer to negotiate different contents for the fields.

3.2.1 State Machine

The state machine takes a byte stream from the physical layer device and converts it into a series of HDLC packets, then into PPP packets; byte-stuffed data are restored. The address and control fields are preserved, because PPP may have negotiated values other than default ones.

Like the transmit state machine, the receive state machine is implemented using one-hot encoding. Unlike the transmit side, however, the receive side uses a 6-stage pipeline at its output. Since the state machine does not know a priori when a packet ends, the pipeline guarantees that the end-of-packet condition will be detected before the last 4 bytes of the packet reach the PPP layer (Figure 9). (The last 4 bytes are the FCS bytes, which are part of HDLC only and should not be transmitted to the PPP layer, see Figure 4.)

FIGURE 9. Output pipeline contents when an end of packet condition is detected. The data in stages data_out_1 through data_out_4 are not recognized as FCS bytes until the trailing flag byte of the HDLC packet is received.

Starting at IDLE (Figure 10), the state machine waits for a byte other than 0x7E from the physical layer's byte stream. When this happens, the state machine starts to transmit the data to the PPP layer and the FCS generator (for error checking), de-stuffing data (converting 0x7D 0x5E to 0x7E and 0x7D 0x5D to 0x7D), if necessary. When in PACKET, if a 0x7E is received, it is considered an end-of-packet condition (END). If another 0x7E is received, then the state machine returns to IDLE.

A new packet received in END causes the state machine to go to PACKET2, because the pipeline still contains data from the previous packet. When the pipeline is flushed, the state machine resumes normal behavior, transitioning back to PACKET.

FIGURE 10. Receive state transition diagram.

In all cases, a packet error causes the state machine to go to ERROR, and all further data in the byte stream is ignored until a 0x7E is received. A packet error can be when a packet is too short to satisfy HDLC packetization or when a byte stuffing error (byte sequence 0x7D 0x7E in the packet) occurs.

3.2.2 Frame Check Sequence Generator

The purpose of the receive FCS is to verify that the received packet contains no transmission errors. The FCS is calculated over all of the de-stuffed packet, excluding only the flag bytes. If the contents of the FCS register is 0xDEBB20E3 when the end-of-packet condition is triggered, the receiver can be reasonably sure that the packet is error-free.

FIGURE 11. Receive FCS generator.

The XOR tree and register in this implementation works the same way as the transmit FCS generator. The difference comes in the comparator circuit in the receive FCS (Figure 11). The device compares the register to the special residue value 0xDEBB20E3, as shown in Figure 9. A hit causes good_FCS to get asserted. The assertion is timed to occur at the same time as the last data byte is presented to the PPP layer.

4.0 Testing and Synthesis

Functional verification was achieved through the use of VHDL behavorial simulation. Command files, basically scripts that replace typing commands by hand, accelerated the testing process.

In state machine verification, outputs are carefully monitored while inputs change. By asserting the appropriate combinations of inputs, the entire state machine (states and transitions) can be traversed. Furthermore, in traversing the entire transmit state machine, byte stuffing and packet encapsulation were tested, along with control signals sent to both the PPP module and the FCS; similarly, byte destuffing was tested in the receive state machine.

The FCS modules are actually straightforward to test; the Internet document Request for Comments (RFC) 1662 (PPP in HDLC-like Framing) includes an implementation of a software CRC, using table lookup. In order to test the hardware (XOR tree) FCS, then, the same input vectors can be processed through both modues, and the results compared.

Most errors found in the code resulted from carelessness in deasserting control signals. Also found was output contention between different processes.

Once individual modules have been tested, integration testing can proceed. A PPP packet generator was written for testing the transmit data path. The generator, implemented as a state machine, sent test vectors to the transmit state machine and could also receive and respond to control signals sent by the state machine back to the PPP layer. The packet generator model was required to effectively simulate the PPP-HDLC transmit interface. While not attempted, a physical layer device model could be written for the receive interface. However, the receive state machine uses far fewer control signals, and none of them route back to the physical layer.

System testing could have been done in either schematics, where VHDL entities become schematic symbols, or in VHDL, using component declaration and instantiation. We chose the latter, due to the simplicity of the process and ease in simulation. Each path was tested by linking together the state machine and the FCS generator; the transmit path also hooked up to the PPP model mentioned previously. See appendices for timing diagrams of system testing.

The final step to take is the synthesis and fitting of the HDLC controller design to a FPGA or CPLD. This process would normally include, according to the Viewlogic documentation:

  1. Compile VHDL files into gnl files
  2. Compile gnl files
  3. Remove synthesis-generated design data
  4. Run synthesis on top-level entity
  5. Run synthesis on other entities
  6. Run logic optimization on entities
  7. Flatten and dissolve entity hierarchies
  8. Synthesize (specific optimization) to a particular target technology (e.g. Xilinx XC3000 series FPGA).
In addition, the user would specify parameters like optimizing area versus speed and CPU effort. These options should be in synth.ini.

5.0 Conclusion

An attempt at actually synthesizing the HDLC controller gave some hints about its final size. We were able to run logic optimization on the receive state machine, the larger by far of the two state machines in the controller:

omega@sunpal2 84 % vsyn -opt e_rcv_state
ViewSynthesis - V6.0.4; Powerview 6.0 (041896)
c Copyright 1985,1996 by Viewlogic Systems, Inc.
-- ViewSynthesis built on Apr 19 1996 
 
Project directory is /u21e/students/omega/thesis_view/synth/e_rcv_state
Reading existing ref file /u21e/students/omega/thesis_view/synth/e_rcv_state/top.ref
optimizing entity e_rcv_state
   e_rcv_state: optimizing partition 1 with 3883 gates
   e_rcv_state: optimizing partition 2 with 2734 gates
   e_rcv_state: optimizing partition 3 with 32 gates
   e_rcv_state: optimizing partition 4 with 32 gates
   e_rcv_state: optimizing partition 5 with 32 gates
   e_rcv_state: optimizing partition 6 with 32 gates
   e_rcv_state: optimizing partition 7 with 952 gates
These results indicate that a 25-30 state one-hot encoded state machine with low to moderate complexity results in almost 7700 gates. We believe that the implementation of the FCS generators may equate to several hundred gates each, because of the sheer size of the XOR tree (32 bits wide and 12+ gates deep without optimization).

Without a successful and complete run of the synthesis tool, we are unable to comment on the performance and actual size of the final product. Furthermore, the synthesis and timing analysis processes are both potentially time-consuming.

6.0 Appendix

Command files

Command files can be found on sunpal2.mit.edu under ~omega/project2. Files with extension .cmd are command files.

VHDL code

VHDL files are also found in ~omega/project2, with extension .vhd.

Transmit state machine
Transmit FCS generator
Receive state machine
Receive FCS generator
PPP module written to test transmit data path
Linked PPP and (transmit) HDLC modules for system testing.

Timing diagrams

ViewTrace timing diagrams are rather unwieldy and will not be included in this document.


Reformatted in html by omega@mit.edu

Revision History 971215 dtl