Data Link Frame Aggregation Protocol
Publications related to this work:
Data Link Frame Aggregation Protocol, J. Michael Meehan, Philip A. Nelson, David Engeset and Athos Pashiardis, Information: An International Interdisciplinary Journal, Volume 7 Number 1, pp. 59-68, January 2004.
Data Link Frame Aggregation Protocol, J. Michael Meehan, Philip A. Nelson, Proceedings of the 17th International Conference on Computers and Their Applications, San Francisco, Ca., April 4-6, 2002, ISBN: 1-880843-42-0.
Team Members: David Engeset (graduates), Athos Pashiardis (graduated)
Source Code:
Data Link Frame Aggregation Protocol
J. Michael Meehan, Philip A. Nelson, David Engeset and Athos Pashiardis
Computer Science Department, Western Washington University, Bellingham, WA, 98225, USA
meehan@wwu.edu
Abstract
We present a data link layer protocol that performs frame aggregation. Data link protocols have traditionally utilized a separate level two frame for encapsulation of each request from level three. This can result in an excessive header to data payload ratio in the case of small level three requests. It is possible to aggregate multiple upper-level protocol requests into a single data-link layer frame destined for a given level two address. The protocol has been implementation under the NetBSD operating system. Test results with this implementation and directions for future enhancement are discussed.
Keywords : Network Protocols, Data Link Protocols, non-1-persistent CSMA/CD.
1. Introduction
Computer communication protocols are organized into distinct layers. Each layer views the next lower layer as providing a service for it to use. The standard reference model used throughout the industry is the Open Software Interconnect (OSI) 7-layer model issued by the International Standards Organization (ISO). [1] Layer two of this model defines the data link. Data link protocols deal with how to organize data into frames and how to transmit those frames using the physical network connections between machines. Commonly used data link technologies are Ethernet (in various forms), FDDI, token-ring and ATM. Although for the purposes of this discussion we confine ourselves to an implementation using what is commonly referred to as Ethernet or IEEE 802.3 style data link layers, it should be noted that the general techniques presented are applicable to other level two technologies.
1.1 Traditional Data-link Operation
The data link layer implements a capability utilized by the network layer to accomplish the transmission of a frame on the physical network being utilized. Thus, the data link layer bridges between the network layer and the physical layer. The traditional method of implementing this is that when the network layer requests data link layer encapsulate a level three packet into the data payload of a level two frame and transmit it onto the local subnet each such level three request is considered completely distinctly from every other level three request. Consider a typical Ethernet based system with a number of applications active. There may be: several telnet sessions; an ftp file transfer in progress; remote files mounted via NFS, SMB, NCP or a combination of all three; a browser window active; remote X Windows applications running. Each application will generate traffic, which will eventually progress down through the protocol stacks and place requests to the data link layer. In addition to the traffic directly generated by these applications there may be additional network traffic resulting from supporting protocols such as DNS, SNMP or ICMP. Under the traditional approach each request will be handled separately resulting in a complete level two frame. We observe that it is highly likely in such a system that the level 2 or MAC destination address for many of these requests may be the same. This suggests that we can reduce overhead and contention by aggregating such traffic into single level two frames.
1.2 Framing Formats
In the design of layered protocol stacks it is desirable to keep the layers a distinct as possible with each layer required to have little or no knowledge of the details of operation of the other layers. At the same time, if multiple network protocols are to be able to utilize a single data link layer service there must be a mechanism which can be utilized by the data link layer to determine which network layer protocol is to receive the payload from an incoming data link frame. In the case of the widely utilized Ethernet or IEEE 802.3 type of data link system a two-byte frame type indicator is used to enable the data link layer to distinguish the contents of its data payload. Hundreds of different frame type indicators have been assigned to indicate the contents of the data payload. For example, 0x0800 indicates an IP version 4 packet is contained in the data payload. There are various schemes (framing formats) utilized to encode frames on an Ethernet system. A typical scheme is given below.
|
Preamble |
Destination Address |
Source Address |
Frame Type |
Data Payload |
CRC |
|
8 |
6 |
6 |
2 |
46-1500 |
4 |
The preamble is simply a fixed bit pattern utilized by the network interface card for the purposes of synchronizing onto the beginning of a frame. The source and destination addresses are the familiar 48-bit Ethernet address.[1] The CRC field is a 32-bit cyclic redundancy check.
There has actually been quite a bit of confusion concerning the framing formats utilized in Ethernet systems. Early standards specified a length field where we show the type field. Fortunately, the possible values, which could be used to designate an actual valid data payload length, are in a range not utilized by the standards organizations to designate valid frame types. The confusion surrounding the assignment of different frame type numbers by different standards organizations led the IEEE to issue the LLC/SNAP header standards. [2] In this widely utilized standard a 3-byte Organizationally Unique Identifier (OUI) is used to designate a given standards organization and a 2-byte type value then defines frame types as designated by that organization.
Regardless of the actual framing format utilized one must be able to eventually determine a frame type designator that indicates how to interpret the contents of the data payload. In particular, to which level three protocol should the data link deliver the payload? In the prototype implementation of the modified data link layer we present here it is necessary for us to designate a unique frame type for the frames encoded by our protocol. These numbers must ultimately be issued from a standards organization. Since the frames in a data link protocol environment never leave the local segment we can freely chose an unassigned frame type number without interfering with normal network operation. Systems that do not have level two protocol code with our extensions implemented that see a frame with our type designation will simply ignore it as an unknown frame type. Moreover, we utilize a configuration list to indicate which systems on the level two segment have the extended protocol code. Aggregated frames are only addresses to those destinations.
1.3 Minimum Frame Length and Padding
In CSMA/CD systems there is a minimum frame length that derives from the requirement that it be impossible to imprint a frame onto the network that will be involved in a collision and not be able to detect the collision. This requirement translates into a minimum frame length that requires a slightly longer amount of time to transmit than the amount of time represented by twice the maximum propagation delay on the local segment. In the case of Ethernet systems this works out to a 64-byte minimum frame size.[3] If you refer to the frame format given above you will notice that the header requires 14 bytes and that the CRC is 4 bytes for a total of 18 bytes. This is where the 46-byte minimum data payload is derived from. The bottom line is that the smallest data payload that can be transmitted it 46 bytes. If the upper level protocol packet does not require 46 bytes then pad characters must be added to the level two frame to fill it out to the minimum frame length. An example of an extreme case would be a telnet connection in character mode that must transmit single characters immediately via IP. In such a case a single byte of data would be encapsulated into an IP packet with 20 bytes of IP header overhead. These 21 bytes would in turn be encapsulated into an Ethernet frame with a 46-byte data payload, 25 of which are pad characters. Thus, a total of 64 bytes plus the preamble would be transmitted to get the actual one character payload to its destination.
Note that if multiple character mode telnet sessions are in operation concurrently to the same host destination, the scenario described above is repeated for each frame transmitted. The irony is that while this is happening it is quite possible that an ftp transfer may be taking place at the same time between the same two machines. The ftp transfer may be capable of using more bandwidth than can be provided. In any event, wasting 25 characters of pad per Ethernet frame to transmit telnet characters while the ftp application starves for bandwidth is an unfortunate situation. The telnet scenario is not unique. There are many application level protocols and support protocols which create very small packets. Examples would be ARP, RARP, ICMP and many X-Windows packets such as mouse movements. The situation then is that we may have a lot of traffic destined for the same MAC address that is being encapsulated into individual level two frames. Some of these may contain as much a 50% pad characters.
The modified data link protocol we propose encapsulates multiple level three packets destined for the same MAC address into a single data link frame. In this fashion, we not only decrease the otherwise repeated data link framing fields but also create the possibility of reusing otherwise wasted level two data payload space by removing the need to generate pad characters for small level three packet sizes.
2.1 Non 1-persistence
Frames at level two can only be aggregated if we know that there is more than one outstanding request from level three for service. Normal CSMA/CD systems are what are called 1-persistent. This means that when the data link layer desires to transmit a frame and the carrier is not busy that it will attempt to transmit with a probability of 1. This is in contrast to CSMA/CA systems in which we may apply a probability function of some sort prior to transmission in an attempt to avoid creating collisions in the first place.[4] A similar concept can be utilized for the frame aggregation data link protocol. When a request arrives from level three to transmit a frame, if there are no other frames destined for that MAC address to transmit at present, the system could be made to wait an amount of time, Δt, hoping that an additional request will arrive before transmitting. If Δt time transpires and no further requests from level three have arrived to transmit to that MAC address the system will transmit anyway.
Another consideration is the amount of data payload space that would go unused if we were to transmit the current collection of requests before waiting the Δt time interval. In our CSMA/CD example using Ethernet the maximum transmittable unit is a data payload of 1500 bytes. One approach is to scale the amount of time we will wait based on the percentage wasted capability if we were to transmit now. Other variations could be considered in which frame aggregation would only be considered during an active exponential back off cycle.
The data-link frame aggregation protocol could be implemented in a 1-persistent manner or non-1-persistent manner. Under a 1-persistent implementation the system would not wait for the possibility of aggregation to occur. Aggregation would only happen in a system which received more level-two requests than could be processed in a given time period. Obviously, a 1–persistent data-link frame aggregation would have better request/response times than a non-1-persistent implementation. The interesting question is “is it worth the increase in request/response latency of a non-1-persistent implementation to achieve improved effective bandwidth?” How significant an improvement in effective bandwidth can one expect and under what loading scenarios does it occur? Our first implementation of the data-link frame aggregation uses a selective non-1-persistent implementation to investigate these issues.
2.2 Timing Algorithm Utilized in the Current NetBSD Implementation
In the current implementation the time to wait is based on the number of packets that are waiting in the queue and the amount of data that is in the queue. As a new packet arrives we calculate a new wait time, which is essentially like having a collision trigger the exponential back off, except if we have a large enough number of packets in the queue we call the function to aggregate the packets to send them on to their destination. The other factor that we take into account is the amount of aggregatable data that is in the queue, if it is large enough compared to the data link MTU, then we will call the aggregation function to send the data off. We have implemented two ways to calculate the wait time based on this data, one is similar to the CSMA/CD algorithm in that we use a random number to determine the wait time. The other way is a straight calculation to get the wait time.
If the amount of data in the queue is greater than or equal to the current MTU minus 34 bytes, or the number of packets, N, that are in the queue is greater than or equal to 16 then the algorithm returns a signal to the protocol to aggregate the packets and send them immediately. Otherwise the algorithm will return the time to wait. The wait time is calculated according to the CSMA/CD exponential back off algorithm in reverse order. If N is less than 10, k is then assigned a random number between 0 and 10, where k is the window size. Otherwise k is assigned a random number between 0 and 52 / (10 * 2^(N – 10)). The straight calculation version of this algorithm assigns 10 to k or the value calculated and no random number is used. The facilities that we had available to implement the waiting time to aggregate the packets used an increment of time called ticks, which we found to be roughly 10ms per tick on the machines we were using. This period of time was satisfactory but a finer measurement of time would have been better to implement the wait times.
2.3 Traffic Count Algorithm
To help assist and increase the performance of the timing algorithm the Traffic Count algorithm runs in the background and calculates the number of outgoing packets for the machine, the time intervals between readings is random ranging from one tick to 100 ticks which is about one second. After 15 time intervals have passed an average is calculated of the number of packets that have been sent over the 15 intervals divided by the number of ticks to complete the 15 intervals. If this average is less than 10 we assign the average to be 10, the reason for this is if we have less than 10 packets go out over an interval then the outgoing traffic is light enough that aggregation would only degrade the Request/Response time to the user. Then we take the current number of packets that have been shipped over the past interval and see if they are greater than the average, if so then we turn aggregation on, otherwise we turn it off to help reduce the delay in sending packets when there is little traffic.
2.4 Aggregation Functionality
After a period of time has expired the packets that have been stored in a queue for a particular MAC address need to be aggregated together and sent along to their destination. The algorithm we chose to do this is a first best-fit algorithm. The first two packets in the queue are extracted and an attempt is made to combine them into a larger aggregated packet. If this is possible the two packets are aggregated together, but they are not sent. If the two packets do not fit the larger of the two is sent on unaggregated. Then the next packet is extracted from the queue and an attempt to combine it with the other packet is made. When combing the packets we have the data structures that store the data point at each other in a chain of mbufs[5], thus we reduce the need to allocate more memory. This process repeats until the queue is emptied, then the remaining packets are sent on.
2.5 Deaggregation Functionality
Upon receiving a packet over the network the data link layer identifies the frame type of the incoming frame. When an aggregate frame arrives the frame is passed on to the deaggregation function which decapsulates the packets from the frame. Each data packet is inserted into a new frame with the proper header information and then passed to the proper upper protocol layers. The new frame would be identical to the one that the packet would have arrived in if the packet were sent by itself, thus the frame aggregation is unnoticed by the other network layers.
2.6 Modified Frame Format
The standard Ethernet frame format as shown above was modified for the purposes of the aggregation algorithm. We selected an unused frame type to designate that frame aggregation is in effect. [6] When the data link receives a frame denoted via the frame type field to be an aggregate the data payload field is interpreted as follows. The first byte of the data payload is used to designate the number of level three packets aggregated together in this frame. If the first byte is n then it is followed by n-1 16 bit integers each of which designates the beginning byte in the data payload field of each of the last n-1 level three packets in this frame. The beginning byte number of the first of the level three packets in the aggregate data payload can be determined from the number n, i.e. (n-1) * 2 bytes. For each level three packet in the aggregate a type byte frame type and the data payload of the packet itself is included in the aggregate.
|
Preamble |
8 bytes |
|
Destination Address |
6 bytes |
|
Source Address |
6 bytes |
|
Aggregate Frame Type |
2 bytes (0xBBBB) |
|
n |
1 byte |
|
offset2 |
2 bytes |
|
offset3 |
2 bytes |
|
... |
|
|
offset n |
2 bytes |
|
frame type 1 |
2 bytes |
|
packet data 1 |
? |
|
frame type 2 |
2 bytes |
|
packet data 2 |
? |
|
... |
|
|
frame type n |
2 bytes |
|
packet data n |
? |
It should be noted that these modifications have been constructed in such a way that they in no way interfere with the normal operation of other systems on the network. These frame will simple appear to normal data link layers as unknown frame types.
For each level three packet included into an aggregate there is a net saving of 22 bytes of header overhead that would have been required had each packet been transmitted in a separate frame. This is in the absence of any consideration for removed pad characters. The real benefit of the approach is in the fact that it reduces the number of small packets on the network. One must observe that after the minimum frame length packet size has been successfully imprinted onto the network that any remaining larger packet size will be collision free as the time period associated with twice the maximum propagation delay has transpired. Thus, it is precisely the elimination of small frames from the network that will have the greatest influence in increasing the effective bandwidth.
It is certainly true that the data link protocol enhancements described would have the most significant effect in a true contention network such as a system based on hubs or old coaxial wire installations. Today CSMA/CD systems tend to be operated in switched environments. Switched environments, however, will still benefit from the properties of this protocol. Transmission patterns in most local area networks are not equally probable between peers. Rather, networks tend to concentrate traffic to a few server destinations. Thus, even in a switch environment to ability to reduce contention and increase the effective bandwidth remains critically important.
3 The Prototype Implementation
The prototype implementation of the data link aggregation protocol was implemented using the NetBSD operating system. [2] The results of tests conducted on this implementation environment are presented below. The implementation of the new Data Link Layer protocol occurs just before the packets are added into the queue of the physical layer, i.e. Ethernet card driver. The functionality of our implementation keeps the network layers as distinct as possible (other layers are unaware of the changes of the new protocol).
3.1 Test Results
Different tests have been set up to run over a period of time to see how the performance of the new protocol compares to the standard data link layer. The data was analyzed to determine whether our implementation had been increasing throughput. Seven clients and a server were set up and used for testing purpose. All of the machines have a uniform configuration: Intel PII 450Mhz, 192 MB of main memory and 100 MBit/s Ethernet card. All of the machines were connected into a 24-port Cisco Switch. WebStone, NetPipe, NetPerf and PostMark were used to gather statistical information about the LAN performance under the new protocol. [3] [4] [5] [6]
On a slow or congested network results seem to indicate that the idea of waiting some time to gather a lot of small packet and send them in a single network packet helps. The numbers that we gathered from the testing we have conducted show that the throughput was similar to the test we conducted where no aggregation protocol was used. The results show that the Request/Response time can be adversely affected by the selective non-1-persistent implementation.
3.2 NetPipe
The graph above was generated from the results of NetPipe. The results indicate that for smaller block sizes the selective non-1-persistent implementation of a data link frame aggregation protocol performs slightly better, whereas for larger block sizes it does not perform as well. With packets less than 1500 bytes the algorithm performs better as the wait time for potential aggregate packets pays off, thus packets are being aggregated.
This graph shows that the throughput performed by the protocol increases slightly as the time goes by.
3.3 PostMark under NFS
The Data Link Aggregation protocol did perform well under NFS. PostMark was used to gather information on the new protocols performance under a Distributed File system environment.
|
Aggregation On |
Aggregation Off |
|
Time: |
Time: |
|
46 seconds total |
124 seconds total |
|
43 seconds of transactions (46 per second) |
116 seconds of transactions (17 per second) |
|
Files: |
Files: |
|
1129 created (24 per second) |
1129 created (9 per second) |
|
Creation alone: 125 files (125 per second) |
Creation alone: 125 files (17 per second) |
|
Mixed with transactions: 1004 files (23 per second) |
Mixed with transactions: 1004 files (8 per second) |
|
1043 read (24 per second) |
1043 read (8 per second) |
|
920 appended (21 per second) |
920 appended (7 per second) |
|
1129 deleted (24 per second) |
1129 deleted (9 per second) |
|
Deletion alone: 133 files (44 per second) |
Deletion alone: 133 files (133 per second) |
|
Mixed with transactions: 996 files (23 per second) |
Mixed with transactions: 996 files (8 per second) |
|
Data: |
Data: |
|
340.93 kilobytes read (7.41 kilobytes per second) |
340.93 kilobytes read (2.75 kilobytes per second) |
|
368.14 kilobytes written (8.00 kilobytes per second) |
368.14 kilobytes written (2.97 kilobytes per second) |
PostMark was configured to use files of size 10 to 500 bytes. Seven clients mounted a directory each on the server. Each client was performing distributed file operations (read, write, append and delete).
3.4 WebStone and NetPerf
Webstone and NetPerf were used for the final series of tests that we conducted. While gathering statistics, we had each client ping the server every 0.1 seconds to simulate the typical background traffic that network applications send. After evaluating the results of this final test we did not notice much improvement of the throughput. NetPerf showed that the Request/Response time of the new protocol was worse. That is due to the wait time for aggregated packets. For Webstone the throughput was about the same with periods of higher throughput compared to no aggregation. For Request/Response time Webstone reported similar results compared to no aggregation.
4. Conclusions
The selective non-1-persistent Data Link Frame Aggregation Protocol seems to perform slightly better than the reference implementation when the network is congested and also when the majority of the outgoing packets are somewhat small in size. The difference in throughput is not extreme, and response latency is adversely affected. The weak link in the current implementation is the sensitivity of the algorithm which decides when to turn aggregation on an off. The current implementation indicates that a 1-persistent implementation would be superior although it is certainly possible that with further refinements to the timing selection algorithm better results may be obtained under the current approach.
5. Enhancements and Future Work
Rather than always waiting a small amount of time for a second level three request for service when transmitting the current single request would result in an undersized frame, it may be desirable to have the data link protocol investigate what level three protocol is making the request. In this fashion certain applications could be made exempt from the introduction of any addition transmission delays. This would be violating a long-standing principle of strict layer boundaries by allowing a level-2 protocol implementation to “peek” into the level-3 information.
An obvious extension to the data link aggregation protocol would be to add compression capabilities. If we are attempting to reduce header overhead by aggregating multiple level three packets into a single level two frame then the extension of the scheme we have outlined to incorporate a capability to perform a compression algorithm on the data payloads would seem to be a natural extension.
References
[1] Douglas Comer, Computer Networks and Internets, 3rd ed., Prentice Hall, 2001.
[2] Marshall K. McKusick, K. Bostic, M. J. Karels, J. S. Quarterman The Design and Implementation of the 4.4BSD Operating System, 1996
[3] WebStone Benchmark http://www.mindcraft.com/webstone/
[4] NetPerf http://www.netperf.org/netperf/NetperfPage.html
[5] PostMark http://www.netapp.com/tech_library/3022.html
[6] NetPIPE http://www.scl.ameslab.gov/netpipe/
[1] Each Ethernet card manufactured is assigned a unique address.
[2] Logical Link Control / SubNetwork Attachment Point
[3] The minimum frame size is actually 64 bytes plus the preamble of 8 bytes.
[4] Carrier Sensed Multiple Access / Collision Avoidance
[5] Mbufs are NetBSD kernel memory allocation units used to store the frames.
[6] The value 0xAAAA is not in current use and will be used to designate that the data payload contains an aggregate of level three payloads.