Optimized Multicast Scheduling Designs for Input Queued Switches
- Shuayb Zarar
MSR-TR-2006-205 |
Published by International Business Machines (IBM) Zurich Research Laboratory
Technical Report
Dense multicast traffic is a natural result of increased demand for network services like audio and video distributions in todays high density networks. With such consistent demands, it seems inevitable that the volume of such traffic will continue to grow for some time to come. Today, if ATM switches are to be more popular in internet circles, either as standalone switches or as cores of high performance routers, it is necessary that they be able to handle multicast traffic with minimum hardware complexity and good efficiency. From previous works, it has been concluded that, the concentrate algorithm for multicast traffic, proposed in [1] personifies near perfection in performance and fairness but is really hard to implement. Schemes like WBA [1] provide relaxed hardware complexities for a little conceded performance. Very simple hardware schemes like Round Robin Matching (RRM) lead to poor latencies at higher throughputs.
In this project, we analyze the performance and hardware behavior of the popular multicast scheduling scheme, the Weight Based Arbiter (WBA) [1], as part of our research for the OS- MOSIS [2, 3] project with strict hardware specifications[2]. An exploration to simplify the WBA with small compromises on fairness and performance led us to propose schemes which were more practical, simple and balanced although not totally perfect in performance. We propose multicast scheduling schemes with acceptable levels of performance close to the concentrate algorithm. Variations of the WBA design and novel schemes like WRRM, which are hybrids of WBA and RRM, in multi-cycle or multi-stage iterations, proposed here are hardware simplifying candidates for high performance switches. The design implementation of the WBA algorithm with behavioral hardware synthesis is shown to saturate the state-of-the-art OSMOSIS Multicast chip 1 with just a 40X40 Switch. Improving this misbehavior was the major motivation for this work. After several optimizations, a structural design for the WBA algorithm is proposed as an alternative, which incorporates a Binary Tree Comparator (BTC) and a Balanced Delay Adder (BDA) as the core design simplifiers. But, even this is seen to hog up the chip area soon. Novel modifications to the WBA are then proposed, where weight calculation methodologies itself are modified with simplified hardware complexity for a small bargain in performance. To make future explorations more organized, a Centralized Function Block (CFB) scheme with a Multicast Weighted Round Robin Matching (mWRRM) algorithm is proposed which is a hybrid implementation of the simple Round Robin scheme and the WBA. On the performance front, multiple iterations of the WRRM scheme are shown to produce results close to that of WBA although keeping implementation complexities manageable. We conclude the work with a comparative overview of the pros and cons of each of the proposed schemes.