I arrived at the Communications Developer Conference this afternoon in time to attend a talk by Dr. Huan-Yu Su of Mindspeed Technologies. His talk had the curious title of Semiconductor Design Solutions for IMS and indeed it touched on multiple subjects and many layers, from IMS service objectives down to the evolution of semi-conductor processes. But the interesting stuff was DSP related. Mindspeed makes DSP chips and DSP software, which today might best be characterized as signal processing systems on silicon (SoC).
But, despite Powerpoint bullets about packet-based systems, when Dr. Su showed a timing diagram of how individual DSP cores were able to process multiple algorithms on many channels, he showed a TDM system. Each media stream got a time slice in a 20 ms scheduling cycle. This means each incoming flow has to hit a jitter buffer and be queued up for it's slice of the TDM cycle. When I questioned this, Dr. Su replied that it was a hard problem and so far they had addressed it by shortening the 20 ms scheduling cycle to 5 ms.
None of this is to discredit Dr. Su who is a smart guy and gave an interesting presentation. I have seen equivalent approaches from Texas Instruments and Freescale Semiconductor. All of the DSP vendors are driven by the TDM-to-VoIP gateway application where 8 KHz (and 20 ms) operation is the norm and completely acceptable. No one has addressed low latency packet processing.
It's not that hard — here is the answer!
Scheduling a DSP to handle packets, whose arrival is statistical in nature, with minimum latency while guaranteeing that all work gets done, is a lot like scheduling packet transmission over a fixed capacity link with QoS guarantees. There was a ton of academic and practical work done on this subject as part of ATM switch development in the 90s. Much of it is directly applicable to scheduling DSP processing.
Those who are really interested might read Leap Forward Virtual Clock: A New Fair Queuing Scheme with Guaranteed Delay and Throughput Fairness by Suri, Varghese and Chandranmenon in the proceedings of INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. This is not the only relevant paper but it is one I am familiar with in some detail. Quoting from the abstract:
We describe an efficient fair queuing scheme, Leap Forward Virtual Clock, that provides end-to-end delay bounds similar to WFQ, along with throughput fairness. Our scheme can be implemented with a worst-case time O(log log N) per packet (inclusive of sorting costs), which improves upon all previously known schemes that guarantee delay and throughput fairness similar to WFQ.
At light load, packets are scheduled immediately and experience minimum latency. Under heavy load, packets are queued but get processed in a time that matches their allocated average arrival rate. So at heavy processor load, individual flows experience some additional delay but they also have their jitter smoothed out (thus minimizing the required depth of the jitter buffer at the ultimate destination).
Comments