CS268 Reading Summaries

Friday, October 17, 2008

TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks (UCB)

The paper describes a data aggregation service, called TAG, that operates on TinyOS. It argues that aggregation is such central functionality in WSN that it should be structured as a service. This justifies its existence within the end-to-end arguments. It incorporates SQL as user frontend for the service. This is nice because it leverages the data aggregation research that has been done in the database community.

It has a small section on the taxonomy of aggregates. It briefly classifies some aggregate functions based on their characteristics. I would like to see more on how this info is applied. Maybe a type checker of the query compiler can take advantage of the info and prevent user from abusing the service. Or, more possibly, a query optimizer use these extra info to specialize a given aggregate function.

The basic mechanism of TAG is to arrange a hierarchical routing structure (like a tree) for each user query. The tree is then used to propagate and aggregate data from bottom-up, where the root is the node interacting with the user. At each level of the tree, nodes can use "epoch" to set up sampling intervals and perform the query-specific aggregation. The paper acknowledges that the sampling intervals is timing imprecise, influenced by timing jitter and many other timing problems. The author thus only assumes vague timing semantics.

One problem with having a tree structure is fault tolerance. A tree inherently is prone to single point of failure. If a node higher up in the hierarchy failed, much data is lost compared to the failure of lower node. Also, security is a concern. A malicious node can pretend to be very near the root and have great influence on the aggregate result.

The SQL interface is a nice idea because it gives more flexibility for application usage compare to the traditional way of hard-coding in a single app. In-network aggregation minimizes message exchange. However, the challenge seems to be on reliability. How much confidence can we put into the data extracted from the network using TAG, given its limitation on the sampling rate?

Thursday, October 16, 2008

Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks (USC)

The paper presents a new approach to wireless routing called directed diffusion. The idea is borrowed somewhat from the reaction-diffusion model in biology. It's link-based protocal where nodes interact locally with neighbors. The protocol design assumes particular application semantics, so it's not build for all apps. In particular, it allows information sink nodes to send out "interest" messages. These are queries for specific info from the network.

The whole process of communication is a iteration of three steps: interest notification, gradient establishment, and reinforcement. Gradient is a tuple of data rate and direction. It's implemented by keeping states in the intermediate nodes with event entries, which basically tells the nodes where to forward a particular packet to. It's a on-demand routing protocol. However, the nodes have a choice of dropping or aggregating the packets. Reinforcement is the control mechanism to adjust the gradient. Positive reinforcement increase the communication rate, while negative reinforcement stops or decrease the comm. rate. Also, the interest request has a expiration field which specifies the duration of the comm.

The paper didn't address the problem of congestion, which i think would be a big problem for their protocol. Can it scale up to thousands of nodes, especially for randomly distributed network? The root problem is because they uses real-time to do the control for their protocol. Temporarily congestion creates timing jitters that would lead to improper reinforcement.

Mobility is also a concern. The author suggests that the localized comm. helps the protocol adjust to dynamic topology more efficiently. However, it seems everytime a node moves would require setting up the gradients all over; otherwise, you can't guarantee the packet will get to the sink. In this case, out-dated gradients would die off on their own because of timeouts. However, it seems you still need to set up new gradients to make sure there is a path from the source to sink.

The protocol isn't described as clearly as it should be. There is a lot of cross-layer design, which violates the end-to-end principle. The idea is novel but doesn't look like it's cleanly engineered. The performance results isn't too meaningful when you take congestion of the formula. It is not clear what is gained here. It's not a loss if we skip this paper next semester.

Thursday, October 9, 2008

A High-Throughput Path Metric for Multi-Hop Wireless Routing (MIT '03)

Minimum hop count is not a good metric to choose a route. First, one high-loss link along a route would create major delay and congestion. This causes retransmission and timeout which leads to decrease in overall network throughput. However, these links may be suitable for sending routing packets instead of data packets. Min hopcount is also bad when it chooses arbitrarily among multiple "equivelant" routes. It mostly likely pick a suboptimal one. Having a metric that is more refined than hop count helps to increase throughput.

ETX is one metric specification. It predicts for each link the expected transmission count per packet. Then it picks the route that has the least combined ETX count. It approximates the ETX by sending a periodic probe (every 1 sec). By dividing the actual received probes by the number sent, you get the delivery ratio. The inverse of it is ETX. It's very simple to understand and easy to implement. It adds very little to the routing states.

ETX has its own limitations. It only interacts and reacts to mobility as good as it updates the routing states, which is delayed by WST (weighed settling time). The main essense of using transmission count is a step further than hop count; however, it doesn't go into the size of packets. It would more ideal to have something like ETX/bit.

The paper presents a simple understandable routing algorithm and implementation. It makes a argument and brings attention to the problem of min hop count. Although it would be nice to see a more comprehensive paper on routing metrics, this paper is a pretty good introduction.

Monday, September 29, 2008

MACAW: A Media Access Protocol for Wireless LAN's (UCB)

Finding the right MAC protocol for wireless is not a solved problem. The paper is interested in finding a good protocol for the wireless LAN for Xerox at PARC. It has a multi-cell topology with each cell consisting of one base station and multiple nodes (the paper calls them "pads"). CSMA isn't suitable, and the paper illustrates this with the "hidden" and "exposed" terminal phenonmena. The root problem is that wireless signals have side-effects in interfering with other channels. With CSMA, implicit sensing is not enough to prevent collisions.

MACA is the predecessor of MACAW. MACA adds two new control signals (RTS and CTS) to help synchronize channel access. This helps nodes to communicate their state with nearby nodes. I.e. "I am prepared to listen to someone", or "I just started a communication stream with someone", and etc. This helps other nodes to know when to backoff. MACA assumes nodes to exchange RTS and CTS to establish connectivity. But what happens if we have rapid mobile environment where new nodes continuously comes into the cell? What is the effect of having node not hearing these control signals?

Of course, MACA is a conservative algorithm. In order to achieve more throughput and add things like fairness, this is where MACAW comes in. It adds three new control signals, Data-Send (DS), ACK, and RRTS, to increase efficiency of MACA. These signals brings even greater synchronization between nodes. However, they decrease potential contention by sending more control messages. That seems to be expensive in wireless. Power consumption overhead of setting up a transmission is incredibly demanding. Also, i don't think measuring throughput alone is the right measurement. Measuring throughput over power may be a better way to go.

In addition, MACAW enchances MACA's binary exponentially backoff (BEB) in two ways. The original BEB has problem with fairness and have starvation problems. MACAW first uses mutliplicative-increase-linear-decrease (MILD) policy to adjust backoff counter. Besides a very witty name, it's similar to the idea of AIMD. Second, it allows both the receiver and the sender to adjust the backoff counter when transmission succeeds.

The paper is a nice short read and shows a lot of the corner cases in designing a wireless link protocol. It's definitely a very involved process. On the other hand, it would also be nice to combine this reading with something about practicality issues like power consumption and environment hazards and other deployment challenges.

Sunday, September 28, 2008

A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

Transport for wireless is different from wired. It requires extra care to avoid performance penalty. This is because wired assume most packet loss and performance degradation is due to congestion, which is not quite true for wireless. The way signal is transferred in wireless environment is more prone to random bit errors. TCP deals with packet loss by invoking congestion control (backing off, i.e. decrease window size, and timeout). This is inappropriate and decrease performance.

The paper surveys three main approaches to deal with this problem. The first is to do nothing special. They called it end-to-end (E2E). Second is called link-layer (LL) which acts like a snoop agent, running at a faster rate than tcp. It tries to handle non-congestion related loss before propagating to tcp. The last is a split-connection scheme that adds an extra wired link from each sender to a base station. This uses the base station as a buffer and decouples the sender and receiver; thus, loss on receiver side won't necessarily slow down the sender, and vice-visa.

Along with each approaches there are enchancement techiques to better performance. Selective acks (SACK) is one, which provide extra ack information over the cumulative ack. Explicit loss notification (ELN) is another, which gives information about the cause for packet loss. Within the SACK paradigm, SMART is a special kind SACK that uses bitmap storage the recent ack history. SACK and ELN can be added to each of these three general schemes.

The paper measures the throughout for each of the different schemes against the peak (error-free) throughput (1.5 Mbps for LAN and 1.35 Mbps for WAN). They finds LL + SMART + TCP-AWARE to be the best against bit-errors. This TCP-awareness isn't explained in the paper. It would be nice to see what it really does. In the end, it comes down to how good are these protocols in preventing tcp from timeouts and shrinking window size. There is also a question of false-negative. How well do these protocols detect real congestion? I would like to see extension of these experiments in a more general settings w/ bit errors + congestion.

Tuesday, September 23, 2008

Scaling Internet Routers Using Optics (Stanford '03)

The paper goes very in-depth in designing a 100Tb/s router. It argues for a multi-stage over single-stage design, even though signle-stage is simple. The paper specifically proposes adding a load-balancing stage that randomize the input packets before entering the VOQs. It proposes Full Ordered Frames First (FOFF) to correct mis-sequencing of packets.

The concept of partitioning a switch into multiple stages is very novel. It's similar to pipelining in a circuit. Specializing individual stages of work increase throughout. However, partitioning a switch is at a more macro level. I wonder if there is any formal study in the design of the partitioning? What are the different ways to do the partitioning? What are some of the factors that drive the division?

The paper also suggests decentalized schedulers. This increases scalability and fault tolerance. But it's not clear how this is done. The discussion about the use of optics and MEMS is also too brief for me. I would be interested in reading some follow-up papers on these topics.

I agree that the paper lacks results and evaluation. It focuses a lot on certain design details, which make it hard to understand. This is the first paper i read from this class that i feel lost, which happens very often for my other classes=). On the up side, i did get a chance to google many things i didn't know about router design.

Monday, September 22, 2008

A Fast Switched Backplane for a Gigabit Switched Router ('97)

Router design takes quite a bit of hardware organization. The internal of a router is divided into forwarding decision, backplane, and output-link scheduler. Forwarding decision takes care of the MAC header (TTL, checksum, and etc.). Output-link scheduling handles flow or priority control to meet QoS, while the backplane does the actual transfer from the input to its outgoing port. The paper analyzes the design of backplane.

Traditionally, backplane is built using bus. This is called "shared backplane". It's known to be non-scalable because at most one port can transmit at any given time. Using crossbar switch would be much more scalable, since there are multiple connections between the set of inputs and outputs. Packets can be transmitted simultaneously. There is time slot, and the between each time slot the switch is reconfigured. This means packets needs to be fixed length now in order to fit the time slot. This sounds similar to the comparison between CSMA and TDMA, only this being at the circuit level.

Crossbar-switched backplane faces three scenarios that may block signal from transmitting. First is called head-of-line (HOL) blocking, which comes from scheduling multiple FIFO queues. The top element prevents the ones after from participating. Input and output blocking are caused by contention in accessing the hardware.

The way to deal with HOL blocking is to move away from FIFO and use something called Virtual output queuing (VOQ). Input maintains a queue for each output port, and sort the incoming packet into the corresponding queue before choosing one VOQ as input to enter the crossbar switch. In a sense, it adds an additional multiplexing stage. A simple scheduling algorithm called iSLIP is used to schedule these VOQs. iSLIP is two-sided (hand-shaking agreement), where output-side uses RR and input-side uses priority.

I wish the VOQ stuffs can be illustrated more in the paper. Sometimes it's hard to picture how these things are implemented. More importantly, they don't give a sense of what the overhead and cost are, even though the paper seems to focus heavily on implementation.

The paper gives a rough overall picture of the tread in router design. It gives a good sense to beginners like me in knowing about how queuing is done in routers. I recommend keeping this paper, especially if you'll need to read the optics paper.