Monday, November 24, 2008

Map-Reduce Performance

So this is Andy and Matei's fantastic OSDI work, improving the performance of the speculative execution of Hadoop. It turns out to be pretty simple, as hadoop does everything in a very naive way. Good work, and great problem selection. Lastly, great acronym selection. Matei is great at that.

Firstly, this work lies under the huge shadow of Google. No one really knows what their system does. They published in 2002 (I think), and hadoop is just about catching up to their system at that time. It's a solid academic area, but there's an open question as to the novelty of the work because of that.

This particular work is likely pretty novel, as Google has a fairly homogeneous network. I think it's 3 types of systems at any point.

Another question revolves around map-reduce itself. I think it's telling how few research efforts actually USE the system, rather than work on the system. I've found that there's a set of problems that it solves, and it's really not a huge set. They're pretty much the uses we've found in the rad lab, log searching. It doesn't build services. It doesn't scale databases (yet). The parlab has map-reduce as one of the "dwarves", a series of solved parallel problems. It's not the key one, as some in systems seem to think.

So the key, in my mind, is to expand the map-reduce problems. My understanding is that MSR is doing this, Yahoo as well. Turning it into a non-consistent database is interesting. Generalizing the model to encompass other parallel problems is interesting. However, the original map-reduce paper never would have been published without Google backing it. It took the experience of it's deployment to get attention. How many great parallel architectures sit in dusty conference proceedings? How do we change this research area so that the simplicity of the map-reduce model is seen as an advantage?

Lastly, I think you should have people read the map-reduce paper and stonebreaker's nagging paper.

Policy Aware Switching

Firstly, I'm going to be out of class on Tuesday. I hope you guys have a good talk.

I couldn't finish it, 26 pages for something I've listened to numerous talks about. It's quite likely I've missed some details, but the idea is pretty simple:

Overlays give us a few advantages, one of which is in routing. We use that in this in the PAS to enforce routes that packets must fly by to reach their endpoint. For instance, we can enforce, in the overlay alone, that a packet must go through a firewall before hitting me.

This gives huge flexibility, all the flexibility you'd expect an overlay to allow. In my nascent graduate career, I first worked on a project that lined up very well with this. PAS gives you various abilities. You can do statistical tricks, allowing packets through some time, for active testing or load balancing.

It has one huge problem though, which is something I've talked about a few times on this blog. It's not optimizing for the common case. They want performance, and this doesn't increase it. For top-tier datacenter companies such as Google, this is totally useless. They'd much rather pay their network engineers a larger amount and get better utilization. It does make sense for smaller datacenters, but not really, as they have fewer configuration problems. Lastly, the engineers themselves are not likely to adopt technology that reduces their employment.

So this is neat work, and with it you can do some really cool things with a network. It's not targeted correctly. I can think of some good uses for this, mostly from ISPs and the like who may want more direct control of their traffic. For instance, they'd like to push all P2P people over some thinner line. That's not as sexy (or well funded) as datacenters.

Wednesday, November 19, 2008

App Layer Multicast

Ok, overlays give you multicast. I see that. This is an alright system for that, hierarchy. That's cool.

I could be that I'm tired, but I just don't have a lot to say about either of these items. Great, multicast. Cool. There's nothing elegant or useful here. Nothing mind-blowing, just "if I had to build a multicast overlay, this is what I'd do".

The related work section shows some of this. They don't really provide an argument as to why their system is superior to the others. They just give a big list of other work.


This describes Scalable, Reliable, Multicast. It uses some naming and ack tricks to get basic functionality, then they implement a whiteboard app with it.

Ok, so nothing tricky. They pointed out that Multicast doesn't fit with end to end, which is obvious enough to have been a brilliant observation. I now want to write a paper on end (to end)^n just so that it exists. It's a good point that I don't think I've entirely digested. In the overlay, you just create n^2 connections, each end-to-end. Are these equivalent? The notion of fate-sharing was what they used to differentiate the two. This is tricky, and I'm not in the right place to reason through this correctly.

There's a somewhat tangential interesting point there. Multicast is equivalent to many unicasts. If you buy the hirearchy, you can implement all of this above that. This is probably why multicast doesn't matter that much. With that, we get to keep end-to-end and the existing routing infrastructure, at the loss of performance for a very small set of applications.

Either way, they pretty much implemented bittorrent in the paper filler, explaining possible uses of multicast with caching. That's when you know it's an important work: something like a "future work" section creates an industry.

Overall, I probably don't have much to say here. It's not too surprising how they did this, and that they punted the ordering stuff to a higher layer.

Monday, November 17, 2008


This is going to be brief, as I actually work on this. I'm currently implementing version 6 of the bundle protocol and TCP convergence layer in python. I've decided to be completely negative here as well. Hopefully others will have the discussion of why this is so fucking sweet.

The biggest problem in DTN is that it doesn't solve a lot of the problems it proposes. Satellite networks? Sure, those are multihop with limited connectivity. Developing regions? Ehhh... most connections are one hop. Cell phone to tower is the only connection with disruption. Everything internal is good. Sensor networks? They don't want the abstractions, as the overhead is too high. Also, established wireless protocols seem to work fine.

The key thing about DTN is that it's so general. It applies to every network, but it isn't optimized for the characteristics of the network itself. With that, it has to compete with networks specifically designed for the properties of the network. It can't. We need graceful degradation, allowing for transitions as network conditions change. DTN is a big broad hammer, encompassing everything. The next step is to morph that into something that can compete with scapels, not just networks where nothing currently works.


DOT describes the "data-oriented transfer" service. The idea here is to separate the data transfer, and stick another layer in our fat protocol stack. It's really heavily related to my class project, actually. Lots of the same issues, more people trying to get practical value out of the DHT work.

I'm not totally sold on loving or hating it. It seems completely unreasonable, but I'm not sure I can put my finger on why. It doesn't optimize the common case, as they point out. Flexibility is nice, but only when you are solving a problem. They are not solving a problem really, rather just trying to allow for a design space where there currently isn't one. This means a system like this will never be used, but it doesn't mean it's bad work.

I should note, that one metric for success the authors had is that the SMTP client they build was robust enough for one of the two authors to use. I'm sold.

I do agree that this is more elegant. That may be the fact that I'm currently implementing a system with a lot of parallels to this. I think my biases will be laid bare in class.

In conclusion, this paper may be an argument as to why my yelling about metrics last class was wrong.

Wednesday, November 12, 2008


What's the X stand for? Extensible? Extra?

Anyhow, X-Trace is Rodrigo and George's work on pushing tracing into the appropriate layers to get a view of the entire system. With this, debugging is easier. It does this in the obvious ways, and has the obvious problems.

The big problem for this is adoption. To use X-Trace, you have to instrument it, which means writing specific code in each protocol you want to trace. That's a huge headache. However, they deal with this well, allowing (limited) incremental adoption as well as focusing on an area (datacenters) that has the amount of top-down control needed to push such a sweeping change.

Performance is another problem. They analyzed the performance hit of their daemons, and showed it's not a huge problem. However their apache trace took a 15% hit. Assuming this is high as it's research code, that's still huge. The obvious thing to me is to instrument only a limited set of things, or to be able to turn it off until you have an error, then try to reproduce it on a limited number of servers running the service. I think they eventually did this as well.

I wonder how related this is to debugging on multicores. I argued with andy about wanting a distributed GDB, and this is approaching the ballpark of such a system.