Stress and the Mythical Man Month as it relates to Beta 2

So our VP Soma blogged a bit about why Beta 2 missed March, which is cool. I thought I'd give a little more detail about how the last few weeks went down. One thing we've learned is that the division as a whole can't make progress on stress when there is other churn going on in the tree. For Windows Forms, our stress tests do things like create and destroy forms on 10 threads over and over again for days on end. These tests expose all kinds of things. Windows Forms bugs, weird timing cases, CLR bugs, and sometimes Windows bugs (one of which we decided had probably been there basically forever. Hmmm, is a bug really a bug if no one has ever found it before?). A full stress run takes several days. So when you get to the end of a milestone and have cranked the churn down on the tree, it's really hard to determine when you'll be done with stress, for several reasons. First, is that stress bugs are notoriously hard to track down. If you've got a slow leak, when the app finally dies it may or may not do so anywhere near what's actually causing the leak. There's kind of an art to it. Second, by nature they may take hours or days to reproduce and/or do so erractially. So you make a fix, put it into the system, and only after a few days go by does your confidence that you've actually fixed it go up. With timing issues you sometimes have to resort to sacrifices to the Stress Gods and hope your fix works. And finally, stress issues often hide other issues behind them. So once you fix one and let the app run for another few hours or days, it may uncover another issue. At the end of the day, it just takes a lot of time regardless of how many resources you have to throw at the problem. It's a good process from a quality point of view but not so good for the old schedule predictability...

Print | posted @ Monday, April 18, 2005 3:34 PM

Comments have been closed on this topic.