Shippin' Ain't Easy

I think back to my first few years as a software developer, happy to lock myself in my office and bang out code until 2 or 3am on a regular basis. No feature was too big, and no bug fix too small. I thought of things purely in the context of how long it would take me to fix the issue. I didn't think about the QA cost to verify it (I knew I wasn't going to break anything, honest!), or the cost to doc it or whatever. In a very developer-centric culture, that's how I looked at the world.

One thing I couldn't understand is why software got shipped with bugs that were known about before shipping. "It's such an easy fix!" I thought. I even remember some of my first experiences with more senior and experienced people rationalizing with me why a particular bug I felt passionate about wouldn't get fixed for the final release. Nooooooooo!

So today I see this article on CNET about a set of customers wanting a Beta 3 of VS 2005, much of which seems to be fueled by frustration around the number of "Postponed" bugs that customers are seeing from their MSDN Feedback filed bugs. In the user-comments for the bug there seems to three main issues:

  1. Too many postponed bugs = there must be a lot of bugs in the product = it must not be ready to ship
  2. The ___________ feature area works in a way I'm not 100% happy with or is missing a feature I really want
  3. The product doesn't seem/feel ready to ship

I'm going to tackle (1) last because I'll probably get long-winded on it (imagine that). For (2), we hear you, and appreciate the feedback. Unfortunately, with a product that's as widely scoped as Visual Studio and the .NET Framework there is always a ton of things that some set of users wish were different, or more featured, or whatever. Sometimes these things reflect shortcomings in how the feature is explained or documented. Sometimes it just takes some time to get used to a new way of doing things. This is good feedback in all of these cases but at some point you have to ship. If you keep changing/adding/reworking things you just will never get there, period. Of the comments that I saw in the MSDN Feedback report, none of which sounded (to me) like "ship-stopper" issues that render the whole product or major parts of it useless. Holding up the product for those things would penalize all of users for incremental benefits to a few.

Issue type (3) worries me and I'd like to understand more about what that is. The comments were specific on the issues for (2) but vague in (3). Some of them were issues in CTPs which are really "what you see is what you get" releases and we try to be as transparent as possible that this-or-that may-or-may-not work in a given build. I will tell you that we've fixed a LOT of bugs since Beta 2 and while I thought Beta 2 to be one of the best Betas we've ever shipped (some people think it was better than some of the products we've shipped), we're that much further along at this point in time. There has been a lot more testing around large scenarios, end-to-end application writing, and compatibility since Beta 2 and that has flushed out a lot of issues. There was some mention of performance in here too, but, again, no specifics.

Now, to the postponed bugs issue – I think this is really about transparency. This is how you ship software, regardless of what company you're in. Let's all have a big hug and admit it: large, complex software projects always have issues, be they actual functional bugs or "I don't like/understand how this works" bugs. Eclipse has bugs. Flash has bugs. J2EE has bugs. Halo has bugs. Firefox has bugs. My Java-powered cell phone has bugs. Linux has bugs. Windows has bugs. It's super-duper-extra-whizzy-important that people remember that not every bug is noticeable by most users, or really impacts users in a significant way. This is life, and the question is how you drive out the bugs that really matter.

At some point you need to say "okay, the product is doing all the things we set out to do in our original scenarios". You have a set of "exit criteria" around performance, stress, compatibility, etc. and you start working towards those. You define a bar for what types of bugs you'll be taking, and then you triage off all the bugs that don't meet that bar. There tends to be a lot of arguing about the bar and everyone has a set of bugs that don't quite meet it but they really really really want to fix them anyway. Insert more arguing here. It's a massive balancing act: how do I ship quality software that will do the right thing for my users and still close it down and get it out the door with known issues? Is this paradoxical? For large software projects, at some point this becomes a treadmill. You could literally keep at it forever if you kept fixing all the bugs. Multiply this by the complexity factor: in DevDiv, we have about 25 teams managing all kinds of dependancies with each other that need to line up. If you keep perturbing the system, you'll never get there, especially when it comes to sensitive things like stress results.

Sounds harsh, huh? Well, it's not a ton of fun. It's gut-wrenching sometimes to come to a consensus that some bug is just too risky or to corner-case to fix and chance breaking other stuff. See, it's really about risk. For every 10 bugs you fix, you're likely to introduce 1-2 bugs, that's just how it goes. It's better if you've got a great automated test infrastructure in place, but even then it's not perfect. For example, imagine if those automated tests take a week to run end-to-end, or you've got areas of the product that aren't readily automatable (like certain kinds of UI, etc). And every change you make has to go through that verification process. It's slow, and it's expensive, so you need to make sure you're taking the right fixes through there. These are the things I didn't understand at the beginning of my career.

Okay now, back to you. What if you're a user that's come across a bug that you want to report? If it's just going to get postponed, why bother? Well you should still report it. A known issue is always better than an unknown one. By reporting it, our triage teams get a chance to repro it and have a real conversation about the issue: how bad is it? Is it a cosmetic issue or a functional one? Is there any chance of user-data loss? Is it a security issue? Does it affect accessibility? how many users will hit it? is there a workaround? can we put the workaround into the bug so the user isn't blocked? can PSS write a KB article about it? Would we issue a Hotfix for this bug?

So at the end of the day it's not really about money or just pushing the product out there and saying "to heck with it!" We know darn well that shipping a low quality piece of software is not the right thing for us or our customers; ask me some time how much each QFE (Hotfix) costs in terms of $$$, risk (to us and our customer), and resources (to us and our customer). Trust me, it's not small so we do whatever we can to prevent them in the first place. But that’s another question.

It turns out shipping software is hard, and being transparent about the process also exposes some of the rough spots. We're learning and it's a big shift for us, and for our developer community. The input we've gotten through MSDN Feedback has been absolutely invaluable and I believe that it's going to result in a product quality that's vastly superior to prior versions -- so many bugs have been found by the community and fixed (about 800 in Windows Forms runtime and designer alone), and many of these are things we would have never been able to find internally. It's goodness.

But if you're filing bugs on MSDN Feedback that you really, truely, honestly believe are ship-stopping bugs, reactivate the bug and add your justification as to why you believe that. Write the scenario that's broken and why it's so important, or ask for more information about why it doesn't meet the bar. State that well, and the right things will happen.

Print | posted @ Monday, August 15, 2005 6:32 PM

Comments have been closed on this topic.