logo design...copyright Elena Yatzeck, 2010-2015

Friday, February 20, 2015

Kryptonite Handbook, or Why You Can and Must Measure Software Development Productivity

The title of Martin Fowler's classic 2003 "bliki" post, "CannotMeasureProductivity," almost says it all.  Why "almost?"  Because, by the end of the entry and to complete the thought, Fowler concludes that we also should not even attempt it:
I can see why measuring productivity is so seductive. If we could do it we could assess software much more easily and objectively than we can now. But false measures only make things worse. (CMP, final paragraph)
Ten years later, Fowler tweets that this sentiment is "still true and still needs to be said," although he qualifies this statement to clarify that "You just have to go with whatever subjective assessment appeals to you. Just remember that it is subjective."

Productivity measures are depicted here as some kind of mystical siren, luring unwary executives onto the rocks of wanting to get best value for their investment dollar.

Ulysses and the Sirens by J. W. Waterhouse
But let's posit an imaginary world, for a moment, (let's call it "Krypton"), where an IT executive wants to set up a slew of non-homogenous software teams to produce highly effective software at the lowest possible cost to her budget.  Crazy, I know, but bear with me.

In my imaginary world, she wants to squeeze the most out of every dollar she spends. 
  • She wants to be able to compare individual teams against each other using some clear rules, so she can find out what levers to push to help all of the teams do better.
  • She wants to compare teams with themselves, release on release, to see if they are improving, and what helps teams improve the fastest.
  • Where she can, she wants to reduce maintenance costs, because those do nothing but keep her in business.
  • She doesn't want to be lied to or surprised.
  • She may even want to prove that "Agile" ideas that are being applied in her organization are working well, and more people should use them. 
Is there a way to do this?  Well, yes, in my imaginary world, there is.  She should:
  • Decide on a reasonable objectively measurable common denominator, such as "thousand lines of code" or "thousand function points."  Note that "function points" are not objective if they are hand calculated.  "Story points" are never objective or fungible across teams.
  • Agree on a locally meaningful and objectively measurable portfolio of numerators.  Using readily available system-of-record data from her HR system, her CI framework, her SONAR dashboard, and her test management repository, she can calculate things such as "number of functional defects released to production," "person hours to create," "elapsed time to create," "number of production outages created after release," "cost to maintain," "cyclomatic complexity," "code duplication percentage," "new customers using the web site," "operational people laid off," or whatever data points she has that she can link in some way to the release of a piece of software.
  • Start measuring all teams on the chosen portfolio of these measures, graphing everyone together.  Make sure the data results match her gut feeling, or correct either the data collection method or her gut.  
  • Interview and observe "winning" teams to see what seems to work.  Test the hypothesis by asking other teams to try the winning strategies.  Because this is an imaginary world, she understands that the less productive teams may be saddled with decades of built-up technical debt, and she decides to measure that too.
  • Keep refining:  add dimensional data as she goes, to test hypotheses about "what makes teams write great code quickly."  She could include things like: "team uses my mandatory standard tools" or "self-described agile team," or "verifiably agile team, using my company's definition," or "team collocation," or "team pay rate," or "percentage of team which is vendors," or virtually anything else that might be posited as making a difference.
In this imaginary world, the executive would create a virtuous, self-reinforcing cycle of what she would call "Plan-Do-Check-Act," and the aggregate productivity of the whole organization would increase, and so would the organizations overall productivity.  She would be happy.  But.

You can see exactly why this IT executive is imaginary, now, because in the real world, the moment I published the words "lines of code," even in small print, many readers would have stopped reading, or even started to shout at the screen.  Measurement of software productivity by lines of code is Kryptonite to most professional software developers today.

From the helpful:  http://superman.wikia.com/wiki/Kryptonite
There are many legitimate reasons why software teams do not want to see their lines of code as a "product."  Here is an entire wiki page about that.  I want to make it clear that I understand:
  • Quantity is not the same as value.  The code itself is nothing until it is used to generate a profit for the organization.  
  • Profit itself must be measured over more time than just one year.
  • Less is more.  If the same coding task can be completed with less lines of code, not more, the smaller code base is a better one than the larger one.
  • Some languages inherently require more lines of code than others.
  • People will just game the system.
  • And so on.
But note that in this imaginary world, we aren't just counting up the lines created per day and declaring ourselves productive.  We are using the following precautions:
  • Lines of code are never used alone.  They are used as a standard denominator. 
  • We do not resolve to just one number.  We keep a set of, say, five to ten, measures, so even if people want to game the system, they can't claim the team creating the most lines is the most productive.  To impress, our teams have to simultaneously be fast AND good, measured by one or more of the 14 ISO SQuaRE dimensions.
  • We consider a set of data points which can be collected in an automated manner during the development activity, and which are not particularly subject to gaming:  duration in days to create 1000 lines, person hours needed to create the same number, number of functional and non-functional defects in the same number, etc.
  • We are capturing dimensions which prevent apples-to-apples comparisons in our environment, so we don't base policy on something stupid, as we go.  If we know one application is COBOL and one is Ruby, we don't compare them directly any more, where that distinction is skewing the results. On the other hand, if a new application is about to be built, we would want to know which languages are correlated with the fastest teams in our organization.
  • We do not insist on the exact same portfolio of measures for every organization.  Our executive (let's call her "Lara"), may care about functional defects, and weeks to produce, at least at first.  Her husband ("Jor-El") works for a different firm where they care more about individual hours spent and unit test coverage.  Totally fine.  Within the subjective value system of each organization, we can objectively measure just the things that matter.
  • We do not confuse this type of ongoing productivity improvement activity with "dashboarding" to keep projects on track.  It's not an either-or.  Both are needed.  Productivity measurement is a retrospective activity inherently done in the aggregate.  Our imaginary executive looks at burn-ups, burn-downs, and predictability, and she is ready to pull the plug on projects that are not going well, or to reduce scope if projects can't deliver on time, based on metrics.  She watches to make sure people are using the right tools.  But those are not productivity activities.  Those are what she calls "management."
Could this Kryptonite be something that deserves another look here on Earth?

I will conclude with an observation from the Superman Wiki, which reports that recently, "a mining group in Serbia discovered a new mineral, called Jadarite, the chemical composition of which is sodium lithium boron silicate hydroxide, the chemical formula for Kryptonite written on a case of the substance in the film Superman Returns minus fluoride....The new mineral is white, or pinkish under UV light, hard in texture but chalky in appearance, and made of tiny crystals less than 5 microns in diameter.    It isn't thought to possess any of Kryptonite's supernatural powers."

But does it?