[info] a million lines of code
Eugen Leitl
<eugen at leitl.org> on
Mon Jan 21 10:47:41 UTC 2008
http://www.embedded.com/columns/technicalinsights/205604461?printable=true
A Million Lines of Code
Programs on the scale of a million lines of code are getting more common. But
how big is that?
By Jack Ganssle
Embedded.com
(01/14/08, 12:41:00 PM EST)
A million lines of code. It's a number bandied about more than ever as
software sizes develop overactive pituitaries. Some cell phones use upwards
of five million. Vista reputedly has 50 million. Everett Dirksen once may
have said: "A billion here, a billion there, pretty soon you're talking real
money." Well, a million lines of code here, a million there, pretty soon
you're talking about a program that is as mind boggling and incomprehensible
as our national debt.
A million lines of code printed out would be 18,000 pages. That's a stack six
feet tall (on typical 20 pound paper). Ironically, the listing weighs in at
180 pounds while the actual operating code is mass-free; it'll live in a
fraction of a gram of silicon. Like DNA, code's human-readable description
requires tremendously more mass than its actual instantiation.
A million lines of code is probably on the order of 20 million instructions,
or 600 million bits. That's not far off of the 3 billions base pairs in human
DNA. Unlike DNA, which has redundancies and so-called "junk" sequences, every
single bit in the code must be perfect. A single error causes greater or
lesser failure.
Since a typical atom is around 0.3 nm in diameter, if one had as many atoms
lined up as the number of instructions needed for a million lines of code,
they would stretch 10 cm. That many Ebola viruses would stretch 15 meters.
A million lines of code is as long as 14 copies of War And Peace, 25 of
Ulysses, 63 copies of The Catcher in the Rye, or 66 copies of K&R's C
Programming Language.
A million lines of code is not ten times more than 100,000. It's well-known
that schedules grow faster than the code. Barry Boehm estimates the exponent
is around 1.35 for embedded software. So the schedule for developing a
million lines of code is 22 times bigger than for 100,000 LOC.
In the March, 1996 issue of Computer Watts Humphrey published crude rules of
thumb for estimating software projects. Though hardly scientific, they do
give a sense of scale. Using his estimates:
A million lines of code require 40,000 pages of external documentation.
A million lines of code will typically have 100,000 bugs pre-test.
Best-in-class organizations will ship with around 1k bugs still lurking. The
rest of us will do worse by an order of magnitude.
A million lines of code will occupy 67 people (including testers, tech
writers, developers, etc) for 40 months, or 223 person-years. Darwin needed
just 1.5 person-years to write The Origin of the Species. Scale that to the
26 copies equal in length of a million lines of code, and it appears writing
code is some 6 times more time-consuming than writing a revolutionary
scientific tome.
A million lines of code costs $20m to $40m. That's one or two 60s-era F-4
fighter jets (in today's dollars), a tenth of an F-22, a thousand cars or
more (in America), nearly 20,000 Tata Nano cars, ten million gallons of gas,
seven times the inflation-adjusted cost of the Eniac, and a million times the
cost of the flash chips it lives in.
Think about that last analogy: A million times the cost of the flash chips.
Yet accounting screams over each added penny in recurring costs, while
chanting the dual mantras "software is free," and "hey, it's only a software
change."
Jack G. Ganssle is a lecturer and consultant on embedded development issues.
He conducts seminars on embedded systems and helps companies with their
embedded challenges. Contact him at jack at ganssle.com. His website is
www.ganssle.com.
More information about the info
mailing list