[info] the failure of networked systems
Eugen Leitl
<eugen at leitl.org> on
Tue Jan 8 14:45:29 UTC 2008
http://anz.theoildrum.com/node/3377#more
The Failure of Networked Systems
Posted by aeldric on January 6, 2008 - 10:00am in TOD: Australia/New Zealand
Topic: Miscellaneous
Tags: complex systems, networks, peak oil (list all tags)
There are those among the Peak Oil community who suspect that we could be
facing a failure of our interdependent society that may be sudden, profound,
and complete. I have repeatedly said that I am not numbered among them. My
opinion is that our way of life will have to change significantly, but
slowly. I don’t expect to be clubbing anybody with a femur in any foreseeable
future. This opinion is on record in both print and electronic media, and I
don’t expect to be issuing a retraction any time soon.... but a recent event
forced me to admit that I may have to hedge a little.
Network
Our internal network here has been having problems. My email (and more
importantly my access to TOD) has been very unreliable over the last two
days. The network regularly flicked from "working" to "failed" in the blink
of an eye. I was reminded that the speed of collapse in a network is often a
function of the natural frequency (speed) of the network, while the breadth
of failure depends on a number of factors, including load and the degree of
interdependence within the network.
The problem was eventually traced to a problem with one piece of software on
one machine on our intranet. The software drivers for the network interface
card on one machine were corrupt.
This raised a question in my mind: The Internet Protocol was originally
designed to be a robust, reliable, redundant system. How does one piece of
software on one machine bring down a network with thousands of nodes?
The answer is easy: Cost efficiencies.
Our Intranet network could have been built to be reliable, but instead it was
built to be "efficient". Far from being a network of fail-safe systems, our
network is a network of interdependencies. When the system was loaded, a
single failure brought the whole system down. "Business Efficiency" has
brought our network to its knees for two consecutive days.
I have seen this pattern a lot recently. Last year the power went out in my
city. The power transmission system was heavily loaded one afternoon, when a
single failure brought the whole system down.
Academics have studied failures of complex systems with interesting results.
One of the experiments they did will be familiar to anyone who has ever
played with sand-castles as a child. Build a sand pile by gradually adding
grains of sand. After a while, avalanches start to run down your pile.
Sometimes they are minor, while other times they affect the whole pile. There
is seemingly no way to reliably predict the outcome.
However Per Bak, in his book “How Nature Works” shows that there is an
instructive way to look at this question.
There is a critical angle for piles of sand - a level of steepness that the
slope cannot go beyond without sand starting to roll down the slope. Imagine
that, as you add sand, you colour red all of the areas of the pile that
achieve this critical angle (and are thus on the verge of an avalanche). You
will notice that the red patches appear as tendrils running down the side of
the pile. As you add sand to the pile it gets higher and wider – the pile
gets steeper and more little tendrils of red appear. Eventually you will see
the tendrils of red start to interconnect.
If you drop a grain of sand on a red area then you will precipitate an
avalanche. If the red area is interconnected with other red areas then all
these areas will be drawn into the avalanche. If the red area is isolated,
then the avalanche will be confined to one red tendril running down the side
of the pile.
This basic principal can be applied to my network problem. If one route on
the network gets loaded to capacity (i.e. turns red), the system detects that
it has reached maximum capacity and it delays traffic (piles it higher) or
switches traffic to other routes (spreads wider).
If the other routes were new, unloaded and redundant parts of the network
then this would not be a problem. But they are not. The other routes are
simply other parts of the old, heavily loaded network. Pretty soon all routes
are red, and they are all interconnected. So when one part of the network
fails it passes the traffic to another part of the network, which fails and
your avalanche starts. With all networks connected, all of them are
vulnerable and all fail.
Our network operates at electronic speeds, and it failed with the same
rapidity.
Understanding how this happened is critically important. There are four parts
to creating the complete meltdown of a network:
1. Create a network by building connections between systems.
2. When a particular part of the network approaches overload (goes red),
recognise that this is happening and use the connections you have created to
allow you to switch load to another part of the network.
3. Continue doing this until all areas are red.
4. Now add more load.
When we poured sand on our sand pile we allowed the sand to fall randomly,
and thus the avalanches seemed random. But once we had the ability to monitor
(see our potential “avalanche” areas coloured red), we were able to carefully
divert the sand into other areas. This delays the avalanche, but in the long
run the avalanche is going to be much worse, because it will occur when all
areas are red.
In summary: The ability to measure and monitor the system gives us the
capacity to avoid small avalanches in individual areas. However, if we keep
adding load without adding capacity we overload the entire network and thus
make an all-encompassing avalanche inevitable.
If we can’t add capacity, then it would have been better to allow a series of
small avalanches.
A look at the financial markets at the moment might illustrate the same
point. When we look at the “sub-prime” issues that are emerging, we see that
the market created a series of “Investment Vehicles” that allowed risk to be
shared. A complex network of interdependencies was created to share this
risk, but capacity was not added to deal with the possibility of default. The
various institutions that bought these “Investment Vehicles” thought they
were buying assets, not debts. The institutions failed to recognise that they
needed to add “capacity” in the form of liquidity equal to the possible value
of defaults on this debt. As a result, now that load is being applied (in the
form of defaults) it threatens to bring down the entire network, rather than
just the single “node” that originated the debt.
The critical concept is that monitoring and networking the system allows us
to go right up to the edge of disaster, and then move load to another part of
the network until it, too, is on the edge of disaster.
Now that the networking effects have been discussed, I would like to push the
analogy a bit further and look at how this plays out from a Peak Oil
perspective.
Several years ago, sweet light crude oil started getting a bit more difficult
to obtain. In response, we stopped talking about “oil” and started talking
about “liquids”. The word “liquids” covers Liquefied Natural Gas (LNG),
ethanol, heavy oils, tar sands, and an increasing number of other
oil-substitutes.
Essentially the part of the network called “Sweet Light Crude” turned red, so
we started connecting the "Oil Network" to other networks.
We connected oil to the “food” network by turning food into ethanol. Actually
food was already connected because you need oil to make food in the modern
world, but now the circle is complete – previously we used oil to create
food, and now we use food (corn, sugar, palm oil, etc) to create oil (or
oil-substitutes).
Adding LNG and CTL (Coal-To-Liquid) to the network connects oil to other
energy sources. As this connection strengthens and load starts to be applied,
a shortage of any of these sources would have an impact in each of the other
sources. To some extent, this has already started to occur.
Adding tar sands and various other oil substitutes to the network has made a
surprisingomputer’s blue cable isn’t likely to run hot, but our finance
system is a network of networks, and it is glowing red. In addition to
monitoring and communication, the financial system provides support for
maintenance and upgrades of the energy systems, so capacity in the financial
system is critical.
When one part of the network develops a problem (say production of LNG
suddenly drops), then messages get sent via the financial system (in the form
of increased prices), and the other parts of the system accept the load, if
they can, by increasing production. When compared to an Internet Protocol
network there are many faults in this system. High latency leads to slow
responses. Poor monitoring leads to conflicting signals or a failure to
detect faults. Bad messages are often not corrected, leading to incorrect
responses, and so on. The speed of a crash
The interesting point to note is that increasing demand past capacity will
not immediately “crash” this system. Oil facilities that are working at
capacity will not “crash” if demand exceeds the capacity, they will simply
continue working at capacity. The crash may come, but it will come because
demand heats up the financial system and crashes other systems that depend on
finances. Since the oil production system is dependent on other systems, this
could conceivably cause an eventual crash. Eventually lack of maintenance
will degrade the capacity, but this is a process that occurs over a period of
months or years.
Likewise, the process of adding capacity is exceptionally slow. Building CTL
or NGL plants takes the best part of a decade.
The oil production system can certainly crash, but it would be a crash in
slow motion.
The only part of the system that can crash quickly is the financial system.
The financial system provides monitoring, communication, maintenance and
upgrades. So a profound, complete crash in this area could conceivably bring
down the whole network.
However, could such a financial crash occur? An immediate halt to oil
production would require a crash far more profound than the Great Depression.
The response speed of our financial system has been improved by linking many
of the sub-systems electrobreakdown of our financial institutions is unlikely
to happen overnight.
If this system crashes overnight, it will be because the plug got pulled – a
breakdown of society external to the system.
The natural frequency for events in the oil and oil-substitute network is in
the range of months at least, or more likely years. Internal stresses cannot
cause it to crash overnight. The Breadth of a crash
The breadth of the crash depends on the degree of linkage and the degree to
which each part of the network is loaded. This is where I start to worry.
Oil appears to be at or near peak capacity - exports are dropping. As for the
food network - world grain reserves are at historic lows, and expected to
drop a little more next year. And the environment? Climate change is clearly
with us, indicating that the environment has already gone past its capacity.
When looked at in these terms it appears that the network is already in
decline. Each of these three parts of the network is at or past capacity. If
a span of years is the natural time-frame for a crash in this system, then it
seems quite plausible that we are watching a very broad-based crash of our
energy systems - right now.
Our actions in increasing the connections to the food and environment
networks will not help, and may simply speed the crash.
The signals indicating the start of a crash would be seen in the monitoring
and communication system – the financial systems. Prices for oil would go up.
Which we have seen… Prices for food would go up. Which we have seen....We
might expect perturbations, volatility, and attempts to “price” the
environment.... Hmmmm. Conclusion
I am forced to concede that a broad-based collapse is a possibility. I still
maintain that a sudden collapse is unlikely, but if it is already happening,
then it could certainly look sudden when we eventually notice it.
More information about the info
mailing list