Monday, May 03, 2010

Software for Simulation and Learning

Software is usually written for effect. We want it to do something for us. But there is another way to look at software. A program becomes a kind of rather dry textbook on its subject, with the extra benefit of being runnable. When writing this "textbook" we need to learn a lot about the domain of the system, sometimes deeper and more precisely than even experts in the field need to. They can often get away with handwaving but there is no way to wave your hands at a computer. Well, you can wave them all you like but it won't do you any good.

Once you've figured the domain out, the knowledge has been made explicit in code for others to read. Many intellectual fields stop there, with a textual description. A textbook. The magic of programming is that the program can also run. You can play with it, study its behavior, write unit tests illuminating standard and special cases. The program is not only a formal description, it is also a simulation, a runnable model of some domain, a toy universe. This duality is a very powerful and unusual teaching tool.

Gerald Sussman, of Scheme fame, has expressed this much more eloquently than I can in his lecture "Why programming is a good medium for expressing poorly understood and sloppily formulated ideas"(unfortunately only available to ACM members).
I had the good fortune to see Professor Sussman deliver the lecture at OOPSLA 2005.
Sussman borrowed the title from a 1960s Marvin Minsky paper. The idea is that a poorly understood and sloppily formulated domain will become illuminated and stringently formulated by expressing it in code. A domain that already is well understood and can be expressed formally is also best described and taught in the form of software. Sussman has actually co-written two works incorporating these ideas: "The Structure and Interpretation of ComputerPrograms" which contains software simulations of digital electronics and a register machine (a simple model of a computer), and more explicitly in "The Structure and Interpretation of Classical Mechanics", which is about advanced mechanics.
There are signs that this idea is spreading: the SEC is apparently considering expresing its regulations as Python code in the future.

All programmers have heard that code is read many more times than it is modified, that you should code for humans, not for the compiler, etc. This advice is given with the intention of making programs maintainable, not with the idea that software is an important knowledge repository and teaching tool in itself. In fact, it is often the ONLY authoritative description of an organization's business rules for instance, because what is in the software is what is actually executed. The only people who really understand a modern business in detail may be the programmers, who quite often are contractors on another continent. They are the only ones exposed to the learning opportunity that the duality of formal description languages and executability provides.

The world is not just about business, there's also conspiracy
theories! :)
I think a few of the more bizarre ones could be laid to rest if a detailed open source multi-simulation of an Apollo moon landing was available.

There are fields where simulation and modelling via software has been embraced wholeheartedly: Economics and Climatology.
Unfortunately they are as "poorly understood" as it gets.
The systems they study are vast, complex, full of feedback loops and work on time scales of years and somtimes decades.
Expressing a poorly understood idea formally can give the model an appearance of precision it does not deserve. "There's no sense being exact about something if you don't even know what you're talking about", as John von Neumann supposedly said.
It is so bad that even practitioners in these fields can't distinguish
between model and reality. People talk about efficient markets as though they really exist and regard Modern Portfolio Theory as prescriptive, not as a model dependent on certain assumptions. Extrapolated data from climate models are used to decide the future of the world. On a smaller scale, over-reliance on models caused problems when the volcanic eruptions in Iceland in early 2010 led to most of Europe's airspace being shut down... not based on actual measurements of dust in the atmosphere but on models predicting how the dust should spread. When KLM and Lufthansa actually performed test flights in supposedly particle-fileld areas there was no problem.

For a software model of a domain to be useful it must be testable against some kind of reality. When the domain is digital electronics or classical mechanics this is done by experiments. In a business environment it is achieved by constantly talking to experts and users, and demoing to them. As for climate, we'll just have to wait a few hundred years.

In order to build a Sussmanesque executable encyclopedia of computable knowledge I believe it would be a good idea to look to an existing successful encyclopedia: Wikipedia. Coincidentally, Jimmy Wales also gave a talk at OOPSLA 2005. Open source backed by a strong community, representing many views, insisting on verifiability is the way to go.