Star Ford

Essays on lots of things since 1989.

Code forests

This paper is about a layering paradigm for enterprise scale software called a “code forest”. The paradigm is a tree-shaped database of elements that compose the code base, with their full content and revision history. Developers edit the database rather than the file system. I will get into details on what that means, but first want to start with list of problems that the approach improves upon.

Why we need code forests

In my examples I’m using C#-like code and terminology but it is the same concept with java or any language designed for enterprise-scale software. In this context “enterprise” means possibly millions of lines of code and multiple tiers with overlapping legacy and new products.

Some problems with today’s large code bases:

  • Depending on the language there are now three or more competing naming systems permeating the code base. These include the class names and optionally heirarchical namespaces of classes; names in the file system; and names of folders, projects and “solutions”. The only reason for the complication is the history of adding tools on top of other tools; it is not needed. A code forest only has one naming scheme.
  • Developers are usually forced to deal with source files. The base unit in programming and compiling is traditionally the file, but it does not have to be that way. A source file is not a meaningful concept in the compiled product. A code forest allows you to work with named elements in the forest, not files.
  • Developers are currently forced to deal with deployment considerations when writing classes. A code forest allows deployment decisions to be made completely separately.
  • The skills and other team characteristics needed to manage a code base are different than skills for writing classes and methods. A code forest helps teams do forest management as distinct from code quality management.
  • Documentation and understanding often decline as code bases get bigger. A code forest organizes code with documentation about code structure in the same place as the code itself.
  • Unwanted dependencies and friend dependencies creep into code bases as teams get bigger. A code forest makes creating dependencies an action that you have to take explicitly, so there can be no dependencies creeping in accidentally.
  • Source control is standard practice now, but technically it is an optional layer on top of a non-source-controlled system, and it can be broken, avoided, worked around, misused, or not be fully integrated with the development tool, leading to complications. A code forest is source control to begin with, so it is impossible to not use source control with it.
  • Developers can spend a great deal of time recompiling “the universe” of code when only one line changed. The compiler is usually too unaware of the code layering to optimize away unnecessary work. A code forest allows compiling to be based on changes only.
  • Visibility of class members is not flexible enough, with the options of public, private and protected members. Sometimes you need visibility in more complex ways, but we end up making too much public. Code forests control visibility exactly.

Basic definition of the code forest

A code forest is a tree (a directed acyclic graph with any number of roots) of nodes (which I’ll call “code elements“) along with the revision history of each element and a map of all dependencies between all elements. It can also include the concepts of code branches, commits, and other source control features.

Each element is composed of an expression in the form “visibility-spec name = element-definition” and a separate area for typing definitional or contract comments. Some example elements are shown here:

  • public A = 3
  • B = int (string s) { return s.Length; }
  • visible C = class { … }

The examples show a variable element, a function element, and a class element, respectively. The class element will have child elements inside it. The only type of element that allows fairly long definitions is the function body. Since most functions are ideally less than 20 lines, that means source control is operating on much smaller units than we are used to.

You may be questioning the function and class syntax. I am not concerned with exact syntax in this paper. There are many function syntaxes, and the one used here is chosen simply because it puts the name on the left of the equals sign so it is consistent with all other definitions. We are assuming that the type of any element is unambiguous from the definition, so in the example, A is known to be an integer. Classes also use the name = class syntax.

To organize millions of lines of code, one can think of all those millions of lines in one giant file with a lot of nesting. Of course you would not display it that way because of its size, but that is one logical way to display it. Replacing brackets with indented bullets to indicate the tree shape, that would look like this example:

  • PersistentData =
    • public Person = class
      • Name = “”
      • IsSally = bool () { return Name == “Sally”; }
    • Team = class
      • Members = new List<Person>;
  • UI =
    • Person = PersistentData.Person
    • ThisUser = (Person)null;

A team of developers can be branching, editing and merging elements all at the same time. The editable unit is the element; there is no need to “check out” or edit whole classes as a unit.

Layer views

You can also look at a code forest visually, showing boxes for the organizational classes and arrows denoting dependencies. Here is an example:

The example comes from an earlier paper “Megaworkarounds” –

The advantage to this kind of view is that it shows how the code is layered. Tools can also allow you to draw layers and drag elements to change the structure of the code base. For example, you could draw a box around a number of functions dealing with the same thing in an overly complex class and create an encapsulating layer. Read the rest of this entry »

Leave a comment »

Complete and incomplete covers in engineering

I confess I have been irritated my whole life about car dashboard controls for heating and cooling because they are an incomplete cover for the complexity that is going on inside. It has been a rough few decades for user interface enthusiasts!

What is a complete cover? It is a layer or shell over some machine complexity that completely hides it and does not let any of the complexity out. A cover is incomplete if it forces you to understand what is going on underneath, or if it is confusing when you do not understand, or if the cover is insufficient to operate all aspects of the machine. A cover can be thick or thin – the thicker the cover, the more it changes the paradigm of the machine interaction. A cover is optimal when it is complete, regardless of whether it is thick, thin, or absent. Sometimes it is optimal to have no cover.

I will explain this with some of examples, starting with a mechanical mercury thermostat. There are three kinds of people in relation to these devices: (1) Those with a gut fear reaction when they look at dials and numbers; (2) those who understand the two exposed dials – measured temperature and set point – but do not know or care how it works inside; and (3) those who understand that the rotation of the temperature-sensitive coil which is superimposed on the rotation of the set point tips a mercury switch, that the bi-stable 2-lobed shape of the mercury chamber affects the temperature swing, and why mercury is used in the first place. It is a lovely thing, but not really the scope of this paper. I am mainly concerned with the middle category of people who are functional operators of the cover and what kind of cover it is.


The thermostat is a complete cover because you can operate every aspect of the heater with it, without needing to know how it works. It is also a fairly thick cover in the sense that it translates one paradigm to another. The actual heater requires an on/off switch to work, thus the only language it understands is on/off. But the thermostat exposes a set point to the user. It translates the language of on/off to the language of set points. Someone could replace the whole heater and wiring with a different inside paradigm but leave the exposed paradigm there, and the user would not need to know that anything changed, because the operation stays the same. In many systems – especially software systems, the replaceability of layers is an important design point, and complete coverage is one of the factors that makes it possible. Read the rest of this entry »

1 Comment »

Appliance model of computing

This longish article about what is now called cloud computing was before its time in a way, with none of the new terminology. It’s still ahead of the implementation, and I keep thinking I’ll implement some of it.  In short, it describes an internet operating system (i-glue) based on blackbox standards, rich protocols, and replaceable appliances. Read the rest of this entry »

Leave a comment »