If a software project is big enough to be “nontrivial” (as programmers say), a company needs not only a system for management of the quality of the engineering product, but also a system for management of the quality of the process – that is, management of management. This meta-management has become a big business itself with companies that just sell processes to software companies, and it involves proving that certain practices and ways of making decisions are likely to produce better outcomes than others. The work process is itself a product, complete with advertising and brand loyalty.
Trends
Here are some of the trends since the 90s that set the context for what is happening now in the industry:
- Systems are getting scaled larger (more simultaneous users), are more open to hacking, and have higher up-time requirements, so they require more people to keep them running.
- New systems don’t appear to be any more complex, but they often contain more records because companies are monopolizing. The size of databases is much larger, but the complexity is the same.
- New systems are more business-critical. Earlier, a business often had a paper backup or a way to work without the software system, but now the company relies on it with no alternative.
- The total number of people involved in automation has exploded. The nerdy prodigy type that is historically associated with programming is now in a small minority; there are not enough of those kinds of people. Most people in IT departments might not even be naturals at the job and many are extroverted, not terribly exact about logic, and much like the society at large.
- More people’s demands are perceived to be important – more stakeholders. In particular, non-technical people are now deciding on the look and feel and flow of systems, and there is a noticeable degradation in internal consistency across business applications.
- In terms of hardware, tools and platforms, the main change is that speed and memory are much, much cheaper. Despite popular belief, there has not been much innovation in operating systems, databases, and languages compared to the period in the 80s and 90s.
Meta-management works with these trends and also aims to prevent the worst problems of the past. Two of the most serious and common problems have been:
- A developer suddenly disappears and no one else knows how to change the system, so it becomes stuck.
- A large project goes way over estimated time and cost, and by the time it is done (if ever), it no longer matches what the company actually needs.
In order to solve those problems, we now have “agile” processes. While there are competing variants, they all aim to solve those two major problems through teamwork (no one person should be irreplaceable) and short planning horizons (projects cannot be months long by design). The idea is to work on very small increments of improvement to a working system and leave the system in a working condition all along, instead of going into the back room and coming out months later with a big finished system. It ensures that a company is getting a continuous stream of value in return for a continuous stream of investment. It is intended to eliminate the risk of losing all the investment due to the two problems above.
Sabotage
In my observations of company productivity, I see about 20% productive people, 60% deadweight, and 20% malicious or destructive people. It does not appear to be possible to change these ratios, because useless and destructive people often have greater powers of persuasion than productive people, and thus they appear essential. Usually they honestly believe they are contributing.
One of the most important meta-management concerns is whether any one person can sabotage a project. Each of the 20% destructors will sabotage it if they can. What we want is a system that restricts the power of the misinformed or malicious to only slightly slow down progress or make a feature slightly worse, but not to completely sabotage it.
Different company organizations offer different ways to sabotage projects:
- In pyramid-shaped organizations that work primarily with delegation, every project gets subdivided into tasks and delegated, further subdivided and delegated so on, making every person accountable to their superior in the pyramid. If any one person is a destructor, their part fails and the whole system fails. The destructors can be identified but perhaps too late. For the same reason, productive people can do a lot and be recognized. This structure is widely considered obsolete.
- Team-based organizations are an answer to the pyramid problem: everyone is replaceable and the productive 20% accomplishes everything by going around everyone else, leveraging a great many more connections between people – not just up and down the chain of command. Sabotage is thwarted by ensuring that no one person has any substantial power, and only teams have collective power. Destructors are difficult to identify, as are producers. An especially productive person is held back to the slower pace of the team.
- Team-pyramids are a way to combine the worst aspects of teams and pyramids. Authority is delegated down through levels in a pyramid, but the engineering product is not passed back up through that pyramid. The product building is done as with teams, but the decisions are in pyramids. This system allows anyone to sabotage the project with no accountability, since the producers are not allowed to work around the deadweight and destructors. My last job recently switched to this system (unknowingly I guess) and the ability of anyone to accomplish anything seized up like an engine without oil.
Other meta-management problems
A key problem of meta-management as a supposedly legitimate field of study is that it requires longer term knowledge than almost anyone has. Most people in the industry are still in their first 10-year project, and hardly anyone lives long enough to have done enough 10-year projects to be able to compare them and have personal experience of which process works better. Therefore most of the claims made about meta-management cannot really be substantiated by experience.
Related to that legitimacy problem is the bizarre worship of process that goes on. Like anything based on faith alone, people become frenetic adherents to their chosen “agile” process, and evangelize it with a giant set of internal vocabulary. There’s normal sounding terms like “runway”, “backlog” and “standup” – and esoteric words (sometimes bordering on religious doctrine) like “kanban”, “scrum”, “manifesto” and “epic”. The words can be used as an appeal to a higher authority to prove a point, when that point cannot be supported by common sense or the normal use of language.
In that last company, a lot of people were as sure as the sky is blue that the process they were trained in was going to work, even when they had never personally done that process successfully. Many of them had never accomplished even one thing independently, since their whole careers were in teams that did not expose whose contributions were relevant. Some of the deadweight people had a very faith-based allegiance to certain aspects of process, and the less experience they had with it working, the more evangelical they seemed to be about it.
Deep in the house of mirrors, there are blatantly countersensical and even reality-defying beliefs. My “favorite” one is the belief that software bugs fall into one of two categories: Either it is a critical bug and the whole team stops all other work and fixes it immediately, or it is non-critical and should never be fixed or even written down anywhere. Anyone who has used software knows there is really a whole range of severity: some bugs make a product unusable, while others only make it annoying, slow, confusing or risky in some way. Anyone who hasn’t drunk the Agile kool-aid can see that obviously some bugs need to be assigned a medium non-emergency priority, but that common sense position had not been canonized and you are not supposed to believe it.
Actual agility
The term “waterfall” refers to software development processes that cannot be reversed or changed mid-course; an extremely waterfall-ish approach would be one that does all planning up front, then does all development in a back room without communication with the users, then the product is considered “done” when the contract terms are met, even if it does not work or does not meet expectations.
An agile approach is one that by contrast continuously checks in with users and is capable of adapting quickly. Ironically though, a waterfall approach can often be quicker and more agile than one labeled “Agile”, and the reason has to do with what I call “chunk size”.
A large chunk size means attempting a plan-build-test cycle that includes the whole project in a single cycle. For a large project, people are not able to plan and communicate everything accurately and it can fail just because the chunk is too large. On the other extreme, if the chunk is too small, then the plan-build-test cycle might only include one micro-feature per cycle and then it ends up taking years to develop a useful product. Very small chunks also result in inconsistency in the product usage, as different people try to push the user-interface paradigm in all different directions at once.
As an aside, the process of building something, no matter what size the chunks are, must have three general stages: One is understanding what we are wanting to accomplish (requirements); two is building it in the back room; and three is showing what was built and then evaluating, testing, fixing, and integrating it. If you have a large chunk, the back-room part might be months long; with a small chunk it might be only hours long. In any case there must be a back-room period when no communication is occurring, because it is a technical creative process that requires focus, and because it is essential to commit to something and complete it instead of being continuously up in the air.
As I saw in my last company, the plan-build-test cycle itself can be disintegrated or defined out of existence. In those conditions, management thinks that the cycle can be so short that there is no back-room period at all, and that all work can be accomplished with continuous communication going on. But that makes the work stop completely.
The ideal chunk size is one that is about the same size as (and not more than 10% larger than) the most recent successful chunk done by the developer. So if that person is comfortable with a 2-week long chunk and has demonstrated that ability, then it will work. If the person is only ready for a 1-day chunk, that is the appropriate size. Anything larger is too waterfall-ish, and anything smaller is too slow. Picking the right size yields the most actual agility.
Can it work?
Part of my reason for writing this is to debrief myself on what happened in that last company (which was my first and only big-company job) and ponder whether large teams can really build software at all. Most of that company’s software had been written by a few people, before they started calling themselves “agile”. After that shift, it looks like they started spending more money and slowing down work. The tech giants all seem to be growing and slowing in the same way. The software I love to hate the most is healthcare.gov, a fiasco of waste (500 M$) and slowness; had it been written before Agile was doctrine, I think the contract would have been terminated without pay, and a different supplier chosen, which would have been faster in the end. All these things make me doubt all the new meta-management thinking.
My prediction is that someone with clout will come along one of these years and declare Agile to be dead, and the industry will shift to something similar with new vocabulary and the same problems. But what would REALLY work?
The things we cannot change are:
- the 60% deadweight problem (the sector overpays because the demand for workers is so high, so people who are not naturals are swept in and cannot contribute much)
- the 20% destructor problem
- the fact that most people (even the contributors) value their careers over the product and will lie about their abilities and backstab anyone without allies
- the newness problem (most people have never completed a large project cycle)
The newness and non-naturals problem means most people are in over their head; that anxiety combined with self-protective human nature is not a great combination for making good decisions. I don’t know how anyone could change this starting point, so we need meta-management that assumes these problems will always exist.
In my last company, as productivity and quality was perceived to be declining, decision making tended to become more centralized as a stress-induced reaction, and that further slowed productivity in a viscous cycle. As someone who has actually gone around more than one 10-year cycle of large application building, I would now consider myself qualified for technical leadership, but I had to complete many small projects independently and some large ones before I felt I was ready for that. Those who did get decision making power at that company appeared to feel they were qualified without having had that length of experience. There was a general sense of operating on theory alone – things were declared to be the right way because Agile (or because of some other appeal to outside authority), not because the decision-maker had any practical experience with it.
The universal rule is that if sabotage is possible, someone will do it, generally by preventing those who are the most qualified to make decisions from making them. Thus it is essential that any process is built around having many channels of communication and no single person who can restrict the communication or decisions. Statements like the following must be impossible: “Everything has to go through me”. “So-and-so needs to be at that meeting”. “We need buy in/approval from so-and-so for this request.”
So, to conclude, here are three ideas for structures that might work, at least better than the “agile” ways I’ve seen:
- The pyramid of tiny teams: This is essentially an old-fashioned pyramid of delegation, but each node in the structure is a tiny team of 3-5 people that are collectively accountable for delivering their part. The most successful tiny-teams are given larger chunks and get newly formed tiny-teams under them, which they can delegate to. It might prevent a node from sabotaging the whole because there is likely to be one producer on each team.
- Decisions by duplication: Instead of holding up engineering work to resolve conflicting points of view, the company proceeds to develop the product in multiple independent teams and then uses only the most successful result.
- Chaos plus portfolio: This is a system where no one has fixed roles and the only management is in periodic evaluations of what a worked accomplished. It relies on the notions that people like to do engineering, and they will naturally learn to become more effective given a lot of freedom.