On Conway's Law and Tech Debt
Often, I see product teams build up architectural tech debt without a clear understanding of why the original architecture choices were made in the first place or why the new choices will be truly sustainable. This is because effective software and system architecture relies much more on the culture and communication within the organization, and much less to do with the usage patterns of the product.
Melvin E. Conway once asserted that:
“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.”
In my experience, this also occurs in the other direction. For teams that are new enough or small enough, they may structure their own communication around their desired technical architecture.
Regardless of how it plays out, this correlation between team structure and technical architecture usually has the intent of easing and speeding up development. This implies that any organization experiencing a change in team structure or dynamic is likely to find a heap of tech debt on the other side of those changes. After all, an architecture designed to accommodate rapid development on one team can often have the opposite effect on a different team.
Enough theory. Let’s dive into real-world scenarios I have seen:
A mid-to-large size organization is forced to divide engineering teams up into many small units. The org hears that by making each unit responsible for its own technical service, faster iteration can be achieved because less communication and approval is required between teams. So the org adopts a microservices architecture. While this initially has the effect of making the iteration on each service go very rapidly, the org eventually finds that large initiatives spanning multiple services are slowing down. Eventually, the org tires of this and decides to make teams more cross-functional so that larger initiatives are easier to achieve. Before the org knows it, a technical architecture that once had strict boundaries between microservices begins to favor a compromise between its past and present - uniformity via monorepos, shared databases and schemas, etc.
A small startup is given a lot of free AWS credits and, primarily for that reason, decides to build and run everything as cloud native. This forces the startup to hire folks who are deeply experienced in cloud technologies - specifically AWS. This decision then shapes how the team architects its product and communicates. Everybody gets used to fast iteration in the cloud. Unfortunately, the startup later realizes that regulations for new verticals it wants to enter into demand that servers be run in specific regions - regions in which AWS does not yet operate. The startup and its engineers now need to choose between: hiring folks with specialty in different cloud or even on-premise environments, re-training existing folks, or sticking with its old architecture and foregoing the new business. In the first two cases, the way the team communicates will inevitably change as the way it operates is forced to change.
A growth-stage organization has frontend and backend teams. These teams collaborate, but the divide between them leads to slowdowns. They adopt a GraphQL API as part of their architecture because it allows them to empower the frontend team far more, leading to quicker iteration on the customer-facing user interfaces. Eventually, both teams realize that by empowering the frontend team more, the frontend folks need a bit more backend experience, and the backend folks need to be more responsive to the questions and needs of the frontend folks. The teams begin to collaborate far more than before and come to question why they decided they needed GraphQL there in the first place. Maybe they could have just communicated in a more cross-functional manner from the start?
A small organization has a team that works closely with each other and there is very little bureaucracy when it comes to selecting and using vendors within its products. The team takes advantage of this, iterating quickly by relying on a myriad of vendors rather than coding everything from scratch. As the org grows, it realizes there needs to be a bit more process and control - something about security or compliance, the executives decide. The engineers begin to prefer custom-coding their own solutions rather than adopting new vendors. It’s not that custom-coding is the fastest path; It’s just that it’s faster than getting a new vendor vetted and approved. The architecture starts to look a lot more proprietary than before.
These are just a handful of the many scenarios I’ve seen play out. In each case, what you’ll find is that the culture and communication of one or more teams within an organization are inextricably linked to the technical architecture decisions those teams make. As the organization those teams work for change or the team itself changes, the architecture must be reevaluated. This delta is felt as a form of tech debt, but is often expressed as the technology being too old, too difficult to change, or too fragile.