The Collaboration Leak | Article 1: The Tribal Knowledge Silo

I used to think having a “Go-To Expert” was a sign of a strong team. I felt secure knowing that if the deployment pipeline broke, I could just “Ask Dave.” If the legacy database acted up, I could just “Ask Sarah.”

I was wrong. What I actually had was a Cache Miss at an organizational scale.

The Local Cache Problem
In a computer, the CPU has a “Local Cache” (L1/L2) that is incredibly fast but tiny. It’s meant for temporary data. In my team, “Local Cache” is an engineer’s brain. It’s where they store the 3D map of the code they are currently writing.

The Tribal Knowledge Silo happens when I allow permanent system logic—the “Why” and the “How”—to stay stored in that Local Cache instead of being written to Shared Memory (the Documentation).

  • The Single Point of Failure: Dave isn’t just an expert; he is a Serial Bottleneck. When he’s in a meeting, sick, or just focused on his own work, the rest of the team hits a “Block” the moment they touch his code.
  • The Synchronous Request: Because the info isn’t in the docs, an engineer has to send a “Ping” to Dave. This is a Synchronous Blocking Call. Dave has to stop his CPU (Focus Leak) to answer, and the requester has to sit idle until he does.
  • The Data Corruption: Every time Dave explains the logic verbally, a little bit of the “truth” is lost in translation. Over time, the team’s understanding of the system starts to drift from the actual code.

Why the Throughput is Dropping
I see teams where 30% of the “work” is just people asking each other where things are. This is High-Latency Information Retrieval. We hire Senior Engineers to solve new problems, but we force them to spend half their day acting as a “Human Help Desk” for things they built two years ago.

The team feels slow not because they can’t code, but because they are constantly waiting for a “Memory Read” from a human who is busy doing something else.

The Patch: The Shared Memory Protocol
To fix this, I have to change the team’s “Write Policy.” We have to move from “Write-Back” (keep it in the brain until asked) to “Write-Through” (if it isn’t documented, it isn’t finished).

  1. The “Bus Factor” Audit: I look at my systems and ask: “If Dave was hit by a bus tomorrow, would this project survive?” If the answer is “No,” I have a critical silo.
  2. The README First Culture: I’ve started asking for the “Interface Documentation” before the code is written. If you can’t explain the logic in a doc, you shouldn’t be writing the logic in the IDE.
  3. Killing the “Quick Questions”: When someone asks a “How-To” question in Slack, the rule is: the answer must be a link to a doc. If the doc doesn’t exist, create it, then send the link. This ensures the “CPU Cycle” spent answering the question is only spent once.

Submit a Bug Report
How do you know if you have a Tribal Knowledge Silo? Watch your “Pings.”

If your most senior engineers are being tagged in Slack more than 5 times a day for “Where is…” or “How does…” questions, your system has a Cache Invalidation problem. Your knowledge is stuck in “Local Cache,” and it’s starving the rest of the network.

Stop hiring experts to be libraries. Start building a library so your experts can be engineers.

2

Leave a Reply

Your email address will not be published. Required fields are marked *