{.featured-image}
What if the hardest root causes to analyse are those buried in risk decisions we never understood we were making?
The symptom problem
Most cybersecurity ‘root cause’ analyses stop at symptoms. They miss the implicit risk acceptances hidden in everyday business decisions. Every trade-off in budget discussions, every ‘we’ll fix it later’ in sprint planning, every ‘do more with less’ in resource allocation: these aren’t just business decisions. They’re unacknowledged risk acceptances that become tomorrow’s incidents.
The language that obscures risk
Consider how we frame these choices. Choose the cheaper vendor? We call it cost optimisation, but we’re implicitly accepting security risks. Defer redundancy funding? We celebrate efficiency while accepting single points of failure. Rush deployments? We praise agility, but we’re trading safety for speed.
The language we use obscures the risk dimension entirely.
The competence gap
The problem runs deeper than awareness. Strategic root cause analysis demands systems thinking, organisational psychology, an understanding of power dynamics, and fluency in budget politics. Most security teams haven’t developed these competencies.
Even when we have the skills, tracing an incident like ‘admin misconfigured cloud storage’ back through understaffing, inadequate training budgets, and choosing speed over verification requires both organisational courage and political capital that few possess.
The three levels of root cause
Operational: technical symptoms
At the surface, we find ‘human error’, ‘misconfiguration’, ‘inadequate access controls’. But human error is just an outcome of system design. Predictable failures in systems that enable mistakes aren’t random events. They’re inevitable results of how we structured work.
Tactical: process failures
One level down: inadequate training, poor processes, insufficient tools, lack of automation. These are closer to root causes, but still symptoms. Why was training inadequate? Why were processes poor? The answers sit at the strategic level.
Strategic: implicit risk decisions
At the root, most critical risk decisions were never explicitly made. Vendor selections that prioritised cost over security capability. Organisational structures that separated security from operations. Project priorities that always delayed security work. Budget allocations that chronically understaffed security teams. Timeline pressures that normalised cutting corners.
No one said ‘we accept the risk of inadequate security.’ But that’s exactly what happened.
A real example: the configuration error
Operational root cause: ‘Administrator misconfigured cloud storage bucket permissions.’
Tactical root cause: ‘Inadequate training on cloud security and insufficient peer review process.’
Strategic root cause:
- Budget allocated three cloud administrators for 50+ applications
- Training budget cut to meet quarterly targets
- Peer review process eliminated to ‘move faster’
- Security team excluded from cloud architecture decisions
- Vendor selected based on cost, not security capabilities
- Deployment timeline compressed to meet executive promises
Each of these was a risk decision made in the language of business optimisation. No one wrote ‘we accept the risk of data exposure.’ But every one of these choices implicitly made that acceptance. The incident report said ‘misconfiguration.’ The real story was six decisions deep.
The context problem
Root causes are so contextual that our industry struggles with the very concept. What’s a ‘root cause’ in one organisation is a downstream effect in another. The misconfiguration that caused one incident was enabled by understaffing, which was driven by budget constraints, which reflected leadership priorities, which stemmed from market pressures.
Where do you stop? There’s no universal answer. Some causes sit outside your organisation’s control entirely. The useful question isn’t ‘what is the root cause?’ but ‘what is the deepest cause we can actually act on?‘
The courage problem
Making implicit decisions explicit is terrifying. It reveals budget trade-offs that created vulnerabilities, leadership priorities that systematically undervalued security, cultural norms that rewarded speed over safety, and organisational structures that guaranteed coordination failures.
Most organisations aren’t ready to hear that the root cause of Tuesday’s incident was a decision the board made three years ago. So the report says ‘human error’ and everyone moves on. The same conditions persist. The next incident is already taking shape.
Stop accepting ‘human error’ or ‘process failure’ as root causes. Push deeper. What business decisions enabled this failure? What trade-offs were made that nobody recognised as risk decisions? What implicit risk acceptances need to be made explicit?
The organisations that do this work are the ones that actually learn from incidents. Everyone else keeps finding ‘human error.’
Inspired by: Research on root cause analysis in cybersecurity by Sarah Fluchs and the broader industry discussion on the scarcity of actionable root cause reports.
Originally published: LinkedIn
Connect: Follow for more on risk management and incident response on LinkedIn | Mastodon | Bluesky