From Panic to Process: A 3-Year Vulnerability Management Transformation

Martin Bally
Jan 15
5 min read

How we moved the Board from asking "Are we safe?" to understanding "How we are managed."

At a previous organization, I walked into a boardroom that was on edge.

We were just emerging from the pandemic, which meant our reliance on VPNs and remote infrastructure was at an all-time high. Simultaneously, the headlines were dominated by the "boogeymen" of the industry: the Equifax breach was still fresh, the chaos of Log4j was unfolding, and a constant stream of VPN zero-days was leading to ransomware attacks globally.

The Board connected these dots and came to a terrifying conclusion: Every vulnerability is a potential ransomware event.

To assuage their fears, our program was reporting data in the worst possible way: "The Single Bucket." We presented a raw count of 100,000 unaddressed vulnerabilities. We would patch 15,000 in a month, but 20,000 new ones would flow in. The Board saw a mountain that never stopped growing.

They kept asking the most dangerous question in cybersecurity: "When are we going to get to zero?"

They viewed every vulnerability as equal. To them, a low-severity bug on a cafeteria menu screen was statistically the same as a critical RCE on a production server. Because we hadn't taught them otherwise, they couldn't see the risk; they could only see the failure to reach an impossible perfection.

We knew we had to change the conversation. It took us three years, three phases of evolution, and a lot of repetition, but we eventually turned a panic-induced interrogation into the biggest win a CISO can ask for: a boring meeting.

Here is how we did it.

Phase 1: Context is King (The "Where")

The first step was smashing the "One Bucket" model. We had to teach the Board that location matters.

We introduced a risk-based tiering system that forced leadership to look at where the asset lived, not just the bug itself. We broke our environment down into three clear contexts:

The Perimeter (The Front Line): We showed them that systems accessible from anywhere in the world were our highest risk. These are the open doors. For these, we set aggressive targets: vulnerabilities here had to be remediated within 15 days.
Workstations (The Roamers): These assets move in and out of the "wild." They have some protection, but high exposure. We set a tolerance of 30 days.
Servers (The Core): These systems sit behind the perimeter. They are critical, but they benefit from layers of protection that internet-facing systems don't have. Because they were shielded from direct external attack, we set a target of 45 days, giving the infrastructure teams realistic windows to test and deploy patches.
Operational Technology - OT (Ground Zero): This was the hardest conversation. We labeled our manufacturing and OT environments as "Ground Zero", the heart of the business where production actually happens. We explained that these systems were wrapped in specific compensating controls (firewalls, air gaps), but were incredibly sensitive to disruption. Because manufacturing lines often only had maintenance windows once every six months, we couldn't patch them on standard IT cycles. For Ground Zero, the target wasn't days; it was Maintenance Windows.

By layering operational constraints (downtime) against security controls (defense-in-depth), we finally gave them targets that were actually achievable.

Phase 2: The Evolution of "Risk" (The "What")

Context was a great start, but it wasn't enough. We quickly realized that even with better timelines, the volume of "High" and "Critical" vulnerabilities was unsustainable. We were still drowning.

We had to evolve our definition of what needed to be fixed into stages. We called this our RBVM (Risk-Based Vulnerability Management).

RBVM 1.0 (Exploitable): In phase one, we stopped chasing every "High" and "Critical." Instead, we filtered for Exploitability. If a vulnerability was Critical and had a known exploit.
RBVM 2.0 (Weaponized): Even "Exploitable" created too much noise. A new backlog started forming because we couldn't keep up. So, we refined the filter: Is the exploit weaponized? Is there code out there right now that a script kiddie could use?
RBVM 3.0 (Actively Attacked): In the second year, we shifted to the holy grail. We integrated threat intelligence (using CrowdStrike and other sources) to ask: "Is an adversary actually using this right now?"

The logic we pitched to the Board was simple: Why spend resources patching a theoretical flaw in Snowflake if adversaries are actively hitting a flaw in ServiceNow? This allowed us to focus the team’s limited energy solely on the risks that could hurt us today.

The "Out-of-Band" Process (The Emergency Lane)

However, we knew that even the best RBVM model has a speed limit. When a massive zero-day like Log4j hit, we couldn't wait for our standard 15 or 45-day cycles.

To address this, we established a specific Out-of-Band (OOB) Process. This was our "emergency lane." If a vulnerability was being actively campaigned against by adversaries (like Log4j or a critical VPN exploit), we suspended the standard rules. We spun up immediate war rooms, bypassed standard change windows, and remediated these specific threats in near real-time.

This distinction was crucial for the Board. It showed them that we could be methodical with the "noise" (standard patching) but incredibly agile with the "crises" (OOB events).

Phase 3: Moving from "Patching" to "Managing" (The Risk Process)

While RBVM handled the new threats, we still had the ghost of the past haunting us: that original 100,000 vulnerability backlog.

This is where the program truly matured. We stopped trying to be "patch managers" and started being "risk managers."

We drew a line in the sand, let’s say February 1st, and froze the old backlog as "Legacy Debt." We knew we couldn't patch it all, so we instituted a Formal Risk Acceptance Process. This was the key to clearing the deck.

For every stubborn legacy vulnerability, we asked a specific set of questions:

Is it patchable? (If yes, patch it).
If not, are there compensating controls? (Is it air-gapped? Is it behind a strict firewall? Is IPS monitoring active?)
Is the business willing to own the risk?

If we could prove that compensating controls neutralized the threat, we didn't force a patch that might break a manufacturing line. Instead, we documented the controls, formally accepted the risk for a set period (e.g., 12 months), and removed it from the "Vulnerable" bucket.

This process took 12 to 16 months of grinding, but it changed the Board’s perspective entirely. They realized that "Zero Vulnerabilities" isn't the goal. "Zero Unmanaged Risks" is the goal.

The Gigantic Win

After three years of grinding, refining, and educating, the "Win" finally happened.

It wasn't a ticker-tape parade. It wasn't a massive budget injection. It was something much better.

I walked into the Board meeting, put up the Vulnerability Management slide, and… nothing happened.

There was no panic. There were no questions asking, "Are we safe?" or "Why isn't this zero?"

They looked at the slide. They saw that our "Actively Attacked" targets were green. They understood that the red on the "OT Backlog" was risk-accepted, controlled, and signed off. They nodded, satisfied that the risk was being managed, and we moved on to the next topic.

The slide had become a non-event.

That was the victory. We had taken the most volatile, anxiety-inducing topic in cybersecurity and turned it into just another operational metric. We didn't just patch the servers; we patched the Board's understanding of risk.