Designing for Supportability: What Happens at 2 AM?
Every architecture looks clean at 2 PM.
It’s only at 2 AM, when something breaks in Production, that the truth appears.
-
An integration silently stops
-
A flow fails halfway
-
An order never reaches ERP
-
A customer is stuck in limbo
And someone asks the most important question in enterprise IT:
“How do we know what happened?”
If your honest answer is:
-
“Check the flow run history.”
-
“Enable tracing and retry.”
-
“We’ll reproduce it in UAT.”
-
“Let’s ask the vendor.”
…then your system is functional—but not supportable.
Supportability is not an operations problem.
It is an Architectural Outcome.
The Invisible System Problem
Many Power Platform solutions fail in production not because logic is wrong—but because failures are:
-
Silent
-
Distributed
-
Non-deterministic
-
Hard to correlate
-
Impossible to replay
You may have:
-
Plugins in Dataverse
-
Flows in Power Automate
-
Azure Functions
-
Service Bus queues
-
External APIs
Each has its own logs.
None tell the story.
At 3 AM, support doesn’t need logs.
They need narrative:
“This Opportunity was approved at 01:12.
The event was published.
ERP processing failed due to credit check.
It retried twice.
It is now in dead-letter awaiting review.”
If your Architecture cannot tell that story, it is incomplete.
Architect for Observability
Supportable systems are designed to speak.
That means:
-
Every business event has an ID
-
Every integration carries correlation
-
Every failure is captured centrally
-
Every async process reports state
-
Every message has a lifecycle
Pattern:
Now support can answer:
-
What happened?
-
Where did it fail?
-
Has it retried?
-
Can we replay?
-
Who owns it?
Without guessing.
Without redeploying.
Without waking developers.
Functional Impact
For the business:
-
Fewer “black holes”
-
Faster resolution
-
Clear ownership
-
Trust in automation
-
No mystery failures
For IT:
-
Deterministic recovery
-
Auditability
-
Root cause analysis
-
No hero debugging
-
Predictable operations
Your system becomes operable, not just correct.
The Takeaway
If your architecture cannot answer:
“What happened to this business transaction?”
…in under 60 seconds, then it is not enterprise-grade.
Power Platform makes it easy to build solutions.
Enterprise architecture makes them survivable.
Because in the real world, systems are not judged by how they work at 2 PM. They are judged by how they fail at 3 AM.
Comments
Post a Comment