Let's imagine that the job of a harvester is to use an axe to harvest trees,
and the axe will deteriorate over time. Assuming that the following's
the expected performance of the axe:
Somehow sharp axe(reasonably high effectiveness and efficiency; acceptable defect rates) -
Somehow dull axe(barely tolerable effectiveness and efficiency; alarming defect rates) -
Fully dull axe(ridiculously poor effectiveness and efficiency; obscene defect rates) -
Now, let's try to come up with some possible work schedules:
Sharpens the axe to be fully sharp as soon as it becomes somehow sharp -
Sharpens the axe to be somehow sharp as soon as it becomes somehow dull -
Sharpens the axe to be somehow dull as soon as it becomes fully dull -
Sharpens the axe to be somehow dull right before the harvester will resign -
Sharpens the axe to be fully sharp as soon as it becomes somehow dull -
Sharpens the axe to be fully sharp as soon as it becomes fully dull -
Sharpens the axe to be fully sharp right before the harvester will resign -
Sharpens the axe to be somehow sharp as soon as it becomes fully dull -
Sharpens the axe to be somehow sharp right before the harvester will resign -
So, while these work schedules clearly show that sharpening the axe's important to maintain effective and efficient long term throughput, trying to keep it to be always fully sharp is certainly going overboard(33.9% throughput), when being somehow sharp is already enough(39.5% throughput).
Then why some bosses don't let the harvester sharpen the axe even when it's somehow or even fully dull? Because sometimes, a certain amount of normal trees have to be acquired within a set amount of time.
Let's say that the axe has become from fully sharp to just somehow dull, so there should be 87 normal trees cut after 180 hours, netting the short term throughput of 48.3%.
But then some emergencies just come, and 3 extra normal trees need to be delivered within 16 hours no matter what, whereas compensating trees can be delivered later in the case of having defective trees.
In this case, there won't be enough time to sharpen the axe to be even just somehow sharp, because even in the best case, it'd cost 12 + 2 * 3 = 18 hours.
On the other hand, even if there's 1 defective tree from using the somehow dull axe within that 16 hours, the harvester will still barely make it on time, because the chance of having 2 defective trees is (1 / 10) ^ 2 = 1 / 100, which is low enough to be neglected for now, and as compensatory trees can be delivered later even if there's 1 defective tree, the harvester will be able to deliver 3 normal trees.
In reality, crunch modes like this will happen occasionally, and most harvesters will likely understand that it's probably inevitable eventually, so as long as these crunch modes won't last for too long, it's still practical to work under such circumstances once in a while, because it's just being reasonably pragmatic.
However, in supposedly exceptional cases, the situation's so
extreme that, when the harvester's about to sharpen the axe, the boss constantly requests that another tree must be acquired as soon as possible, causing the harvester to never have time to sharpen the axe for a long time, thus having to work more and more ineffectively and inefficiently in the long term.
In the case of a somehow dull axe, 12 hours are needed to sharpen it to be somehow sharp, whereas another tree's expected to be acquired within 4 hours, because the chance of having a defective tree cut is 1 / 10, which can be considered small enough to take the risk, and the expected number of normal trees over all trees being cut is 7 of out 10 for a somehow dull axe, whereas 12 hours is enough to cut 3 trees by using such an axe, so at least 2 normal trees can be expected within this period.
If this continues, eventually the axe will become fully dull, and 24 hours will be needed to sharpen it to be somehow dull, whereas another tree's expected to be acquired within 8 hours, because the chance of having a defective tree is 1 / 5, which can still be considered controllable to take the risk, especially with an optimistic estimation.
While the expected number of normal trees over all trees being cut is 1 of out 5 for a fully dull axe, whereas 24 hours is just enough to cut 3 trees by using such an axe, meaning that the harvester's not expected to make it normally, in practice, the boss will usually unknowingly apply optimism bias(at least until it no longer works) by thinking that there will be no defective trees when just another tree's to be cut, so the harvester will still be forced to continue cutting trees, despite the fact that the axe should be sharpened as soon as possible even when just considering the short term.
Also, if the boss can readily replace the current harvester with a new one immediately, the boss will rather let the current harvester resign than letting that harvester sharpening the axe to be at least somehow dull, because to the boss, it's always emergencies after emergencies, meaning that the short term's constantly so dire that there's just no room to even consider the long term at all.
But why such an undesirable situation will be reached? Other than extreme and rare misfortunes, it's usually due to overly optimistic work schedules not seriously taking the existence of defective and compensatory trees, and the importance of the sharpness of the axe and the need of sharpening the axe into the account, meaning that such unrealistic work schedules are essentially linear(e.g.: if one can cut 10 trees on day one, then he/she can cut 1000 trees on day 100), which is obviously simplistic to the extreme.
Occasionally, it can also be because of the inherent risks of sharpening the axe - Sometimes the axe won't be actually sharpened after spending 12, 24 or 36 hours, and while it's extraordinary, the axe might be actually even more dull than before, and most importantly, usually the boss can't directly judge the sharpness of the axe, meaning that it's generally hard for that boss to judge the ROI of sharpening the axe with various sharpness before sharpening, and it's only normal for the boss to distrust what can't be measured objectively by him/herself(on the other hand, normal, defective and compensatory trees are objectively measurable, so the boss will of course emphasize on these KPIs), especially for those having been opting for linear thinking.
Of course, the whole axe cutting tree model is highly simplified, at least because:
Nevertheless, this model should still serve its purpose of making this point across - There's isn't always an universal answer to when to sharpen the axe to reach which level of sharpness, because these questions involve calculations of concrete details(including those critical parts that can't be quantified) on a case-by-case basis, but the point remains that the importance of sharpening the axe should never be underestimated.
When it comes to professional software engineering:
1. The normal trees are like needed features that work well enough
2. The defective trees are like nontrivial bugs that must be fixed as soon as possible(In general, the worse the code quality of the codebase is, the higher the chance to produce more bugs, produce bugs being more severe, and the more the time's needed to fix each bug with the same severity - More severe bugs generally cost more efforts to fix in the same codebase)
3. The compensatory trees are like extra outputs for fixing those bugs and repairing the damages caused by them
4. The axe is like the codebase that's supposed to deliver the needed features(actually, the axe can also be like those software engineers themselves, when the topic involved is software engineering team management rather than just refactoring)
5. Sharpening the axe is like refactoring(or in the case of the axe referring to software engineers, sharpening the axe can be like letting them to have some vacations to recover from burnouts)
6. A fully sharp axe is like a codebase suffering from the gold plating anti pattern on the code quality aspect(diminishing returns applies to code qualities as well), as if those professional software engineers can't even withstand a tiny amount of technical debt. On the good side, such an ideal codebase is the most unlikely to produce nontrivial bugs, and even when it does, they're most likely fixed with almost no extra efforts needed, because they're usually found way before going into production, and the test suite will point straight to their root causes.
7. A somehow sharp axe is like a codebase with more than satisfactory
code qualities, but not to the point of investing too much on this
regard(and the technical debt is still doing more good than harm due to
its amount under moderation). Such a practically good codebase is still a bit unlikely to produce nontrivial bugs regularly, but it does have a small chance to let some of them leak into production, causing a mild amount of extra efforts to be needed to fix the bugs and repair the damages caused by them.
8. A somehow dull axe is like a codebase with undesirable code qualities, but it's still an indeed workable codebase(although it's still quite painful to work with) with a worrying yet payable amount of technical debt. Undesirable yet working codebases like this probably has a significant chance to produce nontrivial bugs frequently, and a significant chance for quite some of them to leak into production, causing a rather significant amount of extra efforts to be needed to fix the bugs and repair the damages caused by them.
9. A fully dull axe is like a unworkable codebase where it must be refactored as soon as possible, because even senior professional software engineers can easily create more severe bugs than needed features with such a codebase(actually they'll be more and more inclined to rewrite the codebase the longer it's not refactored), causing their productivity to be even negative in the worst cases. An effectively broken codebase like this is guaranteed to has a huge chance to produce nontrivial bugs all the time, and nearly all of them will leak into production, causing an insane amount of extra efforts to be needed to fix the bugs and repair the damages caused by them(so the professionals will be always fixing bugs instead of delivering features), provided that these recovery moves can be successful at all.
10. A broken axe is like a codebase being totally technical bankrupt, where the only way out is to completely rewrite the whole thing from scratch, because no one can fathom a thing in that codebase at that point, and sticking to such a codebase is undoubtedly a sunk cost fallacy.
11. While a codebase with overly ideal code qualities can deliver the needed features in the most effective and efficient ways possible as long as the codebase remains in this state, in practice the codebase will quickly degrade from such an ideal state to a more practical state where the code qualities are still high(on the other hand, going back to this state is very costly in general no matter how effective and efficient the refactoring is), because this state is essentially mysophobia in terms of code qualities.
12. On the other hand, a codebase with reasonably high code qualities can be rather resistant from code quality deterioration(but far from 100% resistant of course), especially when the professional software engineers are disciplined, experienced and qualified, because degrading code qualities for such codebases are normally due to quick but dirty hacks, which shouldn't be frequently needed for senior professional software engineers.
To summarize, a senior professional software engineer should strive to keep the codebase to have a reasonably high code quality, but not to the point of not even having good technical debts, and when the codebase has eventually degraded to have just barely tolerable code quality, it's time to refactor it to become having very satisfactory, but not overly ideal, code quality again, except in the case of occasional crunch modes, where even a disciplined, experienced and qualified expert will have to get the hands dirty once in a while on the still workable codebase but with temporarily unacceptable code quality, just that such crunch modes should be ended as soon as possible, which should be feasible with a well-established work schedule.