There are many articles explaining how to make a good design and what rules to follow. In this note we will see a concrete example on how to convert a legacy design into a better one.
Many existing systems have coupling problems. Therefore, their maintainability is reduced. Making a change in this type of system brings a large ripple effect.
Let's assume we have an existing process.
The system applies various algorithms to deduce the hyper-parameters of a supervised learning model.
A new requirement is requested:
To be able to see, in production, data on the performance of each strategy in real time.
Let’s see the process entry point:
<?
StrategySupervisedHelper::getInstance()->optimize($processId);
… the supervised learning class:
class StrategySupervisedHelper extends Singleton {
//..
}
and the method invoked:
<? private function optimize($processId);
In the case of a productive system, the first thing we must do is identify its current coverage. The system has a series of automated unit and functional tests.
To measure coverage we will use the Mutation testing technique.
<?
private function optimize($processId){
throw new Exception('Testing coverage');
}
Unfortunately a single test fails, so we discovered that the process is not covered and we see that the Michael Feathers maxim is sadly applied:
“An inherited system is one that has no tests”
The strategy to refactor an inherited system is to cover the existing functionality before making any changes.
Writing tests reveals good design interfaces among objects. Due to the current solution and the coupling it has incorporated, it is very difficult to write tests.
However, we cannot refactor to write the tests without writing the tests previously. It seems that we are facing a vicious circle.
Photo by
Justin Chen
on
Unsplash
The possible solution to this deadlock is to write the tests declaratively, thus generating better interfaces.
We will run them manually until the coupling is resolved.
Tests can be written with a tool from the xUnit family with a false assertion (they always fail).
function testOptimizationIsGoodEnough(){
$this->assertTrue(false);
}
function testOptimizationBelowThreshold(){
$this->assertTrue(false);
}
...
After having covered (for now manually) the necessary cases we can start with the refactor.
Helpers do not exist in the real world, nor should they exist in any computable model.
Let’s think about the responsibilities to choose the name in MAPPER.
<?
class SupervisedLearningAlgorithm extends Singleton {
}
For now the name is good enough, and it gives us an idea of the responsibilities of your instances in the real world.
There are no valid reasons to use singletons. This fact, in addition to generating all the problems described here:
yields a very implemental invocation (coupled to getInstance()) and not very declarative...
<?
SupervisedLearningAlgorithm::getInstance()->optimize($processId);
which we will change to:
?>
(new SupervisedLearningAlgorithm())->optimize($processId);
leaving the class definition as follows:
<?
class SupervisedLearningAlgorithm{
}
An important design rule is:
Do not subclass concrete classes.
If the language allows this, we explicitly declare it:
<?
final class SupervisedLearningAlgorithm{
}
The object is created and then it gets a magic parameter setting the identifier of the process to be optimized. This argument travels by all methods.
This is a code smell suggesting us to check the cohesion between this parameter and the process.
<?
final class SupervisedLearningAlgorithm{
public function calculate($processId){}
private function analize($processId){}
private function executeAndGetData($processId, bool $isUsingFastMethod = null){}
//... etc etc etc
}
Looking at bijection we conclude there can be no algorithm without a process. We don’t want to have a class with setters to mutate it:
Therefore we will pass all the essential attributes during construction.
The way to know if an attribute is essential is to take away all the responsibilities associated with that object. If it can no longer carry out its responsibilities, it is because the attribute belongs to the minimal attribute set.
?>
final class SupervisedLearningAlgorithm{
public function __construct($processId){}
}
In this way, the strategy is immutable in its essence, with all the benefits it brings us.
The process, according to bijection, models a real world process. This seems to fit the Command pattern.
However, we believe that it is closer to a method object where there is an ordered sequence of executions, modeling the different steps of an algorithm.
As the name we assigned to the object according to its responsibilities suggests, this process models an execution strategy that will compete with other polymorphic strategies. This is the intention of the Strategy pattern.
Photo by
Nicolas Hoizey
on
Unsplash
?>
final class SupervisedLearningStrategy{
}
There is never a valid reason to use null. Null does not exist in real life.
It violates the principle of bijection and generates coupling between the function caller and the argument. Also, it generates unnecessary ifs as null is not polymorphic with any other object.
<?
private function executeAndGetData($processId, $isUsingFastMethod = null){
}
private function executeAndGetData($processId, bool $isUsingFastMethod = false){
}
We change the absence of the argument to a boolean truth value.
The private function in the previous example has a default parameter.
The default parameters produce coupling and ripple effect. They are available for the programmer laziness. Since it is a private function the replacement scope is the same class. We make it explicit, replacing all invocations:
<?
private function executeAndGetData($processId, bool $isUsingFastMethod = false){
}
private function executeAndGetData($processId, bool $isUsingFastMethod){
}
<?
final class SupervisedLearningStrategy {
const CONFIDENDE_INTERVAL_THRESHOLD=0.9;
private function executeAndGetData($processId, bool $isUsingFastMethod){
//...
if ($estimatedError <= self::CONFIDENDE_INTERVAL_THRESHOLD) {}
//..
}
}
These constants coupled within the code will not allow us to make good tests “manipulating time”.
Remember that the tests have to be in control of the entire environment and the time is global and fragile to match the tests.
From now on, it will be an essential parameter of object creation (Refactoring by adding parameters is a safe task, which can be done by any modern IDE.
The log stores relevant information in production about the executions of the strategy. As usual, using a Singleton as a global reference.
This new bond prevents us from being able to test it. This Singleton is in another module over which we have no control, so we are going to use a wrapping technique.
function logInfo(array $infoToLog) {
SingletonLogger::info($infoToLog);
}
Besides from being a Singleton, the log uses static class messages.
<?
SingletonLogger::info($infoToLog);
Let’s remember that:
The only protocol that a class should contain is the one related to its single responsibility (the S for Solid): creating instances.
Since the reference is to a static method, we cannot replace the class call with a polymorphic method. Instead, we will use an anonymous function.
<?
function logInfo(array $infoToLog) {
$loggingFunction = function() use ($infoToLog) {
SingletonLogger::info($infoToLog);
};
$loggingFunction($infoToLog);
}
Then, we can decouple the reference to the log and extract it from the class by reducing the coupling, generating better cohesion from the strategy and favoring its testability.
<?
final class SupervisedLearningAlgorithm{
public function __construct($processId, closure $logging function){
}
}
With the call from the productive code:
<?
$loggingFunction = function() use ($infoToLog) {
SingletonLogger::info($infoToLog);
};
new SupervisedLearningAlgorithm{$processId, $loggingFunction);
And the call from the tests:
<?
$loggingFunction = function() use ($infoToLog) {
$this->loggedData[] = $infoToLog;
};
new SupervisedLearningAlgorithm{$processId, $loggingFunction);
$this->assertEquals([...],$this->loggedData);
On the way of our refactoring we find some fixes with persistent data. Such data travel cohesively, so it makes sense to think of it as an object with real-world responsibilities:
<?
private function getDataToPersist($runTime, $isClustering) {
return [
'processId' => $this->processId,
'date' => new Timestamp(),
'runTime' => $runTime,
'isClustering' => $isClustering
];
}
By creating the new concept, we are in danger of building an anemic model. Let’s see what hidden responsibilities you have:
?>
final class LearningAlgorithmRunData{
////look for responsabilities related to the cohesive data
}
We did not forget to program the tests that we could not write at the beginning. As we have a much less coupled design it is now very easy to do.
<?
function testOptimizationIsGoodEnough(){
///
$this->assertEquals($expected, $real);
}
function testOptimizationBelowThreshold(){
///
$this->assertEquals($expected, $real);
}
And our system is much less “legacy” compared to when we found it.
Photo by
Kelly Sikkema
on
Unsplash
After hard iterative and incremental work, through short steps, we have achieved a better solution in the following aspects:
Photo by
Zac Farmer
on
Unsplash
Modifying an existing system by improving its design is possible, taking into account clear design rules and taking small steps. We must have professional responsibility and courage to make the relevant changes, leaving a much better solution than when we found it.
Part of the objective of this series of articles is to generate spaces for debate and discussion on software design.
We look forward to comments and suggestions on this article.