avatarMark Looi

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

20287

Abstract

solution to an expanding forest of conditional logic is to <b>replace conditional with Polymorphism</b>. (pg. 34) This is done by creating an inheritance hierarchy that semantically captures the meanings implied by the conditionals. So, instead of logical tests and corresponding assignments of values, the values are determined by the semantic meaning associated with the hierarchy.</p><p id="ec63"><b>Changing function declaration</b> is often an intermediate step to a further refactoring; in the example, it sets up moving a function into another one, a specialized place where it is isolated from other processing. The crux of the conditional processing logic is thus moved to a new class. This new class is then refactored further using polymorphism so that the different types are subclasses (i.e., <b>replace type code with Subclasses</b>). Intermediate steps require <b>replacing a Constructor with a Factory Function</b>, then again to <b>replace conditional with Polymorphism</b>. (pg. 39)</p><h1 id="e5a5">Principles in Refactoring</h1><p id="6256">Fowler explains that the practice of <i>refactoring</i> is to <b>restructure software code so that it is easier to understand and modify without changing its observable behavior. </b>Confusingly, he uses the same word, “refactoring”, to be both verb and noun, though it of course looks like a gerund. He further stresses that refactoring is a process by which one makes small incremental changes that cumulatively effect a big change. He emphasizes that keeping the code in working order is a key part of the discipline.</p><p id="5415">Refactoring improves the architecture of the software, for example, by reducing duplicate code. One beneficiary of refactoring is other developers, who sometimes even a few weeks later are confronted with making changes; the discipline of refactoring makes it easier to understand and increases the likelihood that a future change will be completed faster and error-free. Refactoring can also reveal bugs, since the process requires critically analyzing and, basically, re-implementing the code. Finally, refactoring increasing on-going efficiency because adding new features is faster.</p><p id="007e">Fowler recommends refactoring in the following situations (pp. 50–54):</p><ul><li>Preparatory, to lay the groundwork for adding new features</li><li>Comprehensibility, to help understand and learn the purpose of the code</li><li>Incremental, to make small changes as you pass through the code that improve the state while still leaving substantial work for the future</li><li>Opportunistic, to always be on the lookout for clean up and other improvements</li><li>Planned, to set aside a period of concerted effort to improve existing code</li><li>Long-term, to systematically apply some of the above tactics to make improvements</li><li>Code review driven, to make concrete improvements to the code while broadening a team’s understanding of it</li></ul><p id="6c4e">On the other hand, sometimes if code can be hidden behind an API or thrown away and rewritten, either of those might be more practical approaches. The main determinant to refactor or not should be its potential economic benefits. (pg. 57) Other considerations against refactoring include a distributed ownership of the code and the branches that it might coexist in. Sometimes branches are so long lived that merging their changes into the mainline becomes a huge effort; if either branch is refactored (especially in different ways), the merge complexity may be unnecessarily increased. However, Fowler advocates a minimal branching duration, ideally using a Continuous Integration methodology that forces integrations frequently, sometimes daily. In this case, the constraints on refactoring are minimized.</p><p id="b41e">Self-testing code can assist refactoring, since the cost of verifying correctness at each incremental step is low, rendering it practical to make small improvements with confidence. Another problematic case is legacy code, particularly code that is not already self-testing. Refactoring databases might seem another steep challenge, but again an incremental approach with intermediate steps where the legacy fields (i.e., columns) co-exist with the new ones for a time, until no adverse side effects are observed, after which the legacy fields can be removed.</p><p id="b201">Traditional software architecture assumed that software would have to be designed with “flexibility” in mind. In practice, this meant more general purpose algorithms or speculatively adding parameters to functions in an attempt to “future-proof” the implementation. Usually, this is just a waste of time and adds unneeded complexity. A commitment to refactoring allows a developer to think about a simple design, then incrementally add extensions as needed; for example, if more parameters are needed, then use <i>Parameterize Function</i>.</p><p id="2372">Refactoring can also help improve software performance. There are three strategies to write software that runs fast:</p><ul><li>Time budgeting for each critical section of the code</li><li>Across the board improvement of code</li><li>Focus on the areas where most of the processing is occurring.</li></ul><p id="81de">For most software, the last approach is the best, since typically a Pareto-like Power Law applies where an application spends the majority of its execution time in a small fraction of the code (as measured by lines of code). But, there are special situations where the first approach (setting budgets for each part of the code) might be essential, such as for real time systems, where it might be unacceptable for even a rarely used part of the code to perform slowly.</p><h1 id="88ba">“Bad Smells” or Hints to Refactor</h1><p id="89d8">While he doesn’t prescribe strict rules for when to refactor, Fowler identifies a long list of heuristics that experienced developers can use to help pinpoint problem areas that might benefit from refactoring — what he terms “bad smells”. They are (and the specific refactorings that may apply):</p><ul><li><b>Mysterious names:</b> these sometimes suggest a lack of clarity of purpose or other ambiguity. <i>Change Function Declaration, Rename Variable, Rename Field </i>are ways to clean this up.<i> </i>(pg. 72)</li><li><b>Duplicated code:</b> duplicate or similar code is often an opportunity to consolidate or generalize. <i>Extract Function </i>can be used to isolate and reuse the code;<i> Pull Up Method </i>can be used in a similar way. (pg. 72)</li><li><b>Long functions:</b> lengthy blocks of code are often more difficult to understand. “A heuristic we follow is that whenever we feel the need to comment something, we write a function instead.”<i> Extract Function </i>is often all that’s required to simplify the code. However, sometimes the resulting function needs many parameters; if so, <i>Replace Temp with Query, Introduce Parameter Object, Preserve Whole Object, Replace Function with Command, </i>can be used to declutter.<i> Decompose Conditional, Replace Conditional with Polymorphism </i>can be used to reduce the complexity of conditional logic.</li><li><b>Long parameter lists:</b> these sometimes arise when extracting functions from a series of processing steps. Use queries, <i>Replace Parameter with Query,</i> to ask for another parameter given an original one<i>;</i> often it’s clearer to pass the original data structure rather than parts of it, using <i>Preserve Whole Object, </i>or creating a new one with<i> Introduce Parameter Object; </i>flags are sometimes confusing and may arise from an incomplete generalization of the function, so <i>Remove Flag Argument; </i>introducing a new class can consolidate parameters that are used with more than one function,<i> </i>so<i> Combine Functions into Class.</i></li><li><b>Global data:</b> mutable data is dangerous. Immutable global data is probably safe, but still best avoided; for either, use <i>Encapsulate Variable.</i></li><li><b>Mutable data:</b> almost as bad a mutable globals, any variable that could change may cause failures in another part of the code that does not expect the change, particularly as the scope of the variable grows. These bugs can be subtle and difficult to detect. <i>Encapsulate Variable, Split Variable </i>can be used to isolate any updates to a single point of (potential) failure.<i> Slide Statements, Extract Function, Separate Query from Modifier, Remove Setting Method, </i>can be combined as needed to reduce the amount of processing in and around changing the data.<i> Replace Derived Variable with Query </i>can be used to avoid changing data structures unnecessarily.<i></i>Use <i>Combine Functions into Class </i>or<i> Combine Functions into Transform</i> to limit how much code needs to update a variable. If a variable contains some data with internal structure, it’s usually better to replace the entire structure rather than modify it in place, using <i>Change Reference to Value.”</i> (pg. 76)</li><li><b>Divergent change:</b> this reveals itself “when one module is often changed in different ways for different reasons”. (pg. 76) So the software module has different roles. <i>Split Phase, Move Function</i> can be used to separate the processing. <i>Extract Function</i> or <i>Extract Class</i> could be helpful.</li><li><b>Shotgun surgery:</b> is the counterpoint to divergent change; that is, lots of scattered changes are required to add functionality. This is sometimes seen in code where different people work on it, making small, incremental changes, each of which cannot justify a refactoring. <i>Move Function, Move Field, Combine Functions into Class, Combine Functions into Transform </i>can be used to aggregate the dispersed code, as appropriate. <i>Split Phase</i> can organize the processing logic as well. An intermediate step to inline code (<i>Inline Function</i> or <i>Inline Class</i>) resulting in easy-to-observe repetition can then reveal a better refactoring.</li><li><b>Feature envy:</b> this results when the code is not optimally factorized so that it minimizes interaction or communication with other parts of the program. <i>Move Function</i> or <i>Extract Function</i> can get the code and its cohorts together. “The fundamental rule of thumb is to put things together that change together. Data and the behavior that references that data usually change together.” (pg. 77)</li><li><b>Data clumps:</b> these arise when data seem to appear together. <i>Extract Class, Introduce Parameter Object, Preserve Whole Object, </i>can summarize them.</li><li><b>Primitive obsession:</b> introducing structure and semantics to primitive data structures can improve readability and reduce bugs. Typical approaches are <i>Replace Primitive with Object</i>, <i>Replace Type Code with Subclasses. </i>“Primitives that commonly appear together are data clumps and should be civilized with <i>Extract Class</i> and <i>Introduce Parameter Object</i>”. (pg. 79)</li><li><b>Repeated switches:</b> when this type of conditional processing occurs in more than one location, it is ripe for <i>Replace Conditional with Polymorphism.</i></li><li><b>Loops:</b> <i>Replace Loop with Pipeline </i>can improve readability.</li><li><b>Lazy element:</b> these arise from unneeded structure, perhaps the result of a successful refactoring. Eliminate them with <i>Inline Function, Inline Class, Collapse Hierarchy.</i></li><li><b>Speculative generality:</b> this arises from the desire to plan for the future — one that never arrives. Thin it out with <i>Collapse Hierarchy, Inline Function, Inline Class, Change Function Declaration.</i></li><li><b>Temporary field:</b> when a field is used occasionally, it can be more logically organized with <i>Extract Class, Move Function, Introduce Special Case. </i>This might be a bit of a misnomer; perhaps, in the spirit of <i>Rename Variable, </i>it should be called <b>Optional Field </b>and the refactoring is to eliminate an optional field.</li><li><b>Message chains:</b> these cascades of delegation can be addressed with <i>Hide Delegate, Extract Function </i>and <i>Move Function</i>.</li><li><b>Middle man:</b> sometimes encapsulation gets out of hand, creating code that just redirects; fix this with <i>Remove Middle Man</i>.</li><li><b>Insider trading:</b> this arises when modules are more coupled than ideal through the sharing of data. If that data is mutable, even more difficulty could arise. Address this with creating a middle man or <i>Hide Delegate</i>. Sometimes, with classes, it’s necessary to <i>Replace Subclass with Delegate </i>or <i>Replace Superclass with Delegate</i>.</li><li><b>Large class:</b> these can grow over time and reorganizing them can be useful, particularly if clients only use part of the class at any time. <i>Use Extract Class, Extract Superclass</i> or <i>Replace Type Code with Subclasses</i>.</li><li><b>Alternative classes with different interfaces:</b> to allow for class substitution, it’s necessary to align their interfaces; use <i>Change Function Declaration.</i></li><li><b>Data class:</b> these arise as glorified data structures masquerading as classes. Use <i>Encapsulate Record </i>and eliminate the ability for others to set them with <i>Remove Setting Method</i>.</li><li><b>Refused bequest:</b> results from an improperly designed class hierarchy, where too much uncommon code is held in the parent. Clean up the hierarchy with <i>Push Down Method</i> and <i>Push Down Field</i>. In some cases, where the subclass isn’t actually needed, <i>Replace Subclass with Delegate </i>or <i>Replace Superclass with Delegate.</i></li><li><b>Comments:</b> sometimes comments are a crutch to compensate for bad code. Rewrite it with clearer function declarations, <i>Change Function Declaration.</i></li></ul><h1 id="f496">Testing</h1><p id="e294">Fowler makes a strong case for automated, self-testing code that runs frequently as development progresses. A typical test defines a fixture (data and objects usually) and then determines if the output corresponds to that expected from the fixture. Tests are aggregated into suites and are typically organized to verify a certain scope of functionality. There is sometimes the temptation to improve efficiency by sharing some of the test fixture (e.g., a value) between tests. But this should <i>never</i> be done since it’s possible that a future iteration of the test can change this shared value during the test execution. (This is an example of mutable data.)</p><p id="58d8">Tests generally follow a flow of setup, exercise, verify, and teardown. He emphasizes that boundaries are the most important to probe, for example when values can be zero or negative or out of a presumed range.</p><p id="786f">Unfortunately, tests and their fixtures are still code, with all the pitfalls that bedevil software: they can atrophy, they can be poorly designed, they can have bugs. In this way, the actual volume of software increases as at least a multiple of the features added. In addition, as code evolves, the tests themselves must be maintained; worse still, refactoring can render some test redundant, particularly if they are verifying intermediate calculations that are passed on from one module to another. Developers and managers must be realistic and anticipate this growing effort over the lifecycle of a software project.</p><h1 id="7f2f">The Refactorings</h1><p id="5e23">The refactorings are briefly summarized here. Interested readers should read the book for more details; in fact, Fowler helpfully breaks down each refactoring into the following sections: motivation, mechanics, and examples. In addition, he is adding more refactoring, post publication, to the Web version.</p><h2 id="e0c0">Extract or Inline Function</h2><p id="c52d">These refactorings are inverses of each other. Extraction can be used to simplify code (and to generalize the function). Inlining is good when functions are used as intermediaries and clarity is gained by placing the code where it is used (and not likely to be duplicated).</p><h2 id="cfd5">Extract or Inline Variable</h2><p id="9a63">These refactorings are inverses of each other. Extraction is used to simplify code. Inlining is an option when variables are used as intermediaries, such as calculated values.</p><h2 id="1cd6">Change Function Declaration; Renaming Variables</h2><p id="4bfd">To improve clarity and to simplify, renaming functions or variables to more meaningful appellations, adding or removing parameters, or passing a property of a parameter (rather than the whole thing), can be useful.</p><h2 id="5aa9">Encapsulate Variables</h2><p id="4bfa">Encapsulating data is one way to control how it can be modified. By making it pass through a choke point function, changes to the data can be managed centrally.</p><h2 id="782a">Parameter Object</h2><p id="af82">Particularly if the same data are fellow travelers and used in more than one function, it is convenient and beneficial to create an encompassing data structure.</p><h2 id="e4de">Combine Functions into a Class or Transform</h2><p id="b3e2">Whenever there is “a group of functions that operate closely together on a common body of data (usually passed as arguments to the function call), [there is] an opportunity to form a class. Using a class makes the common environment that these functions share more explicit, allows [simplified] function calls inside the object by removing many of the arguments, and provides a reference to pass such an object to other parts of the system.” (pg. 144) Similarly, a transform can do the same thing without creating a new class; instead it generates the calculated outputs of the functions, returning new records.</p><h2 id="1ebc">Split Phase</h2><p id="7bc8">In the introductory example, splitting the phases was a key refactoring to separate business processing logic from the presentation of results. Any processing of two (or more) different things is a good candidate for splitting so that future changes are easier to understand, isolate and test.</p><h2 id="1216">Encapsulate Record</h2><p id="6d0b">By converting a typical record data structure into a class, we can control the reading and writing of the data through <i>get</i> and <i>set</i> methods. This approach can be extended for nested structures such as JSON or XML.</p><h2 id="ddc2">Encapsulate Collection</h2><p id="2413">Collection variables may be encapsulated, requiring the use of <i>get</i> and <i>set </i>to access them. However, “if the getter returns the collection itself, then that collection’s membership can be altered without the enclosing class being able to intervene.” (pg. 170) Instead, he recommends providing “a getting method for the collection, but make it return a copy of the underlying collection”, just as in functional programming. (pg. 171) Access to the data in the collections will require using purpose-built methods that can be controlled centrally.</p><h2 id="f897">Replace Primitive with Object</h2><p id="0665">Often programs are built up using simple data types, such as strings. But, these data types may have a higher level semantic meaning, with a prescribed way of interacting with them, such as a phone number or zip code. It makes no sense to add or subtract phone numbers, but decomposing them might be useful. Creating an object (with approved methods) that mimics their semantic meanings can improve the model and prevent inappropriate transformations of the data.</p><h2 id="26c8">Replace Temp with Query</h2><p id="95f5">Temporary variables that store a calculated value are sometimes worth replacing with a function that implements the calculation, particularly if the calculation is done in other places. It can also help simplify interfaces if temporaries are used to pass parameters.</p><h2 id="c2d0">Extract Class</h2><p id="7ae8">Sometimes as development progresses, what was once a simple class with a few operations grows to become an unwieldy mess. Extracting classes that are logically or semantically separate may make sense, particularly if the operations or subtyping on the class start to affect only parts of the class.</p><h2 id="23d4">Inline Class</h2><p id="e29d">The counterpoint to Extract Class is to inline one into another, pe

Options

rhaps if it is just not worth carrying around or is tied almost one-to-one with the other. Still another reason to inline would be if there is an even more useful class extraction that could be found by seeing the related elements (i.e., objects and methods) together.</p><h2 id="0e20">Hide Delegate</h2><p id="808a">“One of the keys — if not <i>the</i> key — to good modular design is encapsulation. Encapsulation means that modules need to know less about other parts of the system. Then, when things change, fewer modules need to be told about the change — which makes the change easier to make.” (pg. 189) Delegates arise when the structure of a server object needs to be exposed to the client for it to access data in that structure. Hiding the delegate eliminates the need (when changes are made) for a client to know about the server object and its delegate, which can cause a ripple of changes to clients that access the information provided by the delegate. That is, create a delegate method to replace <code>y= x.delegate.y</code> with <code>y= x.y</code></p><h2 id="5019">Remove Middle Man</h2><p id="e93e">The inverse of delegate hiding is to remove the delegate and expose the structure of the true object. The rationale for reducing encapsulation could be that too much of a good thing can be cumbersome.</p><h2 id="88e6">Substitute Algorithm</h2><p id="2f71">While not always possible, it is often desirable to find a better way of doing things, especially if there are gains in clarity, simplicity, efficiency, or other benefits. Simple examples are replacing multi-pass algorithms with single pass or recursion with loops (and vice versa).</p><h2 id="8dc1">Move Function</h2><p id="c05b">Though simple to grasp, when to move a function from one location to another is not necessarily straightforward. A main motivation would be to improve modularity by co-locating it with related code or the context (such as data or other functions) or where it is or might be used. Examples of moves include migrating a nested function higher in its hierarchy, or moving a function between classes.</p><h2 id="9760">Move Field</h2><p id="1d9f">Again, a simple concept, motivated by a better way to organize data structures. Moving fields can apply to standard data structures as well as classes (which are data structures with dedicated functions).</p><h2 id="a99c">Move Statements into Function</h2><p id="12b8">Sometimes code more naturally resides in the function (rather than the caller of the function), especially if the same code must be repeated by the caller whenever the function is invoked. In this case, move those statements into the function.</p><h2 id="3a14">Move Statements to Callers</h2><p id="931c">The inverse of moving code to functions: sometimes it’s more natural to have statements outside the function, perhaps if there are variations in the statements that would encumber the function itself to deal with the special cases.</p><h2 id="1be9">Replace Inline Code with Function Call</h2><p id="72e5">One strategy to eliminate duplicate code or to improve clarity is to collect a block of code and turn it into a function.</p><h2 id="fb21">Slide Statements</h2><p id="c51d">To promote readability, move statements so that they’re in the proximity of related code. An example of this is to co-locate variable declarations with their first use, rather than to have a general declaration section.</p><h2 id="c24f">Split Loop</h2><p id="7338">Sometimes loops perform more than one set of iterative actions. In the introductory example, the code both calculated values and printed them. Splitting the loops can improve clarity, add modularity and reduce the potential interaction between two independent iterations. This refactoring is a special case of <b>Split Phase.</b></p><h2 id="f6ef">Replace Loop with Pipeline</h2><p id="3307">Modern languages permit developers to express iteration as a semantic intent rather than a procedural process. Filters and Maps can be useful replacements for raw iteration, reducing the amount of code and potential for errors.</p><h2 id="a16e">Remove Dead Code</h2><p id="f660">Dead code increases clutter and should be removed. It should not be just commented out. With version control systems, any old code will still be easily found.</p><h2 id="0d07">Split Variable</h2><p id="8144">Sometimes it’s convenient to “reuse” variables, such as one might of a temporary, <i>temp</i>. Don’t do this. Set a variable once and if another is needed, define a different variable for this purpose.</p><h2 id="cc6b">Rename Field</h2><p id="af8a">Names, at least to humans, have great value. Maintain them so that they reflect a semantic intent in data structures, classes, and methods.</p><h2 id="04df">Replace Derived Variable with Query</h2><p id="248b">Variables that are used to hold mutable data can be replaced with the actual calculations that update those variables. In this way, potential unexpected changes to the variable are eliminated and the value is obtained from a function (i.e., the “query”).</p><h2 id="8ee8">Change Reference to Value</h2><p id="9577">To add immutability, use value objects, which tend to be easier to work with. On the other hand, if the object is meant to be shared, then it must be kept as a reference.</p><h2 id="21ba">Change Value to Reference</h2><p id="b275">As noted earlier, references may be more natural when sharing data objects.</p><h2 id="adc1">Decompose Conditional</h2><p id="791e">Conditional logic, especially involving cascades of compound booleans, can be confusing. One way to make the logic easier to digest is to extract the logical calculations into a function.</p><h2 id="c701">Consolidate Conditional</h2><p id="b46c">Sometimes, conditional actions are not really conditional; the outcome is the same. In this case, combine them. On the other hand, complex boolean expressions where some of the operands could change in the course of evaluating the boolean, could make this refactoring more difficult (though this seems dangerous!).</p><h2 id="dfec">Replace Nested Conditional with Guard Clauses</h2><p id="59a5">Guard clauses ensure that something is returned after a cascade of conditionals.</p><h2 id="b477">Replace Conditional with Polymorphism</h2><p id="2f25">Polymorphism can be a powerful way to eliminate forests of case statements by turning procedural processing into a semantic exercise. Classes are defined to express a semantically meaningful hierarchy and the conditional processing logic is then embedded into the class, so that it doesn’t have to be parsed by evaluating conditionals.</p><p id="1c01">“A common case [is] a set of types, each handling the conditional logic differently. … This is made most obvious when there are several functions that have a switch statement on a type code. … Remove the duplication of the common switch logic by creating classes for each case and using polymorphism to bring out the type-specific behavior.</p><p id="0f0e">“Another situation is where … the logic [is] a base case with variants. The base case may be the most common or most straightforward. … Put this logic into a superclass which allows me to reason about it without having to worry about the variants. I then put each variant case into a subclass, which I express with code that emphasizes its difference from the base case.” (pg. 272)</p><h2 id="e355">Introduce Special Case (Checks)</h2><p id="b1ba">Sometimes “users of a data structure check a specific value, and then … do the same thing”, resulting in duplicate code. (pg. 289) Since this special case and the handling of it (e.g., checking for a <i>null</i> value) are repeated, it makes sense to treat it as something clients can query for.</p><h2 id="68aa">Introduce Assertion</h2><p id="d736">This practice can help identify bugs and explain the logic of the code to others.</p><h2 id="ba30">Separate Query from Modifier</h2><p id="0967">It goes without saying that a function that purports to do something, does that plus something additional, is inherently more complex and likely to generate unintended side effects. So, split these up.</p><h2 id="d7e2">Parameterize Function</h2><p id="9c43">In the spirit of reuse, if adding a parameter can increase its use and eliminate duplicate code, then do it!</p><h2 id="9ec8">Remove Flag Argument</h2><p id="f687">A flag is a special case of a parameter which dictates the action a function should take. Instead, create specialized functions for each instance.</p><h2 id="fe35">Preserve Whole Object</h2><p id="bc9a">Rather than cleave off a few values from a record to pass to a function, pass along the whole object and let the function derive the values itself. This reduces the parameter list and doesn’t require modification (of the function or where it’s called) should more values need to be used from that same object.</p><h2 id="e96c">Replace Parameter with Query</h2><p id="9cdd">In a similar vein to preservation of whole objects, let the function determine parameter values by interrogating the object. This is especially practical if one value can be looked up from the other.</p><h2 id="d648">Replace Query with Parameter</h2><p id="4ca8">On the other hand, if a query inside a function must do a lot of work to calculate the desired value or use global data, then determining the value for the function to use before the call may simplify matters. This refactoring enhances <i>referential transparency</i>. Referential transparency is a property of a function that always gives the same result when called with same parameter values. “If a function accesses some element in its scope that isn’t referentially transparent, then the containing function also lacks referential transparency. I can fix that by moving that element to a parameter. Although such a move will shift responsibility to the caller, there is often a lot to be gained by creating clear modules with referential transparency.” (pg. 328)</p><h2 id="21e9">Remove Setting Method</h2><p id="1a45">If a field will not be changed after it is created, then don’t suggest it can be by allowing setting methods to exist.</p><h2 id="4ad7">Replace Constructor with Factory Function</h2><p id="c34a">Instead of using the standard constructors, create a <i>Factory Function</i> (that will use the constructor within it). “A factory function is any function which is not a class or constructor that returns a (presumably new) object. In JavaScript, any function can return an object. When it does so without the <code>new</code> keyword, it’s a factory function.” (Elliott)</p><p id="5844">“Factory functions have always been attractive in JavaScript because they offer the ability to easily produce object instances without diving into the complexities of classes and the <code>new</code> keyword.” (Elliott)</p><h2 id="67ce">Replace Function with Command</h2><p id="031c">In Fowler’s parlance, a <i>Command</i> is a class with a single method that acts in place of a simpler function. There’s more work and overhead in doing this, so most of the time regular functions will suffice. The case for commands is when better control over the behavior of the function and its parameters is desired, for example to create an “undo” method for the command. “They can easily be broken down into separate methods sharing common state through the fields; they can be invoked via different methods for different effects; they can have their data built up in stages.” (pg. 338)</p><h2 id="d33b">Replace Command with Function</h2><p id="a22d">Because of the added overhead and complexity of commands, sometimes, it’s desirable to reverse the process.</p><h2 id="e5a7">Return Modified Value</h2><p id="01e7">Often functions operate and change values without making that change explicit; <i>return</i> the changed values!</p><h2 id="78f2">Replace Error Code with Exception</h2><p id="d276">Rather than manage error codes, handle the exception directly. “Exceptions provide a separate language mechanism for error handling. When I detect an error, I throw an exception, which travels up the call stack until it finds a handler …. Exceptions mean that I don’t have to remember to check error codes anymore, nor worry about detecting and passing errors up the call stack.” (not in printed book; see supplemental materials)</p><h2 id="4172">Replace Exception with Precheck</h2><p id="f3f2">Save exceptions for unexpected behavior; instead, check for a possible error condition before it could be triggered.</p><h2 id="b5b4">Pull Up Method or Field</h2><p id="55d6">When subclasses end up with methods that do the same or similar things, it’s desirable to pull the method upwards in the hierarchy so that identical code is not duplicated. Similar to pulling up methods, fields within subclasses may be duplicated, in which case pull them up.</p><h2 id="3f94">Pull Up Constructor Body</h2><p id="85f6">While similar to pulling up fields or methods, constructors might be treated the same way. But because constructors may depend on a sequence of operations, for example because the constructor uses values set in the subclass, it may be necessary to do additional refactoring to also pull up the dependency.</p><h2 id="5376">Push Down Method or Field</h2><p id="1760">The inverse of pulling up a method or field is employed when only one subclass requires the method or field.</p><h2 id="0efa">Replace Type Code with Subclass</h2><p id="b717">Type codes arise when an object has different behavior or properties based on its type. These type codes are then used to determine when to apply different programming logic to the object. For example, an employee could be an engineer, manager, or salesperson, each with different attributes. Subclasses can model the semantics of the relationship better. The resulting subclasses can then be used by replacing the conditional with Polymorphism, instead of long case statements.</p><h2 id="ee5e">Remove Subclass</h2><p id="9803">Sometimes, subclasses are no longer justified. Remove them and replace them with appropriate fields in the containing class.</p><h2 id="79fb">Extract Superclass</h2><p id="b166">When two classes do similar things that differ in a limited way, then creating a superclass from which the original classes inherit might be warranted.</p><h2 id="69ae">Collapse Hierarchy</h2><p id="f82b">Sometimes a class and its children lose much of their distinction. Pull up the subclass.</p><h2 id="238d">Replace Subclass with Delegate</h2><p id="edc2">Delegates promote code reuse among classes without using inheritance. Inheritance is a powerful, but in some ways, limited tool. It requires the class and subclass to possess a meaningful and extensible hierarchy; changing the superclass may require rework of the subclasses. When this relationship is unclear or not straightforward, a delegate might work better. These delegate classes are <i>helper</i> classes to the main one. “Delegation is a regular relationship between objects — so I can have a clear interface to work with, which is much less coupling than subclassing.” (pg. 382) See also <b>Hide Delegate</b>.</p><h2 id="488a">Replace Superclass with Delegate</h2><p id="fe06">Similar to replacing a subclass with a delegate, a superclass can be expurgated if the hierarchical relationship is too cumbersome to maintain. Sometimes the initial model suggests a logical hierarchy, but after further evolution, that hierarchy weakens. Delegates can still accomplish the original objective of code reuse and centralization of logic without requiring a strict hierarchical order.</p><h1 id="35a3">Criticism</h1><p id="b000">Fowler has written an important reference that should be on the bookshelf of most software developers. He has distilled less than a hundred of the most important refactoring patterns, some of which are obvious and probably used reflexively by experienced programmers. However, all too many developers are content to leave working code alone, a practice that has corrosive effects even in the short term. That Fowler has identified and highlighted the importance of even trivial or obvious improvements is a call to action for developers in their everyday activity. Code quality erodes slowly but ineluctably; a developer’s second priority (after correct implementation) is to arrest if not reverse this erosion.</p><p id="090a">Fowler states that the purpose of refactoring is to make improvements without changing a program’s “observable behavior”. He advocates aggressive and repeated testing to ensure the original behavior is retained. Still, some changes are subtle and may not be readily tested in a functional way; for example, they could affect performance or introduce unexpected, distant consequences. Avoiding regressions that are hard to find is the fear most developers live with. Perhaps a more nuanced discussion of what are some of the pitfalls might be warranted.</p><p id="9975">Refactoring sometimes requires database schema changes. This is not always possible (or easy) with legacy data that cannot be discarded or lost, e.g., historical records. Tactics to perhaps build interfaces around the legacy code and database so that future refactoring can occur without disrupting valuable data might be worth exploring.</p><p id="3630">Fowler refers to the functional programming approach, but only to note its novelty. Some of the refactorings he advocates are an attempt to impose the discipline of functional programming’s immutability of data onto the object-oriented paradigm (e.g., split variable). He certainly acknowledges that an undisciplined use of mutable data can introduce bugs. A more spirited discussion of the advantages of functional programming in refactoring might be useful. For example, perhaps gains in immutable data might be lost to the ability to model semantic objects using inheritance. Or that the complexities of inheritance are avoided by simply adding functions (i.e., delegates).</p><p id="06fd">An attitude that seems helpful is to program defensively; that is, to imagine at the outset all the bad cases that can generate errors. In some respects this is the ethos behind building unit tests and test-driven development. In a similar way, a pessimistic outlook about all the things that can go wrong with their code may be helpful for programmers so that they stay motivated to constantly refactor.</p><p id="44e8">While we generally have been taught to eschew second guessing compilers and precompilers, in the days of interpreted languages, developers may pay a performance penalty with some refactorings, such as split loop. With compiled languages, it would be safe to assume that split loops would be subsequently recombined by the compiler long before any code was generated. We can’t assume this in languages like JavaScript.</p><p id="7b49">Fowler makes a good case for replacing a function with command; perhaps it is more general to state that the refactoring is to <b>replace a function with class object</b>.</p><p id="5c4d">In conclusion, <i>Refactoring</i> is an important book that is helpful for most software professionals. It can also serve as a guide for software managers so that they understand both the simplicity of some refactorings — and the complexity of others. Through his lucid examples and handy step-by-step advice, Martin Fowler demystifies the practice of refactoring, thus encouraging all developers to do more of it!</p><h1 id="2a28">References</h1><p id="7a37">Fowler, M. (1999). <i>Refactoring: Improving the Design of Existing Code</i>. Boston, MA, USA: Addison-Wesley. ISBN: 0–201–48567–2</p><p id="d368">Fowler, M. (2019). <i>Refactoring: Improving the Design of Existing Code</i>. Boston, MA, USA: Addison-Wesley. ISBN: 978-013-475-759-9</p><p id="6f26">Fowler, M. (2020). <i>Refactoring: Web Edition.</i> (Canonical version) Accessed 28-September 2020. <a href="https://memberservices.informit.com/">https://memberservices.informit.com/</a> (only available to purchasers of the book)</p><p id="e90a">Elliott, Eric (2017). “JavaScript Factory Functions with ES6+.” <a href="https://readmedium.com/javascript-factory-functions-with-es6-4d224591a8b1">https://readmedium.com/javascript-factory-functions-with-es6-4d224591a8b1</a> (accessed October 1, 2020)</p></article></body>

Summary and Criticism of Fowler’s “Refactoring”

Martin Fowler wrote an important book in the canon of software engineering more than 20 years ago. In it, he attempted to distill the ways that existing software can be improved. Since then, he has published a second edition and now, an online one as well. Though he didn’t invent the term, his book helped popularize “refactoring” and it is used even among non-professionals to refer to any kind of maintenance or upkeep activity. In elementary algebra, the term “factoring” refers to finding a multiplier that can be extracted from an algebraic expression without changing its veracity, often as part of a process of solving an equation. Similarly, refactoring in software is the practice of identifying ways that the code can be changed for the better without altering its functionality.

Though it may seem counter-intuitive that a purely intellectual abstraction like software code can deteriorate, as if it were a physical object, in fact, software constantly atrophies. It atrophies in several ways:

  • Context change: even software that is functioning perfectly well must live in an environment that itself is changing. In this way, relative to its environment, the software has changed, often in ways that reduce its functionality or make it completely unable to perform its prescribed task.
  • Functionality change: in the evolution of a software system, often it needs to be extended; the act of doing so can lead to code that is more complex, hard to comprehend and increasingly costly to support.

So, refactoring is a never-ending activity that software organizations must budget into their development effort. That said, the more regularly it is done, particularly as part of the rhythm of feature development, the easier it is. This is because as the code atrophies, it tends to deteriorate faster and into a less maintainable state, as one extemporization is piled on top of another. Constant vigilance is the watchword; a good refactoring practice is like tending a garden: don’t let the weeds take over!

Fowler goes further to distill patterns from software that often indicate a need for refactoring; he calls these “bad smells”. Further, he distills a number of canonical refactorings that he states are widely applicable and provides the conditions under which they might apply and how to carry them out. He never fails to illustrate his concepts with examples, some in considerable detail.

First Example

Martin Fowler provides the first chapter of his revised book as a free download, https://files.thoughtworks.com/pdfs/Refactoring2-free-chapter.pdf. It is a good place to start and he uses a simple example to illustrate a few of his recommended refactoring techniques.

One basic strategy for factoring is to organize the code around semantically meaningful code groupings and to avoid the intermingling of processing steps. It assumes that the logic of business rules should be applied independently of the logic of presenting the results. Often, refactoring results in more code, more structure and seemingly repetitive processing. In his first example, he takes code that starts off looking like this:

function statement (invoice, plays) {
    let totalAmount = 0;
    let volumeCredits = 0;
    let result = `Statement for ${invoice.customer}\n`;
    const format = new Intl.NumberFormat("en-US",
                          { style: "currency", currency: "USD",
                            minimumFractionDigits: 2 }).format;
  
    for (let perf of invoice.performances) {
      const play = plays[perf.playID];
      let thisAmount = 0;
  
      switch (play.type) {
      case "tragedy":
        thisAmount = 40000;
        if (perf.audience > 30) {
          thisAmount += 1000 * (perf.audience - 30);
        }
        break;
      case "comedy":
        thisAmount = 30000;
        if (perf.audience > 20) {
          thisAmount += 10000 + 500 * (perf.audience - 20);
        }
        thisAmount += 300 * perf.audience;
        break;
      default:
          throw new Error(`unknown type: ${play.type}`);
      }
  
      // add volume credits
      volumeCredits += Math.max(perf.audience - 30, 0);
      // add extra credit for every ten comedy attendees
      if ("comedy" === play.type) volumeCredits += Math.floor(perf.audience / 5);
  
      // print line for this order
      result += `  ${play.name}: ${format(thisAmount/100)} (${perf.audience} seats)\n`;
      totalAmount += thisAmount;
    }
    result += `Amount owed is ${format(totalAmount/100)}\n`;
    result += `You earned ${volumeCredits} credits\n`;
    return result;
  }

Since it’s a small amount of code, it’s easy to understand. The program produces a printed report for a hypothetical theater business that summarizes what is accrued in box office receipts and an incentive plan. There are business rules for computing these amounts and two different types of plays with different corresponding rules. The computation of the values and their printing are handled in the same loop. It’s straightforward, but if more business rules are added or case types, the complexity could grow; for example, if tragedies were broken into Shakespearean and modern plays, with corresponding rules, the cases could become nested. The processing and printing would be entwined even more closely.

After considerable refactoring, Fowler turns it into a more abstract, general implementation; it is arguably less easy to see how the pieces are related, since you have to jump around to see what the various functions do:

createStatementData.js
export default function createStatementData(invoice, plays) {
    const result = {};
    result.customer = invoice.customer;
    result.performances = invoice.performances.map(enrichPerformance);
    result.totalAmount = totalAmount(result);
    result.totalVolumeCredits = totalVolumeCredits(result);
    return result;
  
    function enrichPerformance(aPerformance) {
      const calculator = createPerformanceCalculator(aPerformance, playFor(aPerformance));
      const result = Object.assign({}, aPerformance);
      result.play = calculator.play;
      result.amount = calculator.amount;
      result.volumeCredits = calculator.volumeCredits;
      return result;
    }
    function playFor(aPerformance) {
      return plays[aPerformance.playID]
    }
    function totalAmount(data) {
      return data.performances
        .reduce((total, p) => total + p.amount, 0);
    }
    function totalVolumeCredits(data) {
      return data.performances
        .reduce((total, p) => total + p.volumeCredits, 0);
    }
  }
  
  function createPerformanceCalculator(aPerformance, aPlay) {
      switch(aPlay.type) {
      case "tragedy": return new TragedyCalculator(aPerformance, aPlay);
      case "comedy" : return new ComedyCalculator(aPerformance, aPlay);
      default:
          throw new Error(`unknown type: ${aPlay.type}`);
      }
  }
  class PerformanceCalculator {
    constructor(aPerformance, aPlay) {
      this.performance = aPerformance;
      this.play = aPlay;
    }
    get amount() {
      throw new Error('subclass responsibility');
    }
    get volumeCredits() {
      return Math.max(this.performance.audience - 30, 0);
    }
  }
  class TragedyCalculator extends PerformanceCalculator {
    get amount() {
      let result = 40000;
      if (this.performance.audience > 30) {
        result += 1000 * (this.performance.audience - 30);
      }
      return result;
    }
  }
  class ComedyCalculator extends PerformanceCalculator {
    get amount() {
      let result = 30000;
      if (this.performance.audience > 20) {
        result += 10000 + 500 * (this.performance.audience - 20);
      }
      result += 300 * this.performance.audience;
      return result;
    }
    get volumeCredits() {
      return super.volumeCredits + Math.floor(this.performance.audience / 5);
    }
  }

Still, at the top level, it’s clear what the program is supposed to do; it’s expressed in createStatementData(), which takes invoices and plays and outputs an accounting statement. While some function names, like enrichPerformance() seem unclear at the outset, the structure suppresses details to the appropriate level. As a matter of programming style, Fowler likes to prepend the string “enrich” to an entity’s name to express the concept that the function “enriches” or adds details to the entity.

Fowler describes the process for refactoring: “Read the code, gain some insight, and use refactoring to move that insight from your head back into the code. The clearer code then makes it easier to understand it, leading to deeper insights”. (pg. 43) In this case, the refactoring consisted of three steps: breaking up the big function into smaller ones; decoupling the data processing from the printing; and, generalizing the calculation using polymorphism, thus enabling an easier extension of the business logic.

To start, Fowler recommends a solid set of tests for the code that will be refactored, ideally using an automated testing framework. (pg. 5) The tests function as an implicit “checksum” in that for the code and test to both be incorrect, the error in logic has to be duplicated in both places. (pg. 5) He recommends using automation because the practice of refactoring requires making very small incremental steps, of an almost inconsequential nature, and testing after each one to ensure correctness. (pg. 5)

Extracting functions that are otherwise in-lined in the code is a prime refactoring technique. Sometimes this requires writing the extracted functions as iterator or accumulator-type functions. For example, in a loop, runningTot += iterationCalc( input ); where iterationCalc() is a new function that performs the incremental calculation in the loop. (pg. 7)

Rename variables for clarity, e.g., if a value is returned by a function, call it “result”. (pg. 9)

Replace temporary variables with queries within the new functions; temporary variables often create locally scoped names that make further extraction more difficult. Eliminate parameters that can be calculated from the other parameters. (pg. 11)

Rename or name functions so that their purpose and output are easy to surmise. Finding the right names for functions and variables can be difficult; often a second iteration is necessary to find the best name to use that expresses succinctly and accurately its purpose. (pg. 10)

Split loops where multiple items are updated into sequences of identical iterations so that each iteration can be refactored independently. (pg. 18)

Sometimes making an inline variable (the opposite of extract variable) is a better refactoring, particularly when the temporary variable adds no value. (pg. 14)

Split phase refactoring separates intertwined processing, such as calculation of a result and displaying it. (pg. 24) To do this requires factoring out the data that might be embedded in the processing and passing it as a parameter into the function that does the formatting. (pg. 26) He also recommends organizing the code so that the distinct phases are in their own files. (pg. 31)

As applications accrue features, calculations and other processing become more complex, with more processing of conditions to assign different results. “But conditional logic … tends to decay as further modifications are made.” (pg. 34). The object oriented solution to an expanding forest of conditional logic is to replace conditional with Polymorphism. (pg. 34) This is done by creating an inheritance hierarchy that semantically captures the meanings implied by the conditionals. So, instead of logical tests and corresponding assignments of values, the values are determined by the semantic meaning associated with the hierarchy.

Changing function declaration is often an intermediate step to a further refactoring; in the example, it sets up moving a function into another one, a specialized place where it is isolated from other processing. The crux of the conditional processing logic is thus moved to a new class. This new class is then refactored further using polymorphism so that the different types are subclasses (i.e., replace type code with Subclasses). Intermediate steps require replacing a Constructor with a Factory Function, then again to replace conditional with Polymorphism. (pg. 39)

Principles in Refactoring

Fowler explains that the practice of refactoring is to restructure software code so that it is easier to understand and modify without changing its observable behavior. Confusingly, he uses the same word, “refactoring”, to be both verb and noun, though it of course looks like a gerund. He further stresses that refactoring is a process by which one makes small incremental changes that cumulatively effect a big change. He emphasizes that keeping the code in working order is a key part of the discipline.

Refactoring improves the architecture of the software, for example, by reducing duplicate code. One beneficiary of refactoring is other developers, who sometimes even a few weeks later are confronted with making changes; the discipline of refactoring makes it easier to understand and increases the likelihood that a future change will be completed faster and error-free. Refactoring can also reveal bugs, since the process requires critically analyzing and, basically, re-implementing the code. Finally, refactoring increasing on-going efficiency because adding new features is faster.

Fowler recommends refactoring in the following situations (pp. 50–54):

  • Preparatory, to lay the groundwork for adding new features
  • Comprehensibility, to help understand and learn the purpose of the code
  • Incremental, to make small changes as you pass through the code that improve the state while still leaving substantial work for the future
  • Opportunistic, to always be on the lookout for clean up and other improvements
  • Planned, to set aside a period of concerted effort to improve existing code
  • Long-term, to systematically apply some of the above tactics to make improvements
  • Code review driven, to make concrete improvements to the code while broadening a team’s understanding of it

On the other hand, sometimes if code can be hidden behind an API or thrown away and rewritten, either of those might be more practical approaches. The main determinant to refactor or not should be its potential economic benefits. (pg. 57) Other considerations against refactoring include a distributed ownership of the code and the branches that it might coexist in. Sometimes branches are so long lived that merging their changes into the mainline becomes a huge effort; if either branch is refactored (especially in different ways), the merge complexity may be unnecessarily increased. However, Fowler advocates a minimal branching duration, ideally using a Continuous Integration methodology that forces integrations frequently, sometimes daily. In this case, the constraints on refactoring are minimized.

Self-testing code can assist refactoring, since the cost of verifying correctness at each incremental step is low, rendering it practical to make small improvements with confidence. Another problematic case is legacy code, particularly code that is not already self-testing. Refactoring databases might seem another steep challenge, but again an incremental approach with intermediate steps where the legacy fields (i.e., columns) co-exist with the new ones for a time, until no adverse side effects are observed, after which the legacy fields can be removed.

Traditional software architecture assumed that software would have to be designed with “flexibility” in mind. In practice, this meant more general purpose algorithms or speculatively adding parameters to functions in an attempt to “future-proof” the implementation. Usually, this is just a waste of time and adds unneeded complexity. A commitment to refactoring allows a developer to think about a simple design, then incrementally add extensions as needed; for example, if more parameters are needed, then use Parameterize Function.

Refactoring can also help improve software performance. There are three strategies to write software that runs fast:

  • Time budgeting for each critical section of the code
  • Across the board improvement of code
  • Focus on the areas where most of the processing is occurring.

For most software, the last approach is the best, since typically a Pareto-like Power Law applies where an application spends the majority of its execution time in a small fraction of the code (as measured by lines of code). But, there are special situations where the first approach (setting budgets for each part of the code) might be essential, such as for real time systems, where it might be unacceptable for even a rarely used part of the code to perform slowly.

“Bad Smells” or Hints to Refactor

While he doesn’t prescribe strict rules for when to refactor, Fowler identifies a long list of heuristics that experienced developers can use to help pinpoint problem areas that might benefit from refactoring — what he terms “bad smells”. They are (and the specific refactorings that may apply):

  • Mysterious names: these sometimes suggest a lack of clarity of purpose or other ambiguity. Change Function Declaration, Rename Variable, Rename Field are ways to clean this up. (pg. 72)
  • Duplicated code: duplicate or similar code is often an opportunity to consolidate or generalize. Extract Function can be used to isolate and reuse the code; Pull Up Method can be used in a similar way. (pg. 72)
  • Long functions: lengthy blocks of code are often more difficult to understand. “A heuristic we follow is that whenever we feel the need to comment something, we write a function instead.” Extract Function is often all that’s required to simplify the code. However, sometimes the resulting function needs many parameters; if so, Replace Temp with Query, Introduce Parameter Object, Preserve Whole Object, Replace Function with Command, can be used to declutter. Decompose Conditional, Replace Conditional with Polymorphism can be used to reduce the complexity of conditional logic.
  • Long parameter lists: these sometimes arise when extracting functions from a series of processing steps. Use queries, Replace Parameter with Query, to ask for another parameter given an original one; often it’s clearer to pass the original data structure rather than parts of it, using Preserve Whole Object, or creating a new one with Introduce Parameter Object; flags are sometimes confusing and may arise from an incomplete generalization of the function, so Remove Flag Argument; introducing a new class can consolidate parameters that are used with more than one function, so Combine Functions into Class.
  • Global data: mutable data is dangerous. Immutable global data is probably safe, but still best avoided; for either, use Encapsulate Variable.
  • Mutable data: almost as bad a mutable globals, any variable that could change may cause failures in another part of the code that does not expect the change, particularly as the scope of the variable grows. These bugs can be subtle and difficult to detect. Encapsulate Variable, Split Variable can be used to isolate any updates to a single point of (potential) failure. Slide Statements, Extract Function, Separate Query from Modifier, Remove Setting Method, can be combined as needed to reduce the amount of processing in and around changing the data. Replace Derived Variable with Query can be used to avoid changing data structures unnecessarily.Use Combine Functions into Class or Combine Functions into Transform to limit how much code needs to update a variable. If a variable contains some data with internal structure, it’s usually better to replace the entire structure rather than modify it in place, using Change Reference to Value.” (pg. 76)
  • Divergent change: this reveals itself “when one module is often changed in different ways for different reasons”. (pg. 76) So the software module has different roles. Split Phase, Move Function can be used to separate the processing. Extract Function or Extract Class could be helpful.
  • Shotgun surgery: is the counterpoint to divergent change; that is, lots of scattered changes are required to add functionality. This is sometimes seen in code where different people work on it, making small, incremental changes, each of which cannot justify a refactoring. Move Function, Move Field, Combine Functions into Class, Combine Functions into Transform can be used to aggregate the dispersed code, as appropriate. Split Phase can organize the processing logic as well. An intermediate step to inline code (Inline Function or Inline Class) resulting in easy-to-observe repetition can then reveal a better refactoring.
  • Feature envy: this results when the code is not optimally factorized so that it minimizes interaction or communication with other parts of the program. Move Function or Extract Function can get the code and its cohorts together. “The fundamental rule of thumb is to put things together that change together. Data and the behavior that references that data usually change together.” (pg. 77)
  • Data clumps: these arise when data seem to appear together. Extract Class, Introduce Parameter Object, Preserve Whole Object, can summarize them.
  • Primitive obsession: introducing structure and semantics to primitive data structures can improve readability and reduce bugs. Typical approaches are Replace Primitive with Object, Replace Type Code with Subclasses. “Primitives that commonly appear together are data clumps and should be civilized with Extract Class and Introduce Parameter Object”. (pg. 79)
  • Repeated switches: when this type of conditional processing occurs in more than one location, it is ripe for Replace Conditional with Polymorphism.
  • Loops: Replace Loop with Pipeline can improve readability.
  • Lazy element: these arise from unneeded structure, perhaps the result of a successful refactoring. Eliminate them with Inline Function, Inline Class, Collapse Hierarchy.
  • Speculative generality: this arises from the desire to plan for the future — one that never arrives. Thin it out with Collapse Hierarchy, Inline Function, Inline Class, Change Function Declaration.
  • Temporary field: when a field is used occasionally, it can be more logically organized with Extract Class, Move Function, Introduce Special Case. This might be a bit of a misnomer; perhaps, in the spirit of Rename Variable, it should be called Optional Field and the refactoring is to eliminate an optional field.
  • Message chains: these cascades of delegation can be addressed with Hide Delegate, Extract Function and Move Function.
  • Middle man: sometimes encapsulation gets out of hand, creating code that just redirects; fix this with Remove Middle Man.
  • Insider trading: this arises when modules are more coupled than ideal through the sharing of data. If that data is mutable, even more difficulty could arise. Address this with creating a middle man or Hide Delegate. Sometimes, with classes, it’s necessary to Replace Subclass with Delegate or Replace Superclass with Delegate.
  • Large class: these can grow over time and reorganizing them can be useful, particularly if clients only use part of the class at any time. Use Extract Class, Extract Superclass or Replace Type Code with Subclasses.
  • Alternative classes with different interfaces: to allow for class substitution, it’s necessary to align their interfaces; use Change Function Declaration.
  • Data class: these arise as glorified data structures masquerading as classes. Use Encapsulate Record and eliminate the ability for others to set them with Remove Setting Method.
  • Refused bequest: results from an improperly designed class hierarchy, where too much uncommon code is held in the parent. Clean up the hierarchy with Push Down Method and Push Down Field. In some cases, where the subclass isn’t actually needed, Replace Subclass with Delegate or Replace Superclass with Delegate.
  • Comments: sometimes comments are a crutch to compensate for bad code. Rewrite it with clearer function declarations, Change Function Declaration.

Testing

Fowler makes a strong case for automated, self-testing code that runs frequently as development progresses. A typical test defines a fixture (data and objects usually) and then determines if the output corresponds to that expected from the fixture. Tests are aggregated into suites and are typically organized to verify a certain scope of functionality. There is sometimes the temptation to improve efficiency by sharing some of the test fixture (e.g., a value) between tests. But this should never be done since it’s possible that a future iteration of the test can change this shared value during the test execution. (This is an example of mutable data.)

Tests generally follow a flow of setup, exercise, verify, and teardown. He emphasizes that boundaries are the most important to probe, for example when values can be zero or negative or out of a presumed range.

Unfortunately, tests and their fixtures are still code, with all the pitfalls that bedevil software: they can atrophy, they can be poorly designed, they can have bugs. In this way, the actual volume of software increases as at least a multiple of the features added. In addition, as code evolves, the tests themselves must be maintained; worse still, refactoring can render some test redundant, particularly if they are verifying intermediate calculations that are passed on from one module to another. Developers and managers must be realistic and anticipate this growing effort over the lifecycle of a software project.

The Refactorings

The refactorings are briefly summarized here. Interested readers should read the book for more details; in fact, Fowler helpfully breaks down each refactoring into the following sections: motivation, mechanics, and examples. In addition, he is adding more refactoring, post publication, to the Web version.

Extract or Inline Function

These refactorings are inverses of each other. Extraction can be used to simplify code (and to generalize the function). Inlining is good when functions are used as intermediaries and clarity is gained by placing the code where it is used (and not likely to be duplicated).

Extract or Inline Variable

These refactorings are inverses of each other. Extraction is used to simplify code. Inlining is an option when variables are used as intermediaries, such as calculated values.

Change Function Declaration; Renaming Variables

To improve clarity and to simplify, renaming functions or variables to more meaningful appellations, adding or removing parameters, or passing a property of a parameter (rather than the whole thing), can be useful.

Encapsulate Variables

Encapsulating data is one way to control how it can be modified. By making it pass through a choke point function, changes to the data can be managed centrally.

Parameter Object

Particularly if the same data are fellow travelers and used in more than one function, it is convenient and beneficial to create an encompassing data structure.

Combine Functions into a Class or Transform

Whenever there is “a group of functions that operate closely together on a common body of data (usually passed as arguments to the function call), [there is] an opportunity to form a class. Using a class makes the common environment that these functions share more explicit, allows [simplified] function calls inside the object by removing many of the arguments, and provides a reference to pass such an object to other parts of the system.” (pg. 144) Similarly, a transform can do the same thing without creating a new class; instead it generates the calculated outputs of the functions, returning new records.

Split Phase

In the introductory example, splitting the phases was a key refactoring to separate business processing logic from the presentation of results. Any processing of two (or more) different things is a good candidate for splitting so that future changes are easier to understand, isolate and test.

Encapsulate Record

By converting a typical record data structure into a class, we can control the reading and writing of the data through get and set methods. This approach can be extended for nested structures such as JSON or XML.

Encapsulate Collection

Collection variables may be encapsulated, requiring the use of get and set to access them. However, “if the getter returns the collection itself, then that collection’s membership can be altered without the enclosing class being able to intervene.” (pg. 170) Instead, he recommends providing “a getting method for the collection, but make it return a copy of the underlying collection”, just as in functional programming. (pg. 171) Access to the data in the collections will require using purpose-built methods that can be controlled centrally.

Replace Primitive with Object

Often programs are built up using simple data types, such as strings. But, these data types may have a higher level semantic meaning, with a prescribed way of interacting with them, such as a phone number or zip code. It makes no sense to add or subtract phone numbers, but decomposing them might be useful. Creating an object (with approved methods) that mimics their semantic meanings can improve the model and prevent inappropriate transformations of the data.

Replace Temp with Query

Temporary variables that store a calculated value are sometimes worth replacing with a function that implements the calculation, particularly if the calculation is done in other places. It can also help simplify interfaces if temporaries are used to pass parameters.

Extract Class

Sometimes as development progresses, what was once a simple class with a few operations grows to become an unwieldy mess. Extracting classes that are logically or semantically separate may make sense, particularly if the operations or subtyping on the class start to affect only parts of the class.

Inline Class

The counterpoint to Extract Class is to inline one into another, perhaps if it is just not worth carrying around or is tied almost one-to-one with the other. Still another reason to inline would be if there is an even more useful class extraction that could be found by seeing the related elements (i.e., objects and methods) together.

Hide Delegate

“One of the keys — if not the key — to good modular design is encapsulation. Encapsulation means that modules need to know less about other parts of the system. Then, when things change, fewer modules need to be told about the change — which makes the change easier to make.” (pg. 189) Delegates arise when the structure of a server object needs to be exposed to the client for it to access data in that structure. Hiding the delegate eliminates the need (when changes are made) for a client to know about the server object and its delegate, which can cause a ripple of changes to clients that access the information provided by the delegate. That is, create a delegate method to replace y= x.delegate.y with y= x.y

Remove Middle Man

The inverse of delegate hiding is to remove the delegate and expose the structure of the true object. The rationale for reducing encapsulation could be that too much of a good thing can be cumbersome.

Substitute Algorithm

While not always possible, it is often desirable to find a better way of doing things, especially if there are gains in clarity, simplicity, efficiency, or other benefits. Simple examples are replacing multi-pass algorithms with single pass or recursion with loops (and vice versa).

Move Function

Though simple to grasp, when to move a function from one location to another is not necessarily straightforward. A main motivation would be to improve modularity by co-locating it with related code or the context (such as data or other functions) or where it is or might be used. Examples of moves include migrating a nested function higher in its hierarchy, or moving a function between classes.

Move Field

Again, a simple concept, motivated by a better way to organize data structures. Moving fields can apply to standard data structures as well as classes (which are data structures with dedicated functions).

Move Statements into Function

Sometimes code more naturally resides in the function (rather than the caller of the function), especially if the same code must be repeated by the caller whenever the function is invoked. In this case, move those statements into the function.

Move Statements to Callers

The inverse of moving code to functions: sometimes it’s more natural to have statements outside the function, perhaps if there are variations in the statements that would encumber the function itself to deal with the special cases.

Replace Inline Code with Function Call

One strategy to eliminate duplicate code or to improve clarity is to collect a block of code and turn it into a function.

Slide Statements

To promote readability, move statements so that they’re in the proximity of related code. An example of this is to co-locate variable declarations with their first use, rather than to have a general declaration section.

Split Loop

Sometimes loops perform more than one set of iterative actions. In the introductory example, the code both calculated values and printed them. Splitting the loops can improve clarity, add modularity and reduce the potential interaction between two independent iterations. This refactoring is a special case of Split Phase.

Replace Loop with Pipeline

Modern languages permit developers to express iteration as a semantic intent rather than a procedural process. Filters and Maps can be useful replacements for raw iteration, reducing the amount of code and potential for errors.

Remove Dead Code

Dead code increases clutter and should be removed. It should not be just commented out. With version control systems, any old code will still be easily found.

Split Variable

Sometimes it’s convenient to “reuse” variables, such as one might of a temporary, temp. Don’t do this. Set a variable once and if another is needed, define a different variable for this purpose.

Rename Field

Names, at least to humans, have great value. Maintain them so that they reflect a semantic intent in data structures, classes, and methods.

Replace Derived Variable with Query

Variables that are used to hold mutable data can be replaced with the actual calculations that update those variables. In this way, potential unexpected changes to the variable are eliminated and the value is obtained from a function (i.e., the “query”).

Change Reference to Value

To add immutability, use value objects, which tend to be easier to work with. On the other hand, if the object is meant to be shared, then it must be kept as a reference.

Change Value to Reference

As noted earlier, references may be more natural when sharing data objects.

Decompose Conditional

Conditional logic, especially involving cascades of compound booleans, can be confusing. One way to make the logic easier to digest is to extract the logical calculations into a function.

Consolidate Conditional

Sometimes, conditional actions are not really conditional; the outcome is the same. In this case, combine them. On the other hand, complex boolean expressions where some of the operands could change in the course of evaluating the boolean, could make this refactoring more difficult (though this seems dangerous!).

Replace Nested Conditional with Guard Clauses

Guard clauses ensure that something is returned after a cascade of conditionals.

Replace Conditional with Polymorphism

Polymorphism can be a powerful way to eliminate forests of case statements by turning procedural processing into a semantic exercise. Classes are defined to express a semantically meaningful hierarchy and the conditional processing logic is then embedded into the class, so that it doesn’t have to be parsed by evaluating conditionals.

“A common case [is] a set of types, each handling the conditional logic differently. … This is made most obvious when there are several functions that have a switch statement on a type code. … Remove the duplication of the common switch logic by creating classes for each case and using polymorphism to bring out the type-specific behavior.

“Another situation is where … the logic [is] a base case with variants. The base case may be the most common or most straightforward. … Put this logic into a superclass which allows me to reason about it without having to worry about the variants. I then put each variant case into a subclass, which I express with code that emphasizes its difference from the base case.” (pg. 272)

Introduce Special Case (Checks)

Sometimes “users of a data structure check a specific value, and then … do the same thing”, resulting in duplicate code. (pg. 289) Since this special case and the handling of it (e.g., checking for a null value) are repeated, it makes sense to treat it as something clients can query for.

Introduce Assertion

This practice can help identify bugs and explain the logic of the code to others.

Separate Query from Modifier

It goes without saying that a function that purports to do something, does that plus something additional, is inherently more complex and likely to generate unintended side effects. So, split these up.

Parameterize Function

In the spirit of reuse, if adding a parameter can increase its use and eliminate duplicate code, then do it!

Remove Flag Argument

A flag is a special case of a parameter which dictates the action a function should take. Instead, create specialized functions for each instance.

Preserve Whole Object

Rather than cleave off a few values from a record to pass to a function, pass along the whole object and let the function derive the values itself. This reduces the parameter list and doesn’t require modification (of the function or where it’s called) should more values need to be used from that same object.

Replace Parameter with Query

In a similar vein to preservation of whole objects, let the function determine parameter values by interrogating the object. This is especially practical if one value can be looked up from the other.

Replace Query with Parameter

On the other hand, if a query inside a function must do a lot of work to calculate the desired value or use global data, then determining the value for the function to use before the call may simplify matters. This refactoring enhances referential transparency. Referential transparency is a property of a function that always gives the same result when called with same parameter values. “If a function accesses some element in its scope that isn’t referentially transparent, then the containing function also lacks referential transparency. I can fix that by moving that element to a parameter. Although such a move will shift responsibility to the caller, there is often a lot to be gained by creating clear modules with referential transparency.” (pg. 328)

Remove Setting Method

If a field will not be changed after it is created, then don’t suggest it can be by allowing setting methods to exist.

Replace Constructor with Factory Function

Instead of using the standard constructors, create a Factory Function (that will use the constructor within it). “A factory function is any function which is not a class or constructor that returns a (presumably new) object. In JavaScript, any function can return an object. When it does so without the new keyword, it’s a factory function.” (Elliott)

“Factory functions have always been attractive in JavaScript because they offer the ability to easily produce object instances without diving into the complexities of classes and the new keyword.” (Elliott)

Replace Function with Command

In Fowler’s parlance, a Command is a class with a single method that acts in place of a simpler function. There’s more work and overhead in doing this, so most of the time regular functions will suffice. The case for commands is when better control over the behavior of the function and its parameters is desired, for example to create an “undo” method for the command. “They can easily be broken down into separate methods sharing common state through the fields; they can be invoked via different methods for different effects; they can have their data built up in stages.” (pg. 338)

Replace Command with Function

Because of the added overhead and complexity of commands, sometimes, it’s desirable to reverse the process.

Return Modified Value

Often functions operate and change values without making that change explicit; return the changed values!

Replace Error Code with Exception

Rather than manage error codes, handle the exception directly. “Exceptions provide a separate language mechanism for error handling. When I detect an error, I throw an exception, which travels up the call stack until it finds a handler …. Exceptions mean that I don’t have to remember to check error codes anymore, nor worry about detecting and passing errors up the call stack.” (not in printed book; see supplemental materials)

Replace Exception with Precheck

Save exceptions for unexpected behavior; instead, check for a possible error condition before it could be triggered.

Pull Up Method or Field

When subclasses end up with methods that do the same or similar things, it’s desirable to pull the method upwards in the hierarchy so that identical code is not duplicated. Similar to pulling up methods, fields within subclasses may be duplicated, in which case pull them up.

Pull Up Constructor Body

While similar to pulling up fields or methods, constructors might be treated the same way. But because constructors may depend on a sequence of operations, for example because the constructor uses values set in the subclass, it may be necessary to do additional refactoring to also pull up the dependency.

Push Down Method or Field

The inverse of pulling up a method or field is employed when only one subclass requires the method or field.

Replace Type Code with Subclass

Type codes arise when an object has different behavior or properties based on its type. These type codes are then used to determine when to apply different programming logic to the object. For example, an employee could be an engineer, manager, or salesperson, each with different attributes. Subclasses can model the semantics of the relationship better. The resulting subclasses can then be used by replacing the conditional with Polymorphism, instead of long case statements.

Remove Subclass

Sometimes, subclasses are no longer justified. Remove them and replace them with appropriate fields in the containing class.

Extract Superclass

When two classes do similar things that differ in a limited way, then creating a superclass from which the original classes inherit might be warranted.

Collapse Hierarchy

Sometimes a class and its children lose much of their distinction. Pull up the subclass.

Replace Subclass with Delegate

Delegates promote code reuse among classes without using inheritance. Inheritance is a powerful, but in some ways, limited tool. It requires the class and subclass to possess a meaningful and extensible hierarchy; changing the superclass may require rework of the subclasses. When this relationship is unclear or not straightforward, a delegate might work better. These delegate classes are helper classes to the main one. “Delegation is a regular relationship between objects — so I can have a clear interface to work with, which is much less coupling than subclassing.” (pg. 382) See also Hide Delegate.

Replace Superclass with Delegate

Similar to replacing a subclass with a delegate, a superclass can be expurgated if the hierarchical relationship is too cumbersome to maintain. Sometimes the initial model suggests a logical hierarchy, but after further evolution, that hierarchy weakens. Delegates can still accomplish the original objective of code reuse and centralization of logic without requiring a strict hierarchical order.

Criticism

Fowler has written an important reference that should be on the bookshelf of most software developers. He has distilled less than a hundred of the most important refactoring patterns, some of which are obvious and probably used reflexively by experienced programmers. However, all too many developers are content to leave working code alone, a practice that has corrosive effects even in the short term. That Fowler has identified and highlighted the importance of even trivial or obvious improvements is a call to action for developers in their everyday activity. Code quality erodes slowly but ineluctably; a developer’s second priority (after correct implementation) is to arrest if not reverse this erosion.

Fowler states that the purpose of refactoring is to make improvements without changing a program’s “observable behavior”. He advocates aggressive and repeated testing to ensure the original behavior is retained. Still, some changes are subtle and may not be readily tested in a functional way; for example, they could affect performance or introduce unexpected, distant consequences. Avoiding regressions that are hard to find is the fear most developers live with. Perhaps a more nuanced discussion of what are some of the pitfalls might be warranted.

Refactoring sometimes requires database schema changes. This is not always possible (or easy) with legacy data that cannot be discarded or lost, e.g., historical records. Tactics to perhaps build interfaces around the legacy code and database so that future refactoring can occur without disrupting valuable data might be worth exploring.

Fowler refers to the functional programming approach, but only to note its novelty. Some of the refactorings he advocates are an attempt to impose the discipline of functional programming’s immutability of data onto the object-oriented paradigm (e.g., split variable). He certainly acknowledges that an undisciplined use of mutable data can introduce bugs. A more spirited discussion of the advantages of functional programming in refactoring might be useful. For example, perhaps gains in immutable data might be lost to the ability to model semantic objects using inheritance. Or that the complexities of inheritance are avoided by simply adding functions (i.e., delegates).

An attitude that seems helpful is to program defensively; that is, to imagine at the outset all the bad cases that can generate errors. In some respects this is the ethos behind building unit tests and test-driven development. In a similar way, a pessimistic outlook about all the things that can go wrong with their code may be helpful for programmers so that they stay motivated to constantly refactor.

While we generally have been taught to eschew second guessing compilers and precompilers, in the days of interpreted languages, developers may pay a performance penalty with some refactorings, such as split loop. With compiled languages, it would be safe to assume that split loops would be subsequently recombined by the compiler long before any code was generated. We can’t assume this in languages like JavaScript.

Fowler makes a good case for replacing a function with command; perhaps it is more general to state that the refactoring is to replace a function with class object.

In conclusion, Refactoring is an important book that is helpful for most software professionals. It can also serve as a guide for software managers so that they understand both the simplicity of some refactorings — and the complexity of others. Through his lucid examples and handy step-by-step advice, Martin Fowler demystifies the practice of refactoring, thus encouraging all developers to do more of it!

References

Fowler, M. (1999). Refactoring: Improving the Design of Existing Code. Boston, MA, USA: Addison-Wesley. ISBN: 0–201–48567–2

Fowler, M. (2019). Refactoring: Improving the Design of Existing Code. Boston, MA, USA: Addison-Wesley. ISBN: 978-013-475-759-9

Fowler, M. (2020). Refactoring: Web Edition. (Canonical version) Accessed 28-September 2020. https://memberservices.informit.com/ (only available to purchasers of the book)

Elliott, Eric (2017). “JavaScript Factory Functions with ES6+.” https://readmedium.com/javascript-factory-functions-with-es6-4d224591a8b1 (accessed October 1, 2020)

Recommended from ReadMedium