What most developers don't know about refactoring

Thursday, February 06, 2014 3 Comments

Say “refactor” to a corporate executive and he will say, “talk to me about business”. What he doesn’t realize is that refactoring is more about business than it is about code. It is all about keeping the cost of change low, being able to react fast to customer requests, being able to focus on adding value to the product... Put it this way and you’ll catch the exec’s attention.

Refactor and the cost of change

“Make a change in the software during the production phase and it will cost you a fortune compared to a change made during the analysis or design phases”. Ever heard that? Boehm wrote it in Software Economics back in the 80’s. Heavily tied to the “waterfall lifecycle” way of thinking it was the mantra used to train generations of software engineers all around the globe.

When you blindly believe this you feel fine when things go wrong later in the lifecycle and believe it to not be your fault.

“They should have asked for it before” Sound familiar?

“Unfortunately” Kent Beck and the entire “agile movement” is all about changing the infamous curve: take care of your code and you will not only be a clean coder, you won’t spend hours blaming the codebase... and the cost of introducing new changes won’t grow exponentially.

The “silver bullet” behind the “agile cost of change curve” is keeping the code clean, simple, well written, easy to read, easy to extend, easy to modify… thereby making it easy to introduce changes. If you can’t prevent change… well, embrace it. Obviously it doesn’t happen by chance. You won’t arrive one day to the office and find the big mess ironed out and ready to be effortlessly modified. It takes time, it takes effort and the technique known to achieve it is known as “refactoring”.

"Refactoring" is defined as the operations you perform to improve your code without affecting its functionality. Rewrite a piece of code making it simpler and more readable… after that the new features will simply flow.

When you spend time refactoring you’re not adding ‘value’ to the product but rather ‘paying the technical debt’. Spending time doing correctly what you didn’t previously have time to do.

Excuses to avoid refactors

Despite all that I wrote above, Mr. Anti-refactor will probably say “well, yes, refactoring sounds cool for code katas and having a good time but, you know, our team is big, we’ve got a lot of things to do and stopping to refactors would simply put us out of business”, the bad news is that he’s probably right to some extent.

Refactoring means changing the structure of the code and, on a real project of a certain size, it will mean making changes on parts of the code that are being concurrently modified by other developers. You know the rule: 80% of the changes happen on 20% of the files, at most. So if you do concurrent/parallel development it means one of two things:

  • You stop development while you clean up the code – which is probably excuse number one not to pay the technical debt.
  • Or you simply go ahead and… well, you’ll create a nightmare of a merge later on...

21st century version control

But we are in the 21st century, aren’t we? Back in 2005 distributed version control (DVCS) arrived and teams stopped to think about version control as a ‘commodity’ (it doesn’t matter which one you have, they all do the same) to find out that advanced ones (like Git, Mercurial and our own Plastic SCM) allowed them to do things that were simply unthinkable before. Working distributed? Doing good branching? Merging anyone? It all changed.

Now, almost a decade later, DVCS is not just the new toy for cool kids, big teams are using it, enterprises are throwing away their old irons and jumping to the now-mature distributed version control systems.

So, don’t they bring a better way to handle refactors?

The answer is no, they don’t. At the end of the day brand new version controls still depend on arcane merge tools to perform the file-per-file merges.

Let me explain in more detail. Suppose you have a class like the following (I’ll just focus on the class structure and not the actual code for the sake of simplicity and focus on the refactor operation):

The class looks like a small mess: Which is the main interface method on the class? Sending bytes (possibly through a socket)? Or as the name says, retrieving some sort of stats from a remote machine?

To start the process first sort the methods in a more ‘expressive’ way:

Now it is a little clearer that the “interface” method with the outside world is the GetStats method which probably initializes the class with data from a remote host.

The “Send()” method is no longer public and the methods have been sorted trying to “explain” what the class does: the interface method, then the methods handling the ‘comm protocol’ and finally 4 methods that look too low level to belong to the same class… (Obviously the example is a little bit forced in order to go straight to the point).

So, you still go further and create a subclass to isolate the “pure network” methods as follows:

Much cleaner. Still far from perfect but it is a good step forward.

Now, since you’re not alone in your project, what would happen if your colleague ‘john’ made a modification to the “GetStats()” method while you were refactoring??

Well, from a “human” point of view you know that you just rearranged the code to make it more readable, but a traditional merge tool doesn’t know how to compare your version and John’s so it will try to match the code line by line… and as you can see below, it won’t be doable since it won’t follow the arrows between the methods as my graphic does:

Once you suffer that you probably won’t think about going through this again.

How using a semantic merge helps?

But, what if the diff and merge tool was actually able to understand the code? What if it knew that all you did was rearrange the code? Then it would be possible for it to “understand” that it should just take the change made by John at GetStats() and simply put it wherever the method is now located on your version of the file, wouldn’t it?

This is exactly what SemanticMerge is all about and this is what we explain, step by step and using actual code in the following 20 minute webinar!

In the middle of the version control DVCS age we are no longer limited to unified diffs, 2-way merges, and at best 3-way text based diff and merge tools which show diffs side by side without understanding the underlying programming code structure at all.

SemanticMerge is what comes to the mind of every programmer facing the scenario I just described above... and fortunately we took the time to implement.


We develop Plastic SCM, a version control that excels in branching and merging, can deal with huge projects and big binary assets natively, and it comes with GUIs and tools to make everything simpler.

If you want to give it a try, download it from here.

We are also the developers of SemanticMerge, and the gmaster Git client.