Solving semantic conflicts with Git and SemanticMerge

Thursday, July 14, 2016 0 Comments

Here is the scenario: you have a source file with a class and some methods, and then you think it is a good idea to do some cleanup. You know, sort the methods in visibility order (public first) maybe create a subclass to wrap some functionality together, or place methods close to each other depending on how they are called, just to improve readability.

But, someone was doing changes to the same file concurrently (you know, it happens) and then he is less than happy to merge his fixes together with your cleanup...

So, here is the deal: shouldn't we try to keep the code as clean as possible? Yes, of course, sticking to whatever common style rules the team agreed to stick to. But then, in real life, isn't it a merge killer?

This blogpost shows how SemanticMerge helps to solve this case, when integrated with Git. For those of you who didn't know, SemanticMerge parses the code before calculating any merges. Unlike other 3-way merge tools, it is not just based on text. It can parse Java code, C#/VB.net code (Roslyn based) and C. There are also external community written parsers for Delphi and JavaScript.

Merge scenario

Well, I'm using a clone of Kestrel Server source code and more specifically working on file MemoryPool.cs.

Here are the changes the two developers are going to perform:

Merge case

Base is how the code was originally (in fact, the figure is hiding all the methods not involved in the example, the code is slightly more complicated than that).

Source (src) and Destination (dst) are the changes that the two developers are going to make.

Look at the icons: C stands for "changed" and M for "moved". As you can see both methods will be changed concurrently (C on both sides) and also moved by one of the developers.

This figure is actually taken from SemanticMerge.

Diffing the code

The developer doing the cleanup (that's you in this story) can diff the changes (we're using Visual Studio Code for this example) and sees something like this:

Classic diff

Needless to say, "traditional" diff is not very helpful with moved code.

You can always run git difftool from the command line to get a "move aware" diff:

Semantic diff

Merging

To create the actual conflict, I used two branches: task001 where I was doing the refactor/reorganization of the file, master where I just modified the methods.

Then, I checked out master and merged from task001:

Running git merge

And, as expected, Git detects a conflict on MemoryPool.cs.

Solving it with SemanticMerge will be simpler than it seems. Just run git mergetool (provided you configured Git to use SemanticMerge as its merge tool, which is quite simple to achieve (see how to make it).

The key issue you face when trying to merge code that has been moved is that traditional merge tools are not "code aware". They don't parse the code, and as such they just try to match lines that are close. But, that fails when the methods are reordered like in this case.

Since SemanticMerge parses the code first, it "knows" where methods are, and calculates the conflicts on a method by method basis (or function per function, property by property, and so on, depending on the actual language):

Merge tool explained

I added red circles to the screenshot to highlight some interesting points:

  • There are only 2 conflicts to solve. Remember, only 2 methods were changed.
  • The tool starts with the first conflict, Dispose() in this case. Check how the 3 versions involved are aligned on the Dispose() method. I mean, check the line numbers. Remember Dispose() was moved up and changed, and modified in the original location by the other developer. SemanticMerge detects the conflict, but also shows it in a way that is easy to understand. Traditional "line sync" is broken to sync on actual methods.
  • Finally, check line 177 on the left and 72 on the right. These are the actual changes made to the method.

You also see the C and M icons on the method declarations. There are dropdowns there to let you run the merge of the Dispose() method. It will be an automatic merge since the lines are not colliding.

And, the same will happen for the Return() method that was moved down.

Wrapping up

Git does a great job calculating the 3 contributors (base, yours and theirs) involved on each file merge. It implements merge tracking to do that, it calculates the common ancestor and then asks an external tool to handle the job when there are manual conflicts (conflicts its algorithm can't figure out without manual intervention).

By plugin Semantic to Git, you extend the "merge power" you are used to inside the files. And since location dependent conflicts are no longer a conflict, cleaning up code and reordering methods are not the root of all merge evil anymore.

But all of this works just "inside" the same file. What we really need is a merge tool that tracks moved code across files! - I hear you say. Yep, that's correct, but we need to develop our custom merge driver for Git to do that. Something we will definitely do, so stay tuned.

If you want to download the tool and give it a try, just go to www.semanticmerge.com.

Bonus track

If you found this blogpost interesting, you might want to watch the tool in action. You can see the same scenario described above here:

We develop Plastic SCM, a version control that excels in branching and merging, can deal with huge projects and big binary assets natively, and it comes with GUIs and tools to make everything simpler.

If you want to give it a try, download it from here.

We are also the developers of SemanticMerge, and the gmaster Git client.

0 comments: