Thursday 21 August 2008

Simian - A Copy and Paste Code Hunter Killer!

In my new(ish) job I have to look after a legacy application with a very large code base. Over the years this has been developed with a lot of copy and paste coding practices. This is a significant contributor to our 'technical debt' and certainly increases the costs of ownership associated with the application, adversely affecting troubleshooting, maintenance, and new feature development.

I have already been making good use of NDepend and Resharper to assist with refactoring, and still have hopes that we will invest in Ants Profiler, but I was missing a tool that would nicely identify for me where copy and paste coding might have occurred. After a little Googling I discovered Simian.

This great little command prompt utility will analyse a Visual Studio solution and spot instances of the same code existing in multiple places. It was really simple to integrate in to the Visual Studio IDE and will make a big difference to my team going forward. Now when working on a piece of code we can quickly check to see if it exists else where and refactor accordingly. This new visibility is going to be invaluable as we move forward and attempt to bit by bit pay back the technical debt.

I found a great blog post by a chap from Conchango that showed how to integrate it with Visual Studio, but in the end I mainly just used the, plenty good enough, documentation that ships with it.

The line that I am currently using to have this tool scan my solution is:

-formatter=vs:c:\temp\buks_simian.log -language=cs $(ProjectDir)/../**/*.cs $(ProjectDir)/../**/*.aspx $(ProjectDir)/../**/*.js

What this does is:

-formatter=vs:c:\temp\buks_simian.log (specify that the output should be dumped to a file on my C drive and formatted for Visual Studio)

-language=cs $(ProjectDir)/../**/*.cs (specify that the all the files that are children of the parents of my current project directory should be searched (this is recursive thanks to the /**/))

$(ProjectDir)/../**/*.aspx $(ProjectDir)/../**/*.js (like above but also look at the aspx and js files too).

The reason why I am starting from the ProjectDirectory and then going up a level before initiating a recursive scan, rather than just using the Solution directory is that my Solution files live outside of the structure of the projects, ensuring that they don't end up source controlled.

No comments: