December 2010 – Outside The Box

What’s the diff?

One of the required tools in a programmer’s toolbox is a tool for comparing files. File comparison utilities have come a long way since the original Unix diff command, but this genre of utilities is still referred to as diff tools. For Windows, I use an open source diff utility called WinMerge. WinMerge includes a shell extension that makes it easy to use from within Windows Explorer.

The thing about diff tools is that they can be re-purposed and used for many tasks that have nothing to do with programming. For example, I use WinMerge to easily and quickly compare AutoCAD EULAs and other legal document revisions. I also use WinMerge to preview all changes before applying an upgrade to software like WordPress or Joomla on my web server. In both cases, I can quickly see exactly what changed.

The power of a diff tool is impressive in its own right, but it’s true power borders on amazing when using it as a merge tool to reconcile multiple simultaneous changes to the same files. In programming, reconciling and merging simultaneous revisions allows multiple programmers to work in parallel, thereby reducing management overhead and eliminating the inherent inefficiency of a serial workflow. I think that the principles and even the tools could just as well be applied to any collaborative project such as designing an airplane or building a skyscraper.

Most modern diff tools provide an easy to use GUI, and they do a lot more than just compare files. Even if you’re not a programmer, you should have a diff tool in your toolbox.

Infinite Computing: Bah, Humbug!

At Autodesk University, Autodesk CEO Carl Bass introduced the term “Infinite Computing” in an attempt to define Autodesk’s perspective on “the cloud” from a unique angle. I think the term is a brilliant and effective use of terminology because it focuses an otherwise nebulous concept and it radiates a sense of real and immediate purpose.

Infinite computing is not really infinite, of course, and it’s certainly not infinitely accessible. However the metaphor is apt, because like the physical universe, as long as the virtual universe keeps expanding it is essentially infinite. [I can’t resist having some fun and taking the analogy a little bit further: at some point, Moore’s law will encounter relativistic effects, and we’ll realize that every transistor warps the virtual space-time continuum in proportion to the square of its clock speed.]

So why am I bearish on the prospect of infinite computing?

Let’s say you buy a computer with multiple processors for, say, AutoCAD. Two processors can produce a nice performance boost, because AutoCAD can utilize 100% of one processor while the operating system uses the other. But what happens if you quadruple your capacity to eight processors? Unless you’re running independent programs that can use the extra processors, they offer very little benefit and are essentially wasted.

The moral of the story is this: an infinite computer is ineffective and inefficient unless it has an infinite number of simultaneous tasks to perform. It costs computing power to manage parallel tasks, so the practical limitations of “infinite” computing make it obviously unrealistic for all but highly specialized tasks. Even if we give it a more accurate name like “massively parallel computing“, such a system is hardly “sustainable” (to use another modern term of art) due to the inherent inefficiencies.

A compromise is necessary. There are new ways to look at old problems that enable a more parallel approach to finding solutions, and I have no doubt that many engineering problems can be restated in a way that makes them amenable to parallel processing solutions — but that’s hardly a revolutionary concept, and it certainly does not require an infinite computer for its implementation.

In the final analysis, “the cloud” is going to be about individuals connecting to each other and to their data seamlessly and in a location-agnostic way, and the “infinite computer” will be what they use to do it. Nothing more, nothing less.

Debugging heap corruption with PageHeap

Heap corruption bugs in C++ code can present some difficult debugging challenges. Often the actual corruption goes unnoticed until some apparently random point in the future when the corrupt memory is accessed by unrelated code. In such cases, it’s almost impossible to infer the location of the bug using typical debugging tools.

Common causes of heap corruption memory errors are modifying objects after they have been destroyed, or double deleting a pointer. In such cases, you often find out about the corruption well after it occurs with unhelpful errors like this:

Free Heap block NNNNNNNN modified at NNNNNNNN after it was freed

Aside from defensive programming techniques, a common strategy for combating these types of bugs is to log and track all memory allocations so that it’s possible to backtrack and determine which code last used the memory address or range of memory that got corrupted. Unfortunately, sometimes it isn’t possible or practical to add such diagnostic capabilities. For example, ObjectARX add-ons use AutoCAD’s heap and memory allocation functions, and unless you happen to have access to the AutoCAD source code, there is no way to change them.

Luckily there is a debugging tool designed for this precise scenario: the PageHeap heap verifier tool. This tool has been built into Windows since Windows 2000, but you have to turn it on in order to use it. I use the Global Flags Utility (gflags.exe from Debugging Tools for Windows) to turn on the PageHeap verifier when needed. The PageHeap utility must be used in conjunction with a debugger such as WinDbg or Visual Studio.

A typical debugging use case for an ObjectARX add-on is to enable PageHeap monitoring for the entire AutoCAD process:

gflags /p /enable acad.exe /full

For monitoring only a specific ObjectARX module, use syntax like this:

gflags /p /enable acad.exe /full /dlls myarx.arx

Once PageHeap monitoring is enabled, you simply reproduce the heap corruption under a debugger like normal. The PageHeap utility will break execution as soon as the heap corruption occurs. Examining the call stack at that point will show exactly which code is causing the corruption.

Once you’ve fixed the problem, don’t forget to disable the PageHeap utility again:

gflags /p /disable acad.exe

I have not needed to use the PageHeap utility very often, but it has been a godsend when I needed it.

Windows offline files

Do you use the Windows offline files feature? I asked this question of some IT/CAD administrator types at Autodesk University last week, and was surprised to find that most didn’t even know what it was. In fact, of the dozen or so people I asked, only one actually used the feature.

There are enterprise solutions that perform similar functions (and cost a lot more), so some of you may not have much need for the feature that has been included in the box since Windows 2000, but I’ll bet that a lot of you would use it if you knew why you needed it.

If you ever need to access remote files from a local network or beyond, you can benefit from the offline files feature. In it’s simplest form, the feature allows you to continue accessing offline files when your network connection is slow or unavailable. It does this by caching copies of your files locally, and automatically keeping the remote and local copies synchronized in the background. If your data is on the cloud, offline files are your shelter from the storm.

For example, it’s common for AutoCAD installations to access configuration files stored on a network server shared by everyone in the organization. If these files are made available offline, the user can continue working on local drawing files without interruption even when the network is completely offline. In fact, if the network is slow or inaccessible for any reason, using offline files will allow the user to continue working as if the network was still perfectly functional.

The UI for offline files has been made very simple (right-click on a remote file in Explorer and select ‘Always available offline’), but there is more power and potential optimization under the hood of this feature. If you’d like to get technical, this TechNet article gives you a taste of the nitty gritty details.

So, do you already use offline files? If so, and especially if you also use RoboCopy, I’d like to hear from you. I use the offline files feature all the time, and I’ve developed a command line utility called RoboCache to manage offline files in Windows Vista and later. I’m looking for a few people to test the utility.

[Update: RoboCache has now been released.]