Skip to content
Archive of posts tagged programming

Simplifying the problem: plain vanilla configuration

One of the stages in simplifying an AutoCAD software problem is to rule out (or in) third party add-ons as the cause. To do that, you’ll need a “plain vanilla” test bed configuration that consists of only the out-of-the-box components with no add-ons loaded.

The best way to test in a plain vanilla AutoCAD configuration is to install it on a virtual machine. I use VMWare Workstation for this purpose (Microsoft’s Windows Virtual PC is another popular choice). Virtual machines are completely separate from the host system, so they provide an ideal testbed for isolating software problems. Unfortunately, virtual machines take time and resources to build and manage, so they won’t work for everyone.

For most common cases, it’s sufficient to start AutoCAD with a “vanilla” profile (ideally using the /p <profile_name> target argument in a shortcut) that loads only the standard AutoCAD menu and no third party add-ons, but this can be a bit tricky. The problem is that some third party add-ons are determined to load, and it’s difficult to stop them.

Lisp add-ons can be thwarted by ensuring that your “vanilla” profile uses a clean path (i.e. a folder structure containing only unmodified out-of-the-box files) for all support files. In addition, you must ensure that only the standard unmodified menu files are loaded. Finally, you can set the DEMANDLOAD system variable to zero to disable all object enablers from loading. Note that AutoCAD must be closed and restarted after any changes to see the effect.

Finally, here’s one last tip: you can enter (arx) at the AutoCAD command prompt to see a list of loaded ARX modules. Some ARX modules may automatically load at startup; those can be disabled by tweaking the registry, but I don’t recommend that unless you really know what you’re doing.

In the end, the goal is to determine whether the problem can be reproduced on a “plain vanilla” configuration. If the problem goes away, you can start adding things back one at a time to see when the problem resurfaces. If the problem still occurs on a clean configuration, you can focus attention on other possible causes.

Simplifying the problem: divide and conquer

When a customer reports a drawing-specific issue, the next stage in simplifying the problem is to eliminate all drawing content that has no bearing on the problem. Unless the drawing object that causes the problem is obvious and can be immediately ascertained, I use the divide-and-conquer method to zero in on the cause.

To divide and conquer an AutoCAD drawing, I start by getting a quick feel for what is in the drawing and how it is structured (I use Periscope for this; in fact I originally wrote Periscope for this purpose). The general approach I use is to eliminate roughly one half of the drawing at a time by erasing an arbitrary half, then checking to see whether the problem still exists in the remaining half. Doing this reliably requires some care.

If the drawing has layouts, I start by erasing all layouts except model space. I then save the new file with a new filename, close AutoCAD, restart, open the last file, and test to see whether the problem still exists. If the problem went away, I go back to the original file and start over, erasing everything in model space, then repeat the saveas/close/reopen sequence. Once I pin the problem down to a specific layout (or layouts), I start working on individual drawing entities.

A typical iteration consists of erasing roughly half the drawing entities, then checking whether the problem persists. If the problem goes away, back up and erase the other half, otherwise continue. Entities can be erased either by selection or using QSELECT. If the drawing consists of entities nested inside blocks, I first try erasing the block references; if the problem goes away, I back up and use BEDIT to erase part of the block content.

While going through these iterations, I sprinkle in a periodic purge (using SuperPurge, of course) to remove hidden and invisible objects that become unreferenced as visible entities are erased. I continue this process until I have a drawing file cleaned of everything except the bare minimum needed to reproduce the problem. Often, but not always, that means a drawing file with a single visible entity in it.

The key to success is being methodical and saving with incremented filenames so that it’s easy to back up a step and start over when the problem disappears. Closing and restarting AutoCAD between each iteration helps ensure that the test is not influenced by erased objects remaining in memory. With experience and a bit of skill in correctly guessing where to look, I’ve learned that almost all drawing-specific problems can be narrowed down to the bare minimum in 5 to 10 iterations.

The art of simplifying the problem

One of the most important skills in resolving technical problems is not problem solving, but problem definition. Stripping a problem down to its essence often makes the solution obvious. I think this is generally true, but especially true in my experience with software tech support and tracking down software bugs.

The first stage in problem simplification is to document a set of steps that consistently reproduces the problem. These steps must be detailed enough so that someone else can use them to reproduce the problem, including a description of exactly what the problem is. Often this is the most difficult step, either because the problem doesn’t happen consistently, or because the problem description lacks detail.

The second stage is to try eliminating unnecessary steps with the goal of determining the bare minimum steps needed to reproduce the problem. If the problem is drawing-specific, this stage includes stripping everything out of the drawing except the bare minimum needed. This is often very time consuming, but almost always a worthwhile investment because it can eliminate a lot of potential dead ends in tracking down the ultimate cause. Often, this stage requires some trial and error.

The third stage is determining the exact cause and source of the problem. In my experience, even problems that at first appear very complicated can almost always be boiled down to just a few steps with a minimal amount of data.

Finally, in stage four the problem has to be solved, of course!

In future posts, I’ll share some tips and techniques that I use to simplify software problems.

What’s the diff?

One of the required tools in a programmer’s toolbox is a tool for comparing files. File comparison utilities have come a long way since the original Unix diff command, but this genre of utilities is still referred to as diff tools. For Windows, I use an open source diff utility called WinMerge. WinMerge includes a shell extension that makes it easy to use from within Windows Explorer.

The thing about diff tools is that they can be re-purposed and used for many tasks that have nothing to do with programming. For example, I use WinMerge to easily and quickly compare AutoCAD EULAs and other legal document revisions. I also use WinMerge to preview all changes before applying an upgrade to software like WordPress or Joomla on my web server. In both cases, I can quickly see exactly what changed.

The power of a diff tool is impressive in its own right, but it’s true power borders on amazing when using it as a merge tool to reconcile multiple simultaneous changes to the same files. In programming, reconciling and merging simultaneous revisions allows multiple programmers to work in parallel, thereby reducing management overhead and eliminating the inherent inefficiency of a serial workflow. I think that the principles and even the tools could just as well be applied to any collaborative project such as designing an airplane or building a skyscraper.

Most modern diff tools provide an easy to use GUI, and they do a lot more than just compare files. Even if you’re not a programmer, you should have a diff tool in your toolbox.

Debugging heap corruption with PageHeap

Heap corruption bugs in C++ code can present some difficult debugging challenges. Often the actual corruption goes unnoticed until some apparently random point in the future when the corrupt memory is accessed by unrelated code. In such cases, it’s almost impossible to infer the location of the bug using typical debugging tools.

Common causes of heap corruption memory errors are modifying objects after they have been destroyed, or double deleting a pointer. In such cases, you often find out about the corruption well after it occurs with unhelpful errors like this:

Free Heap block NNNNNNNN modified at NNNNNNNN after it was freed

Aside from defensive programming techniques, a common strategy for combating these types of bugs is to log and track all memory allocations so that it’s possible to backtrack and determine which code last used the memory address or range of memory that got corrupted. Unfortunately, sometimes it isn’t possible or practical to add such diagnostic capabilities. For example, ObjectARX add-ons use AutoCAD’s heap and memory allocation functions, and unless you happen to have access to the AutoCAD source code, there is no way to change them.

Luckily there is a debugging tool designed for this precise scenario: the PageHeap heap verifier tool. This tool has been built into Windows since Windows 2000, but you have to turn it on in order to use it. I use the Global Flags Utility (gflags.exe from Debugging Tools for Windows) to turn on the PageHeap verifier when needed. The PageHeap utility must be used in conjunction with a debugger such as WinDbg or Visual Studio.

A typical debugging use case for an ObjectARX add-on is to enable PageHeap monitoring for the entire AutoCAD process:

gflags /p /enable acad.exe /full

For monitoring only a specific ObjectARX module, use syntax like this:

gflags /p /enable acad.exe /full /dlls myarx.arx

Once PageHeap monitoring is enabled, you simply reproduce the heap corruption under a debugger like normal.  The PageHeap utility will break execution as soon as the heap corruption occurs. Examining the call stack at that point will show exactly which code is causing the corruption.

Once you’ve fixed the problem, don’t forget to disable the PageHeap utility again:

gflags /p /disable acad.exe

I have not needed to use the PageHeap utility very often, but it has been a godsend when I needed it.