The MacSpin Program

In 1985 Andrew Ward Donoho, David Leigh Donoho and Miriam Gasko published MacSpin: Graphical Data Analysis comprising of a book and a computer graphics system. We note that MacSpin received the award "MacSpin - Best Scientific/Engineering Software of 1987," from the MacUser Magazine.

Ellen Hirame wrote the review of the first version of MacSpin in the article Look At It This Way in the June 1986 MacUser Magazine. It appears on pages 40-44. We give a version of the article below.

MacSpin shows your data in new perspectives.

Information is the currency of the computer world. Just as money and the ability to control and use it is the source of power in the business world, information and the ability to control and use it is the real source of power in the computer world.

Computers are superb information generators and manipulators. Any of today's computers, even the smallest and cheapest, can generate or obtain profusions of data, millions of characters of information, thousands of pages of hard copy. And they do; regularly.

Data generation and acquisition technology is so cheap and so readily available that the world has been inundated with data. The term "information overload" accurately describes the situation. The data is there, and it can be manipulated, but its real information is often so buried in its mass that it can't be found. In a simpler time people called this situation "failing to see the forest for the trees."

MacSpin shows you the forest. It takes all your trees, your data, and creates a picture of the forest. The brain is still the best and quickest analytic computer in the world, and it works best with visual images. Give it a picture of complex data, and let it see this image sliced, spun twisted and massaged as it wishes and it will find the trends, anomalies and buried information. MacSpin is a tool to do exactly that. This dynamic graphic data analyser uses the unique properties of the Macintosh to provide dynamic data analysis capabilities that have only been found in half-million dollar lab machines up to now. Indeed, in some significant ways, MacSpin is the most powerful dynamic data analyser ever created. And certainly the cheapest and the most accessible.

Choose Your Partners

The information world has its own terminology (not surprised, are you?). The individual data items are often called objects or events (that's the term MacSpin uses). Events have attributes that distinguish them from other events. These attributes are often known as variables, and it is as variables that they are known in MacSpin. In the example used to illustrate this review, the events are the cars (or trucks) that make up the dataset. Their attributes include such things as horsepower, number of cylinders, weight and year manufactured.

A group of events can be referred to as though it was a single object or unit by declaring it a subset. This allows them to be selected and manipulated together. In our example, one subset might be all Japanese cars.

In most cases, up to about 600 discrete points can be seen in a MacSpin data cloud (that's what the image is called), before the image becomes cluttered. However, some data - particularly spatial data such as the material in the Galaxies example - is spaced so that over 2000 distinct elements can be seen. MacSpin can handle more data that it can cleanly display. The data array is limited to approximately 5500 cells, while the amount of data that the program can handle while running is directly dependent on the free memory (RAM) available, MacSpin uses advanced memory management techniques so that even 128K Macs can run and manipulate substantial datasets.

Data entry is reasonably straightforward, Either type the information directly into MacSpin's spreadsheet-like Worksheet (Figure 1) or create an ordinary text file with the data. The second option is far more attractive (and usually quicker and easier). The format of the information in the text file is essentially freeform. There are no special tabbing requirements and events are delineated by carriage returns. Thus, most text downloaded from other computers (including mainframes) should work with little or no modification. The current version of MacSpin (1.0) does not support cutting and pasting from the Clipboard, so data can't simply be selected in another application, such as Multiplan or Excel, copied and pasted into MacSpin. This major deficiency is at the top of the publisher's (D2 Software) fix list, and should be available in version 1.1, which is promised by midyear. Once the required data has been entered, MacSpin goes to work.

The Dynamics window ("Cars") occupies the left two-thirds of the screen. This is the MacSpin stage - where data comes to life. In the Dynamics window shown in Figure 2, there are several horizontal lines composed of irregularly spaced dots. Each line represents cars with a different number of cylinders, and each point corresponds to a car that actually has that number of cylinders. The top line contains a point for each car in the sample that has eight cylinders, and the bottom line contains a point for each car that has three cylinders. Several facts are immediately visible. For example, there are no cars with seven cylinders. While a search of the 418 events in the dataset would reveal the same information, here a glance was both sufficient and stronger proof. If it's there, it will be seen. When scanning numeric or alphanumeric data, there is always the danger of overlooking or misreading an item.

There's more information available in this multivariate plot. Distance along the group lines toward the left indicates increased miles per gallon. That makes it easy to see that eight-cylinder cars get worse gas mileage compared to four-cylinder cars.

The points in the data cloud represent the elements or events of the dataset. Click on any point in the Dynamics window to highlight it, and, while the mouse button is held down, display its name. If the Events window is active, the name of the car will also be highlighted there. This feature works the other way, too. Clicking on a name in the Events window causes the corresponding point in the Dynamics window to be highlighted. If you can't see the highlighted point in a busy Dynamics window, choose USE HIGHLIGHT in the Markers menu and highlighted points will cover an 8-by-8 pixel area instead of a 2-by-2 pixel area.

Holding down the OPTION key while pressing the mouse button in the Dynamics window displays the positions of each variable in terms of the currently selected variable. To obtain the most information about a point, hold down both the COMMAND and OPTION keys while selecting it. This pops up a window that contains the full set of variables and their values for that event.

At times, data points w ill be so close together that it is difficult to click one particular point. Solve that problem by holding down the COMMAND key and clicking. MacSpin will zoom in on that small part of the display. This high-resolution mode allows users to do everything that they could do at normal resolution.

There are three information window's on the right side of the screen. They are the Variables, Subsets and Events windows. These windows contain the numeric and text data for the dataset being plotted. Each has its own pull-down menu with the same name. The Events window contains the names of each point, or a serial number if the points don't have identifying labels. In our example the Events window contains the names of the cars.

An individual point is selected by clicking on it in the Dynamics window or on its name in the Events window. Groups of points are selected by shift-clicking in cither the Dynamics or Events window or dragging a selection rectangle in the Dynamics window.

The Subsets window is initially empty, but any selected group of points can be declared a subset or added to an existing subset. Subsets also can be combined using standard set theory operations (union and intersection) to create other subsets.

Distinctive markers (for example, a cross or a plus sign) are chosen from the Markers menu and assigned to any point, group of points or subset to set them off visually. Only eight different markers are available and each can only be used once. However, a marker can apply to an entire selection, no matter how many events it contains. In Figure 2 the AMC Gremlin, Dodge D200 and Datsun PI510 each have been distinguished by a marker. Markers appear in both the Dynamics and Events windows.

The Variables window contains the names of all the variables and indicates which ones are currently assigned to which axes. A new assignment is made by dragging the variable's name over the circle at the end of the axis, at which time the old assignment is automatically undone.

Sometimes a required variable isn't in the Variables window. If the data required can be created by manipulating data already entered (that's certainly easier than entering new data!), MacSpin will do it for you. The TRANSFORM option on the Variables menu brings up a large dialog that allows the creation of new variables out of old ones (Figure 3). This dialog box is too busy. Since only one monadic or dyadic choice can be selected at a time, the large display of options could be replaced by list windows similar to the one in the lower right corner.

Missing in the TRANSFORM options is the ability to create variables that are linear combinations of other variables. However, the same results can usually be achieved by using many simple transforms, and the ability' to use linear combinations is promised for the next version. How to create and use them is even covered in the current manual.

Spin Your Data Round And Round

The primary dynamic manipulation tools are contained in a tool palette on the left side of the Dynamics window (Figure 4). The top six tools are rotation icons. Each controls one of the data cloud's six possible motions. To rotate the data cloud, click on these icons. Rotation is quick and quite smooth.

The program intelligently augments the image it presents (for example, it makes pixels that are visually toward the front of the image brighter, while those in the back are dimmed) to enhance depth perception as the 3D data cloud rotates.

Next down is the Tripod icon. This simply toggles a reference set of axes on the screen. Keep it turned off for best performance, using it every now and then to show a reference marker.

Datasets are actually calculated using four variables. The fourth variable is the animation axis, with the selected variable indicated by an A in the Variables window. MacSpin allows the user the choice of two types of animation: masking or slicing. They differ only in how the subset of events they display is formed.

Below the reference tripod toggle is an animation mode switch that controls which mode of animation is selected. In the masking mode, only data that has a value less than some threshold (user selected via the vertical animation scroll bar) is displayed. In the slicing mode, the dataset set can be examined slice by slice (two-dimensional plane by two-dimensional plane). The exact plane shown is determined by the value of the animation variable selected via the scroll bar.

Animation, particularly slicing mode animation, works best using animation variables with a relatively small number of values. A variable with too many values will often show too little information in each animation frame.

In our example, slicing mode animation is illustrated in Figure 5. This mode is selected by clicking on the animation switch. The animation variable for Figure 5 is "Year" (manufactured). Moving the animation slider up and down displays data for one axis unit (a year in our example) at a time only providing a powerful animation effect.

The three overlapped windows on the left show the trend away from producing fast, fuel-inefficient cars in 1971 through intermediate cars of 1978 to the slower accelerating economy cars of 1983. On-screen these frames appear as an animation.
If the animation switch was clicked once more, masking mode would be invoked. Now moving the animation slider will display data for all years no later than the slider-selected year.

The animation variable can be changed at will. Simply select any unused variable from the Variable window and drag it to the animation scroll bar slider.

Here's Lookin' At You

Many variables tend to be distributed more or less evenly, unrelated to other variables (a reasonable example in the sample dataset is colour), or normally [that is, they follow the normal distribution, with its characteristic bell-shaped curve, that is beloved of statisticians - it is best defined as tending towards some value, and usually near it (in our example this would be the number of drivers)].

Both the unrelated (random) and normally distributed variables are easy to visualise in a MacSpin three-dimensional image. The first is a cube and the second is an egg shape. The eye is very critical, and can detect even fairly slight deviations from these highly symmetric shapes. Users can transform their variables to make the data cloud as symmetrical as possible. The more symmetrical the cloud, the easier it is to spot anomalies. Anomalous points can be interesting for a number of reasons: they may be data capture errors (a station wagon labelled as a sedan), measurement errors (incorrect mileage readings), sampling errors, something that wasn't supposed to be captured in the first place (a truck in the cars data), or a discovery (a three-cylinder car).

MacSpin is not a true statistics program, but a graphical multivariate data analysis program. It often behaves like a statistics program, though, and the rules of good statistics apply. For meaningful results MacSpin should have at least 100 events and three to eight variables to work with, although a user well versed in statistics will, in some cases, be able to get meaningful results with much smaller datasets.

As with all statistics programs, it is easy to mislead yourself. If you know your statistics, MacSpin is the most powerful tool imaginable. But you must know where to use it and you must know what you are doing, or it can take you down the primrose path of erroneous results as well as any powerhouse statistics package.

The Props

MacSpin can save datasets in two ways. The saved files can be either ordinary text files or special (and much more compact) MacSpin binary files. The binary files are not only more compact, but they are much quicker to load. Binary files load many times faster than text files. For example, the Cars file (saved as text) took 42 seconds to load in one of our configurations. Saved as a binary file, it loaded in 6 seconds! Programmers whose products create a lot of data can obtain (from D2 ) Lisa Pascal source code that can be written into their programs so that their programs can create MacSpin binary files.

MacSpin does not currently have a hard copy report generation capacity. The Dynamic window can be dumped to the ImageWritcr or to an on-disk MacPaint file using standard operating system calls (COMMAND-SHIFT-3 or 4). That's it, though. This very serious deficiency should be eliminated in the first revision.

Another serious drawback is that the information in the datasets can't be printed out directly. Datasets (and subsets) can be saved as text files, exported to a database, spreadsheet or word processor, and then formatted and printed. However, that's asking a lot of users. There should be some way within the program to print this material out.

The MacSpin manual is the best manual seen to date. It not only instructs users in program usage in a clear and direct manner, it also takes the trouble to explain why the program does what it does and how it does it. Also included are sections on the history of dynamic data analysis, using the program effectively with a wide range of other programs (this section alone rates this manual a five-mouse rating), and commentary and tutorial involving the seven extensive sample datasets provided. These samples (which take up 198K of disk space) are absolutely fascinating. Reading the manual's excellent descriptions and running the complete and well-designed tutorial gives users superb insight into what MacSpin can really do. The manual is marred only by its poor glossary and index. The rest of the manual is so strong that, in spite of diem, it still rates five mice.

MacSpin is copy-protected. The program asks for a key disk when a copy is used. It can be copied to a hard disk. Users are prompted to insert their MacSpin master disk either every seven calendar days or every ten program launches (whichever comes first). While this copy protection scheme is not too onerous, it has a serious flaw when used with RAM-based HFS (512K Mac and Hard Disk 20). Users running that configuration are forced to insert a write-enabled master disk (that's something to never do),which is then accessed by the disk drive. A failure here could wipe out the master disk. This situation does not occur in any of the other configurations tested [128K and 512K Macs, 2-meg (Levco Monster) Mac, Mac XL and Mac Plus]. If you do use RAM-based HFS, be sure to order your backup disk promptly. A backup disk is available for $10.

MacSpin is shipped without a System folder. Users must supply their own. This is a drawback only if you don't have an external drive or a hard disk. The System files were omitted so that more sample data can be packed onto the disk.

The first upgrade, version 1.1, has already been announced. It should be available before midyear and will cost S99.95. Owners of version 1.0 will be able to upgrade for $10. Version 1.1 will support the Clipboard Hilly (copy, cut and paste), have real printing function, allow linear combinations in the TRANSFORM option and have an option that will connect the points in the data cloud to form a wireframe. It will come on two disks, one with the program and System files and the other packed Rill of sample data.

MacSpin runs adequately although sometimes slowly on a 128K Mac. The major limitation involves how much data can be loaded. The program has good memory management and utilises whatever space it has very efficiently.

Last Updated June 2024