Sam Madden
IS247: Information Visualization
Assignment 3

Jump To [Question 1 | Question 2 | Question 3]

Question 1

Nominal x Nominal
A chart showing the distribution of a number of hazardous waste and toxic chemical sites in and around the Berkeley area. Geography
seems nominal -- there is no inherent ordering to it, although cultural preferences might give westerners a tendency to order
places in a particular way.

Nominal x Ordinal
Another map, this one showing crime by by zip code (although only the 94530 zip code is highlighted here).
Crime ratings are ordered, but it's not clear what the difference between the ratings is, so they're not intervals.

Ordinal x Ordinal
I had a hard time finding rankings like this. It just isn't natural to rank things by rank. This is a very strange web site of
a strange person who felt the need to rank his favorite philosophers (by what, I do not know). He's subcategorized his rankings,
so Aristotle is the best of the greatest philosophers (apparently).

Ordinal x Interval
This chart is actually ordinal x interval x nominal (and is fairly confusing because of that). Voters are ordered
as "very worried", "somewhat worried", "not too worried", and "not at all worried" about something (not clear what).
Their votes are given as percentages (intervals).

Interval x Interval
A table showing vacancy rates in the the United States by Year (there's also an extra nominal axis here --
rental vs. homeowner properties).

Question 2

1. Table 2 distinguishes between "navigational" and "non navigational" actions in a web browser. The biggest flaw seems to be in the non intuitive
nature of this distinction; why, for instance is the "news-next" item not navigational when the "forward" item is? Reading the text carefully reveals
that navigation items are those that result in the display of a new web page, but this distinction is not clear from the table itself.

2. The purpose of the figures is to show that Open URL events (hotlist selections, hyperlink clicks, etc) are the most frequent navigational actions
chosen by browser users. Figure 5b shows that hyperlink clicks the most frequent Open URL event, and thus that the predominant mode of navigation
in a web browser is the hyperlink.

The biggest flaw with this pair of charts is that the major point, that most navigation is done via hyperlinks, is not shown in a single location. To determine
the exact percentage of navigations done via hyperlinks, the reader has to multiple the frequency of Open URL events with the percentage of those events
that are hyperlink clicks. It would be much more effective to show the total percentage of navigations that were hyperlink clicks alongside the other forms
of navigation. Part of the problem is that the distinction between "Open URL" events and other navigational events is somewhat arbitrary. Why is submitting a
form, in which the user clicks a button that takes him/her to a new URL any different than clicking a hyperlink which moves to a new URL?

One possible alternative way of displaying this information is a pie graph. Since the entire percentage space is being partitioned among a small number
of categories, a pie chart could be effective here, although a bar graph doesn't seem overwhelmingly inappropriate.

3. These graphs are extremely confusing for a number of reasons. The single largest of these is the addition of the separate action curves plotted above the main
curve. These action curves don't represent a point on the (X,Y) axis, but merely the action which the user took in transition from one URL to the next. The curves don't
have any relation to each other, and their vertical ordering is irrelevant, although at first glance it doesn't seem that it should be -- each of the action curves follows the slope
of the main curve. Furthermore, the action curves look like a scatter plot. Most readers will try to make a scatter plot out of the chart and end up horribly confused.

Other issues with these graphs are the high density of points which make it very hard to distinguish what action a user took in moving from one URL to the next and the lack
of relevant symbols -- for instance, simple symbols reminiscent of the labels used in a web browser would make it much easier to tell which action curves were "back" and "reload".

These charts are effective in that they do show usage patterns -- after some study, it is possible to see the trends the big arrows are pointing out. Any new representation would
have to preserve that effectiveness. One possible choice is to plot just a single line, using a different symbol at each point to indicate the type of event that caused the transition.
Here's an example:

Notice that characters are used to represent the different possible actions. The data density here is much lower (if the plot were
extended to the size of the image used in the paper, it would shown only about 200 URLs.) This may be a problem, although the
data density of the original was much higher. The confusing use of the word "vocabulary" is eliminated. Access trends are still
evident -- notice, for instance, that reload and back actions never move to a new URL.

4, The main part of figure 7 is useful. It shows that most revisits to pages happen just one or two accesses after the first
access. Figure 8 shows that this corresponds to reload and back actions. Figure 8 also offers an explanation for the peak
at a distance of 4 -- users frequently move back to pages that they visited two pages ago. The inset of figure 7 is a little confusing --
it's main point, that most reaccesses occur a short distance from the first access is shown by the main graph. The inset
appears to be approaching an asymptote at a percentage recurrence labeled "maximum" -- what is this maximum? Very confusing.

Question 3

An amazingly bad visualization. A huge number of different ideas are being shown, using a large variety of different visualization
techniques, most of which are totally novel and unfamiliar. The large red column in the back shows the total number of
prostitutes in the US. There is no reason for it to be cylindrical, except for the vague penis association. Red is mostly arbitrary
-- perhaps it can be associated with sex, but that's what the whole chart is about. The column is partitioned into three sections,
in a somewhat arbitrary way. I'd like more information -- how many of the arrested prostitutes were under 18? What is the total age
distribution of prostitutes? A couple of simple bar charts could have shown much more information in much less space.

At the bottom of the chart is a blue and green disk, with brown ripples radiating from it. There is a distinctly green center
section, with a label %38 which is indicated to be the percentage of sexually active teenage girls. This suggests that the
larger (unlabeled) blue section is teenage girls, and that the brown ripples are non teenage women. This is a partitioning
by area: the brown rings on the outside representing all women cover a larger area then the center, representing teenage girls.
This sort of partitioning is extremely misleading (as Tufte pointed out) -- people aren't very good at guessing the relative
size of two areas.

Even worse is the little green section on the edge of the outermost brown ripple labeled "7% of women
are homosexual". Why is this on the outside edge? Why is it green? Are we honestly expected to be able to tell that this green
section represents 7% of the total area of the disk (or whatever part of this chart represents all women?)

And then I squinted at the coral slice on the far side of the disk. Looks like it says "13% of men are homosexual." There goes the theory about the
brown rings being non teenage women. Maybe the inner ring is men, and the outer women, and the green and brown sections are unrelated? I have
no idea.

The egg shaped protrusion at the foreground of the image is the most confusing of all. What is it doing here? Is it meant to be the percentage of
women using birth control, or to show the break down of birth control usage by type, or both? It seems to be unrelated to women, as one of the
choices of birth control is "male sterilization" (with a surprisingly large 11% of men being sterilized). How do the percentages shown map onto
the egg? By surface area? By volume?

In general, this visualization is full of "chart junk". A few well designed, simple bar graphs would show all this information much more clearly. The
mapping of simple percentages onto complex two and three dimensional shapes is confusing and misleading. Color serves no purpose. It's very hard
to tell how the various pieces of the chart relate to each other. The labels are hard to read and unclear. New types of visualizations are introduced
for no apparent reason.