"Clear graphics aid, and show, clear thinking about what data mean"
(Valiela 2001) NOTE: numbers in this section refer to the figures to be projected during the course. They
are available in the pdf files accompanying these notes.
The power of figures – examples
Example 1 - Figures can make a point quickly and forcefully.
Rubbish in space: text vs. figures
…some 7,000 pieces of space debris – operating and dead satellites, explosion fragments from rocket engines, garbage bags and frozen sewage dumped by astronauts, shrapnel from antisatellite weapons tests, 34 nuclear reactors and their fuel cores, an escaped wrench and a toothbrush – now orbit our world. Only about 5 percent are working satellites. By means of extraordinary data recording and analysis, military computers identify and then track each of these 7,000 objects (>10 cm in diameter), in order to differentiate the debris from a missile attack, for which we may be thankful. Space is not totally self-cleaning; some of the stuff will be up there for centuries, endangering people and satellites working in space as well as inducing spurious astronomical observations. The risk of a damaging collision is perhaps 1 in 500 during several years in orbit. The volume of debris has doubled about every 5 years;
future testing of space weapons will accelerate the trashing of space.
Example 2 - Napoleon's war in Russia – complex story can be told
Example 3 - Barley data analysis, Minnesota, U.S.A. - Graphs can unearth aspects of data that otherwise remain hidden
See figs. 6.20, 1.1. & 6.21 for the different versions that indicate the anomaly in the dataset.
Example 4 - The Challenger disaster - Bad figures can kill
See the file, pages 6 & 7. The first contains the figure version used by the engineers to test the relationship between ground temperature at the time of space shuttle launch and O ring damage recorded on the booster rocket segments. The second has ALL the data on O-rings, irrespective of the amount of damage. The conclusion is clear: there has been no launch under 65o F when O rings did nto suffere damage. Plus the expected temprerature is much colder than at any previous launch.
Example 5 (pages 8 & 9)– The Anscombe quartet - Graphical presentation of data reveals aspects that statistics hide
The table (page 8) shows the x and y values of the four datasets. All descriptive statistics are
identical. The graph (page 9) indicates the their distribution is strikingly different. If the data are not graphed, such differnce would remain unnoticed.
Overall, statistical graphics can:
- show the data
- induce the viewer to think about the substance NOT methodology, design, or technology - avoid distorting what the data have to say
- present many numbers in small space - make large data sets coherent
- encourage the eye to compare diff. pieces of data - reveal data at several levels of detail
- serve a clear purpose: description, exploration, tabulation
- closely integrated with statistical and verbal description of a data set
Terminology
See figs. 2.1 & 2.2, depicting: data rectangle Axes & axis labels (in Us English “scales”) ticks & tick labels
key & data label legend / caption
Principles of designing graphs: economy, clarity, integrity Designed to present the data, in clear, uncluttered and honest way.
Principle. a figure should be understandable without reference to the text.
Double presentation - not allowed: either text, or figure, or table.
no double-coding (symbol + line) All axes should have a label:
what is displayed?
what are the measurement units?
Economy
Data rectangle should fill the scale-line rectangle 2.49 vs. 2.50 Do not insist to include zero – 2.59
Maximum no. of data, min. ink – improve the data/ink ratio (examples with bad data-ink ratio:
Unnecessary decorative elements (page 5) – do not use them Bad practice (chartjunk): pages 8 & 9
Tukey box plot – vs. Tufte plot (page 8)
Examples of 'more with less' (earasing ink & increase information):
- conventional axes vs. range-frame (page 9)
- indicating mean (or median), quartiles along axes(page 10) - ticks corresponding to data (page 11)
Clarity
allow for reduction in reproduction – the fig. on 2.30 did not no scale braeaks 2.71, 2.72
tick marks outward – they can inteerfere with data: bad 2.11, 2.12 & good 2.13 when data sit on axis, move away axis – 2.9 vs. 2.10
use visually prominent sumbols to show data - fig.2.5 (bad) & 2.6 (better) do not allow data labels clutter the graph 1.6 vs. 1.7
do not clutter the interior of data rectangle: 2.3 (bad) vs. 2.4 (good) , 2.14 (bad) vs. 2.15 (good), 2.16 (bad) vs. 2.17 (good)
tick marks & data labels should not dominate 2.19, 2.22 provide sufficient explanation: 2.34
Visual clarity:
avoid overlapping symbols. How?
a) use logarithms 3.27 vs. 3.28
b) moving 3.29 – only works if not too many overlapping points:
c) jittering 3.31 vs. 3.32
d) symbol use – empty circules are best – see 3.33 superimposition - symbol use 3.34, 3.35 3.36, 3.38
the role of reference grid 4.8, 3.41
banking to 45o – useful for trend assessment 1.1., 2.42,
Integrity
Figures are always selective presentation of data – certain aspects remain underemphasised or hidden => careful assessment of purpose needed:
Example: time series presentation methods
symbol plot 3.53 - good for time series for long-term trend connected plot 3.54
vertical line plot 3.55
=> Graph should be truthful to data
No pseudo-dimensions – data dimensions should match data dimensions (if possible) .- page 2 (bad example)
Comparison between panels: uniform or comparable scale 2.55 Provide context – pages 4-6
Do not use graphical elements to create misleading impression Page 7, fig. 9.26
Do not exaggerate (the “lie factor”: discrepancy between data difference and representation size difference) pages 8-14
Graphs to not have to be “alive” and “decorative” – page 15
Common problems with & misconceptions about graphs graphs have to be 'alive', 'communicatively dynamic' =>
overdecorated (chartjunk – word coined by Cleveland) exaggerated design, disguising shallow thinking disregarding the truth about data
arrogant Chartjunk:
- unintentional “optical art” (Moire vibration)
- the dreaded grid – when it is too strong and interferes with data - unnecessary decoration (e.g. p.14)
Graphical methods - Old designs to discard Area charts
Ease and precision of estimation: differences in line length >>> differences in area >>
differences in volume
Area charts violate the data dimensionality principle: a single measured value is represnted by a two-dimensional object (circle) – use dot plots instead (4.23 – bad vs. 4.24 –
recommended) Pie chart
Circles, divided into "slices". (4.19)
bad design overall - area occupied should be assessed only the largest differences could be identified.
figure useless – often numbers besides (unacceptable, double presentation of data)
Use dot plot instead (4.20) Bar chart
Oldest graph type –
measurement value = height of a column.
Often composite – several columns attached to each other, data "solidity": heavy shading, complicated markings
classifies horizontal axis as categorical variable – check if this assumption is correct (sometimes it is not, see 2.38)
point at the appropriate height gives the same information
Stacked bar chart (vertical or horizontal) Columns divided into different segments
Individual measurements very difficult to perceive - base & top both variable => proper perception impossible. (2.38)
IF horizontal axis is a proper variable (data are two-dimensional) – use a scatterplot, line plot (2.39), or similar (see options e.g. 3.53 -3.55 under integrity)
IF data are one-dimensional (4.21), use a multiple dot plot (4.22)
New methods of data presentation Dot plot & multiway dot plot for one-dimensional data
dot plot for labelled data, to replace: pie charts, bar charts
order on dot plot – according to value of measured variable, largest on top smallest on bottom – 4.10 vs. 4.9
Multiway dot plot – common horizontal axis (4.11) & optimisation of vertical order on panels: has to be uniform, AND approximate top-to-bottom order
Think about the order of categoricla variables on multiple dot plots: 4.11 vs. 4.18 superimposition possible on dot plots 1.1.
For two-dimensional data:
Loess (lowess): locally weighted regression
Example: hibernating hamsters & life span – 3.42 – 3.44 How loess wroks – 3.49
Testing the effect of window width – compare 3.44 and 3.45 Testing with loess of residuals – see 3.47 & 3.48
Multi-dimensional data Scatterplot matrix:
for data with > 3 variables, but relationships are always between two variables
picture every two variables against each other with shared scales 3.64 - redundant (every pairing is twice, changing the horizontal – vertical positions) but otherwise synthetic
comparison not possible
Conditional plot / coplot
To picture relationship of two variables under selected interval of a third variable – wear of car tyres example 3.66:
Colour
used to be rare in journals – expensive (USD1000+)
Cheap in Internet-based journals – consider carefully; sometimes b&W in print, Colour on Net – be careful, not always interchangeable
Use colour to help understanding, not for decoration Modest use of colour is very helpful
Try to use harmonic combinations
Consider colour-blindness (some combinations are indistinguishable)
Legend (caption)
An important part of the figure - should give information to help understand the figure Legend is printed underneath the figure – but
NOT in the manuscript – in MS legends are at the end of the text, grouped together.
Common error: not enough detail to understand the figure
Numbering – in the sequence of mentioning in the text, independently of tables
Proportion, scale and appearance:
graphs should tend towards horizontal: length > height
- eye is naturally practised in detecting deviations from horizon - ease of labelling
causal influence: mostly cause (independent variable) – effect (dependent variable) =>
horizontal depth – space to elaborate
ratio: golden section – a/b = b/ (a+b) – ratio 1:1.618
‘smoothly-changing curves can be taller than wide, wiggly curve needs to be wider than tall lettering: type & size - serif fonts preferred - more readable
Integrating figures & text
To have clear understanding, in the text:
- describe everything that is graphed
- draw attention to the important features of the data
- describe the conclusions drawn from the data on the graph
- interplay between graph, caption and text is delicate - no iron rules but hard thinking. self-contained figures necessary.
- error bars should be clearly explained : s.d., s.e., confidence interval?
Revising your graph Check the following:
- no pseudo-dimensions - maximise data-ink ratio
- erase non-data ink & redundant data ink, within reason
- is the legend appropriate?
Scientific illustration software – many types available.
Examples. Statistica, Axum, Origin, SAS (not user-friendly), S-Plus, R, SPSS, SigmaPlot Summary
A good graph:
- is a well-designed presentation of interesting data
-complex ideas communicated with clarity, precision, and efficiency
- gives the viewer the greatest no. of ideas in the shortest time with the least ink in the smallest space
- nearly always multivariate
- requires telling the truth ! about the data
Features of
friendly graph unfriendly graph
words spelled out, no mysterious coding Abbreviations abound, requiring ref. to text words run left to right words run vertically and/or several diff.
directions
messages help explain data graphic is cryptic, requires ref. to text elaborate shadings etc. avoided, labels on
graphic itself
obscure coding, frequent ref to legend needed graphic attracts viewer, raises curiosity chartjunk-filled
colour used with colour-blind in mind design insensitive to colour-blind (10% of popul.)
type clear, precise, modest type clotted, overbearing Type upper-lower case, with serifs all capitals, sans serif
Reviewing/evaluating figures (Exercise)
1. Is the figure necessary? Do the data justify a figure? Table? Can be written in the text?
2. Is the type of figure acceptable? Is a better type of figure necessary? (dot plot, multiple dot plots, co-plot, scatterplot vs. histogram or pie chart)
3. Data/ink ratio? Can this be improved? - can ink be eliminated and information retained?
4. Appearance: axis scale, labels (clear, not too many?), symbols (contrast, recognition). Do data fill the data rectangle?
5. Format: size, font type, ratio (vertical:horizontal, banked to 45 degree?) of figure.
Is size appropriate? Do data points stand out? Does it withstand reduction?
6. Is the legend appropriate?
Useful resources:
Tufte, Edward. R. 2003. The visual display of quantitative information. 2nd ed. Graphics Press, Cheshire, Connecticut, U.S.A.
Tufte, Edward. R. 1990. Envisioning information. Graphics Press.
Tufte, Edward. R. 1997. Visual explanations. Graphics Press.
Tufte, Edward R. 2006. Beautiful evidence. Graphics Press.
Cleveland, William S. 1993. Visualizing data. Hobart Press, U.S.A.
Cleveland, William S. 1994. The elements of graphing data. Hobart Press, U.S.A.
Edward Tufte’s website: www.edwardtufte.com Bill Cleveland’s website:
http://cm.bell-labs.com/cm/ms/departments/sia/wsc/
Photographs & drawings
Is it important? (editor will always ask)
Yes? – check journal reproduction standard – only good final quality convincing
colour – at your expense! (US$1000 & up) Colour possible? Provide slides (not prints)
Black& white photos: Supply B&W prints – colour photo reproduced in B&W – no good result (grays and fading shades result)
How to control photo quality?
best quality: no reduction/enlargement - consider journal dimensions
- reduction decreases quality – crop/frame the important part How? Experiment with cropping possibilities:
meaningful instructions – editor/copy editor happy to oblige
In-photo information:
letters or arrows superimposed if needed scale directly on photo (reduction!) mark “top” on back, in soft pencil
author, fig. no. – in pencil (photos separated form MS in production)
mark position in text – someone will have to decide where to insert – why not you?
Digital photo/illustrations:
check acceptable or preferred file format – can contact technical editor for clarification.
Will be seen as co-operation, not obstacle
.TIFF, .JPEG format better than EPS, etc. (but ask editor!)
Pen & ink illustrations: could be very useful, but only in good quality – by professional artist
15. TABLES
First question: Do you need a table?
- only if repetitive data must be presented
- not good science to publish data just because you measured them - printing a table is costly
- examples of bad (unnecessary) tables
lots of standard conditions – not variables lots of 0s, 100% or +/- s
word list table
Tables should be self-explanatory (as figures) Title: economic use of words
use footnotes (sparingly)
avoid exponents (prone to printing problems)
give details but not excessively (mention method but not recipe)
Format:
No vertical lines
Horizontal lines: column heading top (under table title),
column heading bottom
below table
Partial horizontal – sub-grouping column headings
data in either text, OR figures OR table – never repeat but:
Selected data can be singled out for discussion
significant figures and “virtual exactness”: 1 vs. 1.00 vs. 1.0000
Organisation: elements/comparisons read down, not across
Marginal indicator in Ms text (pencil) – helps to see if you mentioned all tables and if in sequence.
Think hard about the vertical sequence of rows and general arrangement of tables – often neglected.
Tip: read instructions before final formatting! – requirements often specific and non-intuitive
Tabulation of data – why do we put independent variable to the left?
Table 1, version 1: The effect of heating on water temperature.
t (time)= 0', 3', 6', 9', 12', 15’, T (temperature)= 25, 27, 29, 31, 32, 32 oC;
Table 1, version 2: The effect of heating on water temperature.
Temperature (oC) Time (min)
25 0
27 3
29 6
31 9
32 12
32 15
Table 1, version 3. The effect of heating on water temperature.
Time (min) Temperature (oC)
0 25
3 27
6 29
9 31
12 32
15 32
individuals of Carabus nemoralis, C. hortensis, and C. coriaceus caught in 2004 and 2005, in Sorø West Zealand, Denmark.
Activity periods Year,
habitat
Early Main Late
Activity peak
No.
individuals/ye ar Carabus nemoralis
2005
Forest 02 May-17 May 17 May-24 Jun 24 Jun-03 Oct 27 May 46 Suburban 02 May-28 May 28 May-08 Sept 08 Sept-03 Oct 13 Aug 170
Urban 02 -15 May 15 May-17 Aug 17 Aug-03 Oct 23 Jun 85
Carabus hortensis 2004
Forest 06 May-03 Aug 03 Aug-12 Sept 12 Sept-11 Oct 16 Aug 328 Suburban 06 May-07 Aug 07 Aug-17 Sept 17 Sept-11 Oct 20 Aug 19 2005
Forest 02 May-16 Jul 16 Jul-14 Aug 14 Aug- 03 Oct 07 Aug 237 Suburban 02 May-10 Aug 10-Aug-4 Sept 04 Sept-03 Oct 29 Aug 89
Carabus coriaceus 2004
Forest 06 May-22 Aug 22 Aug-18 Sept 18 Sept-11 Oct 06 Sept 376 Suburban 06 May-23 Aug 23 Aug-15 Sept 15 Sept-11 Oct 3 Sept 444
2005
Forest 02 May-09 Aug 09 Aug-07 Sept 07 Sept-03 Oct 17 Aug 121 Suburban 02 May-07 Aug 07 Aug-03 Sept 03 Sept-03 Oct 14 Aug 86