Constructing FIGURES: a tricky art? - Scientific communication and publishing

"Clear graphics aid, and show, clear thinking about what data mean"

(Valiela 2001) NOTE: numbers in this section refer to the figures to be projected during the course. They

are available in the pdf files accompanying these notes.

The power of figures – examples

Example 1 - Figures can make a point quickly and forcefully.

Rubbish in space: text vs. figures

…some 7,000 pieces of space debris – operating and dead satellites, explosion fragments from rocket engines, garbage bags and frozen sewage dumped by astronauts, shrapnel from antisatellite weapons tests, 34 nuclear reactors and their fuel cores, an escaped wrench and a toothbrush – now orbit our world. Only about 5 percent are working satellites. By means of extraordinary data recording and analysis, military computers identify and then track each of these 7,000 objects (>10 cm in diameter), in order to differentiate the debris from a missile attack, for which we may be thankful. Space is not totally self-cleaning; some of the stuff will be up there for centuries, endangering people and satellites working in space as well as inducing spurious astronomical observations. The risk of a damaging collision is perhaps 1 in 500 during several years in orbit. The volume of debris has doubled about every 5 years;

future testing of space weapons will accelerate the trashing of space.

Example 2 - Napoleon's war in Russia – complex story can be told

Example 3 - Barley data analysis, Minnesota, U.S.A. - Graphs can unearth aspects of data that otherwise remain hidden

See figs. 6.20, 1.1. & 6.21 for the different versions that indicate the anomaly in the dataset.

Example 4 - The Challenger disaster - Bad figures can kill

See the file, pages 6 & 7. The first contains the figure version used by the engineers to test the relationship between ground temperature at the time of space shuttle launch and O ring damage recorded on the booster rocket segments. The second has ALL the data on O-rings, irrespective of the amount of damage. The conclusion is clear: there has been no launch under 65^o F when O rings did nto suffere damage. Plus the expected temprerature is much colder than at any previous launch.

Example 5 (pages 8 & 9)– The Anscombe quartet - Graphical presentation of data reveals aspects that statistics hide

The table (page 8) shows the x and y values of the four datasets. All descriptive statistics are

identical. The graph (page 9) indicates the their distribution is strikingly different. If the data are not graphed, such differnce would remain unnoticed.

Overall, statistical graphics can:

- show the data

- induce the viewer to think about the substance NOT methodology, design, or technology - avoid distorting what the data have to say

- present many numbers in small space - make large data sets coherent

- encourage the eye to compare diff. pieces of data - reveal data at several levels of detail

- serve a clear purpose: description, exploration, tabulation

- closely integrated with statistical and verbal description of a data set

Terminology

See figs. 2.1 & 2.2, depicting: data rectangle Axes & axis labels (in Us English “scales”) ticks & tick labels

key & data label legend / caption

Principles of designing graphs: economy, clarity, integrity Designed to present the data, in clear, uncluttered and honest way.

Principle. a figure should be understandable without reference to the text.

Double presentation - not allowed: either text, or figure, or table.

no double-coding (symbol + line) All axes should have a label:

what is displayed?

what are the measurement units?

Economy

Data rectangle should fill the scale-line rectangle 2.49 vs. 2.50 Do not insist to include zero – 2.59

Maximum no. of data, min. ink – improve the data/ink ratio (examples with bad data-ink ratio:

Unnecessary decorative elements (page 5) – do not use them Bad practice (chartjunk): pages 8 & 9

Tukey box plot – vs. Tufte plot (page 8)

Examples of 'more with less' (earasing ink & increase information):

- conventional axes vs. range-frame (page 9)

- indicating mean (or median), quartiles along axes(page 10) - ticks corresponding to data (page 11)

Clarity

allow for reduction in reproduction – the fig. on 2.30 did not no scale braeaks 2.71, 2.72

tick marks outward – they can inteerfere with data: bad 2.11, 2.12 & good 2.13 when data sit on axis, move away axis – 2.9 vs. 2.10

use visually prominent sumbols to show data - fig.2.5 (bad) & 2.6 (better) do not allow data labels clutter the graph 1.6 vs. 1.7

do not clutter the interior of data rectangle: 2.3 (bad) vs. 2.4 (good) , 2.14 (bad) vs. 2.15 (good), 2.16 (bad) vs. 2.17 (good)

tick marks & data labels should not dominate 2.19, 2.22 provide sufficient explanation: 2.34

Visual clarity:

avoid overlapping symbols. How?

a) use logarithms 3.27 vs. 3.28

b) moving 3.29 – only works if not too many overlapping points:

c) jittering 3.31 vs. 3.32

d) symbol use – empty circules are best – see 3.33 superimposition - symbol use 3.34, 3.35 3.36, 3.38

the role of reference grid 4.8, 3.41

banking to 45^o – useful for trend assessment 1.1., 2.42,

Integrity

Figures are always selective presentation of data – certain aspects remain underemphasised or hidden => careful assessment of purpose needed:

Example: time series presentation methods

symbol plot 3.53 - good for time series for long-term trend connected plot 3.54

vertical line plot 3.55

=> Graph should be truthful to data

No pseudo-dimensions – data dimensions should match data dimensions (if possible) .- page 2 (bad example)

Comparison between panels: uniform or comparable scale 2.55 Provide context – pages 4-6

Do not use graphical elements to create misleading impression Page 7, fig. 9.26

Do not exaggerate (the “lie factor”: discrepancy between data difference and representation size difference) pages 8-14

Graphs to not have to be “alive” and “decorative” – page 15

Common problems with & misconceptions about graphs graphs have to be 'alive', 'communicatively dynamic' =>

overdecorated (chartjunk – word coined by Cleveland) exaggerated design, disguising shallow thinking disregarding the truth about data

arrogant Chartjunk:

- unintentional “optical art” (Moire vibration)

- the dreaded grid – when it is too strong and interferes with data - unnecessary decoration (e.g. p.14)

Graphical methods - Old designs to discard Area charts

Ease and precision of estimation: differences in line length >>> differences in area >>

differences in volume

Area charts violate the data dimensionality principle: a single measured value is represnted by a two-dimensional object (circle) – use dot plots instead (4.23 – bad vs. 4.24 –

recommended) Pie chart

Circles, divided into "slices". (4.19)

bad design overall - area occupied should be assessed only the largest differences could be identified.

figure useless – often numbers besides (unacceptable, double presentation of data)

Use dot plot instead (4.20) Bar chart

Oldest graph type –

measurement value = height of a column.

Often composite – several columns attached to each other, data "solidity": heavy shading, complicated markings

classifies horizontal axis as categorical variable – check if this assumption is correct (sometimes it is not, see 2.38)

point at the appropriate height gives the same information

Stacked bar chart (vertical or horizontal) Columns divided into different segments

Individual measurements very difficult to perceive - base & top both variable => proper perception impossible. (2.38)

IF horizontal axis is a proper variable (data are two-dimensional) – use a scatterplot, line plot (2.39), or similar (see options e.g. 3.53 -3.55 under integrity)

IF data are one-dimensional (4.21), use a multiple dot plot (4.22)

New methods of data presentation Dot plot & multiway dot plot for one-dimensional data

dot plot for labelled data, to replace: pie charts, bar charts

order on dot plot – according to value of measured variable, largest on top smallest on bottom – 4.10 vs. 4.9

Multiway dot plot – common horizontal axis (4.11) & optimisation of vertical order on panels: has to be uniform, AND approximate top-to-bottom order

Think about the order of categoricla variables on multiple dot plots: 4.11 vs. 4.18 superimposition possible on dot plots 1.1.

For two-dimensional data:

Loess (lowess): locally weighted regression

Example: hibernating hamsters & life span – 3.42 – 3.44 How loess wroks – 3.49

Testing the effect of window width – compare 3.44 and 3.45 Testing with loess of residuals – see 3.47 & 3.48

Multi-dimensional data Scatterplot matrix:

for data with > 3 variables, but relationships are always between two variables

picture every two variables against each other with shared scales 3.64 - redundant (every pairing is twice, changing the horizontal – vertical positions) but otherwise synthetic

comparison not possible

Conditional plot / coplot

To picture relationship of two variables under selected interval of a third variable – wear of car tyres example 3.66:

Colour

used to be rare in journals – expensive (USD1000+)

Cheap in Internet-based journals – consider carefully; sometimes b&W in print, Colour on Net – be careful, not always interchangeable

Use colour to help understanding, not for decoration Modest use of colour is very helpful

Try to use harmonic combinations

Consider colour-blindness (some combinations are indistinguishable)

Legend (caption)

An important part of the figure - should give information to help understand the figure Legend is printed underneath the figure – but

NOT in the manuscript – in MS legends are at the end of the text, grouped together.

Common error: not enough detail to understand the figure

Numbering – in the sequence of mentioning in the text, independently of tables

Proportion, scale and appearance:

graphs should tend towards horizontal: length > height

- eye is naturally practised in detecting deviations from horizon - ease of labelling

causal influence: mostly cause (independent variable) – effect (dependent variable) =>

horizontal depth – space to elaborate

ratio: golden section – a/b = b/ (a+b) – ratio 1:1.618

‘smoothly-changing curves can be taller than wide, wiggly curve needs to be wider than tall lettering: type & size - serif fonts preferred - more readable

Integrating figures & text

To have clear understanding, in the text:

- describe everything that is graphed

- draw attention to the important features of the data

- describe the conclusions drawn from the data on the graph

- interplay between graph, caption and text is delicate - no iron rules but hard thinking. self-contained figures necessary.

- error bars should be clearly explained : s.d., s.e., confidence interval?

Revising your graph Check the following:

- no pseudo-dimensions - maximise data-ink ratio

- erase non-data ink & redundant data ink, within reason

- is the legend appropriate?

Scientific illustration software – many types available.

Examples. Statistica, Axum, Origin, SAS (not user-friendly), S-Plus, R, SPSS, SigmaPlot Summary

A good graph:

- is a well-designed presentation of interesting data

-complex ideas communicated with clarity, precision, and efficiency

- gives the viewer the greatest no. of ideas in the shortest time with the least ink in the smallest space

- nearly always multivariate

- requires telling the truth ! about the data

Features of

friendly graph unfriendly graph

words spelled out, no mysterious coding Abbreviations abound, requiring ref. to text words run left to right words run vertically and/or several diff.

directions

messages help explain data graphic is cryptic, requires ref. to text elaborate shadings etc. avoided, labels on

graphic itself

obscure coding, frequent ref to legend needed graphic attracts viewer, raises curiosity chartjunk-filled

colour used with colour-blind in mind design insensitive to colour-blind (10% of popul.)

type clear, precise, modest type clotted, overbearing Type upper-lower case, with serifs all capitals, sans serif

Reviewing/evaluating figures (Exercise)

1. Is the figure necessary? Do the data justify a figure? Table? Can be written in the text?

2. Is the type of figure acceptable? Is a better type of figure necessary? (dot plot, multiple dot plots, co-plot, scatterplot vs. histogram or pie chart)

3. Data/ink ratio? Can this be improved? - can ink be eliminated and information retained?

4. Appearance: axis scale, labels (clear, not too many?), symbols (contrast, recognition). Do data fill the data rectangle?

5. Format: size, font type, ratio (vertical:horizontal, banked to 45 degree?) of figure.

Is size appropriate? Do data points stand out? Does it withstand reduction?

6. Is the legend appropriate?

Useful resources:

Tufte, Edward. R. 2003. The visual display of quantitative information. 2^nd ed. Graphics Press, Cheshire, Connecticut, U.S.A.

Tufte, Edward. R. 1990. Envisioning information. Graphics Press.

Tufte, Edward. R. 1997. Visual explanations. Graphics Press.

Tufte, Edward R. 2006. Beautiful evidence. Graphics Press.

Cleveland, William S. 1993. Visualizing data. Hobart Press, U.S.A.

Cleveland, William S. 1994. The elements of graphing data. Hobart Press, U.S.A.

Edward Tufte’s website: www.edwardtufte.com Bill Cleveland’s website:

http://cm.bell-labs.com/cm/ms/departments/sia/wsc/

Photographs & drawings

Is it important? (editor will always ask)

Yes? – check journal reproduction standard – only good final quality convincing

colour – at your expense! (US$1000 & up) Colour possible? Provide slides (not prints)

Black& white photos: Supply B&W prints – colour photo reproduced in B&W – no good result (grays and fading shades result)

How to control photo quality?

best quality: no reduction/enlargement - consider journal dimensions

- reduction decreases quality – crop/frame the important part How? Experiment with cropping possibilities:

meaningful instructions – editor/copy editor happy to oblige

In-photo information:

letters or arrows superimposed if needed scale directly on photo (reduction!) mark “top” on back, in soft pencil

author, fig. no. – in pencil (photos separated form MS in production)

mark position in text – someone will have to decide where to insert – why not you?

Digital photo/illustrations:

check acceptable or preferred file format – can contact technical editor for clarification.

Will be seen as co-operation, not obstacle

.TIFF, .JPEG format better than EPS, etc. (but ask editor!)

Pen & ink illustrations: could be very useful, but only in good quality – by professional artist

15. TABLES

First question: Do you need a table?

- only if repetitive data must be presented

- not good science to publish data just because you measured them - printing a table is costly

- examples of bad (unnecessary) tables

lots of standard conditions – not variables lots of 0s, 100% or +/- s

word list table

Tables should be self-explanatory (as figures) Title: economic use of words

use footnotes (sparingly)

avoid exponents (prone to printing problems)

give details but not excessively (mention method but not recipe)

Format:

No vertical lines

Horizontal lines: column heading top (under table title),

column heading bottom

below table

Partial horizontal – sub-grouping column headings

data in either text, OR figures OR table – never repeat but:

Selected data can be singled out for discussion

significant figures and “virtual exactness”: 1 vs. 1.00 vs. 1.0000

Organisation: elements/comparisons read down, not across

Marginal indicator in Ms text (pencil) – helps to see if you mentioned all tables and if in sequence.

Think hard about the vertical sequence of rows and general arrangement of tables – often neglected.

Tip: read instructions before final formatting! – requirements often specific and non-intuitive

Tabulation of data – why do we put independent variable to the left?

Table 1, version 1: The effect of heating on water temperature.

t (time)= 0', 3', 6', 9', 12', 15’, T (temperature)= 25, 27, 29, 31, 32, 32 ^oC;

Table 1, version 2: The effect of heating on water temperature.

Temperature (^oC) Time (min)

25 0

27 3

29 6

31 9

32 12

32 15

Table 1, version 3. The effect of heating on water temperature.

Time (min) Temperature (^oC)

0 25

3 27

6 29

9 31

12 32

15 32

individuals of Carabus nemoralis, C. hortensis, and C. coriaceus caught in 2004 and 2005, in Sorø West Zealand, Denmark.

Activity periods Year,

habitat

Early Main Late

Activity peak

No.

individuals/ye ar Carabus nemoralis

2005

Forest 02 May-17 May 17 May-24 Jun 24 Jun-03 Oct 27 May 46 Suburban 02 May-28 May 28 May-08 Sept 08 Sept-03 Oct 13 Aug 170

Urban 02 -15 May 15 May-17 Aug 17 Aug-03 Oct 23 Jun 85

Carabus hortensis 2004

Forest 06 May-03 Aug 03 Aug-12 Sept 12 Sept-11 Oct 16 Aug 328 Suburban 06 May-07 Aug 07 Aug-17 Sept 17 Sept-11 Oct 20 Aug 19 2005

Forest 02 May-16 Jul 16 Jul-14 Aug 14 Aug- 03 Oct 07 Aug 237 Suburban 02 May-10 Aug 10-Aug-4 Sept 04 Sept-03 Oct 29 Aug 89

Carabus coriaceus 2004

Forest 06 May-22 Aug 22 Aug-18 Sept 18 Sept-11 Oct 06 Sept 376 Suburban 06 May-23 Aug 23 Aug-15 Sept 15 Sept-11 Oct 3 Sept 444

2005

Forest 02 May-09 Aug 09 Aug-07 Sept 07 Sept-03 Oct 17 Aug 121 Suburban 02 May-07 Aug 07 Aug-03 Sept 03 Sept-03 Oct 14 Aug 86

In document Scientific communication and publishing (Pldal 28-46)