The primary purpose of scientific documents is simply to communicate our data and our research to the world. Unfortunately, we often forget that successful communication, so crucial to the advancement of public knowledge as well as of our individual careers, depends ultimately on people outside of our control: our readers. Because readers have no way of asking authors real-time questions (at least not yet), the onus is on writers to ensure that readers understand exactly what they are meant to understand.
Luckily, the research of Dr. George Gopen and Dr. Judy Swan has provided us with clear, concrete guidelines to improve the clarity of our writing. Using studies of rhetoric, linguistics, and cognitive psychology, Gopen and Swan were able to identify a set of "reader expectations" most English readers unconsciously possess regarding how information is presented. When writers are aware of these expectations and make proper use of them, the likelihood of readers' understanding increases dramatically. Gopen and Swan's findings and advice to writers are described in detail in their article "The Science of Scientific Writing", which I highly recommend reading. However, for those running a bit short of time, here are some of the highlights.
First, the authors demonstrate the concept of reader expectations neatly and simply using the following charts:
time (min) temperature (ºC) vs. temperature (ºC) time (min)
0 25 25 0
3 27 27 3
6 29 29 6
9 31 31 9
12 32 32 12
15 32 32 15
The two charts present exactly the same information, but, as English readers, most of us will intuitively prefer the chart on the left; we expect context (in this case, the independent variable) to come first (on the left, since English is read from left to right). Without any additional explanation, most of us would reasonably infer from the first chart that someone took temperature readings every three minutes. I suspect we'd have more difficulty confidently reaching a consensus about the interpretation of the chart on the right.
With this in mind, let's see how they tackle the following example text:
"The smallest of the URF's (URFA6L), a 207-nucleotide (nt) reading frame overlapping out of phase the NH2-terminal portion of the adenosinetriphosphatase (ATPase) subunit 6 gene has been identified as the animal equivalent of the recently discovered yeast H+-ATPase subunit 8 gene. The functional significance of the other URF's has been, on the contrary, elusive. Recently, however, immunoprecipitation experiments with antibodies to purified, rotenone-sensitive NADH-ubiquinone oxido-reductase [hereafter referred to as respiratory chain NADH dehydrogenase or complex I] from bovine heart, as well as enzyme fractionation studies, have indicated that six human URF's (that is, URF1, URF2, URF3, URF4, URF4L, and URF5, hereafter referred to as ND1, ND2, ND3, ND4, ND4L, and ND5) encode subunits of complex I. This is a large complex that also contains many subunits synthesized in the cytoplasm."
This passage is certainly difficult to read, but why, exactly? The sentences are long, true, with very technical terminology and abundant use of acronyms. If we remove the terminology and most of the acronyms, though, we still run into difficulty:
"The smallest of the URF's, and [A], has been identified as a [B] subunit 8 gene. The functional significance of the other URF's has been, on the contrary, elusive. Recently, however, [C] experiments, as well as [D] studies, have indicated that six human URF's [1-6] encode subunits of Complex I. This is a large complex that also contains many subunits synthesized in the cytoplasm."
Ask ten readers for the subject of the next sentence, and I'd bet you'd get five saying "URFs" and five with "Complex I". (Just to satisfy your curiosity, here it is: "Support for such functional identification of the URF products has come from the finding that the purified rotenone-sensitive NADH dehydrogenase from Neurospora crassa contains several subunits synthesized within the mitochondria, and from the observation that the stopper mutant of Neurospora crassa, whose mtDNA lacks two genes homologous to URF2 and URF3, has no functional complex I.")
Let's unpack this a bit. According to Gopen and Swan, one reason this passage seems so labyrinthine is that the subjects and verbs of sentences are frequently separated by as many as 27 words. Here, then, is our first reader expectation:
"Readers expect a grammatical subject to be followed immediately by its verb."
Once a reader finds the grammatical subject, his or her mind instantly starts looking for the verb, glossing over all intervening material as being less important. Of course, when you put 27 words between your subject and your verb, it's likely that at least some of those words will be significant to the meaning of the text! To avoid having your readers unconsciously skim over important parts of your text, then, it's best to keep subjects and verbs close together. Gopen and Swan suggest one possible revision:
"The smallest of the URF's is URFA6L, a 207-nucleotide (nt) reading frame overlapping out of phase the NH2-terminal portion of the adenosinetriphosphatase (ATPase) subunit 6 gene; it has been identified as the animal equivalent of the recently discovered yeast H+-ATPase subunit 8 gene."
Of course, it's possible that the "interrupting" phrase between the subject and the verb is not all that important, as its position between the subject and the verb unconsciously indicates. In this case, Gopen and Swan advise that it's best to omit this phrase for the sake of clarity:
"The smallest of the URF's (URFA6L) has been identified as the animal equivalent of the recently discovered yeast H+-ATPase subunit 8 gene. "
Unfortunately, as readers we don't know which of these two revised sentences most accurately reflects the authors' intentions, and we would have to ask them to be sure. The two potential meanings of this sentence lead Gopen and Swan to their next reader expectation:
"Each unit of discourse, no matter what the size, is expected to serve a single function, to make a single point. In the case of a sentence, the point is expected to appear in a specific place reserved for emphasis."
As readers, we unconsciously emphasize the end of a sentence and assume that the most important part of any given sentence will be at the end; we build tension as we read in anticipation of an exciting finish. As writers, we can utilize this tendency in order to increase the likelihood of readers emphasizing the material that we want them to emphasize. For any given sentence, then, we should strive to put the most important material in the "stress position" at the end. By applying their guidelines about subject/verb separation and stress positions, Gopen and Swan revised the rest of the passage thusly:
"Recently, however, several human URF's have been shown to encode subunits of rotenone-sensitive NADH-ubiquinone oxido-reductase. This is a large complex that also contains many subunits synthesized in the cytoplasm; it will be referred to hereafter as respiratory chain NADH dehydrogenase or complex I. Six subunits of Complex I were shown by enzyme fractionation studies and immunoprecipitation experiments to be encoded by six human URF's (URF1, URF2, URF3, URF4, URF4L, and URF5); these URF's will be referred to subsequently as ND1, ND2, ND3, ND4, ND4L and ND5."
Note that each sentence makes one point, sentences with closely related points are joined by a semi-colon, and subjects and verbs appear together. However, Gopen and Swan freely admit that they had to make assumptions about what material from the original passage was worthy of emphasis, so while the text might read more smoothly, it may or may not represent the authors' desired message. There are several other possible interpretations that could work, but that only reinforces their argument: the original text was worded in such a way that many interpretations were likely, and it therefore failed to clearly communicate its meaning.
I'll continue with my summary of Gopen and Swan's article over the next couple of posts, but I highly recommend making the time to read the original in its entirety! More to come....
References:
Gopen, G. and J. Shaw. "The Science of Scientific Writing." American Scientist. Nov-Dec. 1990. Accessed from http://www.americanscientist.org/issues/pub/the-science-of-scientific-writing/.
Luckily, the research of Dr. George Gopen and Dr. Judy Swan has provided us with clear, concrete guidelines to improve the clarity of our writing. Using studies of rhetoric, linguistics, and cognitive psychology, Gopen and Swan were able to identify a set of "reader expectations" most English readers unconsciously possess regarding how information is presented. When writers are aware of these expectations and make proper use of them, the likelihood of readers' understanding increases dramatically. Gopen and Swan's findings and advice to writers are described in detail in their article "The Science of Scientific Writing", which I highly recommend reading. However, for those running a bit short of time, here are some of the highlights.
First, the authors demonstrate the concept of reader expectations neatly and simply using the following charts:
time (min) temperature (ºC) vs. temperature (ºC) time (min)
0 25 25 0
3 27 27 3
6 29 29 6
9 31 31 9
12 32 32 12
15 32 32 15
The two charts present exactly the same information, but, as English readers, most of us will intuitively prefer the chart on the left; we expect context (in this case, the independent variable) to come first (on the left, since English is read from left to right). Without any additional explanation, most of us would reasonably infer from the first chart that someone took temperature readings every three minutes. I suspect we'd have more difficulty confidently reaching a consensus about the interpretation of the chart on the right.
With this in mind, let's see how they tackle the following example text:
"The smallest of the URF's (URFA6L), a 207-nucleotide (nt) reading frame overlapping out of phase the NH2-terminal portion of the adenosinetriphosphatase (ATPase) subunit 6 gene has been identified as the animal equivalent of the recently discovered yeast H+-ATPase subunit 8 gene. The functional significance of the other URF's has been, on the contrary, elusive. Recently, however, immunoprecipitation experiments with antibodies to purified, rotenone-sensitive NADH-ubiquinone oxido-reductase [hereafter referred to as respiratory chain NADH dehydrogenase or complex I] from bovine heart, as well as enzyme fractionation studies, have indicated that six human URF's (that is, URF1, URF2, URF3, URF4, URF4L, and URF5, hereafter referred to as ND1, ND2, ND3, ND4, ND4L, and ND5) encode subunits of complex I. This is a large complex that also contains many subunits synthesized in the cytoplasm."
This passage is certainly difficult to read, but why, exactly? The sentences are long, true, with very technical terminology and abundant use of acronyms. If we remove the terminology and most of the acronyms, though, we still run into difficulty:
"The smallest of the URF's, and [A], has been identified as a [B] subunit 8 gene. The functional significance of the other URF's has been, on the contrary, elusive. Recently, however, [C] experiments, as well as [D] studies, have indicated that six human URF's [1-6] encode subunits of Complex I. This is a large complex that also contains many subunits synthesized in the cytoplasm."
Ask ten readers for the subject of the next sentence, and I'd bet you'd get five saying "URFs" and five with "Complex I". (Just to satisfy your curiosity, here it is: "Support for such functional identification of the URF products has come from the finding that the purified rotenone-sensitive NADH dehydrogenase from Neurospora crassa contains several subunits synthesized within the mitochondria, and from the observation that the stopper mutant of Neurospora crassa, whose mtDNA lacks two genes homologous to URF2 and URF3, has no functional complex I.")
Let's unpack this a bit. According to Gopen and Swan, one reason this passage seems so labyrinthine is that the subjects and verbs of sentences are frequently separated by as many as 27 words. Here, then, is our first reader expectation:
"Readers expect a grammatical subject to be followed immediately by its verb."
Once a reader finds the grammatical subject, his or her mind instantly starts looking for the verb, glossing over all intervening material as being less important. Of course, when you put 27 words between your subject and your verb, it's likely that at least some of those words will be significant to the meaning of the text! To avoid having your readers unconsciously skim over important parts of your text, then, it's best to keep subjects and verbs close together. Gopen and Swan suggest one possible revision:
"The smallest of the URF's is URFA6L, a 207-nucleotide (nt) reading frame overlapping out of phase the NH2-terminal portion of the adenosinetriphosphatase (ATPase) subunit 6 gene; it has been identified as the animal equivalent of the recently discovered yeast H+-ATPase subunit 8 gene."
Of course, it's possible that the "interrupting" phrase between the subject and the verb is not all that important, as its position between the subject and the verb unconsciously indicates. In this case, Gopen and Swan advise that it's best to omit this phrase for the sake of clarity:
"The smallest of the URF's (URFA6L) has been identified as the animal equivalent of the recently discovered yeast H+-ATPase subunit 8 gene. "
Unfortunately, as readers we don't know which of these two revised sentences most accurately reflects the authors' intentions, and we would have to ask them to be sure. The two potential meanings of this sentence lead Gopen and Swan to their next reader expectation:
"Each unit of discourse, no matter what the size, is expected to serve a single function, to make a single point. In the case of a sentence, the point is expected to appear in a specific place reserved for emphasis."
As readers, we unconsciously emphasize the end of a sentence and assume that the most important part of any given sentence will be at the end; we build tension as we read in anticipation of an exciting finish. As writers, we can utilize this tendency in order to increase the likelihood of readers emphasizing the material that we want them to emphasize. For any given sentence, then, we should strive to put the most important material in the "stress position" at the end. By applying their guidelines about subject/verb separation and stress positions, Gopen and Swan revised the rest of the passage thusly:
"Recently, however, several human URF's have been shown to encode subunits of rotenone-sensitive NADH-ubiquinone oxido-reductase. This is a large complex that also contains many subunits synthesized in the cytoplasm; it will be referred to hereafter as respiratory chain NADH dehydrogenase or complex I. Six subunits of Complex I were shown by enzyme fractionation studies and immunoprecipitation experiments to be encoded by six human URF's (URF1, URF2, URF3, URF4, URF4L, and URF5); these URF's will be referred to subsequently as ND1, ND2, ND3, ND4, ND4L and ND5."
Note that each sentence makes one point, sentences with closely related points are joined by a semi-colon, and subjects and verbs appear together. However, Gopen and Swan freely admit that they had to make assumptions about what material from the original passage was worthy of emphasis, so while the text might read more smoothly, it may or may not represent the authors' desired message. There are several other possible interpretations that could work, but that only reinforces their argument: the original text was worded in such a way that many interpretations were likely, and it therefore failed to clearly communicate its meaning.
I'll continue with my summary of Gopen and Swan's article over the next couple of posts, but I highly recommend making the time to read the original in its entirety! More to come....
References:
Gopen, G. and J. Shaw. "The Science of Scientific Writing." American Scientist. Nov-Dec. 1990. Accessed from http://www.americanscientist.org/issues/pub/the-science-of-scientific-writing/.