Composition strand asymmetries in prokaryotic genomes: mutational

Transcription

Composition strand asymmetries in prokaryotic genomes: mutational
C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
© 2001 Académie des sciences/Éditions scientifiques et médicales Elsevier SAS. Tous droits réservés
S0764446900012981/FLA
Genetics / Génétique
Composition strand asymmetries in prokaryotic
genomes: mutational bias and biased gene
orientation
Philippe Lopez, Hervé Philippe*
Équipe phylogénie, bio-informatique et génome, UMR 7622, bâtiment B, 6e étage, case 24, 9, quai SaintBernard, 75252 Paris cedex 5, France
Received 27 November 2000; accepted 4 December 2000
Communicated by André Adoutte
Abstract – Most prokaryotic genomes display strand compositional asymmetries, but
the reasons for these biases remain unclear. When the distribution of gene orientation is
biased, as it often is, this may induce a bias in composition, as codon frequencies are not
identical. We show here that this effect can be estimated and removed, and that the
residual base skews are the highest at third base codon positions and lower at first and
second positions. This strongly suggests that compositional asymmetries result from 1) a
replication-related mutational bias that is filtered through selective pressure and/or from
2) an uneven distribution of gene orientation. In most cases, the mutational bias alters the
codon usage and amino acid frequencies of the leading and the lagging strand. However,
these features are not ubiquitous amongst prokaryotes, and the biological reasons for
them remain to be found. © 2001 Académie des sciences/Éditions scientifiques et
médicales Elsevier SAS
strand composition asymmetry / GC skew / prokaryotic genomes / transcription sense / mutational bias
Résumé – Asymétries de composition des génomes procaryotes: biais mutationnel et orientation non aléatoire des gènes. La plupart des génomes procaryotes présente une asymétrie de composition nucléotidique entre les deux brins, mais ce biais
reste largement inexpliqué. Quand l’orientation des gènes est distribuée de façon
hétérogène, on peut s’attendre à un biais de composition, puisque les fréquences des
codons ne sont pas identiques. Nous montrons ici que cet effet peut être estimé et
corrigé, et que le biais résiduel est plus prononcé en troisième base, plus faible en
première et deuxième bases. Cela suggère que les asymétries de composition sont le
résultat 1) d’un biais mutationnel lié à la réplication filtré par les pressions de sélection
et/ou 2) d’une distribution hétérogène de l’orientation des gènes. Dans la plupart des
cas, le biais mutationnel modifie l’utilisation des codons et les fréquences des acides
aminés des deux brins. Cependant, ces deux phénomènes ne sont pas communs à tous
les procaryotes, et leur explication biologique reste à découvrir. © 2001 Académie des
sciences/Éditions scientifiques et médicales Elsevier SAS
asymétrie de composition / biais de GC / génome procaryote / sens de transcription / biais
mutationnel
* Correspondence and reprints.
E-mail address: [email protected] (H. Philippe).
201
P. Lopez, H. Philippe / C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
Version abrégée
Les génomes procaryotes présentent généralement
des asymétries à grande échelle: utilisation des codons
ou composition en base et en oligonucléotides. Même
si ces biais ont été utilisés avec succès pour détecter in
silico les origines de réplication, les processus biologiques qui en sont responsables restent mal connus. On
sait que les gènes sont préférentiellement codés sur le
brin avancé, ce qui peut générer une asymétrie de
composition, puisque tous les codons ne sont pas
utilisés à fréquence égale. De plus, l’existence d’un
biais mutationnel, pendant la réplication par exemple,
reste à étudier. Nous avons donc calculé pour tous les
génomes procaryotes disponibles le biais d’orientation
des gènes, ainsi que le biais de GC (excès de G par
rapport à C) et d’AT en distinguant les trois bases du
codons et les régions intergéniques. La majorité des
gènes est codée sur le brin avancé et la répartition de
ces gènes est parfois remarquablement régulière, ce
qui suggère une pression de sélection favorisant cette
distribution. Les asymétries observées sont parfois
contre-intuitives: biais plus fort en première base, ou
biais en deuxième et troisième bases opposés. Nous
avons développé une méthode pour estimer le biais de
GC et d’AT moyen attendu, connaissant l’orientation
des gènes, l’utilisation des codons et la fréquence des
acides aminés. Ce biais moyen peut alors être soustrait
1. Introduction
The recent achievements in complete genome sequencing have provided a wealth of material for studying largescale strand asymmetries in genomes. Among these biases,
the focus has been set on demonstrating a wide range of
skews: base composition [1, 2], oligonucleotide composition [3, 4], codon usage [5], or amino acid composition [6,
7]. An important reason for this was that these analyses
allow in silico prediction of replication origins [4, 7–9],
which have then been confirmed in vivo [10, 11]. Yet, the
processes responsible for strand asymmetries remain rather
unclear, especially because not much is known about the
constraints that a genome is under. Neutralist explanations
have been advocated, like strand-specific mutational
mechanisms during replication and repair processes [1,
12], or fluctuation of the dNTP-pool composition during
the cell cycle [13]. On the other hand, selectionist arguments, like selective pressures on gene orientation, amino
acid composition or codon usage can be hypothesized
[14, 15]. Both kinds of explanations are likely to be true,
which even puzzled Frank and Lobry in a recent review,
stating that “base composition skews could be influenced
by differences between coding and non-coding strand
because of biased gene orientation” [16]. The recent work
202
aux biais observés. Cette analyse est détaillée ici pour
deux organismes, Bacillus subtilis et Borrelia burgdorferi. La plupart du temps, le biais résiduel se révèle être
de même signe pour les trois bases, mais plus fort en
troisième base qu’en première et deuxième bases, ce
qui est compatible avec le filtrage d’un biais mutationnel par les pressions de sélection associées à ces
positions. Ce résultat suggère fortement l’existence
d’un biais mutationnel favorisant la présence de G et de
T sur le brin avancé, parfois suffisamment fort pour
modifier l’utilisation des codons et la composition en
acides aminés entre les deux brins. Chez les procaryotes, la répartition biaisée des gènes et un biais mutationnel expliquent donc totalement les asymétries
observées. Il peut cependant suffire d’une de ces deux
composantes pour expliquer les asymétries, comme
chez les Mycoplasma, qui ne présentent pas de biais
mutationnel résiduel. Les asymétries de composition
semblent pourtant se maintenir, alors qu’on sait que les
génomes procaryotes sont soumis à de vastes réarrangements au cours de l’évolution. L’intensité relative des
pressions de sélection mises en évidence et des biais
mutationnels observés, ainsi que leur cinétique, pourraient expliquer la variabilité des asymétries observées
chez les procaryotes. Le séquençage des génomes
complets d’espèces proches reste la meilleure approche pour répondre à ces questions.
of Tillier and Collins has shown by ANOVA approaches
that the directions of replication and transcription sometimes played significant and separated roles on the
observed asymmetries [17]. We provide here a simpler
method that confirms and extends these results and displays graphically the previous effects.
Coding regions generally account for about 90 % of the
genome in prokaryotes. This feature has two main consequences for base strand asymmetries. First, there seems to
be a selective pressure for having a majority of genes
transcribed in the replication sense in most prokaryotic
genomes [14, 18], as will be detailed below. Although no
biological reason has been demonstrated for this, except
perhaps avoidance of head-to-head collisions between
replication and transcription units [19], the distribution of
gene orientation is skewed between the leading and lagging strand. Second, since the genetic code is degenerated, it is well known that the probability of fixing mutations is the highest when nucleotide mutations occur at
third codon positions, lower at first and second codon
positions. If there was an overall mutational bias [1, 12], its
amplitude should thus be the highest at third base codon
positions and lower at first and second ones. However,
composition asymmetries are sometimes stronger in first
and second codon positions in prokaryotic genomes [14].
P. Lopez, H. Philippe / C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
Yet, a biased gene orientation is bound to induce strand
base asymmetries in coding regions (i.e. in about 90 % of
the genome) on codon positions 1 and 2. For example,
Leucine is the most frequent amino acid in the genome of
Bacillus subtilis, accounting for 10.5 % of the proteome.
As all Leucine codons bear a U at the second position, it is
likely to produce an excess of T over A at second base
codon positions, since the majority of genes are coded on
the leading strand. Similarly, McLean et al. [14] have
suggested that an unevenly distributed codon usage would
have an impact on codon position 3. Taking into account
the role of gene orientation, we shall investigate the effect
of amino acid frequencies and codon usage in strand
compositional asymmetries of prokaryotic genomes. We
will then demonstrate that once the effect of gene orientation is removed, the residual skew can be attributed to a
mutational bias.
2. Materials and methods
For every available prokaryotic genome, we have computed raw GC skews (G – C/G + C), i.e. the relative excess
of G over C, in windows of 1/50th of the genome, incremented by values of 1/240th of the genome. These computations were performed for subsets of the genome,
namely the intergenic regions (GCi), and the positions
corresponding to first, second and third base positions in
coding regions (GC1, GC2, GC3). The same analyses were
performed on AT skews, leading to ATi, AT1, AT2 and AT3.
To reflect the organization of gene orientation along the
genome, i.e. the colinearity of transcription and replication, we computed transcription skews (TS), as the relative
excess of genes that were transcribed in the arbitrary
positive sense over those transcribed in the opposite sense
(see figure legends for details about the method). For better
readability, these skews were all displayed in a cumulative
way, meaning the raw value is integrated from the arbitrary
start of the genome [2].
To study the impact of gene orientation, average
expected base skews were computed by taking into
account the local proportion of gene encoded in the
arbitrary positive sense (Pforward) as well as the amino acid
composition and codon usage. It has been shown that
some organisms, like the spirochete Borrelia burgdorferi,
harbor significantly different subsets of codon usage
depending on whether the leading or lagging strand is
considered [5]. Codon usage was thus globally estimated
over the whole genome (global), or split into two subsets
by distinguishing between leading and lagging strands
(distinct). The same estimations were performed for amino
acid frequencies. This led to four possible expected local
base compositions, depending on whether codon usage
and/or amino acid frequencies are considered global or
distinct. The average local proportion of nucleotide base
at posth codon position was then estimated using the
following generic formula, where forward (backward)
means leading (lagging) if computed on the leading strand,
and lagging (leading) if computed on the lagging strand :
Pbase,pos =
61
兺共 Fi
i=1
forward
. RFSCiforward . Pforward . δbase,pos 共 i 兲
+ Fibackward . RFSCibackward . 共 1 − Pforward 兲 . δbase,pos共 i 兲 兲
The term pbase,pos is the frequency of nucleotide base at
posth codon position and base stands for the complement
of base. Fi stands for the frequency of amino acid corresponding to codon i, and RFSCi is the relative frequency of
synonymous codon i. The value of δbase,pos(i) is set to 1 if
codon i has the base nucleotide at posth position, or 0 if
otherwise. When global codon usage (or amino acid frequencies) is used, then forward and backward terms are
set to the same global value, computed without distinguishing the strands. Corrected skews are then defined as
the raw observed skews minus the expected ones, and
indicated by the suffix cor followed by two letters (G or D)
indicating whether amino acid frequencies and/or codon
usage are considered global or distinct. For instance,
GC1corDG refers to G excess over C at first codon position,
corrected by distinct amino acid frequencies and global
codon usage.
By removing the average effect of gene content, GG
corrections should display any other bias that affects strand
composition, such as a mutational bias for instance. However, it is known that both strands have distinct compositional properties in some organisms. In such a case, DG
corrections should remove this effect on first and second
codon positions (i.e. the positions essentially affected by
amino acid frequencies), while GD corrections should do
the same on third codon positions (i.e. the ones affected by
codon usage). As a consequence, DD corrections should
show any residual bias other than gene content and global
strand properties, such as a bias correlated to the distance
to the origin of replication for example [6].
3. Results and discussion
3.1. Transcription sense in prokaryotes
While it has often been observed that genes are preferentially transcribed on the leading strand in prokaryotes,
the evenness of this distribution must also be considered.
In the genome of B. subtilis [20], gene orientation was
remarkably biased, as 74 % of its genes were transcribed
on the leading strand, and the transcription (TS) skew
displayed a very steady trend (figure 1). Such an organization was unlikely to have occurred by chance alone, and
some selective pressure had to be responsible for it, all the
more so as quite a few other organisms display the same
kind of organization. For instance, the genome of Mycobacterium tuberculosis also displayed a very regular TS
skew, although with only 59 % of its genes on the leading
strand (figure 1). Similarly, the spirochete Borrelia burg-
203
P. Lopez, H. Philippe / C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
dorferi [21] showed only minor irregularities on its TS
skew (figure 1). However, such a selective pressure seemed
either very weak or absent in many other prokaryotes. For
instance, no clear-cut trend could be found in the TS skew
of the genomes of Chlamydia trachomatis MoPn [22] or
Thermotoga maritima [23], even if they transcribe more
than 50 % of their genes on the leading strand. In some
cases, such as the archaeon Pyrococcus furiosus [24], the
majority of the genes were transcribed on the lagging
strand and TS skew was rather irregular. An interesting
feature about closely related genomes, such as the Chlamydia or Pyrococcus species, is that they can display very
different TS skews, confirming that massive genomic rearrangements occur in prokaryotes. Gene orientation in
prokaryotic genomes has thus to be described with at least
two parameters: the percentage of genes transcribed in the
replication sense and the regularity of this distribution, as
seen on TS skews. As far as could be told from the
available genomes, there did not seem to be a correlation
between these two factors (figure 1). For instance, the
genomes of Helicobacter pylori J99 and Mycobacterium
tuberculosum displayed a similar percentage of genes
transcribed in the replication sense, although their TS
skews were quite different. As for a possible selective
pressure for having genes transcribed in the replication
sense, this was likely true for most, but not all, prokaryotes
(especially Gram-positive ones).
3.2. The case of B. subtilis
The genome of B. subtilis has already been thoroughly
analyzed, and we focused on base skews at different
codon positions. GC1 skew was found to be the highest,
while GC3 and GCi skews were less but similarly pronounced (figure 2.A). The most striking feature was that
second codon positions were oppositely biased, harboring
more C than G on the leading strand. AT skews were quite
regular as well, but AT1 was the most pronounced, while
AT2 and ATi displayed lower amplitudes. Unexpectedly,
AT3 exhibited the opposite behavior, being negative on the
leading strand. An overall mutational bias thus could not
at first sight explain the raw excess of G over C in the B.
subtilis genome, as the order of the amplitudes of the
skews was GC1 > GC3 > 0 > GC2 [14].
The corrections we designed (see Materials and methods) allow the removal of the effect of gene orientation,
which is generally biased (figure 1). GC1corGG was reduced
to a 15-fold lower amplitude, and GC2corGG was reestablished to the same sign as GC1corGG and GC3corGG (figure
2.B). Because the degeneracy of the genetic code mainly
concerns third base codon positions, GC3 was rather
unaffected by the correction. Moreover, corrected GC
skews still displayed regular trends, whose order was GCi
∼ GC3corGG >> GC1corGG ∼ GC2corGG. Such a feature is fully
congruent with the hypothesis of an overall mutational
bias related to replication (favoring G over C on the
leading strand) that is filtered in the coding regions through
the genetic code and the selective pressure acting on
204
proteins. In the case of B. subtilis, this mutational bias was
weaker than the average base skew due to gene orientation.
Corrections on AT skews produced similar results. The
very marked AT1 skew, which argued for an excess of A
over T on the leading strand, was reduced to mere stochastic noise after correction. AT3corGG and AT2corGG still displayed a residual skew, but they were reduced by an
approximately two-fold factor. Considering the low amplitudes, no decisive conclusion could be drawn about a
mutational bias on A and T nucleotides in the B. subtilis
genome. Surprisingly, ATi and AT3corGG displayed opposite trends, which we shall discuss later on.
The correction can be refined in order to take into
account differences between strands for amino acid frequencies and codon usage. Skews on codon positions 1
and 2 were reduced to stochastic noise by DG corrections,
i.e. with distinct amino acid frequency subsets (figure 2.C).
This was not so surprising since amino acid frequencies
are to a great extent related to those positions. Similarly,
skews on codon position 3 were reduced to an extremely
low amplitude by GD corrections (figure 2.D), as this
subset of position is mostly concerned by codon usage.
When both kinds of corrections were applied (figure 2.E),
no residual trend could be observed for any codon position. Such results strongly suggest that there is indeed a
strong mutational bias in the B. subtilis genome, at least for
nucleotides G and C, and that this bias modifies the amino
acid frequencies and the codon usage of the two strands.
Most of all, DD corrections showed that strand composition asymmetries in this genome were fully explained by
the combined effects of nonrandom gene orientation and a
mutational bias that is filtered through selective pressure.
3.3. The case of Borrelia burgdorferi
The same analyses were performed on the genome of B.
burgdorferi [21], for which codon usage is significantly
different between the two strands [5]. In this case, the
order of raw base composition skews could be interpreted
as the result of a mutational bias only (figure 3.A). Yet,
some incongruities challenged this interpretation: GC1
amplitude similar to that of GC3 and considerably higher
than GC2, no skew for AT1 whereas one was displayed by
AT2. However, once the average effect of gene orientation
was removed (figure 3.B), GC1corGG was brought to a
much likelier level as well as GC2corGG, while AT2corGG
was slightly reduced and also smoothed. AT1corGG displayed a pronounced and regular residual trend, where
AT1 had none. Moreover, the correction recovered
GC3corGG >> GC1corGG ∼ GC2corGG and AT3corGG >>
AT1corGG ∼ AT2corGG, which was again compatible with the
hypothesis that was expressed for B. subtilis. In this case
the mutational bias seemed to affect both GC and AT
skews, nucleotides G and T being the most numerous on
the leading strand. As for B. subtilis, refining the correction
showed a mutational bias in this genome that altered both
codon usage and amino acid frequencies on the two
strands ( figure 3.C–E). Once these effects were removed,
P. Lopez, H. Philippe / C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
Figure 1. Transcription skew (TS) is computed as the relative excess of genes transcribed in the arbitrary positive sense over those transcribed in the
opposite sense.
All skews are measured in a sliding window of 1/50th of the genome for smoothness of the curves, and this window is incremented by 1/240th of
the genome for precision. For better readability, skews are displayed in a cumulative way, meaning the values are integrated along the genome. For
instance, when the curve goes upwards, it means that locally there are more genes transcribed in the positive sense than in the negative sense. For
comparison, the origin of replication is chosen as an arbitrary start for the genome. Here displayed prokaryotic genomes are those whose origin of
replication is known. The name of the organism is followed by the percentage of codons transcribed in the replication sense.
205
P. Lopez, H. Philippe / C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
Figure 2. Different skews for the genome of Bacillus subtilis.
GC (resp. AT) skews stand for the excess of G over C (resp. A over T)
in the window, computed as [G – C]/[G + C] (resp. [A – T]/[A + T]).
Base skews have been estimated for the first, second and third bases
of the codons (leading to GC1, GC2, GC3 and AT1, AT2, AT3) and the
intergenic regions (leading to GCi and ATi). Details concerning computation and display are given in figure 1. Four different estimated
local base skews are computed by taking into account gene orientation, amino acid frequencies and codon usage (see Materials and
methods for details). Here, the skews displayed are (A) uncorrected
skews, (B) GG corrected skews, (C) DG corrected skews, (D) GD
corrected skews and (E) DD corrected skews (see Materials and
methods for details).
no residual trends could be observed. For these two
prokaryotic genomes, we have thus shown that the raw
strand composition asymmetries could be explained in
coding regions by two factors: a nonrandom gene orientation combined with genetic code, amino acid and codon
usage constraints, and an overall mutational bias, be it GC
and/or AT.
3.4. Asymmetries in intergenic regions
However, intergenic regions, which account for approximately 10 % of the prokaryotic genomes, also displayed
strand asymmetries, for which we have little explanation.
If intergenic regions were constraint-free, they should be
exposed to mutational bias only and GCi and ATi would be
expected to display biases similar to (or even greater than)
GC3corGG and AT3corGG, respectively. This was roughly
observed in the genome of B. burgdorferi, even if AT3corGG
had a higher amplitude than ATi. In contrast, it was clearly
rejected in the genome of B. subtilis, as ATi displayed a
206
Figure 3. Same analysis as figure 2 for Borrelia burgdorferi.
As this genome is linear, it has been circularized for this study. For
comparison with figure 1, we have chosen the origin of replication
(located at about 455 kb) for an arbitrary start.
pronounced skew opposite to AT3corGG, and GC3corGG
displayed a lower amplitude than GCi. Such results are in
agreement with the fact that intergenic regions are also
exposed to heavy functional constraints that yield strand
asymmetries, such as Shine-Dalgarno sequences, promoters or terminators. For example, the fact that the Bacillus
genome is eight times richer in transcription terminators
than that of Borrelia (Claude Thermes, pers. com.) might
explain why skews in intergenic regions and third base
positions are so different in the genome of Bacillus. In this
respect, the skews are much more pronounced in the
25-bp extremities of intergenic regions, where most terminators are supposed to be (data not shown). The extremities of intergenic regions harbor the greater part of the
asymmetries, although they represent but 20 % of these
regions. The inside of intergenic regions only shows a
weak bias, still opposed to that of third codon position,
suggesting that other but weaker constraints are at work.
Consequently, a better understanding of the constraints
affecting these regions is required before a similar correction can be designed [25].
4. Conclusion
We have so far extended our study to all prokaryotic
genomes sequenced (the corresponding curves can be
P. Lopez, H. Philippe / C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
found at http://sorex.snv.jussieu.fr/gcskew/main.htm), as
well as a few eukaryotic complete chromosomes (data not
shown). For most of these genomes, the proposed method
readily demonstrated an overall residual mutational bias,
that behaves congruently with the selective pressures that
are imposed on codon positions. However, for Mycoplasma genitalium, M. pneumoniae, and chromosome I of
the eukaryote Leishmania major, the remarkable nonrandomness of gene orientation was enough to explain all the
base skews, as the residues from GG corrections proved to
be very weak and probably random. In this respect, the
finding of the origin of replication of these two Mycoplasma genomes through GC skew [8] was due to the very
pronounced nonrandomness in gene orientation, and not
to a residual mutational bias.
This feature was all the more interesting as the majority
of available prokaryotic genomes displayed a mutational
bias (21 genomes out of 28, see Web site). Whenever
clear-cut results were obtained, residual base skews
showed an excess of G over C on the leading strand, and
also an excess of T over A. However, this last bias was
weaker than GC bias in Eubacteria, or even absent. This
proved to be especially true for B. subtilis and Helicobacter pylori. In contrast, AT bias seemed more pronounced than GC bias in Archaea such as Pyrococcus
species or Methanobacterium thermoautotrophicum. As
the replication machinery is very different between Eubacteria and Archaea [26], it is not unlikely that the mutational bias is different as well.
We have thus demonstrated that strand compositional
asymmetries in prokaryotic genomes can be readily
explained by 1) an overall mutational bias and/or 2) a
biased gene orientation distribution. Even if so far no
biological mechanisms have been proven responsible for
the mutational bias, there has recently been some controversy as to whether it could be associated with replication
or transcription processes [15]. Considering that our
method took gene orientation into account and revealed
asymmetries that depended only on the considered strand,
our results show that the transcriptional hypothesis could
probably be ruled out. Moreover, the residual bias was
remarkably constant along the strand, suggesting that the
mutational bias is replication-associated.
Depending on the organisms, the two previous biases
can show different intensities. Yet, prokaryotic genomes
are known to undergo rearrangements [27] and this plasticity is likely to upset skews induced by mutational bias.
An explanation for the conservation of the asymmetries
could be that rearrangements that do not upset the skews
(such as translocation without inversion on the same replichore), which we call synonymous rearrangements, are
preferentially selected. This is, for example, the case
between the Pyrococcus abyssi and P. horikoshii genomes,
where only one nonsynonymous rearrangement has been
demonstrated. Interestingly, the mutational bias had the
AT3 skew restored, while GC3 was still not, meaning that
in this archaeon the AT bias was greater than the GC bias
(unpublished data). However, little if anything is known
about the relative kinetics of genomic rearrangements and
of the mutational bias. If the latter is faster, then some
compositional asymmetries will be observable. In some
organisms, such as the four completely sequenced Chlamydia, a high plasticity has been shown [22], but the corresponding genomes still display very pronounced GC
skews, meaning that the mutational bias has had enough
time to settle. However, it is not unlikely that some
genomes have too high a plasticity for asymmetries to be
observable, even if a mutational bias exists. For instance,
the genomes of Synechocystis sp. and Methanococcus
jannaschii did not conserve the major part of their operon
structures [28], suggesting a very high rate of genomic
rearrangement, and they are also remarkable for the
absence of clear trends in strand composition asymmetries. The understanding of the relative kinetics of rearrangements and mutational bias will require the analyses
of numerous closely related genomes.
Acknowledgements. The authors would like to thank
A. Viari and E. Rocha for helpful discussions and ideas
about this paper, and D. Moreira and A. Germot for
careful reading of the manuscript.
References
[6] Mackiewicz P., Gierlik A., Kowalczuk M., Dudek M.R., Cebrat S.,
How does replication-associated mutational pressure influence amino
acid composition of proteins? Genome Res. 9 (1999) 409–416.
[1] Lobry J.R., Asymmetric substitution patterns in the two DNA strands
of bacteria, Mol. Biol. Evol. 13 (1996) 660–665.
[2] Grigoriev A., Analyzing genomes with cumulative skew diagrams,
Nucleic Acids Res. 26 (1998) 2286–2290.
[3] Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V.,
Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J., Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B.,
Shao Y., The complete genome sequence of Escherichia coli K-12, Science 277 (1997) 1453–1474.
[4] Lopez P., Philippe H., Myllykallio H., Forterre P., Identification of
putative chromosomal origins of replication in Archaea, Mol. Microbiol.
32 (1999) 883–886.
[7] Rocha E.P., Danchin A., Viari A., Universal replication biases in
bacteria, Mol. Microbiol. 32 (1999) 11–16.
[5] McInerney J.O., Replicational and transcriptional selection on
codon usage in Borrelia burgdorferi, Proc. Natl. Acad. Sci. USA 95 (1998)
10698–10703.
[8] Lobry J.R., Origin of replication of Mycoplasma genitalium, Science 272 (1996) 745–746.
[9] Salzberg S.L., Salzberg A.J., Kerlavage A.R., Tomb J.F., Skewed
oligomers and origins of replication, Gene 217 (1998) 57–67.
[10] Picardeau M., Lobry J.R., Hinnebusch B.J., Physical mapping of an
origin of bidirectional replication at the centre of the Borrelia burgdorferi
linear chromosome, Mol. Microbiol. 32 (1999) 437–445.
[11] Myllykallio H., Lopez P., Lopez-Garcia P., Heilig R., Saurin W.,
Philippe H., Zivanovic Y., Forterre P., Bacterial mode of replication with a
minimal eukaryotic-like machinery in a hyperthermophilic archaeon,
Science 288 (2000) 2212–2215.
[12] Francino M.P., Ochman H., Strand asymmetries in DNA evolution, Trends Genet. 13 (1997) 240–245.
207
P. Lopez, H. Philippe / C.R. Acad. Sci. Paris, Sciences de la vie / Life Sciences 324 (2001) 201–208
[13] Thomas D.C., Svoboda D.L., Vos J.M., Kunkel T.A., Strand specificity of mutagenic bypass replication of DNA containing psoralen
monoadducts in a human cell extract, Mol. Cell Biol. 16 (1996) 2537–
2544.
[14] McLean M.J., Wolfe K.H., Devine K.M., Base composition skews,
replication orientation, and gene orientation in 12 prokaryote genomes,
J. Mol. Evol. 47 (1998) 691–696.
[15] Mrazek J., Karlin S., Strand compositional asymmetry in bacterial
and large viral genomes, Proc. Natl. Acad. Sci. USA 95 (1998) 3720–
3725.
[16] Frank A.C., Lobry J.R., Asymmetric substitution patterns: a review
of possible underlying mutational or selective mechanisms, Gene 238
(1999) 65–77.
[17] Tillier E.R., Collins R.A., The contributions of replication orientation, gene direction, and signal sequences to base-composition asymmetries in bacterial genomes, J. Mol. Evol. 50 (2000) 249–257.
[18] Brewer B.J., When polymerases collide: replication and the transcriptional organization of the E. coli chromosome, Cell 53 (1988) 679–
686.
[19] French S., Consequences of replication fork movement through
transcription units in vivo, Science 258 (1992) 1362–1365.
[20] Kunst F., Ogasawara N., Moszer I., Albertini A.M., Alloni G.,
Azevedo V., Bertero M.G., Bessieres P., Bolotin A., Borchert S., Borriss R.,
Boursier L., Brans A., Braun M., Brignell S.C., Bron S., Brouillet S.,
Bruschi C.V., Caldwell B., Capuano V., Carter N.M., Choi S.K., Codani J.J.,
Connerton I.F., Danchin A., The complete genome sequence of the
gram-positive bacterium Bacillus subtilis, Nature 390 (1997) 249–256.
[21] Fraser C.M., Casjens S., Huang W.M., Sutton G.G., Clayton R.,
Lathigra R., White O., Ketchum K.A., Dodson R., Hickey E.K., Gwinn M.,
Dougherty B., Tomb J.F., Fleischmann R.D., Richardson D., Peterson J.,
Kerlavage A.R., Quackenbush J., Salzberg S., Hanson M., van Vugt R.,
208
Palmer N., Adams M.D., Gocayne J., Venter J.C., Genomic sequence of a
Lyme disease spirochaete, Borrelia burgdorferi, Nature 390 (1997) 580–
586.
[22] Read T.D., Brunham R.C., Shen C., Gill S.R., Heidelberg J.F.,
White O., Hickey E.K., Peterson J., Utterback T., Berry K., Bass S.,
Linher K., Weidman J., Khouri H., Craven B., Bowman C., Dodson R.,
Gwinn M., Nelson W., DeBoy R., Kolonay J., McClarty G., Salzberg S.L.,
Eisen J., Fraser C.M., Genome sequences of Chlamydia trachomatis MoPn
and Chlamydia pneumoniae AR39, Nucleic Acids Res. 28 (2000) 1397–
1406.
[23] Nelson K.E., Clayton R.A., Gill S.R., Gwinn M.L., Dodson R.J.,
Haft D.H., Hickey E.K., Peterson J.D., Nelson W.C., Ketchum K.A.,
McDonald L., Utterback T.R., Malek J.A., Linher K.D., Garrett M.M.,
Stewart A.M., Cotton M.D., Pratt M.S., Phillips C.A., Richardson D.,
Heidelberg J., Sutton G.G., Fleischmann R.D., Eisen J.A., Fraser C.M.,
Evidence for lateral gene transfer between Archaea and bacteria from
genome sequence of Thermotoga maritima, Nature 399 (1999) 323–329.
[24] Maeder D.L., Weiss R.B., Dunn D.M., Cherry J.L., Gonzalez J.M.,
DiRuggiero J., Robb F.T., Divergence of the hyperthermophilic archaea
Pyrococcus furiosus and P. horikoshii inferred from complete genomic
sequences, Genetics 152 (1999) 1299–1305.
[25] Rocha E.P., Danchin A., Viari A., Translation in Bacillus subtilis:
roles and trends of initiation and termination, insights from a genome
analysis, Nucleic Acids Res. 27 (1999) 3567–3576.
[26] Cann I.K., Ishino Y., Archaeal D.N., A replication. Identifying the
pieces to solve a puzzle, Genetics 152 (1999) 1249–1267.
[27] Watanabe H., Mori H., Itoh T., Gojobori T., Genome plasticity as
a paradigm of eubacteria evolution, J. Mol. Evol. 44 (Suppl. 1) (1997)
S57–S64.
[28] Itoh T., Takemoto K., Mori H., Gojobori T., Evolutionary instability
of operon structures disclosed by sequence comparisons of complete
microbial genomes, Mol. Biol. Evol. 16 (1999) 332–346.

Documents pareils