-
Notifications
You must be signed in to change notification settings - Fork 10
/
abdurakhmonov_10_duplication_800417.pdf.txt
2006 lines (1531 loc) · 74.2 KB
/
abdurakhmonov_10_duplication_800417.pdf.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Duplication, divergence and persistence in the Phytochrome photoreceptor gene family of cottons (Gossypium spp.)</title>
<meta name="Subject" content="BMC Plant Biology 2010, 10:119. doi: 10.1186/1471-2229-10-119"/>
<meta name="Author" content="Ibrokhim Y Abdurakhmonov, Zabardast T Buriev, Carla Logan-Young, Abdusattor Abdukarimov, Alan E Pepper"/>
<meta name="Creator" content="FrameMaker 8.0"/>
<meta name="Producer" content="Acrobat Distiller 7.0 (Windows)"/>
<meta name="CreationDate" content=""/>
</head>
<body>
<pre>
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
RESEARCH ARTICLE
Open Access
Duplication, divergence and persistence in the
Phytochrome photoreceptor gene family of
cottons (Gossypium spp.)
Research article
Ibrokhim Y Abdurakhmonov1, Zabardast T Buriev1, Carla Jo Logan-Young2, Abdusattor Abdukarimov1 and
Alan E Pepper*2
Abstract
Background: Phytochromes are a family of red/far-red photoreceptors that regulate a number of important
developmental traits in cotton (Gossypium spp.), including plant architecture, fiber development, and photoperiodic
flowering. Little is known about the composition and evolution of the phytochrome gene family in diploid (G.
herbaceum, G. raimondii) or allotetraploid (G. hirsutum, G. barbadense) cotton species. The objective of this study was to
obtain a preliminary inventory and molecular-evolutionary characterization of the phytochrome gene family in cotton.
Results: We used comparative sequence resources to design low-degeneracy PCR primers that amplify genomic
sequence tags (GSTs) for members of the PHYA, PHYB/D, PHYC and PHYE gene sub-families from A- and D-genome
diploid and AD-genome allotetraploid Gossypium species. We identified two paralogous PHYA genes (designated
PHYA1 and PHYA2) in diploid cottons, the result of a Malvaceae-specific PHYA gene duplication that occurred
approximately 14 million years ago (MYA), before the divergence of the A- and D-genome ancestors. We identified a
single gene copy of PHYB, PHYC, and PHYE in diploid cottons. The allotetraploid genomes have largely retained the
complete gene complements inherited from both of the diploid genome ancestors, with at least four PHYA genes and
two genes encoding PHYB, PHYC and PHYE in the AD-genomes. We did not identify a PHYD gene in any cotton
genomes examined.
Conclusions: Detailed sequence analysis suggests that phytochrome genes retained after duplication by segmental
duplication and allopolyploidy appear to be evolving independently under a birth-and-death-process with strong
purifying selection. Our study provides a preliminary phytochrome gene inventory that is necessary and sufficient for
further characterization of the biological functions of each of the cotton phytochrome genes, and for the development
of 'candidate gene' markers that are potentially useful for cotton improvement via modern marker-assisted selection
strategies.
Background
Phytochromes are specialized photoreceptors that perceive and interpret light signals from the environment to
regulate virtually all aspects of plant development,
including seed germination, chloroplast development,
tropisms, shade avoidance responses, floral initiation, circadian rhythms, pigmentation, and senescence [1-3]. The
phytochromes have a primary role in sensing red (R) and
far-red (FR) light, and also play a role in the perception of
* Correspondence: apepper@bio.tamu.edu
2
Department of Biology, Texas A&M University, College Station, Texas 77843,
USA
blue (B) and ultraviolet (UV) light [4]. The active phytochrome molecule consists of a large (~110 kDa) apoprotein bound to a phycobilin chromophore [5,6]. The
phytochrome apoproteins are encoded by a small gene
family in all plant taxonomic divisions, including parasitic
plants, mosses, cryptogams, and green algae [7-13]. In
angiosperms, the phytochrome apoprotein genes have
been classified into four or five gene sub-families based
on sequence similarity to the five phytochrome genes of
Arabidopsis: PHYA, PHYB, PHYC, PHYD, and PHYE
[14,15]. All five Arabidopsis phytochromes share an
amino acid sequence similarity of 46-56%, with the
Full list of author information is available at the end of the article
© 2010 Abdurakhmonov et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
exception PHYB and PHYD--which are the result of
recent gene duplication and share ~80% amino acid identity [14,16]. Thus, the five Arabidopsis genes are often
assigned to four subfamilies: PHYA, PHYB/D, PHYC, and
PHYE [17]. The Arabidopsis PHYB/D subfamily is more
closely related to PHYE gene (~55% nt identity) than to
the PHYA and PHYC genes (~47% nt identity), which
together form a separate ancient evolutionary clade
[13,14].
Having presumably arisen by gene duplication and subsequent subfunctionalization and/or neofunctionalization, the phytochrome gene family in toto performs a
complex network of redundant, partially redundant, nonoverlapping, and in some cases antagonistic regulatory
functions throughout plant development [18-35]. For
example, all Arabidopsis phytochromes play diverse and
interacting roles in photoperiodic regulation of floral initiation. PHYA, PHYB, PHYD and PHYE act partially
redundantly in the light-dependent entrainment of the
circadian clock [35,36], which in turn regulates transcription of the floral inducer CONSTANS (CO) in a circadian
manner [37]. In Arabidopsis, PHYA, in conjunction with
blue-light dependent cryptochrome photoreceptors
CRY1 and CRY2, promotes flowering by inhibiting the
degradation of CO protein, while PHYB acts antagonistically to stimulate CO degradation [38]. In addition,
PHYB, PHYD and PHYE act partially redundantly as
repressors of flowering that are dependent on R/FR ratio
[19,28,30,39]. In this role, PHYB also acts downstream of
CO as a negative regulator of transcription of the 'florigen' molecule FT (the target of CO) in a tissue specific
manner [40]. Mutant analyses indicate that PHYC also
plays a role in photoperiodic flowering [31,41]. Further,
genetic variation at the PHYC locus underlies some of the
natural phenotypic variation in flowering time in Arabidopsis [42,43].
In angiosperms, the composition of phytochrome gene
family varies significantly among taxonomic lineages.
Although a single PHYA gene is present in most flowering plants, some plant families, such as carnation (Carryophyllaceae) and legumes (Fabaceae), have two distinct
PHYA genes [10]. Similarly, several plant lineages have
gained multiple PHYB-like genes through independent
gene duplications of PHYB [10,14,16,44-47]. For example,
tomato has two PHYB genes (designated PHYB1 and
PHYB2) that are not directly orthologous to Arabidopsis
PHYB and PHYD, respectively [44]. While most angiosperms have a single PHYC gene, species in some families
such as Fabaceae and Salicaceae appear to have lost
PHYC during evolution [10,47]. Although a single PHYElike gene is present in most flowering plants, PHYE is
completely absent in poplar (Salicaceae), in the Piperales,
and some monocots such as maize [10,47]. Finally, the
Page 2 of 18
novel PHYF subfamily, which groups with PHYA/C clade,
has been identified in tomato [44].
Little is known about the composition of the phytochrome gene family in cultivated cottons or their wild relatives (Gossypium spp.) in the Malvaceae family. This is
despite the fact that physiological experiments suggest
that phytochromes regulate economically important
aspects of cotton development, including drought resistance, seed germination, plant architecture, photoperiodic flowering, and fiber elongation [48-51]. For example,
R/FR photon ratio influences the length and diameter of
developing seed fiber; fibers exposed to a high R/FR photon ratio during development were longer than those that
received lower R/FR ratio, implicating the involvement of
a phytochrome [50,51].
While modern domesticated varieties of the major cultivated cottons G. hirsutum L. and G. barbadense L.
exhibit photoperiod independent flowering, wild and
'primitive' accessions of G. hirsutum and G. barbadense
flower under short-day photoperiodic control [52,53]. An
understanding of the molecular-genetic basis of differences in photoperiodic flowering in cottons will accelerate strategies for improvement of cultivated varieties
through the introgression of valuable genetic traits from
wild germplasm [52,53]. In this regard, it is important to
note that mutational changes in phytochrome function
have been implicated in the loss of photoperiod sensitivity in several major crops including sorghum, barley, rice,
and soy [54-57].
A thorough characterization of the phytochrome gene
family in cotton species is necessary for understanding
the molecular basis of photoperiodic flowering, the influences of light quality on cotton fiber elongation, and
other aspects of cotton development. Any inventory of
phytochrome genes of cottons is complicated by the fact
that the major cultivated species, G. hirsutum and G. barbadense are allotetraploids. Diploid species in the genus
Gossypium are categorized into eight genome groups
(designated A through G, and K) based on cytogenetic
and phylogenetic criteria [58-62]. The old-world A
genome group and the new world D genome group
diverged from each other on the order of 1-7 MYA [61],
then underwent hybridization and polyploidization creating an AD allopolyploid lineage ancestral to G. hirsutum (designated AD1) and G. barbadense (designated
AD2) on the order of 1 MYA [62,63].
In this study, we utilized a PCR-based approach with
low-degeneracy primers to obtain gene fragments, or
'genome sequence tags' (GSTs) that yield an initial
description of the composition and evolution of the phytochrome gene family in the New World allotetraploid
cottons Gossypium hirsutum and G. barbadense, and in
the Old-World diploids G herbaceum L. and G. raimondii
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Ulbr., which are considered to be extant relatives of the
A- and D-genome diploid ancestors (respectively) of the
allotetraploid lineage. This study provides a necessary
foundation for studies of the specific biological functions
of each of the phytochrome genes in cotton species, and
helps to illuminate the evolutionary patterns of duplicated genes in complex genomes, as well as the evolutionary history of the world's most important fiber crop
species.
Results
Because our results were derived from PCR, our inventory of the phytochrome gene family in Gossypium spp. is
provisional. All sequences have been submitted to GenBank (accession numbers HM143735-HM143763).
Phytochrome hinge amplification using 'universal' primers
Between N-terminal 'photoperception domain' and Cterminal 'signaling domain' of the phytochrome apoprotein is a short 'hinge region' (Figure 1) that shows relatively high sequence variation, and has proven useful for
characterization of the phytochrome gene complement in
a variety of plant species, and for robust phylogenetic
analyses [10]. To amplify the hinge region of all cotton
phytochromes, we used an alignment of eudicot phytochrome sequences to design a 768-fold degenerate PCR
primer (designated PHYdeg-F) based on the conserved
HYPATDIP peptide in the N-terminal domain, and a
16,384-fold degenerate PCR primer (designated PHYdegR), based on the conserved PFPLRYAC peptide in the Cterminal domain (Table 1).
Amplification across the hinge region using Taq DNA
polymerase yielded PCR products from all taxa. We
cloned the amplification products from each taxon into
an E. coli vector, then sequenced ~40 clones for each
taxon. For all taxa, a majority (>60%) of clones showed
the highest similarity in BLAST searches to Arabidopsis
PHYE (E value ~ 1e-40). For each taxon, only a minority
of clones showed high-scoring similarity to Arabidopsis
PHYA or PHYB. This apparently skewed distribution of
amplification products -- observed across all taxa -- suggested an amplification bias in favor of PHYE amplicons.
No clones were obtained from any taxon that had highscoring similarity to Arabidopsis PHYC or PHYD. No
new phytochrome sub-families were observed.
Amplification of the PHYA gene sub-family
Because of possible biased amplification, we designed
new less-degenerate hinge-region primer sets for the
PHYA, PHYB/D, and PHYC sub-families (Table 1) using
available phytochrome sequences from species in the
rosid clade, which includes both cotton and Arabidopsis
[64,65].
The hinge regions of PHYA genes were amplified using
PHYABnondeg-F and PHYAdeg-R (Table 1), yielding a
Page 3 of 18
~360 bp amplification product from all accessions. In
BLAST database searches, all clones had a high-scoring
pair relationship with Arabidopsis PHYA (E value ~ 2e63). Sequences from a total of more than 200 clones across
all taxa yielded two distinct consensus contigs from each
of the diploids G. herbaceum and G. raimondii, and four
distinct contigs from the allotetraploids G. barbadense
and G. hirsutum. When aligned across all taxa, these contigs yielded a 315 bp consensus alignment that had an
average pairwise sequence similarity of 94.6%, with 282
sites (89.5%) identical across all taxa, and no stop codons
or indels in any taxa. Distance analysis (Figure 2) showed
two well-separated gene sub-clades (100% bootstrap support). These sub-clades were designated PHYA1 and
PHYA2. The level of hinge-region differentiation between
these two sub-clades was far greater than that seen in
other cotton phytochrome gene sub-families (discussed
below), with an uncorrected "p" distance of 0.086, corresponding to 28 nt changes (9%) based on parsimony.
These data indicated that a single PHYA gene underwent duplication after the divergence of the cotton and
Arabidopsis lineages, but prior to the divergence of Agenome and D-genome lineages, leaving each of the modern diploids in our study (and presumably the ancestors
to the AD allotetraploids) with a complement of two
PHYA paralogs (PHYA-1 and PHYA-2). Indeed, four distinct contigs were observed in both the inbred G. hirsutum cultivar TM-1 and in the doubled-haploid line G.
barbadense 3-79. For each allotetraploid taxon, two contigs fell into each of the PHYA-1 and PHYA-2 clades (Figure 2). A conservative inventory of available EST
sequences indicated that at least two distinct PHYA loci
are expressed in G. hirsutum (Additional file 1).
Within each of the PHYA1 and PHYA2 clades, the level
of nucleotide diversity was very low, with at most four
parsimonious nucleotide changes separating each contig.
However, within the PHYA1 clade, the contigs resolved
into two subclades (74% bootstrap support) that each
contained a single contig from one of the diploid taxa and
one contig from each of the allotetraploids. For example,
G. raimondii (D-genome) PHYA1 grouped in a single
contig from each of G. hirsutum and G. barbadense.
Based on this grouping, the latter contigs were assigned
the provisional designation of PHYA1.D. Similarly, G.
herbaceum (A-genome) grouped with G. hirsutum
PHYA1.A and G. barbadense PHYA1.A. Based on similar
criteria, the PHYA2 clade was also divided into PHYA2.A
and PHYA2.D subclades (90% bootstrap support). The
phylogenetic resolution of A- and D-genome subclades
supported the hypothesis that each of the A- and Dgenome diploids contributed both PHYA1 and PHYA2 to
the allotetraploid lineage. Thus, although hinge-region
nucleotide diversity within each of the PHYA1 and
PHYA2 clades was low, it was sufficient to resolve a tentative PHYA gene complement for each taxon, as well as the
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 4 of 18
Table 1: Primers used to amplify cotton phytochrome gene family.
Primer name
Sequence 5' to 3'
Fold-degeneracy
PHYdeg-F
CAYTAYYCIGCIACIGAYATHCC
PHYdeg-R
CRCAIGCRTAICKARIGGRWAIGG
PHYABnondeg-F
GCATTATCCTGCTACTACTGATATT
0
PHYAdeg-R
CAWGCATACCTWAGMGGRAAI
64
PHYBdeg-R
AACAACIAIICCCCAIAGCCTCAT
64
1010-F
GTTYTTGTTTAAGCARAACCG
4
1910-R
GAGTCWCKCAGAATAAGC
4
1910-F
AGCTTATTCTGMGWGACTC
4
2848-R
TAACCCKCTTRTTTGCAGTCA
2
PHYC-1R-DFCI
GGTCCGCCTGATTGAGACTGC
0
768
16,384
I corresponds to inosine. R, Y, M, K, S, W correspond to the IUPAC-IUB ambiguity set.
pattern of gene inheritance through the allopolyploidization event.
Amplification of the PHYB/D gene sub-family
A ~320 bp fragment from the PHYB/D hinge region was
obtained by amplification using primers PHYABnondegF and PHYBdeg-R (Table 1). Sequences from a total of 80
clones yielded a single consensus contig from each of the
diploid cottons G. herbaceum and G. raimondii, and from
the allotetraploid G. hirsutum. Two distinct contigs were
assembled from clones derived from the allotetraploid G.
barbadense. These clone sequences shared ~85% nucleotide identity with the Arabidopsis PHYB gene and ~78%
nt identity with Arabidopsis PHYD. All clones had a highscoring pair relationship with the Arabidopsis PHYB gene
(E value ~ 1e-71) as well as significant similarity to the
Arabidopsis PHYD gene (E value ~ 3e-55). Consensus
sequences were aligned across all taxa, yielding a 319 bp
alignment with an average pairwise sequence similarity of
99.8%, with 317 sites (99.4%) identical across all taxa, no
stop codons and no indels. Although these data indicated
the presence of at least one PHYB gene in each of the Aand D-genome diploid plants and in G. hirsutum, and at
least two genes PHYB genes in the G. barbadense, the low
level of nucleotide differentiation observed within the
hinge region yielded insufficient phylogenetic information to characterize the PHYB gene complement in any of
the study taxa.
To obtain better resolution of the PHYB gene complement, additional low degeneracy primers 1010-F, 1910-F,
1910-R, and 2848-R (Table 1) were used along with
primer PHYABnondeg-F to create a 2.1 kb long series of
overlapping amplicons corresponding to approximately
1.8 kb of the Arabidopsis PHYB gene and extending from
the hinge, through the first intron and into the second
exon (Figure 1). After amplification, cloning and sequencing, the amplicons were assembled for each taxon. In all
Gossypium taxa examined, the first intron was ~300 bp
longer than the first intron of PHYB from Arabidopsis.
Unlike the other phytochrome amplicons, we detected
a high frequency of putative PCR-mediated recombination events [66] within the PHYB2.1 kb fragment from
amplifications using G. barbadense as template. The
recombination detection algorithm RDP3 [67] identified
a number of clones resulting from apparent recombination between the A-genome and D-genome derived
homeologous sequences, with predicted breakpoints (P =
0) between nucleotides 1000 and 1700 of the alignment.
After omission of these recombinant clones, composite
amplicon sequences from each taxon were aligned, creating a consensus alignment of 2,061 bp with 98.8% average
pairwise similarity and 2,007 identical sites (97.4%).
Overall, the cotton PHYB genes shared 65% nucleotide
identity with the Arabidopsis PHYB ortholog. No stop
codons or indels were detected in exon sequences. A 2 bp
putative deletion was observed in one contig (designated
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 5 of 18
PHYA
Hinge
PHYB
Hinge
PHYB 2.1 Kb
PHYC
Exon
Coding I Non-coding
(Hinge)
PHYC 1.0 Kb
Amplicon
PHYE
1 kb
Hinge
Figure 1 Gene diagrams and PCR amplicons used in this study. Model gene structures are derived from Arabidopsis thaliana annotations
(At1g09570, At2g18790. At5g35840, At4g18130). The PHYB 2.1 kb sequenced fragment represents a composite of several overlapping amplicons.
PHYB.D) from G. hirsutum. In addition, a 1 bp indel was
polymorphic between the PHYB.A and PHYB.D clades.
Finally, PHYB of G. raimondii had an additional 1 bp
insertion. All indel polymorphisms were located within
first introns.
Detailed phylogenetic analyses of the 2,061 bp contigs
from A-, D-, and AD-genome cottons (Figure 3) indicated
the presence of least one PHYB locus in the two diploid
cottons, G. herbaceum and G. raimondii, and at least two
PHYB loci in both allotetraploid cottons. The G. hirsutum
and G. barbadense sequence contigs each grouped into
two sub-clades (tentatively designated PHYB.A and
PHYB.D). The single PHYB contig from G. herbaceum
was used to define the PHYB.A cluster (99% bootstrap
support), while the single PHYB contig from G. raimondii
anchored the PHYB.D cluster. From these results, we
concluded that PHYB.A and PHYB.D, which shared ~98%
nucleotide sequence identity, arose as orthologs at the
time of divergence of the A- and D-genome diploid lineages. We surmised that PHYB.A was contributed to the
allotetraploids via the A-genome ancestor and PHYB.D
was contributed via the D-genome ancestor. Available
EST sequences indicated that at least one PHYB locus is
expressed in G. hirsutum (Additional file 1).
Amplification from the PHYC gene sub-family
Several sets of degenerate primer pairs that were
designed on the basis of the conserved HYPATDIP and
PFPLRYAC regions -- including several designed from
rosid PHYC nucleotide sequences -- failed to produce
detectable PCR amplification products from the Gossypium species tested (data not shown). However, the identification of a small EST clone (GenBank CO121409) with
similarity to Arabidopsis PHYC (E value = 7e-119) in a
library from G. raimondii floral tissue [68], allowed us to
design the primer PHYC_1R_DFCI within the C-terminal
domain (Table 1). When used in combination with PHYdeg-F, this primer amplified a ~1 kb fragment composed
entirely of coding sequence from the first exon of PHYC,
including the hinge (Figure 1). All clones obtained using
this primer pair had a high-scoring similarity to Arabidopsis PHYC (E value ~ 1e-172). From these clones, we
assembled a single consensus contig from each of the diploid species G. herbaceum and G. raimondii, and two distinct consensus contigs from each of the allotetraploids
G. hirsutum and G. barbadense. Consensus sequences for
each of the putative PHYC contigs were aligned across all
taxa, yielding a 1,022 bp alignment with an average pairwise sequence similarity of 99.1%, 1,002 sites (98.0%)
identical across all taxa, with no indels or stop codons in
any taxa.
In phylogenetic analyses (Figure 4), the PHYC consensus sequences grouped into two major clades (100% bootstrap support). One of these clades contained the G.
herbaceum contig and one contig from each of G. hirsutum and G. barbadense. This clade was designated
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
PHYA2
Page 6 of 18
G.raimondii PHYA2
0.005 / 2
G.barbadense PHYA2.A
G.hirsutum PHYA2.A
G.herbaceum PHYA2
G.hirsutum PHYA2.D
G.barbadensePHYA2.D
0.006 / 2
90
0.005 substitutions/site
100
0.086 / 28
PHYA1
G.herbaceum PHYA1
G.hirsutum PHYA1.A
0.002 / 1
G.barbadense PHYA1.A
74 0.003 / 1
0.003 / 1
G.hirsutum PHYA1.D 80
G.barbadense PHYA1.D
0.004 / 1
G.raimondii PHYA1
Figure 2 Unrooted NJ tree of Gossypium spp. PHYA-related genes based on a ~315 bp consensus alignment of amplification products from
the hinge region. Distances (uncorrected "p") and most parsimonious number of nt changes are indicated for each branch (to the left and to the
right of the/, respectively). Branch lengths of less than 0.001 substitutions per site are not shown. Bootstrap support (500 replicates) is indicated where
>50%.
PHYC.A. The other clade, designated PHYC.D, included
the G. raimondii contig along with the other of the two
contigs from each of G. hirsutum and G. barbadense.
These data indicated that both the A- and D-genome
ancestors had one PHYC gene, and that upon hybridization and polyploidization, this gene was contributed from
each diploid to the allotetraploid ancestor of G. hirsutum
and G. barbadense.
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 7 of 18
G. barbadense PHYB.A
0.002 / 5
G. hirsutum PHYB.A
1
2
G.herbaceum PHYB
0.001 substitutions/site
100 0.012 / 24
0.003 / 6
99
0.004 / 9
G. raimondii PHYB
2
G. barbadense PHYB.D
2
G. hirsutum PHYB.D
Figure 3 Unrooted NJ tree of Gossypium spp. PHYB-related genes based on the consensus alignment of the ~2.1 kb merged amplicons. Distances (uncorrected "p") and most parsimonious number of nucleotide changes are indicated for each branch (to the left and to the right of the/,
respectively). Branch lengths of less than 0.001 substitutions per site are not shown. Bootstrap support (500 replicates) is indicated where >50%.
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 8 of 18
G. hirsutum PHYC.A
0.004 / 4
G. barbadense PHYC.A
0.002 / 1
0.001 / 1
58
0.001 / 1 G. herbaceum PHYC
0.005 substitutions/site
100
0.001 / 1
G. raimondii PHYC
G. barbadense PHYC.D
91
0.008 / 8
0.001 / 1
G. hirsutum PHYC.D
Figure 4 Unrooted NJ tree of Gossypium spp. PHYC-related genes based on a 1022 bp consensus alignment of amplification products from
primers PHYdeg-F and PHYC_1R_DFCI. Distances (uncorrected "p") and most parsimonious number of nucleotide changes are indicated for each
branch (to the left and to the right of the/, respectively). Bootstrap support (500 replicates) is indicated where >50%.
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
For comparison with the other phytochromes, we also
analyzed a portion of the PHYC alignment corresponding
to the hinge region only. This alignment was 296 nucleotide pairs in length, with pairwise sequence similarity of
99.0%, 290 sites (98.0%) identical across all taxa, with no
indels. Although it encompassed fewer variable nucleotides, NJ analysis of the hinge region alone could be used
to differentiate the PHYC.A and PHYC.D clades (100%
bootstrap support) and to infer the composition and evolutionary inheritance of the PHYC gene family in cottons
(data not shown).
Our failure to obtain PHYC hinge amplification with
several sets of both universal (e.g. PHYdeg-F/PHYdeg-R)
and rosid specific primers was entirely due to substantial
nucleotide differentiation in PHYC, particularly within
the hinge region. For example, the 24 nt long PHYdeg-R
primer had six nucleotide mismatches with the cotton
PHYC genes, including three transitions and three transversions. Five of the six mismatches occurred at what are
considered to be invariant (e.g. non-degenerate) nucleotide positions. It should be noted that these divergent
nucleotides in the conserved primer-binding site did not
alter the amino acid sequence (PFPLRYAC).
The PHYE gene sub-family
PHYE hinge region consensus contigs from our study
taxa formed a 270 bp alignment with an average pairwise
similarity of 98.9%, with 264 (97.8%) invariant sites, no
indels, and no stop codons in any taxa. The consensus of
the aligned PHYE sequences had 80% nucleotide similarity to the corresponding fragment of the Arabidopsis
PHYE gene. Based on maximum parsimony, nucleotide
diversity in the cotton PHYE hinge sequences could be
explained by a minimum of six nucleotide changes, all of
which were synonymous. NJ analysis of the cotton PHYE
hinge region showed two distinct clades (97% bootstrap
support) corresponding to the A- and D-genome derived
orthologs (designated PHYE.A and PHYE.D), a finding
consistent with a hypothesis in which each diploid ancestor contributed a single PHYE ortholog to the allotetraploid lineage (Figure 5). Interestingly, while two distinct
PHYE contigs were obtained from G. hirsutum, only a
single contig, which grouped with the D-genome clade,
was obtained from G. barbadense. Available EST
sequences indicated that at least one PHYE locus is
expressed in G. hirsutum (Additional file 1).
A global hinge-based alignment of Arabidopsis and cotton
phytochromes
PHYA, PHYB, PHYC and PHYE hinge regions from Arabidopsis and Gossypium spp. were aligned to create a
global phytochrome alignment 358 nucleotides in length,
with an average pairwise similarity of 69.4% and 123 identical sites (34.4%). The gene phylogeny generated from
Page 9 of 18
this alignment (Figure 6) reflected divergence of PHYA,
PHYB, PHYC and PHYE as a result of speciation (nodes
1A, 1B, 1C and 1E, respectively) and gene duplication
(nodes 2 and 3). The level of nucleotide divergence of
each of the gene sub-families after nodes 1A, 1B, 1C and
1E (Kimura 2-parameter distances) was similar, with a
mean of 0.297 ± 0.21 nucleotide substitutions per site.
However, the synonymous (KS) and non-synonymous
(KA) substitution rates were both significantly more variable among the various gene sub-families defined by
nodes 1A, 1B, 1C and 1D than were simple nucleotide
distances (Table 2). Despite this variation, all sub-families
showed a KA/KS ratio <0.1, implying that each remains
under purifying selection for function. Further, excessively long branch-lengths, which are often found in
pseudogenes, were not observed. In the PHYB, PHYC and
PHYE clades, the branch lengths leading to the Arabidopsis orthologs, which have known biological functions,
were longer than the branches leading to their respective
cotton orthologs. Considered together, these lines of evidence indicate that each of the phytochrome sub-families
retains some biological function in Gossypium, as they do
in Arabidopsis [14-16,18-31]. Further, our topology supports the conclusion that PHYD is the result of a relatively
recent gene duplication that may be exclusive to the Brassicaceae family [16].
Discussion
Resolution of the phytochrome gene family
In three out of four cases, we were able to successfully
resolve the inventory and evolutionary relationships of
the phytochrome genes in diploid and allotetraploid cottons using the hinge region only. This finding supports
the general utility of employing the hinge region for identifying GSTs for phytochromes. In only one case (PHYB)
was additional gene sequence required for sufficient phylogenetic resolution. In another case (PHYC), nucleotide
divergence at a commonly used primer-binding site prevented the characterization of the hinge region by the
typical strategy of using primers based on conserved
flanking peptides HYPATDIP and PFPLRYAC. However,
nucleotide diversity within the PHYC hinge region itself
was sufficiently informative to resolve the pattern of evolutionary inheritance through allotetraploidization event.
The sequencing of phytochrome gene fragments from
A- and D-genome diploids, as well as from AD allotetraploid taxa, provides an essential foundation for all subsequent analysis of phytochrome function and evolution in
Gossypium. The sequenced fragments provide sufficient
information (at least two diagnostic nucleotide characters) to unequivocally identify or 'tag' various orthologs,
homeologs and paralogs, as well as monitor their patterns
of nucleotide divergence, and trace their evolutionary
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 10 of 18
G.raimondii PHYE
0.004 / 1
G.barbadense PHYE.D
0.004 / 1
G.hirsutum PHYE.D
0.011 / 3
97
0.005 substitutions/site
G.hirsutum PHYE.A
0.004 / 1
G. herbaceum PHYE
Figure 5 Unrooted NJ tree of Gossypium spp. PHYE-related genes based on a 270 bp consensus alignment of amplification products from
the hinge region. Distances (uncorrected "p") and most parsimonious number of nucleotide changes are indicated for each branch. Branch lengths
of less than 0.001 substitutions per site are not indicated. Bootstrap support (500 replicates) is indicated where >50%.
inheritance through the allopolyploidization event. This
information will serve as a foundation for further
sequence assembly and annotation, and will be used to
design locus-specific primer sets for quantitative RT-PCR
assays that will measure transcript levels for each gene
family member. In some cases (e.g. PHYA1 vs. PHYA2)
levels of sequence divergence are high enough to support
studies of gene function using RNAi or amiRNA
approaches to create gene-specific knockouts [69]. The
use of well characterized 'candidate genes' of agronomic
interest is becoming an integral component of markerassisted selection efforts in plants [70]. Several SNPbased molecular markers [71,72] are now being developed using the diagnostic nucleotide characters identi-
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 11 of 18
Table 2: Nucleotide divergence in phytochrome genes in comparisons of Arabidopsis and cotton
K-2P
S Dif
Ks
NS Dif
KA
Ka/Ks
Node 1A
0.291
46.5
1.82
27.4
0.123
0.068
Node 1B
0.296
36.5
1.00
17.5
0.086
0.090
Node 1C
0.274
41.0
1.55
30.3
0.147
0.095
Node 1E
0.326
49.0
>2.0
23.0
0.122
<0.061
Node 3
0.094
17.0
0.309
11.8
0.050
0.163
Nodes refer to the NJ tree in Figure 6. K-2P indicates the mean Kimura 2-parameter distances between Arabidopsis and cotton gene
sequences.
fied in this study, and are being mapped in experimental
cotton populations that show segregation of phytochrome-controlled traits such as fiber length and flowering time.
The ancestral phytochrome gene complement of the
Malvales and Brassicales
Our study indicated that the diploid ancestors to the
world's major fiber crops (G. hirsutum and G. barbadense) had a complement of phytochrome apoprotein
genes that was very similar to that of the model plant
Arabidopsis thaliana. This was not entirely unexpected
given the relatively close phylogenetic relationship of the
two lineages [64,65]. The most-simple evolutionary scenario is that the last common ancestor of Arabidopsis and
cotton, possibly an arborescent species in the late Cretaceous period [65], had a phytochrome gene complement
consisting of one functional gene in each of the PHYA,
PHYB/D, PHYC and PHYE subfamilies.
PHYA duplication in Gossypium
After the divergence of the Malvales and Brassicales, the
ancestral PHYA gene underwent duplication resulting in
the observed PHYA-1 and PHYA-2 paralogs of modern
Gossypium spp. As the A- and D-genome diploids have
both paralogs, the duplication event occurred prior to the
divergence of the A- and D-genome lineages. Using 85
MYA (range 68 MYA to 96 MYA) as a rough estimate of
the time of divergence of the Malvales and Brassicales
[64,73], along with our observed Ks of 1.82 in the PHYA
hinge region in this time interval, we can derive a crude
estimate of 0.011 substitutions/synonymous-site/million
years, and an estimate of the time of PHYA duplication of
~14 MYA. This estimate places the duplication well
within the crown group of Malvales and the Malvaceae
family [65]. Given our time estimate, the PHYA duplication may be exclusive to the genus Gossypium, but would
have occurred prior to the estimated time of divergence
of the A and D genome groups [62]. As neither we nor
others [58,62,74] have observed evidence of additional
nuclear gene duplications or chromosomal duplications
in this time period, the PHYA event was likely a tandem
or segmental duplication, rather than whole genome
duplication.
After a gene duplication event, one of the two newly
duplicated genes is theoretically unconstrained by selection for function, and is thus free to accumulate mutations leading to a pseudogene fate, subfunctionalization,
or neofunctionalization [75-80]. Although we did not
obtain definitive evidence of pseudogenic sequences in
any of the phytochromes or taxa studied (e.g. no stop
codons or frameshift mutations), we did observe significant variation in KA/Ks ratios in pairwise interspecific
comparisons (discussed below), leaving open the possibility of pseudogene outcomes. Alternatively, one of the
duplicated genes may undergo positive selection to gain a
novel function (neofunctionalization). Further, duplicated gene-pairs may subdivide the function of ancestral
gene (subfunctionalization). Perhaps the most intriguing
fate, which has been observed empirically, but not yet
explained in theory, is the situation in which both gene
copies may be retained for a lengthy period under what
appears to be purifying or negative selection [79,80]. One
approach to understanding the evolutionary fates of
duplicated genes is through an analysis of the signature of
natural selection on amino acid encoding sequences.
Although the hinge regions of phytochromes display
relatively high levels of nucleotide diversity [81], they do
not evolve under neutrality. The hinge region participates
in inter-domain communication in phytochrome molecules [82]. For example, phosphorylation of a serine residue in the PHYA hinge plays a likely role in regulating
protein-protein interactions between phytochrome and
downstream signal-transducing molecules [83]. Com-
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 12 of 18
Gossypium PHYA1
Gossypium PHYA2
0.052
0.042
3
AtPHYA
0.143
0.096
AtPHYC
1A
0.153
0.135
0.120
AtPHYD
1C
0.121
0.122
0.074
2
0.087
AtPHYB
0.062
Gossypium PHYC
0.054
1B
0.039
1E
0.118
0.112
Gossypium PHYB
0.1 substitutions/site
Gossypium PHYE
0.208
AtPHYE
Figure 6 Unrooted NJ tree of phytochrome genes of A. thaliana and cottons (Gossypium spp.) based on a 358 bp consensus alignment of
amplification products from the hinge regions. Kimura two-parameter distances are shown for each branch. All internal branches had 100% bootstrap support (500 replicates). 1A, 1B, 1C and 1E denote gene divergence events likely resulting from speciation. 2 and 3 denote gene divergence
events likely resulting from gene duplication. All Gossypium phytochrome genes are included within clusters (indicated by ovals).
pared to wild-type, a mutation in the hinge region of Arabidopsis PHYB is deficient in localization into distinct
nuclear bodies [84]. Further, a single nucleotide polymorphism (SNP) in the hinge of one of two PHYB genes in
Aspen (Populus tremula, Salicaceae) was associated with
natural geographic variation in the timing of bud-set [85].
In comparisons between cotton and Arabidopsis (Table
2), the KA/Ks ratio for the PHYA hinge region was 0.068 -a value that is typical for genes under purifying selection
[86]. In contrast, the KA/Ks ratio for PHYA after gene
duplication (node 3) was 0.163, or ~2.4-fold higher. This
value is also ~2.1-fold greater than the mean KA/Ks ratio
of all phytochrome hinge regions (corresponding to
nodes 1A, 1B, 1C, and 1D in figure 6) of approximately
0.079 ± 0.014. This significantly elevated KA/Ks ratio after
the PHYA duplication could be attributed to a relaxation
of stabilizing selection and/or subfunctionalization of the
nascent PHYA paralogs (these two alternative possibilities are remarkably difficult to distinguish on the basis of
sequence information alone).
The possible functional divergence of PHYA1 and
PHYA2 may be more pronounced after the separation of
the A- and D-genome lineages (Table 3). A comparison of
PHYA2 in the two diploids yields a KA/Ks ratio of ~8.2,
primarily due to amino acid substitutions in PHYA2.D,
while PHYA1 has a KA/Ks ratio of 0.000 in the same taxonomic comparisons. Although this difference is suggestive of possible differential rates of functional evolution in
Abdurakhmonov et al. BMC Plant Biology 2010, 10:119
http://www.biomedcentral.com/1471-2229/10/119
Page 13 of 18
the paralogs, it is not statistically significant in Fisher's
exact test (P = 0.2485). It will be of interest to determine
whether the cotton PHYA paralogs have distinct functions. Experiments are underway to determine the
respective biological functions of each PHYA-1 and
PHYA-2 in G. hirsutum and G. barbadense using paralogspecific RT-PCR, RNAi gene knockout, and tests for
genetic associations between phytochrome-controlled
phenotypic traits and PHYA-1 and PHYA-2 specific
molecular markers. A 'candidate gene' approach has
recently been used in soy (Glycine max) to uncover a
genetic linkage between the photoperiod insensitivity
locus E4 and one of the two the PHYA genes, designated
GmphyA1 and GmphyA2 [57]. Loss of photoperiodic
the major deleterious phenotypic effects that would have
been caused by complete deficiency of PHYA gene function.
Persistence and loss of phytochrome paralogs after
allopolyploidization
All phytochromes underwent gene duplication by polyploidization at the time of formation of the AD allotetraploids, on the order of 0.5-2.0 MYA [59,61,63,87]. For
example, in G. hirsutum, we detected a minimum set of
ten distinct phytochrome genes, including four PHYA
genes. In order to assess the evolutionary trajectory of
these recently duplicated genes, we examined the synonymous and non-synonymous divergence rates of A- and D-
Table 3: Nucleotide divergence in phytochrome genes in comparisons of A- and D-genome derived homeologs in diploid
and allotetraploid cottons.
Sequence
Comparison
PHYA1 Hinge
Ks
2.25