-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathalagna_09_comparative_798668.pdf.txt
1558 lines (1194 loc) · 54 KB
/
alagna_09_comparative_798668.pdf.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>1471-2164-10-399.fm</title>
<meta name="Author" content="hjgy"/>
<meta name="Creator" content="FrameMaker 7.1"/>
<meta name="Producer" content="Acrobat Distiller 7.0 (Windows)"/>
<meta name="CreationDate" content=""/>
</head>
<body>
<pre>
BMC Genomics
BioMed Central
Open Access
Research article
Comparative 454 pyrosequencing of transcripts from two olive
genotypes during fruit development
Fiammetta Alagna1, Nunzio D'Agostino2, Laura Torchia3, Maurizio Servili4,
Rosa Rao2, Marco Pietrella5, Giovanni Giuliano5, Maria Luisa Chiusano2,
Luciana Baldoni*1 and Gaetano Perrotta*3
Address: 1CNR – Institute of Plant Genetics, Via Madonna Alta 130, 06128 Perugia, Italy, 2Department of Soil, Plant, Environmental and Animal
Production Sciences, University of Naples 'Federico II', Via Università 100, 80055 Portici, Italy, 3ENEA, TRISAIA Research Center, S.S. 106 Ionica,
75026 Rotondella (Matera), Italy, 4Department of Economical and Food Science, University of Perugia, Via S. Costanzo, 06126 Perugia, Italy and
5ENEA, Research Center CASACCIA, S.M. Galeria 00163, Rome, Italy
Email: Fiammetta Alagna - fiammetta_a@hotmail.com; Nunzio D'Agostino - nunzio.dagostino@gmail.com;
Laura Torchia - laura.torchia@enea.it; Maurizio Servili - servimau@unipg.it; Rosa Rao - rosa.rao@unina.it;
Marco Pietrella - marco.pietrella@enea.it; Giovanni Giuliano - giovanni.giuliano@enea.it; Maria Luisa Chiusano - chiusano@unina.it;
Luciana Baldoni* - luciana.baldoni@igv.cnr.it; Gaetano Perrotta* - gaetano.perrotta@enea.it
* Corresponding authors
Published: 26 August 2009
BMC Genomics 2009, 10:399
doi:10.1186/1471-2164-10-399
Received: 15 April 2009
Accepted: 26 August 2009
This article is available from: http://www.biomedcentral.com/1471-2164/10/399
© 2009 Alagna et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Background: Despite its primary economic importance, genomic information on olive tree is still
lacking. 454 pyrosequencing was used to enrich the very few sequence data currently available for
the Olea europaea species and to identify genes involved in expression of fruit quality traits.
Results: Fruits of Coratina, a widely cultivated variety characterized by a very high phenolic
content, and Tendellone, an oleuropein-lacking natural variant, were used as starting material for
monitoring the transcriptome. Four different cDNA libraries were sequenced, respectively at the
beginning and at the end of drupe development. A total of 261,485 reads were obtained, for an
output of about 58 Mb. Raw sequence data were processed using a four step pipeline procedure
and data were stored in a relational database with a web interface.
Conclusion: Massively parallel sequencing of different fruit cDNA collections has provided large
scale information about the structure and putative function of gene transcripts accumulated during
fruit development. Comparative transcript profiling allowed the identification of differentially
expressed genes with potential relevance in regulating the fruit metabolism and phenolic content
during ripening.
Background
An improvement of our knowledge on gene composition
and expression is essential to investigate the molecular
basis of fruit ripening and to define the gene pool
involved in lipid and phenol metabolism in an oil crop
species as olive, characterized by a peculiar fatty acid and
antioxidant composition.
The availability of complete genome sequences and large
sets of expressed sequence tags (ESTs) from several plants
Page 1 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
has recently triggered the development of efficient and
informative methods for large-scale and genome-wide
analysis of genetic variation and gene expression patterns.
The ability to monitor simultaneously the expression of a
large set of genes is one of the most important objectives
of genome sequencing efforts. In this respect, the 454
pyrosequencing technology [1] is a rather novel method
for high-throughput DNA sequencing, allowing gene discovery and parallel efficient and quantitative analysis of
expression patterns in cells, tissues and organs.
In the past few years, several studies based on comparative
high throughput sequencing of plant transcriptomes
have, indeed, allowed the identification of new gene functions, contaminant sequences from other organisms,
alterations of gene expression in response to genotype, tissue or physiological changes, as well as large scale discovery of SNPs (Single Nucleotide Polymorphisms) in a
number of model and non model species, such us maize,
grapevine and eucalyptus [2-5].
Olive is the sixth most important oil crop in the world,
presently spreading from the Mediterranean region of origin to new production areas, due to the beneficial nutritional properties of olive oil and to its high economic
value.
It belongs to the family of Oleaceae, order of Lamiales,
which includes about 10 families for a total of about
11,000 species. Members of this order are important
sources of fragrances, essential oils and phenolics claiming for numerous health benefits, or providing valuable
commercial products, such as wood or ornamentals.
Information on the genome sequence and transcript profiles of the entire clade are completely lacking.
Olive is a diplod species (2n = 2x = 46), predominantly
allogamous, with a genome size of about 1,800 Mb [6,7].
In spite of its economical importance and metabolic peculiarities, very few data are available on gene sequences
controlling the main metabolic pathways.
Olive accumulates oil mainly in the drupe mesocarp and
its content can reach up to 28–30% of total mesocarp
fresh weight. Olive oil shows a peculiar acyl composition,
particularly enriched in the monounsaturated fatty acid
oleate (C18:1), deriving from the desaturation of stearate.
Oleate can reach percentages up to 75–80% of total fatty
acids, while linoleate (C18:2), palmitate (C16:0), stearate
(C18:0) and linolenate (C18:3) represent minor components. The final acyl composition of olive oil varies enormously among varieties. Environmental factors, such as
temperature and light during fruit ripening, can deeply
influence the balance between saturated and unsaturated
fatty acids [8].
http://www.biomedcentral.com/1471-2164/10/399
The chemistry of phenolic oleosides is attracting an
increasing interest of pharmacological research and agrifood biotechnology, and the biochemical pathway leading to their biosynthesis and regulation has been recently
deeply evaluated [9], even if the genetic control still
remains completely unknown.
Secoiridoids represent the most important class of phenolics and they arise from simple structures, like tyrosol and
hydroxytyrosol, to quantitatively more important conjugated forms like oleuropein, demethyloleuropein, 34DHPEA-EDA and ligstroside [10]. Oleuropein is the
main secoiridoid, representing up to the 82% of total
biophenols, known as the bitter principle of olives and
responsible for major effects on human health and for
releasing phytoalexins against plant pathogens [10].
Another secoiridoid with relevant health functions is oleocanthal (deacetoxy ligstroside aglycone) [11].
Developing olives contain active chloroplasts capable of
photosynthesis, thus representing significant sources of
photoassimilates. While chlorophyll is localized mostly
in the epicarp, the mesocarp contains significant amounts
of other photosynthetic pathway components, such as
phosphoenol pyruvate carboxylase [12].
Olive fruit development and ripening, takes place in
about 4–5 months and includes the following phases: i)
fruit set after fertilization, ii) seed development, iii) pit
hardening, iv) mesocarp development and v) ripening.
During the ripening process, fruit tissues undergo physiological and biochemical changes that include cell division
and expansion, oil accumulation, metabolite storage, softening, phenol degradation, colour change (due to
anthocyanin accumulation in outer mesocarp cells). Oil
synthesis starts after pit hardening, reaching a plateau
after 75–90 days, while the phenolic fraction is maximum
at fruit set and decreases rapidly along fruit development.
This work is aimed at defining the transcriptome of olive
drupes and at identifying ESTs involved in phenolic and
lipid metabolism during fruit development. Drupes from
two cultivars have been used: a widely cultivated variety
characterized by a very high phenolic content, and an
oleuropein-lacking natural variant; two developmental
stages, at completed fruit set and at mesocarp development, representing diverse sets of expressed genes, were
analyzed using 454 pyrosequencing.
Results
Sequencing output
The starting materials to explore the olive fruit transcriptome were fruit pools from two Olea europaea cultivars,
Coratina and Tendellone (C and T), showing striking differences in their biophenol accumulation pattern (Figure 1,
Page 2 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
Đǀ͘ KZ d/E
Đǀ͘ dE>>KE
;ŚŝŐŚ ƉŚĞŶŽů ĐŽŶƚĞŶƚͿ
;ůŽǁ ƉŚĞŶŽů ĐŽŶƚĞŶƚͿ
Four enriched full-length ds-cDNA collections (see Methods) were obtained and their 454 pyrosequencing provided a total of 261,485 sequence reads, corresponding to
58.08 Mb, with an average read length of 217–224 nt,
depending on the cDNA sample (Table 1). The 4-step procedure adopted by the ParPEST pipeline to process the
454 EST reads is summarised in Figure 3.
ϰϱ &
;ĐŽŵƉůĞƚĞĚ ĨƌƵŝƚ ƐĞƚͿ
ϭϯϱ &
;ŵĞƐŽĐĂƌƉ ĚĞǀĞůŽƉŵĞŶƚͿ
&͗ ĂLJƐ ĂĨƚĞƌ ĨůŽǁĞƌŝŶŐ
Figure 1
Plant material
Plant material. Fruit mesocarp and epicarp of cvs. Coratina
and Tendellone.
2). C is cultivated in the Apulia region and represents the
most widely cultivated variety of Italy, while T is a minor
cultivar locally spread in Central Italy. A previous SSR
analysis reported a very high genetic distance between
them [13]. These cultivars also differ markedly in their
oleuropein concentration (272.9 mg g-1 dw in C, decreasing strongly during fruit ripening, and 0.3 mg g-1 dw in T,
350
CORATINA
TENDELLONE
300
Polyphenols (mg g -1 DW)
decreasing only slightly during ripening). In contrast, the
content of the 3-4DHPEA-EDA intermediate compound
was similar between genotypes (data not shown).
250
200
150
100
50
ESTs were masked to eliminate sequence regions that
would cause incorrect clustering. Targets for masking
include simple sequence repeats (SSR, also referred to as
microsatellites), low complexity sequences (including
poly-A tails) and other DNA repeats. The number of ESTs
masked for each category, as well as the total nucleotides
that were masked, are shown in Table 2.
The most frequent DNA repeats, identified using RepBase
as the filtering database, were ribosomal RNA (both SSU
and LSU); LTR retrotransposons from the BEL type family,
Gypsy and Copia; non-LTR retrotransposons from the
CR1 superfamily and, finally, a batch of retro pseudogenes (CYCLO, L10, L31, L32) (data not shown).
In order to assess EST redundancy in the whole collection
and provide a survey of the Olea europaea drupe transcriptome, masked EST sequences were pair-wise compared
and grouped into clusters, based on shared sequence similarity. As a consequence, the obtained clusters are ESTs
which are most likely products of the same gene. Each
cluster was then assembled into one or more tentative
consensus sequences (TCs), which were derived from
multiple EST alignments. As described in Methods, TCs
within a cluster shared at least 90% identity within a window of 100 nucleotides. Therefore, the presence of multiple TCs in a cluster could be due to possible alternative
transcripts, to paralogy or to domain sharing. In addition,
all the ESTs that, during the clustering/assembling process, did not meet the match criteria to be clustered/assembled with any other EST in the collection, were defined as
singleton ESTs. The combination of TCs and singletons
are referred to as unique transcripts.
0
45
60
75
90
105
120
135
150
165
180
Days after flow ering
Figure 2 the course of fruit ripening
dellone in in polyphenol content between cv. Coratina and TenChanges
Changes in polyphenol content between cv. Coratina
and Tendellone in the course of fruit ripening. Arrows
indicate the dates of sample collection: at 45 and 135 DAF.
The total number of clusters generated was 22,904. They
were assembled into 26,563 TCs comprising 185,913 EST
reads. The TC length ranged between 102 (min) and
4,916 (max) nucleotides, while TC average length was 355
nucleotides. 2,406 were the clusters assembled into multiple TCs (ranging from 2 to 14). The total number of singleton ESTs (sESTs) was 75,570, with an average length of
179 nucleotides (Table 3, 4).
Page 3 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
Table 1: 454 sequencing raw data
SAMPLE
HQ READS
HQ BASES
AVERAGE LENGTH (bp)
Coratina 45 DAF
Coratina 135 DAF
Tendellone 45 DAF
Tendellone 135 DAF
51,659
61,488
71,112
77,226
11,215,346
13,769,788
15,963,353
17,127,266
217.10
223.94
224.48
221.78
261,485
58,075,754
222.82
HQ READS: Quality passed reads, HQ BASES: Quality passed bases.
The analysis of the full EST collection from this work
revealed an average GC-content of 42.5%, ranging from
less than 16% to more than 63%.
Database web interface
The OLEA EST database consists of a main relational database (MySQL) which collects raw as well as processed data
generated by ParPEST. This is supported by three local satellite databases: myENZYME, a local copy of the ENZYME
repository which was built by parsing the enzclass.txt and
the enzyme.dat files (release 04 Nov 2008) retrieved from
454 EST READS
RepBase
REPEAT MASKING
SIMPLE
REPEATS
LOW
COMPLEXITY
RepBase
TARGETS
MASKED ESTs
CLUSTERS
^^D>/E'
myKEGG
TCs
myENZYME
UniProtKB
&hEd/KE > EEKd d/KE
myGO
Figure 3
EST processing work-flow
EST processing work-flow.
BLAST REPORT
The web interface http://454reads.oleadb.it/ includes Java
tree-views for easy object navigation as well as the possibility to highlight on-the-fly the enzymes in the pathway
image files retrieved from the KEGG FTP site.
Functional annotation
In order to identify Olea unigenes coding for proteins with
a known function, we used a BlastX-based annotation that
provided 12,560 TCs with significant similarities to proteins in the UniProtKB database; the remaining 14,003
(52.7%) had no function assigned. A higher number of
sESTs with no function was obtained (58,835), representing 77.85% of the total.
When considering annotated TCs and sESTs with respect
to the origin of the protein data source, the bulk of the
identifications (73% – 75%), concerned proteins of plant
origin, as expected.
>h^dZ/E'
SINGLETONS
the ExPASy FTP site; myGO, a mirror of the Gene Ontology database, which was built by running the seqdblite
MySQL script (version 20081102) downloaded from the
GO database archives; myKEGG, which was built by parsing XML files of the KEGG pathways (release 21 October
2008), retrieved from the KEGG [14] FTP site. A PHPbased web application provides user-friendly data querying, browsing and visualization.
TCs and sESTs coding for enzymes with assigned EC
number were 5,040 and 5,864, respectively (Figure 4A)
following the ENZYME classification scheme http://
www.expasy.org/enzyme/. The majority of the enzymerelated unigenes encode for transferases (3,982), hydrolases (2,628) and oxidoreductases (1,895). Of particular
relevance for fruit metabolism are those TCs and sESTs
involved in the biosynthesis of secondary metabolites
(761) and lipids (1,005) (Figure 4B, C). Most frequently,
the same enzymatic function is redundantly encoded by
several unigenes, this may be the result of different proteins referenced with the same EC number or the effect of
different transcripts encoding specific enzyme subunits.
Given the limited sequence length typically provided by
454 pyrosequencing, it is also plausible that in some cases
Page 4 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
Table 2: Number of ESTs masked for each mask sequence category
Cultivar
Coratina
Tendellone
Developmental stage
Simple repeats
45 Days After Flowering
2,683 (97,992)
135 Days After Flowering
3,586 (129,807)
45 Days After Flowering
3,168 (113,243)
135 Days After Flowering
4,199 (146,670)
Low complexity
Match in RepBase
683 (22,961)
1,781 (29,8882)
5,147 (419,835)
779 (23,895)
2,959 (450,297)
7,324 (603,999)
814 (25,082)
2,663 (468,078)
6,645 (606,403)
894 (28,066)
3,176 (559,319)
8,269 (734,055)
*The total nucleotides masked is given in brackets
different TCs and sESTs cover non-matching fragments of
the enzyme transcript coding frame.
Changes in transcript abundance
In principle, the higher the number of ESTs assembled in
a specific TC, the higher the number of mRNA molecules
encoding that particular gene in a given tissue sample.
However, differences in transcript abundance may reflect
sampling errors rather than genuine differences in gene
expression. Hence, in order to identify differentially
expressed genes in the four sequenced fruit cDNA collections, the statistical R test [15] was applied, as a measure
of the extent to which the observed differences in the gene
transcription among samples reflect their actual heterogeneity. Applying this test and further filtering criteria to
select differentially expressed TCs among the four sets (see
Methods), we selected 2,942 differentially expressed TCs,
1,627 of them with a predicted annotation and 1,315 with
no similarity with other sequences in the public databases
[see Additional file 1].
Clustering of differentially expressed TCs distinguished
gene transcripts differentially expressed during fruit development from those differentially expressed between genotypes, evidencing that the former were more numerous
than the latter. C was the genotype showing the highest
expression differences between the two stages (Figure 5A).
This result was confirmed by PCA analysis, where transcripts from the second stage of C were significantly divergent from the remaining ones (Figure 5B).
Transcript differences affect several important physiological processes that promote fruit growth and development.
Transcripts identified as differentially expressed between
45 and 135 DAF in both genotypes and between genotypes, grouped in 13 categories on the basis of their predicted annotations, showed that different biological
processes are modulated at the molecular level by strin-
gent genetic and developmental signals (Figure 6). Transcripts involved in photosynthesis, in biosynthesis of
structural proteins (histones, aquaporins, ribosomal proteins, tubulins, pollen allergens), in terpenoid and flavonoid biosynthesis, in cell wall metabolism, in cellular
communication (hormone biosynthesis and regulation,
cascades of signal transduction) and responses to biotic
and abiotic stresses, were mainly expressed at 45 DAF,
whereas the majority of gene transcripts related to different primary metabolic pathways (carbohydrate, lipid,
amino acid and protein metabolism) as well as transcription factors and regulators and genes involved in vitamin
biosynthesis, were more expressed at 135 DAF (Figure 6).
A wide set of genotype-specific TCs, mainly related to hormone biosynthesis and signalling, responses to abiotic
and biotic stresses, biosynthesis of terpenes and phenylpropanoids, were also observed (Table 5).
Furthermore, TCs encoding structural enzymes synthesizing terpenoids and terpenoid precursors (such as dimethylallyl diphosphate (DMAPP) and isopentenyl
diphosphate (IPP)) fluctuated between developmental
stages (Figures 7 and 8). Transcripts involved in the
mevalonate (MVA) pathway for isoprenoid biosynthesis,
occurring in the cytoplasm, were predominantly not regulated, while six out of seven genes coding for the main
enzymes of the plastidial non-mevalonate (non-MVA)
pathway, presented TCs more abundant at 45 DAF.
Finally, transcripts involved in flavonoid biosynthesis
were also regulated between developmental stages and
genotypes (Figure 9).
Discussion
Sequencing output
This is the first report of a large-scale and comparative EST
analysis from olive fruit. Olive is one of the most impor-
Table 3: Summary of the EST assembly
Nr. of cluster
Nr. of clusters with multiple TCs
Nr. of sESTs
Nr. of TCs
Nr. of unique transcripts
22,904
2,406
75,570
26,563
102,133
Page 5 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
Table 4: Composition of the assembled dataset
TCs
Number of sequences
Average length (nts)
Min seq length
Max seq length
26,563
354.93
102
4,916
Number of sequences
Average length (nts)
Min seq length
Max seq length
185,913
239.47
101
412
Number of sequences
Average length (nts)
Min seq length
Max seq length
75,570
179.35
36
446
ESTs in TCs
>ŝƉŝĚƐ
ŵĞƚĂďŽůŝƚĞƐ
ŶnjLJŵĞƐ
sESTs
EƵŵďĞƌ ŽĨ d ĂŶĚ Ɛ^dƐ ĞŶĐŽĚŝŶŐ ĞŶnjLJŵĞƐ
EƵŵďĞƌ ŽĨ d ĂŶĚ Ɛ^dƐ ĞŶĐŽĚŝŶŐ ĚŝĨĨĞƌĞŶƚ ĞŶnjLJŵĞƐ
Figure 4
Depiction of enzyme-encoding TCs grouped by classes, each describing the main enzymatic activity
Depiction of enzyme-encoding TCs grouped by classes, each describing the main enzymatic activity. (A). B and
C represent TCs encoding enzymes involved in biosynthesis of plant metabolites and lipids, respectively. The number of TCs
encoding each enzyme class is reported on the x axis of each graph.
Page 6 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
0
1
5
,
ϰϱ &
dϰϱ &
dϭϯϱ &
ϭϯϱ &
W
dϭϯϱ
ϭϯϱ
dϰϱ
ϰϱ
Figure 5
A. Hierarchical Clustering Analysis (HCA) of differentially expressed TCs
A. Hierarchical Clustering Analysis (HCA) of differentially expressed TCs. Color codes for expression values are
reported on the top. B – Principal component analysis (PCA) of differentially expressed TCs. The percentage of variance
explained by each component is shown within brackets. C45 = Coratina 45 DAF; C135 = Coratina 135 DAF; T45 = Tendellone
45 DAF; T135 = Tendellone 135 DAF.
tant oil crops in the world. It belongs to the Asterid clade
of angiosperms, that includes thousands of economically
important crops for which genomic information is still
scarce. The massive EST characterization described here
can be considered an initial platform for the functional
genomics of Olea europaea and will be a starting point for
the establishment of molecular tools for improving the
major quality traits in Olea species. Massively parallel EST
sequencing provided more than 102,000 unigenes consisting in 26,563 TCs and 75,570 singletons from four
fruit libraries. Considering 27 available data on expressed
genes of other plant species, such as Arabidopsis [16], it is
possible that the reported unigene set of Olea is an over
estimate of the actual number of transcripts expressed in
the fruit. This could in part be the result of unassembled
segments of TCs and sESTs pertaining to the same transcript unit. A certain amount of incomplete EST assembly
is expected as a result of the short reads provided by the
454 pyrosequencing technology.
Despite the fact that cDNA samples were prepared without any normalization process, we only found a moderate
degree of redundancy. Clustering of ESTs has indeed
reduced the number of sequences by 61% from 261,483
quality passed reads to 26,563 TCs plus 75,570 sESTs.
RepBase masking analysis has revealed a surprising
Page 7 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
Figure 6
gories (listed on the axis) on the basis of their predicted biological process
Transcripts identifiedxas differentially expressed between 45 DAF and 135 DAF in both genotypes grouped in functional cateTranscripts identified as differentially expressed between 45 DAF and 135 DAF in both genotypes grouped in
functional categories (listed on the x axis) on the basis of their predicted biological process.
amount of short repeats and transposable elements (TEs),
which could represent a valuable resource to develop TEderived molecular markers [17] and to investigate on Olea
genome size evolution. Also, the GC-content of 42.5%,
ranging from less than 16% to more than 63%, can provide a contribution to the evolution studies and gene
transfer dynamics within the Oleaceae taxon.
Functional annotation
The percentage of TCs and singletons with no putative
function assigned was considerably elevated, possibly as a
result of gene functions specifically evolved in Olea europaea and quite divergent from those of other plant species.
The Olea fruit, indeed, presents a number of exclusive
traits, like, above all, oil and biophenol accumulation.
These traits are encoded at genomic level. On the other
hand, the high incidence of unigenes with no assigned
function (about 70%), could be due to the poor annotation that still affects protein databases. Also, it is possible
that many TCs and sESTs could not be reliably annotated
because they did not cover the entire length of the transcript or because they represent untranslated regions
(UTRs). This could be particularly the case of our dataset
given that the 454 sequencing technology typically provides short sequence reads.
The identification in the Olea genome of transcribed
sequences similar to a wide range of phylogenetically distant organisms raises intriguing questions about the evolution of their physiological roles and about whether or
Table 5: TCs assembled from ESTs exclusively present in one of the two cultivars
Biological Process
N. of specific TCs for cv. Coratina
N. of specific TCs for cv. Tendellone
Hormone metabolism and regulation
Abiotic and biotic stress
Cell wall metabolism
Lipid metabolism
Steroid metabolism
Phenylpropanoid metabolism
13
10
9
7
9
7
5
4
3
7
0
3
Page 8 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
Dimethylallyl diphosphate
Isopentenyl diphosphate
Farnesyl pyrophosphate
synthetase
Geranyl diphosphate
(R)-limonene
synthase
Iridodial
C
(+)-R-limonene
Iridotrial
Deoxyloganic
acid
Loganin
7-epi-loganic
acid
Farnesyl diphosphate
Squalene
1,8-cineole
synthase C
Squalene
monooxygenase
7-ketologanin
7-ketologanic
acid
(+)-d-cadinene
C
Cycloartenol
Sterol
biosynthesis
Oleoside-11methyl ester
Brassinosteroid
biosynthesis
7-beta-1-D-glucopyranosil11-methyl oleoside
Secologanin
Ligstroside
Tryptamine
Oleuropein
Secoiridoid oleosides
biosynthesis
Polyneuridine aldehyde
C
esterase
Deacetoxyvindoline
4-hydroxylase
Vindoline
Cycloartenol
synthase
1,8-cineole
Secologanin synthase
3-alpha(S)Stryctosidine
Squalene
2,3-oxide
16-Epivellosimine
Vellosimine
spontaneous
Polyneuridine
aldehyde
Deacetylvindoline
Vinblastine
Vincrastine
Indole and ipecac
alkaloid
biosynthesis
Asparagine
Ajmaline
Figure 7
Partial representation of the metabolic pathway for terpenoid biosynthesis (Kegg map 00900)
Partial representation of the metabolic pathway for terpenoid biosynthesis (Kegg map 00900). TCs more
expressed at 45 DAF are boxed green, while those more expressed at 135 DAF are boxed purple. In black boxes are those
TCs expressed at all fruit ripening stages and in both genotypes. Genotype-specific TCs more expressed at 45 DAF, more
expressed at 135 DAF and with unchanged expression are included in green, purple and black circles, respectively, and
reported with C (Coratina) and T (Tendellone) when they are exclusively present in one of the two cultivars.
not these sequences and the related functions are the
result of recent gene transfer or the relic of an ancient past.
It is important to note that about 25% of the annotated
enzyme-coding transcripts are involved in biosynthesis of
lipids and fruit metabolites. The availability of the genetic
information related to these enzyme functions represents,
in our view, a fundamental tool for understanding the
molecular basis of the expression of traits related to fruit
phenotype and for establishing new strategies of metabolic engineering to improve the overall quality of olive
fruit.
Changes in transcript abundances
Large scale random sequencing of different fruit cDNA
collections has provided information on relative large
scale variation of gene expression. However, it should be
noted that no further experimental validation has been
performed on differentially expressed TCs passing the R
test [15].
Analysis of differentially expressed gene transcripts evidenced large differences in key genes involved in a
number of metabolic pathways that can potentially alter
most quality traits in olive fruits. In some cases, different
TCs with identical predicted annotation showed a contrasting accumulation pattern between developing stages
or between genotypes; this implies that similar, although
not identical, proteins and enzymes may undergo different expression patterns, determining a fine regulation of
metabolic pathways and the accumulation of alternative
metabolites.
Page 9 of 15
(page number not for citation purposes)
BMC Genomics 2009, 10:399
http://www.biomedcentral.com/1471-2164/10/399
Ac-MVA pathway
Non-MVA pathway
(In cytoplasm)
(In plastids)
2 Acetyl-CoA
(AC)
Pyruvate + Glyceraldeide-3-P
DOXP synthase
ACC thiolase
1-deoxy-D-xylulose-5-P
(DOXP)
Acetoacetyl-CoA
(ACC)
(+ NADPH)
(+ NADPH)
HMG-CoA
synthase
2-C-methyl-D-erythritol-4-P
(MEP)
3-hydroxy-3-methylglutaryl-CoA
(HMG-CoA)
HMG-CoA
reductase
(HMGR)
DOXP
reductoisomerase
(+ CTP)
CDP-ME synthase
4-(CDP)-2-C-methyl-D-erythritol
(CDP-ME)
CoA
Mevalonate
(MVA)
(+ ATP)
CDP-ME kinase
MVA kinase
4-(CDP)-2-C-methyl-D-erythritol-2-P
(CDP-ME2P)
Mevalonate phosphate
(MVAP)
MECP synthase
CMP
MVAP kinase
2-C-methyl-D-erythritol 2,4-cyclo-PP
(MECP)
HMBPP synthase
Mevalonate diphosphate
MVAPP
decarboxylase
CO2
Isopentenyl diphosphate
delta isomerase
Isopentenyl diphosphate
(IPP)
1-hydroxy-2-methyl-2-(E)-butenyl-4-PP
(HMBPP)
HMBPP reductase
Dimethylallyl diphosphate
(DMAPP)
Partial representation of the metabolic pathway for biosynthesis of steroids (Kegg map 00100)
Figure 8
Partial representation of the metabolic pathway for biosynthesis of steroids (Kegg map 00100). Color codes for
boxes are the same as in Figure 7.
It is interesting to note that the C cultivar underwent a
larger degree of transcriptional modulation during fruit
development. It is possible that this is related to the very
high content in phenolic compounds at the beginning of
fruit development in this cultivar.
Comparison between fruit developmental stages
Expression differences were found for transcripts involved
in several physiological processes that promote fruit