-
Notifications
You must be signed in to change notification settings - Fork 0
/
PMC1488885.xml
19 lines (13 loc) · 109 KB
/
PMC1488885.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article"><?properties open_access?><front><journal-meta><journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id><journal-id journal-id-type="publisher-id">Nucleic Acids Research</journal-id><journal-title>Nucleic Acids Research</journal-title><issn pub-type="ppub">0305-1048</issn><issn pub-type="epub">1362-4962</issn><publisher><publisher-name>Oxford University Press</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="pmid">16822859</article-id><article-id pub-id-type="pmc">PMC1488885</article-id><article-id pub-id-type="doi">10.1093/nar/gkl418</article-id><article-categories><subj-group subj-group-type="heading"><subject>Article</subject></subj-group></article-categories><title-group><article-title>Genomic overview of mRNA 5′-leader <italic>trans-</italic>splicing in the ascidian <italic>Ciona intestinalis</italic></article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Satou</surname><given-names>Yutaka</given-names></name><xref rid="au1" ref-type="aff">1</xref><xref ref-type="corresp" rid="cor1">*</xref></contrib><contrib contrib-type="author"><name><surname>Hamaguchi</surname><given-names>Makoto</given-names></name><xref rid="au1" ref-type="aff">1</xref></contrib><contrib contrib-type="author"><name><surname>Takeuchi</surname><given-names>Keisuke</given-names></name><xref rid="au1" ref-type="aff">1</xref></contrib><contrib contrib-type="author"><name><surname>Hastings</surname><given-names>Kenneth E. M.</given-names></name><xref rid="au2" ref-type="aff">2</xref></contrib><contrib contrib-type="author"><name><surname>Satoh</surname><given-names>Nori</given-names></name><xref rid="au1" ref-type="aff">1</xref><xref rid="au3" ref-type="aff">3</xref></contrib><aff id="au1"><sup>1</sup><institution>Department of Zoology, Graduate School of Science, Kyoto University</institution><addr-line>Sakyo, Kyoto 606-8502, Japan</addr-line></aff><aff id="au2"><sup>2</sup><institution>Montreal Neurological Institute and Department of Biology, McGill University</institution><addr-line>3801 University St. Montreal, Quebec, Canada H3A 2B4</addr-line></aff><aff id="au3"><sup>3</sup><institution>CREST, Japan Science Technology Agency</institution><addr-line>Kawaguchi, Saitama, 330-0012, Japan</addr-line></aff></contrib-group><author-notes><corresp id="cor1"><sup>*</sup>To whom correspondence should be addressed. Tel: +81-75-753-4095; Fax: +81-75-705-1113; Email: <email>yutaka@ascidian.zool.kyoto-u.ac.jp</email></corresp></author-notes><!--For NAR: both ppub and collection dates generated for PMC processing 1/27/05 beck--><pub-date pub-type="collection"><year>2006</year></pub-date><pub-date pub-type="ppub"><year>2006</year></pub-date><pub-date pub-type="epub"><day>5</day><month>7</month><year>2006</year></pub-date><volume>34</volume><issue>11</issue><fpage>3378</fpage><lpage>3388</lpage><history><date date-type="received"><day>25</day><month>3</month><year>2006</year></date><date date-type="accepted"><day>19</day><month>5</month><year>2006</year></date></history><copyright-statement>© 2006 The Author(s)</copyright-statement><copyright-year>2006</copyright-year><license license-type="openaccess"><p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/2.0/uk/"/>) which permits unrestricted non-commerical use, distribution, and reproduction in any medium, provided the original work is properly cited.</p></license><abstract><p>Although spliced leader (SL) <italic>trans-</italic>splicing in the chordates was discovered in the tunicate <italic>Ciona intestinalis</italic> there has been no genomic overview analysis of the extent of <italic>trans-</italic>splicing or the make-up of the <italic>trans-</italic>spliced and non-<italic>trans-</italic>spliced gene populations of this model organism. Here we report such an analysis for <italic>Ciona</italic> based on the oligo-capping full-length cDNA approach. We randomly sampled 2078 5′-full-length ESTs representing 668 genes, or 4.2% of the entire genome. Our results indicate that <italic>Ciona</italic> contains a single major SL, which is efficiently <italic>trans</italic>-spliced to mRNAs transcribed from a specific set of genes representing ∼50% of the total number of expressed genes, and that individual <italic>trans-</italic>spliced mRNA species are, on average, 2–3-fold less abundant than non-<italic>trans-</italic>spliced mRNA species. Our results also identify a relationship between <italic>trans-</italic>splicing status and gene functional classification; ribosomal protein genes fall predominantly into the non-<italic>trans</italic>-spliced category. In addition, our data provide the first evidence for the occurrence of polycistronic transcription in <italic>Ciona</italic>. An interesting feature of the <italic>Ciona</italic> polycistronic transcription units is that the great majority entirely lack intercistronic sequences.</p></abstract></article-meta></front><body><sec><title>INTRODUCTION</title><p>The ascidian tunicate <italic>Ciona intestinalis</italic> is a chordate whose 160 Mb genome, with ∼16 000 genes, is considerably simpler than those of vertebrates, such as man, mouse and pufferfish, which contain ∼30 000 genes (<xref ref-type="bibr" rid="b1">1</xref>–<xref ref-type="bibr" rid="b3">3</xref>). The smaller number of genes, comparatively short intergenic distances, and the robust and experimentally accessible nature of its early development have made the ascidian an important model system for genetic analysis of chordate development (<xref ref-type="bibr" rid="b4">4</xref>,<xref ref-type="bibr" rid="b5">5</xref>). In-depth knowledge of the ascidian genome will also contribute to our understanding of chordate evolution and the origins of the vertebrate genome. This developmental and evolutionary relevance has driven extensive molecular genetic studies so that <italic>Ciona</italic> has become one of the better-characterized animals in terms of genomics resources (<xref ref-type="bibr" rid="b6">6</xref>,<xref ref-type="bibr" rid="b7">7</xref>).</p><p>Recent studies have uncovered an unexpected and striking genomic difference between tunicates and vertebrates. This difference concerns mRNA 5′-leader <italic>trans</italic>-splicing, or spliced leader (SL) <italic>trans</italic>-splicing. In SL <italic>trans</italic>-splicing, the original 5′ ends of some pre-mRNAs are discarded and are replaced, in a spliceosomal mechanism, by the 5′-region of a small, specialized donor RNA, the SL RNA (<xref ref-type="bibr" rid="b8">8</xref>). Because multiple pre-mRNAs are <italic>trans</italic>-spliced by the same SL RNA species, SL sequences are found as common sequences at the 5′ ends of diverse mRNA species. SL <italic>trans</italic>-splicing occurs in tunicates, including <italic>Ciona</italic> (<xref ref-type="bibr" rid="b9">9</xref>,<xref ref-type="bibr" rid="b10">10</xref>). However, despite intensive genetics research, it has not been observed in any vertebrate, and presumably does not occur in that group. From its patchy distribution among the eukaryotic kingdoms and phyla, it is not clear if SL <italic>trans</italic>-splicing is an ancestral eukaryotic mechanism that has been secondarily lost in several lineages, e.g. the vertebrates, arthropods, plants, fungi, or if it was absent from the ancestral eukaryote and arose independently within each of the several phyla in which it is now known to occur: nematodes, flatworms, chordates, cnidarians, rotifers and protist euglenozoans (<xref ref-type="bibr" rid="b11">11</xref>–<xref ref-type="bibr" rid="b14">14</xref>). In-depth genomic studies of SL <italic>trans-</italic>splicing organisms, e.g. tunicates, and of related non-<italic>trans-</italic>splicing organisms, e.g. vertebrates, are likely to generate insight into the evolution of SL <italic>trans-</italic>splicing, and the implications of its evolutionary gain or loss for other aspects of genome organization and function.</p><p>The functions of SL <italic>trans</italic>-splicing are partly, but not entirely, understood. Its best-known role is to resolve polycistronic transcripts into individual 5′-capped monocistronic mRNAs, a process that has been extensively studied in nematodes (<xref ref-type="bibr" rid="b15">15</xref>,<xref ref-type="bibr" rid="b16">16</xref>) flatworms (<xref ref-type="bibr" rid="b17">17</xref>) and in euglenozoan protists including kinetoplastids [trypanosomes, where it is the dominant mechanism of gene expression (<xref ref-type="bibr" rid="b18">18</xref>–<xref ref-type="bibr" rid="b22">22</xref>)]. Most known SL-resolved polycistronic transcription units, or operons, include short intercistronic sequences that in the nematode <italic>Caenorhabditis</italic> include <italic>cis</italic>-elements playing an active role in directing <italic>trans</italic>-splicing of downstream cistrons (<xref ref-type="bibr" rid="b23">23</xref>). In <italic>Caenorhabditis</italic>, genes within operons are in some cases functionally related (<xref ref-type="bibr" rid="b16">16</xref>), and have a significant overall tendency to show similar patterns of mRNA accumulation (<xref ref-type="bibr" rid="b24">24</xref>). Thus operons could represent a mechanism for coordinating gene expression. However, because many operons contain genes that have no obvious functional relationship and/or are not coordinately expressed (<xref ref-type="bibr" rid="b16">16</xref>), additional non-specific factors, such as genome compaction may also contribute to operon evolution. Apart from its role in operons it is likely that SL-<italic>trans-</italic>splicing has additional functions, e.g. in regulating mRNA stability or translation (<xref ref-type="bibr" rid="b25">25</xref>,<xref ref-type="bibr" rid="b26">26</xref>) or in removing potentially deleterious sequences from pre-mRNA 5′-untranslated regions (5′-UTR) (<xref ref-type="bibr" rid="b27">27</xref>), because in nematodes and flatworms, and possibly all <italic>trans-</italic>splicing metazoa, only a modest fraction of <italic>trans-</italic>spliced genes are in operons; the majority are transcribed mono-cistronically.</p><p>SL-<italic>trans-</italic>splicing occurs on mono- or poly-cistronic pre-mRNA targets only at splice accepter sites that do not have a partner donor site upstream in the transcript, i.e. unpaired acceptor sites. Acceptor sites that are paired with an upstream partner donor site preferentially undergo <italic>cis</italic>-splicing with removal of the intervening intron, rather than <italic>trans-</italic>splicing (<xref ref-type="bibr" rid="b27">27</xref>). All known <italic>trans</italic>-splicing metazoa appear to have a significant class of conventionally-expressed genes that do not contain unpaired acceptor sites and hence do not undergo <italic>trans-</italic>splicing. However, there have been no sequence-based overview studies of both the <italic>trans-</italic>spliced and non-<italic>trans-</italic>spliced gene classes in any metazoan and the biological implications of the division of the genome into these two major gene classes remain unexplored.</p><p>The first report of SL <italic>trans-</italic>splicing in the chordates identified a 16 nt <italic>trans</italic>-spliced leader transferred to at least seven mRNA species in <italic>Ciona</italic> (<xref ref-type="bibr" rid="b9">9</xref>). A distinct 40 nt SL was subsequently reported in another tunicate, <italic>Oikopleura dioica</italic>, which belongs to the distantly-related class Appendicularia (<italic>Larvaceae</italic>), whose morphology, behavior and developmental and ecological strategies differ markedly from ascidians, and whose genome evolution has featured a marked overall compaction (<xref ref-type="bibr" rid="b10">10</xref>). In <italic>Oikopleura</italic> 12–24% of genes give rise to <italic>trans-</italic>spliced mRNAs, including some genes in SL-resolved operons (<xref ref-type="bibr" rid="b10">10</xref>). In <italic>Ciona</italic> the overall extent of <italic>trans-</italic>splicing is unknown, and polycistronic transcription has not been reported.</p><p>In order to advance our understanding of the <italic>Ciona</italic> genome in relation to SL-<italic>trans</italic>-splicing, we have carried out a global overview analysis of the <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced gene populations. Our goals were to answer the following questions: What fraction of <italic>Ciona</italic> genes gives rise to <italic>trans</italic>-spliced mRNAs? Are the <italic>trans-</italic>spliced and non-<italic>trans-</italic>spliced gene classes specialized in terms of gene function? Does the <italic>Ciona</italic> genome contain operons that are resolved by SL <italic>trans</italic>-splicing? Is the currently-known 16 nt SL the only one, or are additional, novel, SL sequences also used, as in <italic>Caenorhabditis</italic>, which contains a second SL RNA, SL2, devoted to polycistron resolution? Our study reports the first broad samplings of both the <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced mRNA subpopulations of any organism, and reveals the differential distribution of a functional gene class (ribosomal protein genes) between these mRNA populations. It also provides the first evidence for polycistronic transcription in <italic>Ciona</italic>. Moreover our results reveal an interesting feature of polycistronic transcription in <italic>Ciona</italic>, i.e. a predominance of operons entirely lacking intercistronic sequences.</p></sec><sec sec-type="materials|methods"><title>MATERIALS AND METHODS</title><sec><title>Ascidian eggs and embryos</title><p><italic>Ciona intestinalis</italic> adults were cultivated at the Maizuru Fisheries Research Station of Kyoto University, Maizuru city, facing the Sea of Japan. They were maintained in aquaria in our laboratory at Kyoto University under constant light to induce oocyte maturation. Eggs and sperm were obtained surgically from gonoducts. After fertilization, embryos were reared at ∼18°C in Millipore-filtered seawater containing 50 μg/ml streptomycin sulfate.</p></sec><sec><title>Construction of a full-length enriched cDNA library</title><p>Total RNA was isolated independently from four different developmental stages (eggs, tailbud embryos, larvae and young adults) of <italic>Ciona intestinalis</italic> by the acid guanidinium thiocyanate–phenol–chloroform method (<xref ref-type="bibr" rid="b28">28</xref>). Oligo-capping of a mixture of equal amounts of isolated RNAs was performed using a commercially available kit (GeneRacer kit, Invitrogen). Oligo-capped RNA was reverse-transcribed with tagged-oligo-(dT) primer, and the resultant cDNA was amplified by 15 cycles of PCRs with Pfu DNA polymerase using primers for the capping-oligo and the tag in the oligo-(dT) primer. The amplified cDNAs were size-fractionated by gel chromatography. After treating with <italic>Taq</italic> DNA polymerase for 5 min at 72°C, the cDNA was cloned into pGEM-T vector (Promega).</p></sec><sec><title>5′ End sequencing of oligo-capped cDNA clones</title><p>cDNA inserts were PCR amplified using M13 reverse and forward primers. Successful amplifications were confirmed by agarose gel electrophoresis. After purification of the PCR products with Montage-PCR Filter Units (Millipore), their sequences were determined by conventional procedures using the big-dye terminator kits on an ABI PRISM 3700 DNA Analyzer (Applied Biosystems), and the same primers used for amplification.</p></sec><sec><title>Informatics analyses</title><p>5′-Terminal sequences for oligo-capping cDNA clones, full-length enriched ESTs (simply termed full-length ESTs hereafter), were BLAST-searched against themselves, and the results were used for clustering with a threshold score of 150. The clustering result was evaluated based on mapping information of the full-length ESTs onto the genome. We found that one cluster contained eight different actin genes because of high conservation of their nucleotide sequences, and we manually corrected this problem. A unique number was assigned to each cluster. To find spliced-leaders, 5′ end sequences of length 20 nt were compared with each other using the CROSSMATCH program (<xref ref-type="bibr" rid="b29">29</xref>).</p><p>Analysis of polycistronic transcription units in the <italic>Ciona</italic> genome was based on a set of gene models, called Kyotograil2004, recently predicted by the grailexp program based on ∼680 000 ESTs and ∼6500 full insert cDNA sequences (<xref ref-type="bibr" rid="b30">30</xref>). Three independent approaches were used to uncover candidate operons: analysis of gene models upstream of mapped SL-full-length ESTs, genome-wide search for closely-spaced head-to-tail gene models, and search among conventional ESTs for dicistronic transcripts.</p><p><italic>Gene models upstream of genome-mapped full-length ESTs:</italic> All full-length ESTs were aligned with the genome sequence using the BLAT program (<xref ref-type="bibr" rid="b31">31</xref>) and the genomic distance between the 5′ end of each full-length EST and the nearest end of the nearest upstream non-overlapping gene model was tabulated.</p><p><italic>Genome search for closely-spaced head-to-tail gene models:</italic> All cases where non-overlapping neighbouring gene models in the same transcriptional orientation were separated by less than or equal to 100 nt were recovered. In some cases more than two gene models in a row satisfied the criteria. Each such group of two or more gene models was assigned a unique group identification number. In addition, groups described more in detail in the present paper were given an independent operon identification number. All gene groups were manually examined on the genome browser (<xref ref-type="bibr" rid="b30">30</xref>) on which full-length ESTs and an extensive collection of conventional ESTs were also mapped. Only groups in which both gene models were supported by ESTs and/or full-length ESTs were considered to be candidate operons.</p><p><italic>Search for dicistronic conventional ESTs:</italic> All non-redundant conventional ESTs were mapped onto the genome by BLAT and coordinates were compared with a list of grailexp gene model coordinates. ESTs were recovered that mapped to two non-overlapping neighbouring gene models having the same transcriptional orientation.</p><p><italic>GO assesment:</italic> The proteins deduced from the gene models were searched against human proteins with the BLAST program (<xref ref-type="bibr" rid="b32">32</xref>). The human protein set used was a group of 11 632 proteins annotated with GO terms in the molecular function category among the reference sequences in NCBI (release 9). The cut-off value for significant similarities was set to <italic>e</italic> =1E − 15. GO terms in the molecular function category for each top-scoring hit were compared to determine whether genes within each pair or group were functionally related. GO terms in the fourth rank and their children were treated as their parent GO terms in the third rank for efficient comparisons. Among 179 GO terms in the third rank of the molecular function category, 42 GO terms were associated with genes having significant human/<italic>Ciona</italic> similarity.</p><p><italic>Determination of operon intercistron boundaries:</italic> SL-full-length EST sequence data precisely localized the downstream gene's <italic>trans-</italic>splice acceptor site. The 3′ end, poly(A)-adjacent sequences, of mRNAs derived from upstream genes were identified through various means. Because our major conventional cDNA 3′ EST sequencing approach in prior studies (<xref ref-type="bibr" rid="b33">33</xref>) had been based on priming with an oligo(dT)-containing primer, most 3′ EST sequencing runs were lacking several bases immediately adjacent to the poly(A). However for some short-insert cDNAs, conventional 5′-EST sequences provided poly(A)-adjacent mRNA sequences. In addition, our collection of full-insert cDNAs (<xref ref-type="bibr" rid="b6">6</xref>), previously sequenced by primer walking and vector-based priming, also provided poly(A)-adjacent sequence for several mRNAs. Finally, where necessary for the present study, conventional EST clones were resequenced using a vector-based primer that permitted determination of poly(A)-adjacent mRNA sequence.</p></sec></sec><sec><title>RESULTS</title><sec><title>5′-ESTs from a full-length enriched cDNA library</title><p>In the initial discovery of SL <italic>trans</italic>-splicing in <italic>Ciona</italic>, seven <italic>trans</italic>-spliced mRNAs were identified but the overall genomic extent of <italic>trans</italic>-splicing was not established (<xref ref-type="bibr" rid="b9">9</xref>). In the present study, we used a full-length cDNA cloning/DNA sequencing approach to identify a significant and representative fraction of the mRNA species in the <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced subpopulations.</p><p><italic>Trans</italic>-spliced and non-<italic>trans</italic>-spliced mRNAs can be recognized by the presence or absence of a 5′-SL. The presence/absence of the known <italic>Ciona</italic> 16 nt SL at mRNA 5′-termini cannot be determined from existing <italic>Ciona</italic> EST data because these ESTs are based on cDNA molecules produced by conventional cloning methods, which inevitably lose 8–21 nt of mRNA 5′-sequence information in the final double-stranded cDNA (<xref ref-type="bibr" rid="b34">34</xref>). In order to produce cDNA clones that do contain the extreme mRNA 5′-sequence, we used the oligo-capping method (<xref ref-type="bibr" rid="b35">35</xref>). To obtain a broad and representative gene sampling, we determined mRNA 5′-terminal sequences for 2078 randomly-picked oligo-capping cDNA clones generated from a mixture of egg, tailbud embryo, larva and young adult mRNA (DDBJ accession nos.: BW648671–BW650748). Within this study, we term these 5′ end sequences full-length ESTs to discriminate them from conventional ESTs, termed simply ESTs, in order to avoid confusion when we compare these two types of EST data, and in recognition of the unique genomic applications of 5′-complete, as opposed to 5′-incomplete, mRNA sequence data. Based on sequence similarities, the 2078 full-length ESTs were organized into 668 clusters of related clones, each cluster representing a different mRNA species/gene (Supplementary Table S1). Because 15 852 genes are predicted in the <italic>Ciona</italic> draft genome sequence (<xref ref-type="bibr" rid="b7">7</xref>), the number of genes covered by the full-length ESTs corresponds to 4.2% of the whole gene set. With a sample of this size the relative numbers of <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced genes in the genome can be estimated with the 5% confidence interval at the 99% confidence level. As discussed below, the evaluation of the full-length EST set by comparison with a non-biased conventional EST set, which we had obtained by independent experiments previously (<xref ref-type="bibr" rid="b6">6</xref>), suggested that the present full-length EST set does not contain a strong bias for or against the <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced mRNA subpopulation.</p></sec><sec><title><italic>Ciona</italic> has only one major SL</title><p>To identify <italic>trans</italic>-spliced leaders as 5′-terminal sequences shared by diverse mRNA species, we cross-compared the first 20 nt of the full-length EST set by the CROSSMATCH program (<xref ref-type="bibr" rid="b29">29</xref>). This identified a single common 5′-sequence of 16 nt, identical to the previously-reported SL sequence (<xref ref-type="bibr" rid="b9">9</xref>) that was shared, with a low-level of microheterogeneity (<xref ref-type="table" rid="tbl1">Table 1</xref>), by 563 full-length ESTs (termed SL-full-length ESTs) representing 332 genes. Many <italic>Ciona</italic> SL genes were found in unassembled part of genome sequences, which were set aside from the main assembly because of high repetitiveness (<xref ref-type="bibr" rid="b7">7</xref>), and such microheterogeneity was actually found there (Supplementary Figure S1). Consistent with a <italic>trans</italic>-splicing origin for the SL in these SL-full-length EST mRNA sequences, the SL sequence itself was not present in the genomic DNA regions encoding the mRNAs. The 5′-sequences of the remaining 1515 full-length ESTs (non-SL-full-length ESTs), representing 350 genes, were unique in that each was associated with only one mRNA species (fourteen genes were represented by both SL-full-length ESTs and non-SL-full-length ESTs). The absence of additional shared 5′-sequences strongly suggests that <italic>Ciona</italic> has only one SL or that any additional SL that might exist could be associated with at most a very minor fraction of the mRNA population (the probability that an additional SL, associated with even as few as 1% of mRNA molecules, would not have been sampled twice or more in our dataset is <1%). In <italic>Caenorhabditis</italic> [and other Clade V nematode species (<xref ref-type="bibr" rid="b36">36</xref>,<xref ref-type="bibr" rid="b37">37</xref>)] a second SL exists, SL2, which is <italic>trans</italic>-spliced to 8.4–9.2% of mRNA speices, and which is used only for resolving polycistronic transcripts (<xref ref-type="bibr" rid="b38">38</xref>). In other organisms—flatworms (<xref ref-type="bibr" rid="b17">17</xref>), <italic>Oikopleura</italic> (<xref ref-type="bibr" rid="b10">10</xref>) and probably more-distantly-related nematodes (<xref ref-type="bibr" rid="b16">16</xref>)—one and the same SL is used both to <italic>trans</italic>-spliced monocistronic genes and for resolving polycistronic transcripts, and our results strongly indicate that <italic>Ciona</italic> also has a single major SL.</p></sec><sec><title>Global parameters of the <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced mRNA populations</title><p>The fact that SL-full-length ESTs comprised 27% (563/2078) of the total full-length EST population suggests that 27% of the total population of mRNA molecules are <italic>trans</italic>-spliced. However, this assumes that <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced mRNAs were amplified and cloned with similar efficiency by the oligo-capping procedure. We were able to independently confirm that this was indeed the case by additional analysis of existing EST data from unbiased conventional (and hence 5′-incomplete) cDNA libraries (<xref ref-type="bibr" rid="b6">6</xref>) representing the same developmental stages we had used to make the oligo-capping full-length EST library. Extensive linear overlap of conventional ESTs with our full-length EST sequences allowed us to identify and count, in the EST dataset, the numbers of molecules corresponding to the <italic>trans</italic>-spliced (SL-full-length EST) and non-<italic>trans</italic>-spliced (non-SL-full-length EST) mRNAs. The 668 genes in our full-length EST dataset were represented by a total of 19 242 cDNA molecules in the conventional EST libraries, and, of these, 5984 or 31%, correspond to <italic>trans</italic>-spliced mRNAs. The excellent agreement of this figure with the 27% estimated from direct analysis of the full-length EST library indicates that, compared with conventional cDNA cloning, the oligo-capping procedure did not introduce a strong bias for or against the <italic>trans</italic>-spliced mRNA subpopulation. This finding differs from that of a study of the cestode flatworm <italic>Echinococcus</italic>, which reported that the oligo-capping method was strongly biased against <italic>trans</italic>-spliced mRNAs (<xref ref-type="bibr" rid="b39">39</xref>). Presumably this bias is based on a feature of <italic>trans</italic>-spliced <italic>Echinococcus</italic> mRNAs that is not shared with <italic>trans</italic>-spliced <italic>Ciona</italic> mRNAs.</p><p>The gene populations represented by SL-full-length ESTs and non-SL-full-length ESTs were almost entirely distinct; only 14/668 genes (2.1%) were represented by both full-length EST types (Supplementary Table S2; Supplementary Figure S2). Even among those genes represented by 3 or more full-length ESTs, the vast majority (130/140 = 92%) were represented either entirely by SL-full-length ESTs or entirely by non-SL-full-length ESTs. This fact establishes several points. First, because they formed a distinct sequence set, the multiply-represented non-SL-full-length ESTs were not simply failed 5′-incomplete copies of <italic>trans</italic>-spliced mRNAs, but apparently represent a distinct population of bona fide non-<italic>trans</italic>-spliced mRNA molecules. Moreover, several lines of evidence indicated that at most a very small fraction of even singly-represented non-SL-full-length ESTs in our library could be 5′-incomplete copies of <italic>trans</italic>-spliced mRNAs: (i) a superabundant <italic>trans</italic>-spliced mRNA was represented by 123 SL-full-length ESTs and zero non-SL-full-length ESTs (Supplementary Table S3) and (ii) only a small minority (3/44) of non-SL-full-length ESTs derived from the 14 dual <italic>trans</italic>-spliced/non-<italic>trans</italic>-spliced genes had a structure that could be compatible with an artifactual origin as 5′-incomplete copies of the corresponding <italic>trans</italic>-spliced mRNA (Supplementary Figure S2E and F). A second point is that <italic>trans</italic>-splicing in <italic>Ciona</italic> is largely efficient; most of the mRNA molecules derived from <italic>trans</italic>-spliced genes are in fact <italic>trans</italic>-spliced (from genes represented by three or more full-length ESTs including at least one SL-full-length EST, we obtained a total of 256 full-length ESTs of which 215 were SL-full-length ESTs and 41 were non-SL-full-length ESTs, suggesting a <italic>trans</italic>-splicing efficiency for <italic>trans</italic>-spliced genes of at least 84%). A third point is that the <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced gene populations each represent approximately one-half of the total gene number: 332/668 full-length EST-represented genes were <italic>trans</italic>-spliced, including the small number of dual <italic>trans</italic>-spliced/non-<italic>trans</italic>-spliced genes in this category. This corresponds to 50%, with the 99% confidence interval being 45–55%.</p><p>The difference between the proportion of expressed genes that give rise to <italic>trans</italic>-spliced mRNAs (∼50%) and the proportion of accumulated mRNA molecules that are <italic>trans-</italic>spliced (27–31%, as estimated above) indicates that, on average, individual <italic>trans</italic>-spliced mRNA species are 2–3-fold less abundant than individual non-<italic>trans</italic>-spliced mRNA species. This unexpected difference was not due to the presence of a small number of unusually-abundant non-<italic>trans</italic>-spliced mRNAs, but appeared to reflect general population features (Supplementary Table S3).</p><p>The biological importance of SL <italic>trans</italic>-splicing is not entirely understood and it is sometimes discussed to be related with gene functions. As our study identified a significant number of both <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced genes within the same species, it provided a unique opportunity to assess whether these might represent distinct functional classes. We assessed gene function through Gene Ontology (GO) annotations (<xref ref-type="bibr" rid="b40">40</xref>). Only one out of 42 assigned GO terms in the third rank of the molecular function category showed a clear difference in representation in the two gene sets, suggesting extensive functional overlap of <italic>trans</italic>-spliced and non-<italic>trans</italic>-spliced genes. However, we noted a significant differential representation of ribosomal protein genes (GO: 0003735, structural constituent of ribosome), which were preferentially encoded by non-<italic>trans</italic>-spliced genes. Detailed inspection showed that seventy-six of the 79 ribosomal protein genes we could identify in the <italic>Ciona</italic> genome were represented in our full-length EST set (Supplementary Table S4). None were exclusively <italic>trans</italic>-spliced, 5 were dual <italic>trans</italic>-spliced/non-<italic>trans</italic>-spliced genes and the remaining 71 were non-<italic>trans</italic>-spliced genes. The biological significance of this marked preference of <italic>Ciona</italic> ribosomal protein genes for non-<italic>trans</italic>-spliced gene expression is unclear.</p></sec><sec><title>Polycistronic transcription units in <italic>Ciona</italic></title><p>Our studies also revealed evidence for SL-resolved operons in the <italic>Ciona</italic> genome. In SL-resolved operons in other organisms, the member genes are transcribed in the same direction and are very close neighbours in the genome [intercistronic regions are most often ∼100 nt in the nematode <italic>Caenorhabditis</italic> (<xref ref-type="bibr" rid="b16">16</xref>), and 23–30 nt in <italic>Oikopleura</italic> (<xref ref-type="bibr" rid="b10">10</xref>)]. In addition, whereas the 5′-most cistron may be <italic>trans</italic>-spliced or not, all downstream cistrons are <italic>trans</italic>-spliced. We assessed the possible presence of similarly organized genes in <italic>Ciona</italic> by mapping our SL-full-length ESTs onto the genome and asking whether any of them were located very close to upstream gene models. For this analysis, we used an improved set of gene models recently predicted by the grailexp program (<xref ref-type="bibr" rid="b41">41</xref>) based on ∼680 000 ESTs and ∼6500 full-insert cDNA clone sequences (<xref ref-type="bibr" rid="b30">30</xref>). We were able to identify upstream gene models for 310 of the 332 SL-full-length EST-represented <italic>trans</italic>-spliced genes. In 173 of these gene pairs (group I) the genes were in the same transcriptional orientation and in 137 pairs they were in opposite orientation (group II). Thus operons would be expected to be found in group I but not in group II, nor in same-orientation (group III) or opposite-orientation (group IV) gene pairs made up of non-SL-full-length EST-represented genes and their upstream neighbours. Indeed, as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>, we found that very short intergenic regions (<100 nt) were moderately common in group I (9.2%, or 16 gene pairs, termed candidate operons 1–16) but were rare in groups II, III and IV (<1.1%), consistent with the expectation that some group I pairs might represent SL-resolved operons. As shown below (see Figure 3 and Supplementary Figure S3), detailed inspection demonstrated that these candidate operons lacked intercistronic regions, which strongly supports the hypothesis of polycistronic transcription.</p><p>We also carried out a full-length EST-independent whole-genome computational search for neighbouring gene-model pairs having operon-like properties, i.e. in the same orientation and separated by <100 nt. This yielded a total of 352 candidate operons, including operons 1–12 previously identified in the full-length EST-based analysis (Supplementary Table S5; operons 13–16 were missed in this screen because their downstream member genes were not accurately represented by the gene model set we used). Most candidate operons (328/352 = 93%) consisted of two genes, but some appeared to contain three (21 cases) or four (3 cases) genes (global average 2.08 genes per operon). Recognition of candidate operons depends on accuracy in the gene model predictions. Hence, the population of 352 candidate operons is likely to be incomplete, although our gene model accuracy estimates suggest that the missing fraction would be a minority.</p><p>To obtain direct evidence for polycistronic transcription of candidate operons we surveyed conventional EST data searching for cDNA clones representing incompletely processed, unresolved dicistronic transcripts, because such dicistronic ESTs that were experimentally obtained are theoretically the exact equivalent of the RT–PCR amplified dicistronic transcripts. Although, in <italic>Caenorhabditis</italic>, unresolved precursors are rare and in many cases undetectable (<xref ref-type="bibr" rid="b16">16</xref>), the great depth of available <italic>Ciona</italic> EST data raised the possibility that even rare RNA species might be found. In a grailexp model-based scan of the EST dataset, we found eight putative dicistronic transcripts (operons 33–40, Supplementary Table S6), all corresponding to candidate operons previously found in the whole-genome scan. In the course of other studies, we also found three additional putative dicistronic transcripts (operons 41–43, Supplementary Table S6), which had not been discovered in the grailexp model-based dicistronic EST search or whole-genome scan because one or both genes were not accurately represented by grailexp models.</p><p>In the case of operon 41 the evidence for SL-resolved polycistronic transcription is most extensive and is summarized in <xref ref-type="fig" rid="fig2">Figure 2</xref>. Operon 41 consists of two adjacent genes in the same transcriptional orientation. The downstream gene encodes a homologue of the GTP-binding nuclear protein Ran, while the upstream gene encodes a different protein similar to hypothetical proteins in other animals. The adjacent positioning of the genes is not a genome assembly artifact; the raw genome shotgun sequence data, which were obtained in the previous study (<xref ref-type="bibr" rid="b7">7</xref>), included eight separate reads across the intercistron boundary (3 of which are indicated in <xref ref-type="fig" rid="fig2">Figure 2</xref>). The accuracy of intron–exon structure of these genes, and the fact that they are independent genes at the protein level, is attested by the existence in the EST dataset of hundreds of cDNA clones corresponding to separate monocistronic polyadenylated mature mRNAs (summarized in <xref ref-type="fig" rid="fig2">Figure 2</xref>). In addition to the monocistronic mRNAs, the EST data included two cDNA molecules, citb076c21 and cima003i16, that corresponds to an unresolved dicistronic transcript of operon 41, thereby providing direct experimental evidence for polycistronic transcription. In both cases the 3′-EST sequencing run encoded Ran, while the 5′-EST encoded the other protein. Finally, Ran mRNA was represented by three SL-full-length ESTs in our full-length EST analysis (<xref ref-type="fig" rid="fig2">Figure 2</xref>) and is therefore a <italic>trans</italic>-spliced mRNA species. Thus operon 41 has every feature expected of an SL-resolved dicistronic locus. Moreover, detailed features of the intercistron boundary in this and other putative incompletely-processed operons (see below) makes it very unlikely that the dicistronic transcripts could represent aberrant readthrough transcripts linking two transcriptionally independent genes (see Discussion).</p><p>Operon 43 also has all of the sorts of evidences available in the case of operon 41, while other individual operons may lack one or more of these pieces of evidence. However, the collective data strongly argue that candidate operons form a coherent set of genomic entities, most or all of which are SL-resolved polycistrons. We found dicistronic ESTs for nine candidate operons in addition to operons 41 and 43, and in each case the accuracy of the genome assembly was confirmed by multiple shotgun genome sequence reads, and multiple cDNAs representing the separate moncistronic mRNAs were found among the EST data. In these cases the downstream genes did not happen to be among the 4.2% of genes sampled in our full-length EST set, so we could not confirm that the downstream mRNAs were <italic>trans</italic>-spliced. However the following data showed that most, if not all, downstream genes in candidate operons are <italic>trans</italic>-spliced.</p><p>Our full-length EST dataset included 19 genes corresponding to downstream genes in candidate operons and in all 19 cases the full-length ESTs were SL-full-length ESTs (<xref ref-type="table" rid="tbl2">Table 2</xref>). The absence of non-SL-full-length ESTs is not informative in 7 of these cases because some operons were identified solely (operons 13–16) or perhaps partly (operons 41–43) on the basis of having a downstream <italic>trans</italic>-spliced gene. However the remaining 12 cases correspond to operons identified in the unbiased whole-genome gene model screen, and if there were no special relationship between candidate operons and <italic>trans</italic>-splicing, e.g. if these were simply pairs of transcriptionally independent genes that just happened to be unusually close together and in the same orientation, we would expect that the downstream genes would be <italic>trans</italic>-spliced or not in the proportions of the corresponding gene classes in the genome as a whole, i.e. half and half. Our finding that all (12/12) downstream genes are <italic>trans</italic>-spliced rules out the latter hypothesis (χ<sup>2</sup> test: <italic>P</italic> ≪ 0.01), strongly supporting SL-resolved polycistronic transcription. Moreover, it indicates that erroneously-identified operons can be no more than a small minority of the candidate operon set. [As summarized in <xref ref-type="table" rid="tbl2">Table 2</xref>, the upstream genes in candidate operons included both <italic>trans</italic>-spliced (6 cases) and non-<italic>trans</italic>-spliced (14 cases) genes, which are consistent with all hypotheses, including SL-resolved operon gene expression].</p></sec><sec><title>Intercistron boundaries in operons</title><p>Detailed analysis of intercistron boundaries gave further evidence consistent with SL-resolved polycistronic transcription, and also indicated an unusual aspect of operon expression in <italic>Ciona</italic> as compared with other organisms. The downstream genes of operons 1–16 and 41–43 were represented by SL-full-length ESTs, which precisely localized the <italic>trans</italic>-splice acceptor sites. In each of these 19 cases, the 3′ end of the mRNA derived from the upstream gene could also be precisely localized by determining poly(A)-adjacent sequences in oligo(dT)-primed cDNA clones (see Materials and Methods). This precise localization of 3′ ends and <italic>trans</italic>-splice acceptor sites revealed that in each of these operons the upstream and downstream cistrons were directly juxtaposed, with no intercistronic DNA. [An apparent exception was operon 42 in which the downstream gene was represented by a single SL-full-length EST in which the SL was linked to exon 2; however, a gene-specific 5′-RACE experiment confirmed that in other mRNA molecules derived from the downstream gene, <italic>trans</italic>-splicing did occur precisely at the intercistron boundary marked by the upstream mRNA's 3′ end (data not shown)]. Two examples are shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> and the remaining 16 cases are shown in Supplementary Figure S3. In general, the first nucleotide of the downstream cistron, to which the SL was linked, was immediately adjacent to the last nucleotide of the upstream cistron, to which poly(A) was linked, so that the G residue of the AG dinucleotide immediately upstream of the <italic>trans</italic>-splice acceptor site served as the residue to which poly(A) was added on the upstream mRNA. This relationship applied to all operons shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> and Supplementary Figure S3, and was reflected in the majority (41/55) of upstream gene cDNAs. In eight operons, there was microheterogeneity of upstream cDNA 3′ ends; in some molecules, poly(A) addition had occurred at alternative sites 2–6 nt upstream—exceptionally in operon 15 as far as 67 nt. This 3′ end microheterogeneity could reflect partial nucleolytic processing of the upstream mRNA 3′ end prior to poly(A) addition, or a 2 nt mispriming shift by oligo(dT) on the…AGAAAAAAAA……sequence during the first strand synthesis (‘AG’ corresponds to the acceptor site for the SL <italic>trans</italic>-splicing of the downstream cistron and the following ‘AAAAAAAA……’ is the poly(A) tail of the upstream gene transcript).</p><p>In no other <italic>trans</italic>-splicing organism are the majority, or even a significant fraction, of operons known to lack intercistronic DNA. However it is interesting to note that a minor class of <italic>Caenorhabditis</italic> operons lack intercistronic sequences and are resolved by <italic>trans</italic>-splicing with SL1 rather than SL2 (<xref ref-type="bibr" rid="b42">42</xref>,<xref ref-type="bibr" rid="b43">43</xref>) (see Discussion).</p></sec></sec><sec><title>DISCUSSION</title><p>Our study showed that the genome of <italic>Ciona</italic> is composed of two nearly equal gene subsets with little overlap, one of which undergoes efficient pre-mRNA <italic>trans</italic>-splicing with the single major SL, while the other undergoes conventional non-<italic>trans</italic>-splicing expression. The ∼50% <italic>trans</italic>-spliced gene fraction we estimate for <italic>Ciona</italic> is lower than the 70–90% estimate for nematodes (<xref ref-type="bibr" rid="b25">25</xref>,<xref ref-type="bibr" rid="b44">44</xref>), but higher than the 12–24% estimate for <italic>Oikopleura</italic> (<xref ref-type="bibr" rid="b10">10</xref>). The significance of lineage-specific differences in overall <italic>trans</italic>-splicing levels is not clear, although it is likely that such differences could reflect, and/or contribute to, lineage-specific features of genome evolution.</p><p>The marked preferential encoding of ribosomal proteins by non-<italic>trans</italic>-spliced mRNAs in <italic>Ciona</italic> is the first correlation to be established in any organism between <italic>trans</italic>-splicing and gene functional classification. It is of interest that a different relationship appears to exist in <italic>Oikopleura</italic> where at least 43 ribosomal protein genes are <italic>trans</italic>-spliced (<xref ref-type="bibr" rid="b10">10</xref>). Ribosomal protein gene organization strategies also differ; at least 14 <italic>Oikopleura</italic> ribosomal protein genes are associated with candidate operons (<xref ref-type="bibr" rid="b10">10</xref>), but we found only 3 of the 79 identified <italic>Ciona</italic> ribosomal protein genes in candidate operons (data not shown). In <italic>Caenorhabditis</italic>, the <italic>trans</italic>-splicing status for most of the 115 known ribosomal protein genes has not been established, although a high proportion, ∼40%, have been shown to be associated with operons (<xref ref-type="bibr" rid="b16">16</xref>).</p><p>An unexpected finding of our study was that individual <italic>trans</italic>-spliced mRNA species in <italic>Ciona</italic> are, on average, 2–3-fold less abundant than non-<italic>trans</italic>-spliced mRNAs. This issue has not yet been investigated in any other organism and it will be of interest for future studies to establish if a similar abundance relationship holds in other species. A related question to be investigated is whether the mRNA abundance difference observed in <italic>Ciona</italic> is compensated by increased translational efficiency of <italic>trans</italic>-spliced mRNAs. (Differing studies in nematodes have reported that the translational efficiency of <italic>trans-</italic>spliced mRNAs is higher than (<xref ref-type="bibr" rid="b25">25</xref>) or similar to (<xref ref-type="bibr" rid="b26">26</xref>) that of non-<italic>trans-</italic>spliced mRNAs.)</p><sec><title>Polycistronic transcription units</title><p>Our study led to the recognition of a set of 352 operons (mostly dicistronic) that are resolved by <italic>trans</italic>-splicing with the same SL used for monocistronic pre-mRNAs in the <italic>Ciona</italic> genome. In the best characterized cases, operons 41 and 43, the evidence for SL-resolved polycistronic transcription includes: (i) cDNA clones representing the unresolved dicistronic transcript, (ii) SL-full-length ESTs identifying the downstream member gene as being <italic>trans</italic>-spliced and (iii) an unusual genomic structure featuring the complete absence of intercistronic DNA. Because of the complete absence of intercistronic DNA, it is unlikely that the dicistronic transcripts could be aberrant read through transcripts of independent genes. Production of the 3′ end of the upstream mRNA by primary transcript cleavage, or by <italic>trans</italic>-splicing (see below), requires transcription across the intercistron boundary to generate the cleavage site. Likewise, production of the 5′ end of the mature <italic>trans</italic>-spliced downstream mRNA requires transcription across the intercistron boundary to generate the <italic>trans</italic>-splice acceptor site. Thus transcription across the intercistron boundary is not aberrant for an intercistronless operon, but is essential for the production of any mature monocistronic mRNA from the locus.</p><p>We found 19 candidate operons in which the downstream genes were represented by full-length ESTs (SL-full-length ESTs in all case). Given that the full-length EST set represents a 4.2% sampling of the entire genome, it can be estimated that the genome should contain a total of ∼100/4.2 × 19 = 452 such intercistronless gene pairs. This number is higher than the 352 candidate operons identified in the whole genome gene-model scan, but the latter number was expected to be an underestimate because it made no allowance for inaccurate gene models. These findings together form a compelling case for the existence of a coherent set of ∼350–450 SL-resolved (mostly dicistronic) operons in the <italic>Ciona</italic> genome.</p></sec><sec><title>Operon structure and resolution</title><p>Precise juxtaposition of upstream and downstream cistrons in <italic>Ciona</italic> operons suggests the possibility that the <italic>trans</italic>-splicing reaction itself could generate the 3′ end of the upstream mRNA. Indeed it seems very unlikely that an independent transcript cleavage mechanism could, in many independent cases, target precisely the same phosphodiester bond that is cleaved in the <italic>trans</italic>-splicing reaction. However, because the <italic>trans</italic>-splice acceptor site branch-point must necessarily reside within the upstream mRNA, the latter would presumably be released as a branched nucleic acid structure. Unless it was rapidly debranched by an unknown mechanism, this structure could have negative implications for mRNA function. Moreover, it seems certain that prior formation of the upstream mRNA's 3′ end (by a distinct mechanism) would preclude subsequent <italic>trans</italic>-splicing of the downstream cistron because the <italic>trans</italic>-splice branch-point, and acceptor site AG dinucleotide, will have been lost with the upstream mRNA. Thus expression of genes in operons that lack intercistronic sequences may be mutually exclusive in the sense that any given transcript molecule may be capable of producing either an upstream or a downstream mRNA, but not both [see also Ref. (<xref ref-type="bibr" rid="b42">42</xref>)]. As indicated by mRNA accumulation measured by EST counts (<xref ref-type="bibr" rid="b45">45</xref>), we found that operon member genes appeared to be independently expressed in overlapping patterns that were neither tightly coordinated nor markedly mutually exclusive, at least at the macroscopic level (data not shown). Further biochemical studies will be required to establish the mechanism of operon resolution in <italic>Ciona</italic> and the mechanisms that regulate the differential accumulation of mRNAs encoded by operons.</p><p>An observation that may be relevant to the mechanism of operon resolution is that most <italic>Ciona</italic> operons (16/18 cases shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> and Supplementary Figure S3) lack the canonical polyadenylation signal AAT(U)AAA within 40 nt upstream of the upstream cistron polyadenylation site (although this signal is used in a substantial fraction of <italic>Ciona</italic> genes (Y. Satou, unpublished data). <italic>Oikopleura</italic> operon upstream cistrons also lack the AATAAA signal (<xref ref-type="bibr" rid="b10">10</xref>). It can be noted that, whereas it might be desirable to have 100% efficient 3′ end formation in monocistronic genes, this is not the case for <italic>Ciona</italic> operon upstream cistrons because this would almost certainly preclude <italic>trans</italic>-splicing and expression of the downstream gene (see above). It might then be reasonable to expect mechanistic differences in 3′ end formation in monocistronic and operon gene classes.</p><p>Our findings add <italic>Ciona</italic> to the number of organisms, including <italic>Oikopleura</italic>, in which the same SL is used both for monocistronic and SL-resolved polycistronic expression. It seems increasingly likely that the occurrence in Clade V nematodes, such as <italic>Caenorhabditis</italic>, of an additional SL, SL2, specifically devoted to operon resolution is a highly specialized, lineage-specific feature (<xref ref-type="bibr" rid="b16">16</xref>). One of the functions of the intercistronic regions in <italic>Caenorhabditis</italic> operons is the specific recruitment of SL2 to downstream cistrons, in preference to the more abundant SL1 (<xref ref-type="bibr" rid="b23">23</xref>). However it is of great interest that a small minority of operons (at least 25 operons) in <italic>Caenorhabditis</italic> have little or no intercistronic sequence and are resolved by SL1, not SL2 (42, 43; T. Blumenthal, personal communication), similar to <italic>Ciona</italic> operons. <italic>Ciona</italic> is the first organism known in which the majority of operons entirely lack intercistronic DNA. Whether the intercistron-less operons of <italic>Caenorhabditis</italic> and <italic>Ciona</italic> represent convergent evolution or a common ancestral character is presently unknown, but is clearly a very important question. The presence of the canonical AATAAA cleavage/polyadenylation signal in upstream cistrons in the three fully characterized SL1-type operons in <italic>Caenorhabditis</italic> (<xref ref-type="bibr" rid="b42">42</xref>) suggests a possible mechanistic difference from the majority of <italic>Ciona</italic> operons. Further studies of the mechanisms of operon resolution in <italic>Ciona</italic> and in SL1-type operons of <italic>Caenorhabditis</italic>, and additional studies of operon structure and resolution in nematodes that lack SL2, would be of great interest in this regard.</p></sec><sec><title>SL <italic>trans</italic>-splicing evolution in chordates</title><p>The presence of SL <italic>trans</italic>-splicing and its use to resolve polycistronic transcripts in both <italic>Ciona</italic> and distantly-related <italic>Oikopleura</italic> suggest that these were ancestral tunicate features. However this does not establish whether <italic>trans</italic>-splicing and operons arose early within the tunicate lineage following the divergence of the vertebrate lineage, or were present in the ancestral chordate and were secondarily lost in the vertebrate lineage. It is interesting that despite overall similarities, there are numerous differences between SL <italic>trans</italic>-splicing in <italic>Ciona</italic> and <italic>Oikopleura</italic>. These include the length and sequence of the SL itself (<xref ref-type="bibr" rid="b9">9</xref>,<xref ref-type="bibr" rid="b10">10</xref>), the proportion of genes that are <italic>trans-</italic>spliced, the presence in <italic>Oikopleura</italic>, but not in <italic>Ciona</italic>, of 23–30 nt intercistronic sequences in candidate operons, and the preferential encoding of ribosomal proteins by <italic>trans-</italic>spliced and/or operon associated genes (<italic>Oikopleura</italic>) versus non-<italic>trans-</italic>spliced and non-operon-associated genes (<italic>Ciona</italic>). These differences, and other evolutionary genomic differences, such as the small genome size and short intron lengths in <italic>Oikopleura</italic> (<xref ref-type="bibr" rid="b46">46</xref>) suggest that further comparative studies of tunicates may be particularly informative about evolutionary aspects of <italic>trans</italic>-splicing and its relationship to genome evolution.</p></sec></sec><sec><title>SUPPLEMENTARY DATA</title><p>Supplementary Data are available at NAR online.</p></sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>[Supplementary Material]</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="nar_34_11_3378__index.html"/>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="nar_34_11_3378__1.pdf"/>
</supplementary-material>
</sec>
</body><back><ack><p>The authors appreciate the technical assistance of Mr Daisuke Miyamura and Dr Takeshi Kawashima. This research was supported by Grants-in-Aid from the Ministry of Education, Science, Sports, Culture and Technology, Japan to Y.S. (17071020 and 17687022). This research was also supported in part by CREST project to N.S. K.E.M.H. was supported by a grant from NSERC. Funding to pay the Open Access publication charges for this article was provided by the Ministry of Education, Science, Sports, Culture and Technology, Japan.</p><p><italic>Conflict of interest statement.</italic> None declared.</p></ack><ref-list><title>REFERENCES</title><ref id="b1"><label>1</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aparicio</surname><given-names>S.</given-names></name><name><surname>Chapman</surname><given-names>J.</given-names></name><name><surname>Stupka</surname><given-names>E.</given-names></name><name><surname>Putnam</surname><given-names>N.</given-names></name><name><surname>Chia</surname><given-names>J.M.</given-names></name><name><surname>Dehal</surname><given-names>P.</given-names></name><name><surname>Christoffels</surname><given-names>A.</given-names></name><name><surname>Rash</surname><given-names>S.</given-names></name><name><surname>Hoon</surname><given-names>S.</given-names></name><name><surname>Smit</surname><given-names>A.</given-names></name><etal/></person-group><article-title>Whole-genome shotgun assembly and analysis of the genome of <italic>Fugu rubripes</italic></article-title><source>Science</source><year>2002</year><volume>297</volume><fpage>1301</fpage><lpage>1310</lpage><pub-id pub-id-type="pmid">12142439</pub-id></citation></ref><ref id="b2"><label>2</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Waterston</surname><given-names>R.H.</given-names></name><name><surname>Lindblad-Toh</surname><given-names>K.</given-names></name><name><surname>Birney</surname><given-names>E.</given-names></name><name><surname>Rogers</surname><given-names>J.</given-names></name><name><surname>Abril</surname><given-names>J.F.</given-names></name><name><surname>Agarwal</surname><given-names>P.</given-names></name><name><surname>Agarwala</surname><given-names>R.</given-names></name><name><surname>Ainscough</surname><given-names>R.</given-names></name><name><surname>Alexandersson</surname><given-names>M.</given-names></name><name><surname>An</surname><given-names>P.</given-names></name><etal/></person-group><article-title>Initial sequencing and comparative analysis of the mouse genome</article-title><source>Nature</source><year>2002</year><volume>420</volume><fpage>520</fpage><lpage>562</lpage><pub-id pub-id-type="pmid">12466850</pub-id></citation></ref><ref id="b3"><label>3</label><citation citation-type="journal"><collab>International human genome sequencing consortium</collab><article-title>Finishing the euchromatic sequence of the human genome</article-title><source>Nature</source><year>2004</year><volume>431</volume><fpage>931</fpage><lpage>945</lpage><pub-id pub-id-type="pmid">15496913</pub-id></citation></ref><ref id="b4"><label>4</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satoh</surname><given-names>N.</given-names></name></person-group><article-title>The ascidian tadpole larva: comparative molecular development and genomics</article-title><source>Nature Rev. Genet.</source><year>2003</year><volume>4</volume><fpage>285</fpage><lpage>295</lpage><pub-id pub-id-type="pmid">12671659</pub-id></citation></ref><ref id="b5"><label>5</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satoh</surname><given-names>N.</given-names></name><name><surname>Satou</surname><given-names>Y.</given-names></name><name><surname>Davidson</surname><given-names>B.</given-names></name><name><surname>Levine</surname><given-names>M.</given-names></name></person-group><article-title><italic>Ciona intestinalis</italic>: an emerging model for whole-genome analyses</article-title><source>Trends Genet.</source><year>2003</year><volume>19</volume><fpage>376</fpage><lpage>381</lpage><pub-id pub-id-type="pmid">12850442</pub-id></citation></ref><ref id="b6"><label>6</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satou</surname><given-names>Y.</given-names></name><name><surname>Yamada</surname><given-names>L.</given-names></name><name><surname>Mochizuki</surname><given-names>Y.</given-names></name><name><surname>Takatori</surname><given-names>N.</given-names></name><name><surname>Kawashima</surname><given-names>T.</given-names></name><name><surname>Sasaki</surname><given-names>A.</given-names></name><name><surname>Hamaguchi</surname><given-names>M.</given-names></name><name><surname>Awazu</surname><given-names>S.</given-names></name><name><surname>Yagi</surname><given-names>K.</given-names></name><name><surname>Sasakura</surname><given-names>Y.</given-names></name><etal/></person-group><article-title>A cDNA resource from the basal chordate <italic>Ciona intestinalis</italic></article-title><source>Genesis</source><year>2002</year><volume>33</volume><fpage>153</fpage><lpage>154</lpage><pub-id pub-id-type="pmid">12203911</pub-id></citation></ref><ref id="b7"><label>7</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dehal</surname><given-names>P.</given-names></name><name><surname>Satou</surname><given-names>Y.</given-names></name><name><surname>Campbell</surname><given-names>R.K.</given-names></name><name><surname>Chapman</surname><given-names>J.</given-names></name><name><surname>Degnan</surname><given-names>B.</given-names></name><name><surname>De Tomaso</surname><given-names>A.</given-names></name><name><surname>Davidson</surname><given-names>B.</given-names></name><name><surname>Di Gregorio</surname><given-names>A.</given-names></name><name><surname>Gelpke</surname><given-names>M.</given-names></name><name><surname>Goodstein</surname><given-names>D.M.</given-names></name><etal/></person-group><article-title>The draft genome of <italic>Ciona intestinalis</italic>: insights into chordate and vertebrate origins</article-title><source>Science</source><year>2002</year><volume>298</volume><fpage>2157</fpage><lpage>2167</lpage><pub-id pub-id-type="pmid">12481130</pub-id></citation></ref><ref id="b8"><label>8</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nilsen</surname><given-names>T.W.</given-names></name></person-group><article-title><italic>Trans-</italic>splicing of nematode premessenger RNA</article-title><source>Annu. Rev. Microbiol.</source><year>1993</year><volume>47</volume><fpage>413</fpage><lpage>440</lpage><pub-id pub-id-type="pmid">8257104</pub-id></citation></ref><ref id="b9"><label>9</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vandenberghe</surname><given-names>A.E.</given-names></name><name><surname>Meedel</surname><given-names>T.H.</given-names></name><name><surname>Hastings</surname><given-names>K.E.</given-names></name></person-group><article-title>mRNA 5′-leader <italic>trans-</italic>splicing in the chordates</article-title><source>Genes Dev.</source><year>2001</year><volume>15</volume><fpage>294</fpage><lpage>303</lpage><pub-id pub-id-type="pmid">11159910</pub-id></citation></ref><ref id="b10"><label>10</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ganot</surname><given-names>P.</given-names></name><name><surname>Kallesoe</surname><given-names>T.</given-names></name><name><surname>Reinhardt</surname><given-names>R.</given-names></name><name><surname>Chourrout</surname><given-names>D.</given-names></name><name><surname>Thompson</surname><given-names>E.M.</given-names></name></person-group><article-title>Spliced-leader RNA <italic>trans</italic> splicing in a chordate <italic>Oikopleura dioica</italic> with a compact genome</article-title><source>Mol. Cell. Biol.</source><year>2004</year><volume>24</volume><fpage>7795</fpage><lpage>7805</lpage><pub-id pub-id-type="pmid">15314184</pub-id></citation></ref><ref id="b11"><label>11</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nilsen</surname><given-names>T.W.</given-names></name></person-group><article-title>Evolutionary origin of SL-addition <italic>trans-</italic>splicing: still an enigma</article-title><source>Trends Genet.</source><year>2001</year><volume>17</volume><fpage>678</fpage><lpage>680</lpage><pub-id pub-id-type="pmid">11718904</pub-id></citation></ref><ref id="b12"><label>12</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stover</surname><given-names>N.A.</given-names></name><name><surname>Steele</surname><given-names>R.E.</given-names></name></person-group><article-title><italic>Trans-</italic>spliced leader addition to mRNAs in a cnidarian</article-title><source>Proc. Natl Acad. Sci. USA</source><year>2001</year><volume>98</volume><fpage>5693</fpage><lpage>5698</lpage><pub-id pub-id-type="pmid">11331766</pub-id></citation></ref><ref id="b13"><label>13</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hastings</surname><given-names>K.E.</given-names></name></person-group><article-title>SL <italic>trans-</italic>splicing: easy come or easy go?</article-title><source>Trends Genet.</source><year>2005</year><volume>21</volume><fpage>240</fpage><lpage>247</lpage><pub-id pub-id-type="pmid">15797620</pub-id></citation></ref><ref id="b14"><label>14</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pouchkina-Stantcheva</surname><given-names>N.N.</given-names></name><name><surname>Tunnacliffe</surname><given-names>A.</given-names></name></person-group><article-title>Spliced leader RNA mediated <italic>trans-</italic>splicing in phylum Rotifera</article-title><source>Mol. Biol. Evol.</source><year>2005</year><volume>22</volume><fpage>1482</fpage><lpage>1489</lpage><pub-id pub-id-type="pmid">15788744</pub-id></citation></ref><ref id="b15"><label>15</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spieth</surname><given-names>J.</given-names></name><name><surname>Brooke</surname><given-names>G.</given-names></name><name><surname>Kuersten</surname><given-names>S.</given-names></name><name><surname>Lea</surname><given-names>K.</given-names></name><name><surname>Blumenthal</surname><given-names>T.</given-names></name></person-group><article-title>Operons in <italic>C.elegans</italic>: polycistronic mRNA precursors are processed by trans-splicing of SL2 to downstream coding regions</article-title><source>Cell</source><year>1993</year><volume>73</volume><fpage>521</fpage><lpage>532</lpage><pub-id pub-id-type="pmid">8098272</pub-id></citation></ref><ref id="b16"><label>16</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blumenthal</surname><given-names>T.</given-names></name><name><surname>Gleason</surname><given-names>K.S.</given-names></name></person-group><article-title><italic>Caenorhabditis elegans</italic> operons: form and function</article-title><source>Nature Rev. Genet.</source><year>2003</year><volume>4</volume><fpage>112</fpage><lpage>120</lpage><pub-id pub-id-type="pmid">12560808</pub-id></citation></ref><ref id="b17"><label>17</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davis</surname><given-names>R.E.</given-names></name><name><surname>Hodgson</surname><given-names>S.</given-names></name></person-group><article-title>Gene linkage and steady-state RNAs suggest <italic>trans</italic>-splicing may be associated with a polycistronic transcript in <italic>Schistosoma mansoni</italic></article-title><source>Mol. Biochem. Parasitol.</source><year>1997</year><volume>89</volume><fpage>25</fpage><lpage>39</lpage><pub-id pub-id-type="pmid">9297698</pub-id></citation></ref><ref id="b18"><label>18</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname><given-names>P.J.</given-names></name><name><surname>Kooter</surname><given-names>J.M.</given-names></name><name><surname>Borst</surname><given-names>P.</given-names></name></person-group><article-title>Inactivation of transcription by UV irradiation of <italic>T. brucei</italic> provides evidence for a multicistronic transcription unit including a VSG gene</article-title><source>Cell</source><year>1987</year><volume>51</volume><fpage>273</fpage><lpage>281</lpage><pub-id pub-id-type="pmid">3664637</pub-id></citation></ref><ref id="b19"><label>19</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Muhich</surname><given-names>M.L.</given-names></name><name><surname>Boothroyd</surname><given-names>J.C.</given-names></name></person-group><article-title>Polycistronic transcripts in trypanosomes and their accumulation during heat shock: evidence for a precursor role in mRNA synthesis</article-title><source>Mol. Cell. Biol.</source><year>1988</year><volume>8</volume><fpage>3837</fpage><lpage>3846</lpage><pub-id pub-id-type="pmid">3221866</pub-id></citation></ref><ref id="b20"><label>20</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tschudi</surname><given-names>C.</given-names></name><name><surname>Ullu</surname><given-names>E.</given-names></name></person-group><article-title>Polygene transcripts are precursors to calmodulin mRNAs in trypanosomes</article-title><source>EMBO J.</source><year>1988</year><volume>7</volume><fpage>455</fpage><lpage>463</lpage><pub-id pub-id-type="pmid">3366120</pub-id></citation></ref><ref id="b21"><label>21</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vanhamme</surname><given-names>L.</given-names></name><name><surname>Pays</surname><given-names>E.</given-names></name></person-group><article-title>Control of gene expression in trypanosomes</article-title><source>Microbiol Rev.</source><year>1995</year><volume>59</volume><fpage>223</fpage><lpage>240</lpage><pub-id pub-id-type="pmid">7603410</pub-id></citation></ref><ref id="b22"><label>22</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Campbell</surname><given-names>D.A.</given-names></name><name><surname>Thomas</surname><given-names>S.</given-names></name><name><surname>Sturm</surname><given-names>N.R.</given-names></name></person-group><article-title>Transcription in kinetoplastid protozoa: why be normal?</article-title><source>Microbes Infect.</source><year>2003</year><volume>5</volume><fpage>1231</fpage><lpage>1240</lpage><pub-id pub-id-type="pmid">14623019</pub-id></citation></ref><ref id="b23"><label>23</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>T.</given-names></name><name><surname>Kuersten</surname><given-names>S.</given-names></name><name><surname>Deshpande</surname><given-names>A.M.</given-names></name><name><surname>Spieth</surname><given-names>J.</given-names></name><name><surname>MacMorris</surname><given-names>M.</given-names></name><name><surname>Blumenthal</surname><given-names>T.</given-names></name></person-group><article-title>Intercistronic region required for polycistronic pre-mRNA processing in <italic>Caenorhabditis elegans</italic></article-title><source>Mol. Cell. Biol.</source><year>2001</year><volume>21</volume><fpage>1111</fpage><lpage>1120</lpage><pub-id pub-id-type="pmid">11158298</pub-id></citation></ref><ref id="b24"><label>24</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lercher</surname><given-names>M.J.</given-names></name><name><surname>Blumenthal</surname><given-names>T.</given-names></name><name><surname>Hurst</surname><given-names>L.D.</given-names></name></person-group><article-title>Coexpression of neighboring genes in <italic>Caenorhabditis elegans</italic> is mostly due to operons and duplicate genes</article-title><source>Genome Res.</source><year>2003</year><volume>13</volume><fpage>238</fpage><lpage>243</lpage><pub-id pub-id-type="pmid">12566401</pub-id></citation></ref><ref id="b25"><label>25</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maroney</surname><given-names>P.A.</given-names></name><name><surname>Denker</surname><given-names>J.A.</given-names></name><name><surname>Darzynkiewicz</surname><given-names>E.</given-names></name><name><surname>Laneve</surname><given-names>R.</given-names></name><name><surname>Nilsen</surname><given-names>T.W.</given-names></name></person-group><article-title>Most mRNAs in the nematode <italic>Ascaris lumbricoides</italic> are <italic>trans-</italic>spliced: a role for spliced leader addition in translational efficiency</article-title><source>RNA</source><year>1995</year><volume>1</volume><fpage>714</fpage><lpage>723</lpage><pub-id pub-id-type="pmid">7585256</pub-id></citation></ref><ref id="b26"><label>26</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lall</surname><given-names>S.</given-names></name><name><surname>Friedman</surname><given-names>C.C.</given-names></name><name><surname>Jankowska-Anyszka</surname><given-names>M.</given-names></name><name><surname>Stepinski</surname><given-names>J.</given-names></name><name><surname>Darzynkiewicz</surname><given-names>E.</given-names></name><name><surname>Davis</surname><given-names>R.E.</given-names></name></person-group><article-title>Contribution of <italic>trans-</italic>splicing 5′-leader length cap-poly(A) synergism and initiation factors to nematode translation in an <italic>Ascaris suum</italic> embryo cell-free system</article-title><source>J. Biol. Chem.</source><year>2004</year><volume>279</volume><fpage>45573</fpage><lpage>45585</lpage><pub-id pub-id-type="pmid">15322127</pub-id></citation></ref><ref id="b27"><label>27</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blumenthal</surname><given-names>T.</given-names></name></person-group><article-title><italic>Trans-</italic>splicing and polycistronic transcription in <italic>Caenorhabditis elegans</italic></article-title><source>Trends Genet.</source><year>1995</year><volume>11</volume><fpage>132</fpage><lpage>136</lpage><pub-id pub-id-type="pmid">7732590</pub-id></citation></ref><ref id="b28"><label>28</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chomczynski</surname><given-names>P.</given-names></name><name><surname>Sacchi</surname><given-names>N.</given-names></name></person-group><article-title>Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction</article-title><source>Anal. Biochem.</source><year>1987</year><volume>162</volume><fpage>156</fpage><lpage>159</lpage><pub-id pub-id-type="pmid">2440339</pub-id></citation></ref><ref id="b29"><label>29</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ewing</surname><given-names>B.</given-names></name><name><surname>Hillier</surname><given-names>L.</given-names></name><name><surname>Wendl</surname><given-names>M.C.</given-names></name><name><surname>Green</surname><given-names>P.</given-names></name></person-group><article-title>Base-calling of automated sequencer traces using phred I. Accuracy assessment</article-title><source>Genome Res.</source><year>1998</year><volume>8</volume><fpage>175</fpage><lpage>185</lpage><pub-id pub-id-type="pmid">9521921</pub-id></citation></ref><ref id="b30"><label>30</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satou</surname><given-names>Y.</given-names></name><name><surname>Kawashima</surname><given-names>T.</given-names></name><name><surname>Shoguchi</surname><given-names>E.</given-names></name><name><surname>Nakayama</surname><given-names>A.</given-names></name><name><surname>Satoh</surname><given-names>N.</given-names></name></person-group><article-title>An integrated database of the ascidian, <italic>Ciona intestinalis</italic>: towards functional genomics</article-title><source>Zoolog. Sci.</source><year>2005</year><volume>22</volume><fpage>837</fpage><lpage>843</lpage><pub-id pub-id-type="pmid">16141696</pub-id></citation></ref><ref id="b31"><label>31</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kent</surname><given-names>W.J.</given-names></name></person-group><article-title>BLAT–the BLAST-like alignment tool</article-title><source>Genome Res.</source><year>2002</year><volume>12</volume><fpage>656</fpage><lpage>664</lpage><pub-id pub-id-type="pmid">11932250</pub-id></citation></ref><ref id="b32"><label>32</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname><given-names>S.F.</given-names></name><name><surname>Gish</surname><given-names>W.</given-names></name><name><surname>Miller</surname><given-names>W.</given-names></name><name><surname>Myers</surname><given-names>E.W.</given-names></name><name><surname>Lipman</surname><given-names>D.J.</given-names></name></person-group><article-title>Basic local alignment search tool</article-title><source>J. Mol. Biol.</source><year>1990</year><volume>215</volume><fpage>403</fpage><lpage>410</lpage><pub-id pub-id-type="pmid">2231712</pub-id></citation></ref><ref id="b33"><label>33</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satou</surname><given-names>Y.</given-names></name><name><surname>Takatori</surname><given-names>N.</given-names></name><name><surname>Yamada</surname><given-names>L.</given-names></name><name><surname>Mochizuki</surname><given-names>Y.</given-names></name><name><surname>Hamaguchi</surname><given-names>M.</given-names></name><name><surname>Ishikawa</surname><given-names>H.</given-names></name><name><surname>Chiba</surname><given-names>S.</given-names></name><name><surname>Imai</surname><given-names>K.</given-names></name><name><surname>Kano</surname><given-names>S.</given-names></name><name><surname>Murakami</surname><given-names>S.D.</given-names></name><etal/></person-group><article-title>Gene expression profiles in <italic>Ciona intestinalis</italic> tailbud embryos</article-title><source>Development</source><year>2001</year><volume>128</volume><fpage>2893</fpage><lpage>2904</lpage><pub-id pub-id-type="pmid">11532913</pub-id></citation></ref><ref id="b34"><label>34</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>D'Alessio</surname><given-names>J.M.</given-names></name><name><surname>Gerard</surname><given-names>G.F.</given-names></name></person-group><article-title>Second-strand cDNA synthesis with <italic>E. coli</italic> DNA polymerase I and RNase H: the fate of information at the mRNA 5′ terminus and the effect of E. coli DNA ligase</article-title><source>Nucleic Acids Res.</source><year>1988</year><volume>16</volume><fpage>1999</fpage><lpage>2014</lpage><pub-id pub-id-type="pmid">2833725</pub-id></citation></ref><ref id="b35"><label>35</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Suzuki</surname><given-names>Y.</given-names></name><name><surname>Taira</surname><given-names>H.</given-names></name><name><surname>Tsunoda</surname><given-names>T.</given-names></name><name><surname>Mizushima-Sugano</surname><given-names>J.</given-names></name><name><surname>Sese</surname><given-names>J.</given-names></name><name><surname>Hata</surname><given-names>H.</given-names></name><name><surname>Ota</surname><given-names>T.</given-names></name><name><surname>Isogai</surname><given-names>T.</given-names></name><name><surname>Tanaka</surname><given-names>T.</given-names></name><name><surname>Morishita.</surname></name><etal/></person-group><article-title>Diverse transcriptional initiation revealed by fine large-scale mapping of mRNA start sites</article-title><source>EMBO Rep.</source><year>2001</year><volume>2</volume><fpage>388</fpage><lpage>393</lpage><pub-id pub-id-type="pmid">11375929</pub-id></citation></ref><ref id="b36"><label>36</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blaxter</surname><given-names>M.L.</given-names></name><name><surname>De Ley</surname><given-names>P.</given-names></name><name><surname>Garey</surname><given-names>J.R.</given-names></name><name><surname>Liu</surname><given-names>L.X.</given-names></name><name><surname>Scheldeman</surname><given-names>P.</given-names></name><name><surname>Vierstraete</surname><given-names>A.</given-names></name><name><surname>Vanfleteren</surname><given-names>J.R.</given-names></name><name><surname>Mackey</surname><given-names>L.Y.</given-names></name><name><surname>Dorris</surname><given-names>M.</given-names></name><name><surname>Frisse</surname><given-names>L.M.</given-names></name><etal/></person-group><article-title>A molecular evolutionary framework for the phylum Nematoda</article-title><source>Nature</source><year>1998</year><volume>392</volume><fpage>71</fpage><lpage>75</lpage><pub-id pub-id-type="pmid">9510248</pub-id></citation></ref><ref id="b37"><label>37</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>K.Z.</given-names></name><name><surname>Sommer</surname><given-names>R.J.</given-names></name></person-group><article-title>Operon structure and <italic>trans</italic>-splicing in the nematode <italic>Pristionchus pacificus</italic></article-title><source>Mol. Biol. Evol.</source><year>2003</year><volume>20</volume><fpage>2097</fpage><lpage>2103</lpage><pub-id pub-id-type="pmid">12949121</pub-id></citation></ref><ref id="b38"><label>38</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blumenthal</surname><given-names>T.</given-names></name><name><surname>Evans</surname><given-names>D.</given-names></name><name><surname>Link</surname><given-names>C.D.</given-names></name><name><surname>Guffanti</surname><given-names>A.</given-names></name><name><surname>Lawson</surname><given-names>D.</given-names></name><name><surname>Thierry-Mieg</surname><given-names>J.</given-names></name><name><surname>Thierry-Mieg</surname><given-names>D.</given-names></name><name><surname>Chiu</surname><given-names>W.L.</given-names></name><name><surname>Duke</surname><given-names>K.</given-names></name><name><surname>Kiraly</surname><given-names>M.</given-names></name><etal/></person-group><article-title>A global analysis of <italic>Caenorhabditis elegans</italic> operons</article-title><source>Nature</source><year>2002</year><volume>417</volume><fpage>851</fpage><lpage>854</lpage><pub-id pub-id-type="pmid">12075352</pub-id></citation></ref><ref id="b39"><label>39</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fernandez</surname><given-names>C.</given-names></name><name><surname>Gregory</surname><given-names>W.F.</given-names></name><name><surname>Loke</surname><given-names>P.</given-names></name><name><surname>Maizels</surname><given-names>R.M.</given-names></name></person-group><article-title>Full-length-enriched cDNA libraries from <italic>Echinococcus granulosus</italic> contain separate populations of oligo-capped and <italic>trans</italic>-spliced transcripts and a high level of predicted signal peptide sequences</article-title><source>Mol. Biochem. Parasitol.</source><year>2002</year><volume>122</volume><fpage>171</fpage><lpage>180</lpage><pub-id pub-id-type="pmid">12106871</pub-id></citation></ref><ref id="b40"><label>40</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harris</surname><given-names>M.A.</given-names></name><name><surname>Clark</surname><given-names>J.</given-names></name><name><surname>Ireland</surname><given-names>A.</given-names></name><name><surname>Lomax</surname><given-names>J.</given-names></name><name><surname>Ashburner</surname><given-names>M.</given-names></name><name><surname>Foulger</surname><given-names>R.</given-names></name><name><surname>Eilbeck</surname><given-names>K.</given-names></name><name><surname>Lewis</surname><given-names>S.</given-names></name><name><surname>Marshall</surname><given-names>B.</given-names></name><name><surname>Mungall</surname><given-names>C.</given-names></name><etal/></person-group><article-title>The Gene Ontology (GO) database and informatics resource</article-title><source>Nucleic Acids Res.</source><year>2004</year><volume>32</volume><fpage>D258</fpage><lpage>D261</lpage><pub-id pub-id-type="pmid">14681407</pub-id></citation></ref><ref id="b41"><label>41</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Uberbacher</surname><given-names>E.C.</given-names></name><name><surname>Xu</surname><given-names>Y.</given-names></name><name><surname>Mural</surname><given-names>R.J.</given-names></name></person-group><article-title>Discovering and understanding genes in human DNA sequence using GRAIL</article-title><source>Meth. Enzymol.</source><year>1996</year><volume>266</volume><fpage>259</fpage><lpage>281</lpage><pub-id pub-id-type="pmid">8743689</pub-id></citation></ref><ref id="b42"><label>42</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Williams</surname><given-names>C.</given-names></name><name><surname>Xu</surname><given-names>L.</given-names></name><name><surname>Blumenthal</surname><given-names>T.</given-names></name></person-group><article-title>SL1 <italic>trans</italic> splicing and 3′-end formation in a novel class of <italic>Caenorhabditis elegans</italic> operon</article-title><source>Mol. Cell. Biol.</source><year>1999</year><volume>19</volume><fpage>376</fpage><lpage>383</lpage><pub-id pub-id-type="pmid">9858561</pub-id></citation></ref><ref id="b43"><label>43</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hengartner</surname><given-names>M.O.</given-names></name><name><surname>Horvitz</surname><given-names>H.R.</given-names></name></person-group><article-title><italic>C. elegans</italic> cell survival gene ced-9 encodes a functional homolog of the mammalian proto-oncogene bcl-2</article-title><source>Cell</source><year>1994</year><volume>76</volume><fpage>665</fpage><lpage>676</lpage><pub-id pub-id-type="pmid">7907274</pub-id></citation></ref><ref id="b44"><label>44</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zorio</surname><given-names>D.A.</given-names></name><name><surname>Cheng</surname><given-names>N.N.</given-names></name><name><surname>Blumenthal</surname><given-names>T.</given-names></name><name><surname>Spieth</surname><given-names>J.</given-names></name></person-group><article-title>Operons as a common form of chromosomal organization in <italic>C. elegans</italic></article-title><source>Nature</source><year>1994</year><volume>372</volume><fpage>270</fpage><lpage>272</lpage><pub-id pub-id-type="pmid">7969472</pub-id></citation></ref><ref id="b45"><label>45</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satou</surname><given-names>Y.</given-names></name><name><surname>Kawashima</surname><given-names>T.</given-names></name><name><surname>Kohara</surname><given-names>Y.</given-names></name><name><surname>Satoh</surname><given-names>N.</given-names></name></person-group><article-title>Large scale EST analyses in <italic>Ciona intestinalis</italic>: its application as Northern blot analyses</article-title><source>Dev. Genes. Evol.</source><year>2003</year><volume>213</volume><fpage>314</fpage><lpage>318</lpage><pub-id pub-id-type="pmid">12736826</pub-id></citation></ref><ref id="b46"><label>46</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seo</surname><given-names>H.C.</given-names></name><name><surname>Kube</surname><given-names>M.</given-names></name><name><surname>Edvardsen</surname><given-names>R.B.</given-names></name><name><surname>Jensen</surname><given-names>M.F.</given-names></name><name><surname>Beck</surname><given-names>A.</given-names></name><name><surname>Spriet</surname><given-names>E.</given-names></name><name><surname>Gorsky</surname><given-names>G.</given-names></name><name><surname>Thompson</surname><given-names>E.M.</given-names></name><name><surname>Lehrach</surname><given-names>H.</given-names></name><name><surname>Reinhardt</surname><given-names>R.</given-names></name><etal/></person-group><article-title>Miniature genome in the marine chordate <italic>Oikopleura dioica</italic></article-title><source>Science</source><year>2001</year><volume>294</volume><fpage>2506</fpage><pub-id pub-id-type="pmid">11752568</pub-id></citation></ref></ref-list><sec sec-type="display-objects"><title>Figures and Tables</title><fig id="fig1" position="float"><label>Figure 1</label><caption><p>Distances between 5′ ends of genomically-mapped full-length ESTs and the nearest end of their 5′-neighbouring gene models, plotted in 100 base-windows. (<bold>A</bold>) SL-full-length ESTs whose 5′-neighbouring gene is transcribed in the same direction (group I), (<bold>B</bold>) SL-full-length ESTs whose 5′-neighbouring gene is transcribed in the opposite direction (group II), (<bold>C</bold>) non-SL-full-length ESTs whose 5′-neighbouring gene is transcribed in the same direction (group III) and (<bold>D</bold>) non-SL-full-length ESTs whose 5′-neighbouring gene is transcribed in the opposite direction (group IV). Note that only in (A) (group I) are gene pairs separated by <100 bases (first data point) well-represented. The number of gene pairs representing each distance interval is reported as a percentage of the total number of gene pairs in each group.</p></caption><graphic xlink:href="gkl418f1"/></fig><fig id="fig2" position="float"><label>Figure 2</label><caption><p>Dicistronic and moncistronic conventional ESTs, and SL-full-length ESTs, representing operon 41. The top part of the figure is a schematic depiction of the genomic DNA in terms of the intron/exon (lines/boxes) structures for two adjacent genes. The genes, coding for a protein similar to hypothetical proteins in other animals and a GTP-binding nuclear protein Ran, are immediately adjacent, so they are shown on separate lines for clarity. Protein-coding and non-coding regions are shown by grey and green. Below the genomic DNA depiction are diagrams showing conventional EST cDNA clones aligning with exons in this region. Each cDNA clone is represented by two EST sequencing runs, a 5′-EST (red) and a 3′ EST (blue), which are joined by dashed lines and which in some cases overlap. The first cDNA clone depicted, citb076c21, is a dicistronic transcript whose 5′-EST corresponds to the upstream gene and whose 3′ EST corresponds to the downstream gene (intron sequences were not present in the ESTs). An additional cDNA clone, cima003i16, was similar. Other cDNA clones depicted represent mature monocistronic mRNAs corresponding to either the upstream or downstream gene. The bottom depiction represents SL-full-length ESTs corresponding to the downstream gene, showing it to be <italic>trans-</italic>spliced. The genomic juxtaposition of these two cistrons is not due to an artifact of genome assembly, because eight raw whole-genome shotgun reads, of which three are depicted by black arrows at the bottom, can be aligned across the intercistron boundary.</p></caption><graphic xlink:href="gkl418f2"/></fig><fig id="fig3" position="float"><label>Figure 3</label><caption><p>Cistrons in <italic>Ciona</italic> candidate operons are immediately juxtaposed with no intervening DNA. Genomic sequences (assembly version 1.0) and coordinates are shown on the top line of each panel, with scaffold name and nucleotide position indicated. ESTs mapping to this site are shown below; conventional ESTs for the upstream genes (lower case roman letters, poly(<bold>A</bold>)—10 residues shown—shaded in red) and SL-full-length ESTs for the downstream genes (lower case italic letters, SL sequence shaded in grey). The intercistron boundaries are indicated by arrows. In addition to the two examples shown here, 16 similarly-organized operons are shown in Supplementary Figure S3.</p></caption><graphic xlink:href="gkl418f3"/></fig><table-wrap id="tbl1" position="float"><label>Table 1</label><caption><p>SL sequence variants observed in SL-full-length ESTs</p></caption><table frame="hsides" rules="groups"><thead><tr><th align="left" colspan="1" rowspan="1">SL sequence</th><th align="left" colspan="1" rowspan="1">Number of full-length ESTs</th><th align="left" colspan="1" rowspan="1">Number of clusters/genes</th></tr></thead><tbody><tr><td align="left" colspan="1" rowspan="1">ATTCTATTTGAATAAG</td><td align="left" colspan="1" rowspan="1">517</td><td align="left" colspan="1" rowspan="1">307</td></tr><tr><td align="left" colspan="1" rowspan="1">_TTCTATTTGAATAAG</td><td align="left" colspan="1" rowspan="1">14</td><td align="left" colspan="1" rowspan="1">14</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTATTT<underline>A</underline>AATAAG</td><td align="left" colspan="1" rowspan="1">4</td><td align="left" colspan="1" rowspan="1">4</td></tr><tr><td align="left" colspan="1" rowspan="1">A<underline>T</underline>TTCTATTTGAATAAG</td><td align="left" colspan="1" rowspan="1">4</td><td align="left" colspan="1" rowspan="1">4</td></tr><tr><td align="left" colspan="1" rowspan="1">_CTATTTGAATAAG</td><td align="left" colspan="1" rowspan="1">3</td><td align="left" colspan="1" rowspan="1">3</td></tr><tr><td align="left" colspan="1" rowspan="1"><underline>A</underline>ATTCTATTTGAATAAG</td><td align="left" colspan="1" rowspan="1">3</td><td align="left" colspan="1" rowspan="1">3</td></tr><tr><td align="left" colspan="1" rowspan="1"><underline>GT</underline>ATTCTATTTGAATAAG</td><td align="left" colspan="1" rowspan="1">3</td><td align="left" colspan="1" rowspan="1">3</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTA<underline>A</underline>TTGAATAAG</td><td align="left" colspan="1" rowspan="1">3</td><td align="left" colspan="1" rowspan="1">2</td></tr><tr><td align="left" colspan="1" rowspan="1">_TCTATTTGAATAAG</td><td align="left" colspan="1" rowspan="1">2</td><td align="left" colspan="1" rowspan="1">2</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTATT<underline>A</underline>GAATAAG</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTA_TTGAATAAG</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTATTT<underline>C</underline>AATAAG</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTATTTGAA<underline>A</underline>AAG</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTATTTGAA<underline>C</underline>AAG</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTATTTGAA<underline>G</underline>AAG</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCTATTTGA_____</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr><tr><td align="left" colspan="1" rowspan="1">ATTCT<underline>G</underline>TTTGAATAAG</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td></tr></tbody></table></table-wrap><table-wrap id="tbl2" position="float"><label>Table 2</label><caption><p>Full-length EST representation of upstream and downstream genes in candidate operons</p></caption><table frame="hsides" rules="groups"><thead><tr><th colspan="1" rowspan="1"/><th colspan="4" align="left" rowspan="1">Number of full-length ESTs</th></tr><tr><th colspan="1" rowspan="1"/><th colspan="2" align="left" rowspan="1">Upstream gene</th><th colspan="2" align="left" rowspan="1">Downstream gene</th></tr><tr><th align="left" colspan="1" rowspan="1">Operon ID<sup>a</sup></th><th align="left" colspan="1" rowspan="1">SL-full-length EST</th><th align="left" colspan="1" rowspan="1">Non-SL-full-length EST</th><th align="left" colspan="1" rowspan="1">SL-full-length EST</th><th align="left" colspan="1" rowspan="1">Non-SL-full-length EST</th></tr></thead><tbody><tr><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">2</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">3</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">4</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">5</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">6</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">5</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">7</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">2</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">8</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">2</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">9</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">10</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">11</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">12</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">13</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">14</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">15</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">16</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">17</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">18</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">19</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">20</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">21</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">22</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">6</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">23</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">24</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">25</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">11</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">26</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">27</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">5</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">28</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">5</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">29</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">30</td><td align="left" colspan="1" rowspan="1">2</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">31</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">32</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">33</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">34</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">35</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">36</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">37</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">38</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">39</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">40</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">41</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">3</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">42</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">43</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">0</td><td align="left" colspan="1" rowspan="1">1</td><td align="left" colspan="1" rowspan="1">0</td></tr><tr><td align="left" colspan="1" rowspan="1">Total</td><td align="left" colspan="1" rowspan="1">7</td><td align="left" colspan="1" rowspan="1">38</td><td align="left" colspan="1" rowspan="1">26</td><td align="left" colspan="1" rowspan="1">0</td></tr></tbody></table><table-wrap-foot><fn><p><sup>a</sup>ID numbers assigned to candidate operons described in detail in the text. Genes within each operon are listed in Supplementary Table S5, except for operons 13–16 and 41–43. The downstream genes in candidate operons 13–16 are not accurately represented by gene models but were represented by cicx006d13 (operon13), cicx007d17 (operon14), cicx008h05 (operon15) and cicx009d07 (operon16). Operons 41–43 were found in other unrelated studies and their detailed features are shown in Supplementary Table S6.</p></fn></table-wrap-foot></table-wrap></sec></back></article>