-
Notifications
You must be signed in to change notification settings - Fork 10
/
agrafioti_05_comparative_793634.pdf.txt
1454 lines (1101 loc) · 49 KB
/
agrafioti_05_comparative_793634.pdf.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>1471-2148-5-23.fm</title>
<meta name="Author" content="csproduction"/>
<meta name="Creator" content="FrameMaker 7.0"/>
<meta name="Producer" content="Acrobat Distiller 5.0.5 (Windows)"/>
<meta name="CreationDate" content=""/>
</head>
<body>
<pre>
BMC Evolutionary Biology
BioMed Central
Open Access
Research article
Comparative analysis of the Saccharomyces cerevisiae and
Caenorhabditis elegans protein interaction networks
Ino Agrafioti1, Jonathan Swire1, James Abbott2, Derek Huntley2,
Sarah Butcher2 and Michael PH Stumpf*1
Address: 1Theoretical Genomics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College London, Wolfson Building,
SW7 2AZ, London, UK and 2Bioinformatics Support Service, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College
London, Wolfson Building, SW7 2 AZ, London, UK
Email: Ino Agrafioti - ino.agrafioti@imperial.ac.uk; Jonathan Swire - j@robberfly.com; James Abbott - j.abbott@imperial.ac.uk;
Derek Huntley - d.huntley@imperial.ac.uk; Sarah Butcher - s.butcher@imperial.ac.uk; Michael PH Stumpf* - m.stumpf@imperial.ac.uk
* Corresponding author
Published: 18 March 2005
BMC Evolutionary Biology 2005, 5:23
doi:10.1186/1471-2148-5-23
Received: 09 December 2004
Accepted: 18 March 2005
This article is available from: http://www.biomedcentral.com/1471-2148/5/23
© 2005 Agrafioti et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Background: Protein interaction networks aim to summarize the complex interplay of proteins
in an organism. Early studies suggested that the position of a protein in the network determines its
evolutionary rate but there has been considerable disagreement as to what extent other factors,
such as protein abundance, modify this reported dependence.
Results: We compare the genomes of Saccharomyces cerevisiae and Caenorhabditis elegans with
those of closely related species to elucidate the recent evolutionary history of their respective
protein interaction networks. Interaction and expression data are studied in the light of a detailed
phylogenetic analysis. The underlying network structure is incorporated explicitly into the
statistical analysis. The increased phylogenetic resolution, paired with high-quality interaction data,
allows us to resolve the way in which protein interaction network structure and abundance of
proteins affect the evolutionary rate. We find that expression levels are better predictors of the
evolutionary rate than a protein's connectivity. Detailed analysis of the two organisms also shows
that the evolutionary rates of interacting proteins are not sufficiently similar to be mutually
predictive.
Conclusion: It appears that meaningful inferences about the evolution of protein interaction
networks require comparative analysis of reasonably closely related species. The signature of
protein evolution is shaped by a protein's abundance in the organism and its function and the
biological process it is involved in. Its position in the interaction networks and its connectivity may
modulate this but they appear to have only minor influence on a protein's evolutionary rate.
Background
Studies of the evolutionary history of protein interaction
network (PIN) data have produced an almost bewildering
range of (partially) contradictory results [1-6,8-12]. While
PIN data is notoriously prone to false positive and nega-
tive results [5,13], reasons for disagreements are probably
more diverse than just the quality of the interaction data.
Failure to account for protein abundance – as measured
by gene expression levels, or by proxy, the codon-adaptation index – has been criticized [3]; the choice of species
Page 1 of 14
(page number not for citation purposes)
BMC Evolutionary Biology 2005, 5:23
for comparative analysis will also affect any evolutionary
inferences as shown by Hahn et al. [12]. This may either
be due to loss of power (e.g. fewer reliably identified
orthologues between more distantly related species) or to
differences in underlying PINs in distantly related species.
Below, for example, we will show that results obtained
from a comparison between the two hemiascomycetes
Saccharomyces cerevisiae and Candida albicans differ considerably from those obtained using a distant S. cerevisiae –
Caenorhabditis elegans comparison. Finally, it has recently
been shown that many studies may have suffered from the
fact that present network data, and this is in particular true
for PINs, are random samples from much larger networks.
Unless these subnets are adequate representations of the
overall network, their structural properties (such as node
connectivity) may differ quite substantially from that of
the nodes in the global network. This is, for example, the
case for so-called scale-free network models [14].
Moreover, many studies have ignored the underlying network structure [15] in the statistical analysis. The network,
however, introduces dependencies between connected
proteins which should not be ignored. Fraser et al. [2] for
example find that (i) there is a negative correlation
between a protein's evolutionary rate and its connectivity
k (the number of its interactions), (ii) connected proteins
have positively correlated evolutionary rates, and (iii)
connected proteins do not have correlated connectivities.
All three statements cannot, of course, be strongly true
simultaneously. Here we observe only relatively weak –
though statistically significant – correlations between connectivity and evolutionary rate. We will argue that in a
regression framework [16] some of these quantities contain very little information indeed about the corresponding properties of their interaction partners. Furthermore,
we will demonstrate that when analyzing network data
the network structure must be included into the analysis
from the outset. Here we will first perform an evolutionary analysis of the yeast and nematode PIN data available
in the DIP database [7], a hand-curated dataset combining
information from a wide range of sources, followed by a
comparison of the two datasets. When making comparisons between yeast species and between nematode species, we use only a single PIN dataset – for S. cerevisiae and
C. elegans, respectively – and take comfort from the observation of Hahn et al. [12] who find that evolutionary analysis involving closely related reference taxa produces
consistent results. Previously, topological comparisons of
biological network data from different species have been
made [17] but here we focus on shared evolutionary characteristics of PINs in the two species. We would expect at
least some level of similarity of biological networks
between species; but the more distantly related two organisms are, the more changes can have accumulated in their
respective molecular networks. Thus, the depth of the
http://www.biomedcentral.com/1471-2148/5/23
Figure 1
Phylogeny of the organisms used in the study
Phylogeny of the organisms used in the study. The
evolutionary relationship of the organisms used in this study.
The last common ancestor of the ascomycetes in this phylogeny has been estimated to have lived approximately 330 million years ago. For the nematodes only two annotated
genomes were available: their last common ancestor is
believed to have lived approximately 100 million years ago.
phylogeny can affect the evolutionary analysis of PINs; it
is, for example, unlikely that PINs have been conserved
over large evolutionary time-scales.
Results
Evolutionary analysis of the S. cerevisiae PIN
For the evolutionary analysis of the yeast PIN we use a
panel of related yeast species: Saccharomyces mikatae, Saccharomyces bayanus, Saccharomyces casteliii, Saccharomyces
kluyveri, C. albicans and Schizosaccharomyces pombe (see
Methods section); the evolutionary relationship between
these species is shown in figure 1. We thus focus on relatively recent evolutionary change which allows us to study
the effects of the network structure on the rate of evolution more directly than e.g. distant comparisons of S. cerevisiae and C. elegans, which may, after all, have different
PINs.
Connectivity, expression and evolutionary rates in the S.
cerevisiae PIN
For most protein sequences we have not been able to
identify orthologues in all yeast species used in this analysis. We therefore defined the averaged relative evolutionary rate R (see Eqn. (1) in the Methods section) which
allows us to make comparisons for 4124 out of the 4773
yeast genes for which we have interaction data.
Page 2 of 14
(page number not for citation purposes)
http://www.biomedcentral.com/1471-2148/5/23
1.0
0.5
0.0
Average Relative Rate
1.5
BMC Evolutionary Biology 2005, 5:23
1
2
5
10
20
50
100
200
1.0
0.5
0.0
Average Relative Rate
1.5
Nr. of interactions
5
10
50
100
500 1000
5000
Expression level
Figure 2
Dependence of evolutionary rate in ascomycetes on the number of protein interactions and expression level
Dependence of evolutionary rate in ascomycetes on the number of protein interactions and expression level.
The averaged relative rate R decreases with increasing number of interaction partners (τ ≈ -0.06) and the expression level (τ ≈
-0.23). The 95% bootstrap intervals for Kendall's τ values obtained from the six species comparisons are always negative (see
table 1). The linear regression curves (red) appear concave on the log-transformed x-axis.
In figure 2 we show the dependence between inferred evolutionary rates and connectivities and expression levels,
respectively. Our comparative analysis found statistically
significant, though small, negative correlation, measured
by Kendall's τ, between estimated evolutionary rates and
a protein's number of interactions. In table 1 and figure 3
Page 3 of 14
(page number not for citation purposes)
BMC Evolutionary Biology 2005, 5:23
http://www.biomedcentral.com/1471-2148/5/23
Table 1: Evolutionary analysis of S. cerevisiae Correlations between evolutionary rate, number of connections and expression level of
proteins and the confidence intervals for Kendall's τ statistic obtained for the different ascomycete species. Values of τ that have
associated p-values < 0.01 are highlighted in bold. X1 denotes correlation with evolutionary rate obtained from a pairwise sequence
comparison between S. cerevisiae and species X; X2 differs from X1 only in that the evolutionary rate was obtained using a maximum
likelihood estimate. M denotes a rate obtained with respect to S. mikatae, B to S. bayanus, C to S. castellii, K to S. kluyveri, A to C.
albicans, and P to S. pombe.
Species comparison
M1
M2
B1
B2
C1
C2
K1
K2
A1
A2
P1
P2
Connectivity
2.5-%
97.5-%
-0.13
-0.17
-0.11
-0.16
-0.19
-0.13
-0.13
-0.16
-0.11
-0.14
-0.17
-0.12
-0.11
-0.13
-0.08
-0.12
-0.14
-0.09
-0.12
-0.15
-0.08
-0.14
-0.17
-0.10
-0.08
-0.11
-0.06
-0.08
-0.11
-0.05
-0.10
-0.14
-0.07
-0.10
-0.13
-0.07
Expression
2.5-%
97.5-%
-0.25
-0 28
-0.22
-0.30
-0 33
-0.27
-0.26
-0 28
-0.24
-0.29
-0 30
-0.27
-0.29
-0 31
-0.27
-0.32
-0 34
-0.30
-0.28
-0 32
-0.25
-0.32
-0 35
-0.29
-0.28
-0 31
-0.26
-0.28
-0 30
-0.25
-0.30
-0 33
-0.27
-0.28
-0 31
-0.25
we observe that comparisons with all species support this
notion We furthermore estimated approximate confidence intervals for τ from 1000 bootstrap replicates [18]
(shown in table 1).
Observed negative correlations between estimated evolutionary rates and the expression level – which have been
reported previously by Pal et al. [19] – are more pronounced. Equally, k, the number of a protein's interactions, and expression levels are also correlated (τ = 0.09).
There has been considerable controversy as to whether the
effect of a protein's connectivity can be studied independently of expression levels (see e.g. [3,4]). The observed values of τ suggest that expression levels are better predictors
of the evolutionary rate than are connectivities. Calculating partial rank correlation coefficients, τp, provides further evidence for this: correcting for expression reduces
the correlation between the evolutionary rate R (or any of
the individual rates) and the number of interactions, as is
apparent from figure 3. As the phylogenetic distance
between species increases, the negative partial correlation
between evolutionary rate and connectivity decreases
compared to the uncorrected rank correlation measure τ.
In the supplementary tables S1-S3 [see Additional file 1]
we show the evolutionary rates for the different functional
categories, processes and cellular compartments (taken
from Gene Ontology (GO) [20]. Interestingly, once the
effects of expression and protein function on the estimated evolutionary rate are taken into account the
dependence of the latter on connectivity in a generalized
linear regression model [16] (where we log-transformed
the expression level to obtain an approximately normal
distribution) is considerably reduced. This can be assessed
formally using the Akaike information criterion (AIC)
[21] on the sub-models where one of the terms has been
dropped (see methods). For the full model we obtain AIC
= -407.4. Dropping expression from the model results in
AIC = -196.9, indicating that a substantial amount of
information about the evolutionary rate is contained in
the expression levels. Dropping the other terms individually while retaining the rest results in: AIC = -392.9 if the
connectivity is dropped from the statistical model, and
AIC = -352.6 (process), -250.1 (function) and -392.7
(compartment). We thus have the following order of statistically inferred impact on the evolutionary rate (with a
slight abuse of the notation): expression>function>process>connectivity≈compartment. Using the rates obtained
from comparisons with the individual species results in
the same ordering.
Evolution of interacting proteins in S. cerevisiae
So far we have treated nodes/proteins as independent
(using only their connectivities in the analysis) but we will
now consider the extent to which interactions introduce
dependencies into the data. It is intuitively plausible that
interacting proteins have similar evolutionary rates, and
this has indeed been reported by Fraser et al. [2,22] and
studied by others, too, e.g. [3,12]. Just like them we find
that evolutionary rate decreases with connectivity; we also
observe that the connectivities of interacting proteins are
anti-correlated in yeast (τ ≈ -0.03 with p < 10-8). This is
well explained from the statistical theory of networks
[14,23], as well as structural analyses of PIN data, where it
is found that highly connected proteins form hubs which
connect sparsely connected proteins.
Taken together this would mean that the evolutionary
rates of connected proteins should also be anti-correlated.
This is, however, not the case when we look at the yeast
PIN, where we find that evolutionary rates of interacting
proteins are positively correlated as measured by Kendall's
Page 4 of 14
(page number not for citation purposes)
BMC Evolutionary Biology 2005, 5:23
S.bayanus
S.castellii
S.kluyverii
C.albicans
S.pombe
−0.14
−0.08
−0.02
S.mikatae
http://www.biomedcentral.com/1471-2148/5/23
Correlation between rate and connectivity
S.bayanus
S.castellii
S.kluyverii
C.albicans
S.pombe
−0.35
−0.20
−0.05
S.mikatae
kendall’s τ
kendall’s partial τ
Correlation between rate and expression
Figure 3 and partial correlation between evolutionary rate, number of interactions and expression level
Correlation
Correlation and partial correlation between evolutionary rate, number of interactions and expression level.
Kendall's rank correlation (blue) and partial rank correlation coefficients (red) between R and the number of interactions (correcting the partial τ for expression level) and expression (correcting for the number of interactions).
τ. The correlations we observe are only relatively weak
(even though they are significant) τ ≈ 0.05 – 0.10 with p <
10-8. In figure 4 we show the distribution of the τ rank correlation under the correct network Null model (see meth-
ods) for rates, expression levels and connectivities of
interacting proteins. The observed value always lies outside the distribution of the expected values. Also shown in
the figure are the probabilities that two interacting pro-
Page 5 of 14
(page number not for citation purposes)
BMC Evolutionary Biology 2005, 5:23
http://www.biomedcentral.com/1471-2148/5/23
Expression level
100
0
50
Frequency
80
40
0
Frequency
150
Evolutionary rate (S.cerevisisae − S. mikatae)
−0.05
0.00
0.05
0.10
−0.04
−0.02
0.00
0.02
0.04
Kendall’s τ
Nr. of Interactions
0.08
Protein Function
300
0
100
Frequency
100
50
0
Frequency
150
Kendall’s τ
0.06
−0.04
−0.02
0.00
0.02
0.04
0.00
0.05
0.10
0.15
0.20
Probabilityof same function
Biological Process
Cellular Compartment
200
0
100
Frequency
300
100
0
Frequency
300
Kendall’s τ
0.25
0.00
0.05
0.10
0.15
0.20
0.25
Probability of same process
0.10
0.15
0.20
0.25
0.30
0.35
Probability of same cellular compartment
Figure 4
Statistical dependencies of interacting proteins in S. cerevisiae
Statistical dependencies of interacting proteins in S. cerevisiae. Bootstrap distributions of Kendall's τ between evolutionary rates, expression levels and numbers of interactions and probabilities that protein function and the processes and cellular compartments by which proteins are classified are identical for a pair of interacting proteins. The grey histograms show
the distribution of the statistics obtained from 1000 bootstrap replicates and the red vertical lines indicate the observed value.
The bootstrap procedure was constrained such that each sample reproduced the degree distribution of the observed PIN.
teins have identical GO-classifications for function, process and cellular compartment, respectively. Again the
observed probabilities lie outside the distribution under
the Null model.
Page 6 of 14
(page number not for citation purposes)
BMC Evolutionary Biology 2005, 5:23
http://www.biomedcentral.com/1471-2148/5/23
Correlation, even partial correlation, may, however, be an
inadequate statistical measure if the data is structured (as
in a network); one should then rather focus on the power
of a factor such as expression level or connectivity to predict evolutionary rates. We assess this formally through
the use of statistical regression models which describe the
evolutionary rate of one protein as a function of the rate
of its interacting partner, as well as of its expression level,
number of interactions, function, process and cell compartment. The AIC, which for the full model is AIC = 2397.6, allows us to order the factors by the information
they contain about a protein's evolutionary rate. The order
(and the respective AIC value on dropping the factor from
the model) is as follows: Expression (AIC = -1399.6),
function (AIC = -1445.9), process (AIC = -1956.6), cellular compartment (AIC = -2226.6), connectivity (AIC = 2316.8), and the rate of one of its interaction partners
(AIC = -2397.0). Note that, measured by the AIC, the evolutionary rate of an interaction partner provides virtually
no additional information about a protein's own evolutionary rate, once the protein's own expression level, function and process have been taken into account.
-0.017]. Anti-correlation between the CAI measure of
expression and evolutionary rates is again much more
pronounced with τ ≈ -0.30 and approximate bootstrap CIs
of [-0.333, -0.264]. The resulting scatter plots of rate vs.
connectivity and rate vs. CAI are shown in figure 5.
Thus, in summary, we observe that the evolutionary rate
of yeast proteins is inversely related both to their connectivity in the PIN and to their expression levels, with
expression levels having a greater impact on a protein's
evolutionary rate than connectivities. Finally, while there
is statistically significant correlation between the rates of
interacting proteins, the rate of one interaction partner
carries very little information about the rate of the other
protein if other factors are taken into account.
Evolution of interacting proteins in C. elegans
Comparing properties of interacting proteins we again
find a negative correlation between their respective connectivities (τ = -0.07) and a weaker positive correlation
between their evolutionary rates (τ = 0.03).
Evolutionary analysis of the C. elegans PIN
In the evolutionary analysis of C. elegans we use C.
briggsae, the only other congeneric nematode for which
high quality whole-genome data is available. Since nematodes are multicellular, care has to be taken when
analysing the effects of gene expression on evolutionary
rate, as expression levels will vary considerably between
tissues and, indeed, between different stages of the nematode life cycle. Because codon usage bias as a selective
response increasing translational efficiency should be
driven by the overall expression level of a protein integrated over both tissue and time, the codon-adaptation
index (CAI; see Methods and [24]) can serve as a meaningful averaged quantity reflecting overall integrated expression levels better than a direct measurement of mRNA
expression level data obtained from any single tissue type.
Connectivity, expression and evolutionary rates in the C.
elegans PIN
The correlation of evolutionary rate and connectivity is
somewhat reduced compared to S. cerevisiae with a point
estimate of τ = -0.05 with a 95% bootstrap CI of [-0.097,
Partial correlation coefficients again show that the influence of expression is greater than that of connectivity: τp ≈
-0.03 for the partial correlation measure between rate and
connectivity, while τp ≈ -0.30 if the correlation between
expression (CAI) and rate is corrected for connectivity.
This is confirmed by performing an ANOVA [25] on the
regression between rate, CAI and connectivity where no
significant correlation can be found between rate and connectivity (p ≈ 0.62). Generalized linear regression modelling shows that measured by the AIC a model in which the
rate depends only on the CAI but not on the connectivity
(AIC = -660.5) is more powerful than a model in which
the rate depends on both connectivity and CAI (AIC = 618.4). In the absence of extensive GO data we find that
the CAI is the only statistically significant predictor for a
protein's evolutionary rate.
The corresponding 95% bootstrap CI for τ does, however,
include 0 and negative values; thus there is no statistical
basis for concluding that evolutionary rates of interacting
proteins are correlated in C. elegans even if we consider
only the rank correlation measure. In figure 6 the
distribution of τ under the correct Null model (see methods) confirms this result as the observed correlation
between the evolutionary rates of interacting proteins falls
into the 95% confidence interval obtained from the Null
model. Expression levels are, however, significantly correlated and connectivities remain significantly anti-correlated. Regression models, equivalent to those performed
for yeast, confirm the negligible information a protein's
evolutionary rate contains about the evolutionary rate of
an interacting protein.
In summary, for C. elegans we find that expression, even if
measured indirectly through the CAI, is a better predictor
about a protein's evolutionary rate than connectivity and
GO classifications. The evolutionary rates of connected
proteins do not appear to be correlated.
Comparing the PINs of S. cerevisiae and C. elegans
It is instructive to compare the PINs of the two model
organisms, yeast and worm, directly. We have therefore
used our earlier approach of identifying and analysing
Page 7 of 14
(page number not for citation purposes)
http://www.biomedcentral.com/1471-2148/5/23
0.4
0.0
Evolutionary Rate
0.8
BMC Evolutionary Biology 2005, 5:23
1
2
5
10
20
50
100
0.6
0.4
0.2
0.0
Evolutionary Rate
0.8
Nr. of interactions
0.4
0.5
0.6
0.7
0.8
0.9
Codon Adaptation Index
Dependence of evolutionary rate in nematodes on the number of protein interactions and CAI
Figure 5
Dependence of evolutionary rate in nematodes on the number of protein interactions and CAI. The estimated
evolutionary rate decreases with increasing number of interaction partners (τ ≈ -0.03) and the expression level (τ ≈ -0.30). We
have again transformed the x-axis in the scatterplot of rate vs. connectivity which leads to the concave shape of the regression
line (red).
orthologues to the yeast and nematode PIN data. While
we are, of course, aware that this may be problematic
given the two or three billion years of evolutionary history
separating the two organisms, it should serve as a useful
illustration of the amount of information one modelorganism is likely to provide about another (including, of
Page 8 of 14
(page number not for citation purposes)
BMC Evolutionary Biology 2005, 5:23
http://www.biomedcentral.com/1471-2148/5/23
Expression level
60
20
40
Frequency
30
0
0
10
Frequency
50
80
70
100
Evolutionary rate
−0.10
−0.05
0.00
0.05
0.10
Kendall’s τ
−0.06
−0.02
0.02
0.06
Kendall’s τ
40
0
20
Frequency
60
80
Nr. of Interactions
−0.05
0.00
0.05
Kendall’s τ
Figure 6
Statistical dependencies of interacting proteins in C. elegans
Statistical dependencies of interacting proteins in C. elegans. Bootstrap distributions of Kendall's τ between evolutionary rates, expression levels and numbers of interactions. The grey histograms show the distribution of τ obtained in 1000 bootstrap replicates and the red lines indicate the observed value. The bootstrap procedure was constrained such that each sample
reproduced the degree distribution of the observed PIN.
course, humans). Using this approach we found a total of
524 pairs of orthologues. These we aligned and from the
alignments we estimated evolutionary rates. For all of
these proteins we have PIN data and for most we also have
information about their expression levels in the two species. The results are summarized in tables 2 and 3.
Page 9 of 14
(page number not for citation purposes)
BMC Evolutionary Biology 2005, 5:23
http://www.biomedcentral.com/1471-2148/5/23
Table 2: Correlations obtained from a direct comparison of S. cerevisiae with C. elegans Orthologues in the S. cerevisiae and C. elegans
PINs where identified by reciprocal BLAST searches and evolutionary rates, estimated previously (see table 1), were analysed for
correlation between evolutionary rate, the number of interactions and expression levels. We also performed an analysis with
evolutionary rates estimated directly from the distant S. cerevisiae and C. elegans comparison.
Comparison
Evolutionary Rate obtained from closely related
species
Evolutionary Rate obtained from S. cerevisiae- C.
elegans
S. cerevisiae
C. elegans
S. cerevisiae
C. elegans
Nr. of Interactions
2.5-percentile
97.5-percentile
-0.11
-0.20
-0.02
-0.13
-0.23
-0.04
-0.20
-0.26
-0.14
-0.10
-0.19
-0.03
Expression
2.5-percentile
97.5-percentile
-0.33
-0.41
-0.24
-0.44
-0.50
-0.36
-0.25
-0.32
-0.19
-0.42
-0.47
-0.37
Table 3: Correlations between orthologous proteins in the S.
cerevisiae and C. elegans PINs Observed rank correlations
(measured by Kendall's τ) for evolutionary rates (measured with
respect to S.mikatae and C. briggsae, respectively), connectivity
and protein expression level (estimated by mRNA expression
level in S. cerevisiae and CAI in C. elegans).
Quantity
Observed τ
95% CI
Evolutionary Rate
Connectivity
Expression
0.24
0.07
0.32
0.12–0.35
0.001–0.14
0.26–0.39
Although they essentially agree with the earlier results,
they do suggest that the choice of species used for inferring the evolutionary rate can influence the analysis. For
example, the partial correlation between interaction and
evolutionary rate (calculated directly from the S. cerevisiae
– C. elegans amino acid sequence comparison) accounting
for expression is much less reduced compared with the
simple correlation coefficient (τp = -0.20 in S. cerevisiae,
and τp = -0.10 in C. elegans) than when evolutionary rates
are calculated using more closely related target species.
Over long evolutionary distances it appears as if
connectivity and expression level act almost independently. However, the more reliable comparisons of the previous section suggest that this is not the case.
Comparing properties of orthologous proteins we find
that their expression levels (using the CAI as a proxy in C.
elegans) show the strongest correlation while their
respective PIN connectivities show the lowest value for
Kendall's τ statistic. This may be due to the noise in the
PIN data or the incomplete nature of present PIN data
sets. We expect that the relatively small proportion of C.
elegans proteins included in the DIP database will also