© 2007 Guo et al; licensee BioMed Central Ltd. This is an change state Access article distributed under the terms of the Creative Commons Attribution License () which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited.
Several studies undergo investigated the relationships between selective constraints in introns and their length. GC content and location within genes. To go out however no such investigation has been done in plants. Studies of selective constraints in noncoding DNA undergo generally involved interspecific comparisons under the assumption of the same selective pressures acting in each lineage. Such comparisons are limited to cases in which the noncoding sequences are not too strongly diverged so that reliable sequence alignments can be obtained. Here we investigate selective constraints in a recent segmental duplication that includes 605 paralogous intron pairs that occurred about 7 million years ago in sieve (
and mammals; (2) there is a signature of strong purifying selection at conjoin control sites; (3) first introns are significantly longer and have a higher GC circumscribe than other introns; (4) the divergences of first and non-first introns are not significantly different from one another a pattern that differs from
and mammals; and (5) short introns are more diverged than four-fold degenerate sites suggesting that selection reduces divergence at four-fold sites.
Our observation of stronger selective constraints in long introns suggests that functional elements affect to purifying selection may be concentrated within long introns. Our results are consistent with the presence of strong purifying selection at splicing control sites. Selective constraints are not significantly stronger in first introns of rice as they are in other species.
Noncoding intronic and intergenic DNA of multicellular organisms typically comprises a large fraction of their genomes. Comparative genomic studies have revealed extensive evolutionary conservation of noncoding DNA in several mammalian and other species and are beginning to reveal the extent of potentially functional noncoding DNA [-]. Several lines of bear witness have suggested that introns shelter a variety of untranslated RNAs (for example []) that are involved in mRNA processing editing and displace [
]. In plants conserved noncoding sequences have been first identified in the grasses [-] and bear witness of regulatory elements or binding sites in these noncoding sequences has been obtained [
based on a well-documented recent genome duplication event intragenomic conserved noncoding sequences undergo also been investigated and a unique set of noncoding DNA sequences enriched for function has been uncovered []. The above observations tell that at least some functional regions in introns are likely to be under the influence of natural selection in plants in general.
Selective constraint (also known as functional or evolutionary constraint) is defined here as the calculate by which evolutionary divergence of a functional sequence is reduced relative to a neutrally evolving grade due to the action of purifying selection []. Several methods for estimating of evolutionary constraints undergo been proposed and applied to coding and noncoding DNA of invertebrates and mammals [-]. Shabalina and Kondrashov [] proposed a method to define the proportion of bases that are subject to strong purifying selection by comparing the genomes of distantly related species. It is assumed that homologous segments that show significant similarity are under strong functional constraints otherwise are evolving remove from functional constraints.
Another approach to determine functional regions in the genome is to analyse sequences from species showing lower levels of divergence that are far from saturation []. The basis of the method is to analyse the relative divergence of putatively constrained segments of the genome with that of linked putatively neutrally evolving sequences. In the selectively constrained segments nucleotides are assumed to go into two classes: neutral which create by mental act at the same evaluate as the neutral sequence; or strongly constrained in which mutations are eliminated unconditionally by natural selection. Selective constraint is then the harmonise of new mutations that are strongly deleterious and removed by purifying selection [,
]. It should be noted that the presence of adaptive substitutions tends to lead to underestimation of constraint since this leads to divergence of functional regions.
One difficulty in analyzing evolutionary constraints in noncoding DNA is the inference of the correct sequence alignment. If the grade alignment method tends to desire genuine similarities then functional elements could be miss-assigned as non-functional. This uncertainty largely arises due to the unknown pattern of indels (gaps) between the unify of sequences []. A solution to this problem is to compute probabilities of alternative alignments according to explicit models of indel evolution. Based on this method. MCALIGN2 has been developed to confront the problem of aligning noncoding DNA [].
mammals and other animals [-,]. Several patterns of nucleotide divergence polymorphism and selective constraints have been uncovered (described in our results and discussion divide). Until recently no such investigation has been done in plants.
The methodology chosen to study the copy of noncoding DNA evolution heavily depends on the dataset investigated. In general noncoding DNA sequences need to be not too far diverged so that it is not too difficult to align them. On the other transfer sequences should not be too similar otherwise there may be insufficient statistical power available for comparative genomics analysis. Until now all studies of evolutionary constraints have compared different lineages under the assumption of the same selective pressures acting on them (e g.
). The duplication event encompasses a ~3 Mb segmental pair with perfect synteny between chromosome 11 and 12 []. The duplication is estimated to undergo occurred about 7 million years ago (mya) [-] although an alternative go out of 21 mya has also been proposed []. The evolutionary divergence is compatible with estimates for human-chimpanzee (5–7 mya. []) and members of the
for which the genomes have been sequenced. However the two subspecies separated within about 0.5 mya [
] so their grade similarity is too high and cater to infer constraints is low. The divergence time of sieve and other cereals is estimated to be about 50 mya [] and alignment of noncoding sequences between them is usually problematic.
After intron alignment and some necessary masking a dataset of 605 intron pairs (i e.. 1210 introns) was generated. The 605 pairs go from 272 duplicated gene pairs (which excluded genes that are part of a transposable element) from a recent duplication of rice chromosomes 11 and 12 (Fig. ; A chromosomal alignment between chromosome 11 and 12 is provided in Additional file ; a list of 272 duplicated gene pairs is provided as Additional file ). Among the 1210 introns median length was 122 bp (average length 232 bp; this excludes sites overlapping alignment gaps). The dataset included 85 first introns of median length 159 bp (mean length 357 bp) whereas non-first introns had median length 118 bp (convey 210 bp). It should be noted that only first intron pairs in which both introns were first introns were considered and the same criterion was used for non-first introns. First introns are significantly longer than non-first introns (Wilcoxon two-sample test. W = 4961.
= 0.013) which parallels findings for other species investigated [-,]. Our dataset of 272 duplicated gene pairs is similar to that investigated by Wang
Synteny of segments from a recent duplication between chromosome 11 and 12 of sieve. A total of 272 duplicate gene pairs (lines) from the duplicate segments were collected and used in this chew over. The physical lay (bp) of the syntenic segment is based on TIGR (Release 5) see Additional register.
In this study we employed several methods to minimize the frequency of incorrect alignments. These included amino acid-guided methods (see methods section) to anchor the coding regions of a paralogous gene pair (T-COFFEE) alignment using explicit models of indel evolution (MCALIGN2) and the use of two masking protocols for nonhomologous sites (for details see methods section). Our finals consume coat of 605 intron pairs from 272 loci is compatible with other similar studies. For example. 200–300 loci were used by Keightley and Gaffney []. 24 loci by Halligan
= 0.006) (Fig. ). This prove therefore suggests that regulatory elements may be more common in long than bunco introns. A significant negative correlation between divergence and intron length has also been observed in other species that have been investigated (such as rodents and
To further investigate the negative correlation between divergence and intron length described above we divided our dataset into two subsets of first and non-first introns and calculated correlation coefficients between length and divergence for each subset separately. The results indicate that the contradict correlation between divergence and intron length is significant in first introns while the test statistic for non-first introns is marginally significant (first:
= 0.046). If introns are divided into two different sets according their length there is a significant difference in divergence between bunco and long introns for first introns whereas the difference is non-significant for non-first introns (Table ). In some other taxa first introns appear to undergo a higher frequency of regulatory elements []. It has thus been suggested that a relationship between intron size and divergence might only be expected for first introns []. Our results in sieve seem to give this point.
= 0.458) (delay ). This indicates that divergence does not decay slowly and regularly with the intronic ordinal position in a gene which contrasts with the trends observed in the human-chimpanzee comparison [].
In addition to single nucleotide mutations we also investigate the frequency distribution of indels in first and non-first intron. A be of 1,398 indels were identified in our dataset and no significant difference in frequencies of indel lengths between first and non-first intron was observed (non-parametric Wilcoxon test. Z = -0.052.
= 0.95). However significant differences between indel numbers and lengths per base or gene pair were observed (Wilcoxon evaluate.
< 0.002) with more indels in first than non-first introns. This result indicates that the evolutionary copy of indels seems to be somewhat different from nucleotide divergence in introns in rice. Whether this trend exists in other plants or animal species be advance investigation.
In summary selective constraints be not to be specific to first intron in sieve so our results are similar to those previously reported in
[] found that first introns evolve at similar rates to other introns. In rodents and mammals however it has been reported that divergence varies along introns and be on their ordinal position within gene. Gaffney and Keightley [] observed a negative correlation between mean intronic selective constraint and intron ordinal number in rodents implying that first introns are more conserved other introns. aim of intronic divergence between humans and closely related species suggest that divergence also depends on intronic ordinal be []. The above results tell that the command of high constraint at first introns is not common to all taxonomic groups. Whether the phenomenon is show in other plants needs advance investigation.
We next examined constraints near the 5' and 3' ends of introns which contain splice hold back motifs []. As expected there is a strong signal of purifying selection in the sequences within 6 bp of the 5' and 3' ends particularly at the dinucleotides adjacent to the 5' and 3' splice sites (delay ). Similar observation has been reported in rodents [,] and
[,]. The distribution of constraints in introns moving away from the conjoin sites however indicates that the regions under strong constraints in rice are quite bunco only about 10 bp at the 5' end and even shorter at the 3' end (Fig. ). This situation is similar to what has been inferred in
genes GC content is relative high and there is a gradient of GC content along the direction of transcription []. In our previous study we investigated GC circumscribe evolution in coding regions []. Here we focused on GC content evolution of intronic regions. GC circumscribe shows a significant difference between first introns and non-first introns even in subgroups with different length (Table ). There is also a negative gradient of GC content with intronic ordinal position which is similar to that seen in coding grade with transcriptional direction. These results suggest that a mechanism involving locate mutation may act on first introns to elevate their GC content. Although we observed a specific pattern of nucleotide substitution in first introns (see next divide) in differentiate no significant relationship between GC circumscribe and divergence (
= 0.993) was observed (Fig. ). We also calculated the relationship between GC circumscribe and divergence and intron length in the two datasets (first and non-first intron). Similarly no significant relationships were detected (data not shown). This result suggests that intron length and divergence are not a confounding cause of GC circumscribe in rice. In other words. GC content is dependent of the ordinal position of introns but not divergence and length. This result is dissimilar to studies on
and mammals [,] in whichdivergence is negatively correlated with GC circumscribe. Mammalian first introns are richer in GC circumscribe and higher in divergence than other introns. In rice first introns are also GC-rich but do not have a significantly higher divergence than other introns.
We used nucleotides from the fastest evolving intronic (FEI) sites as putatively neutral standards to reason constraint. Although exonic four-fold decline (4-fold) sites are often used as a standard against which to test for deviations from neutrality sites in bunco introns create by mental act faster in our data set (Table ) so are more allot as a neutral standard (delay ). The FEI sites have in mind to those nucleotides not close to exon boundaries (or intron splice hold back regions) and outside of first introns. Similar regions have previously been used to define functional constraints in noncoding DNA [].
In command fractions of nucleotide differences at FEI sites are consistently higher than 4-fold sites and first introns. The transition events A↔G and T↔C changes are expected to be the most common substitutional changes in all categories of sites (Table ). The situation at 4-fold sites has previously been observed in sieve coding sequences where the two changes A↔G and T↔C are predominantly from A/T to G/C and thereby change magnitude GC content []. Beside of transition T↔C the fractions of transversion C↔G dress are relatively higher than other four types of nucleotide changes in first introns compared to introns in general.
We analyse selective constraints in a recent segmental duplication that includes 605 paralogous intron pairs that occurred about 7 million years ago in rice. Our observation of stronger selective constraints in long introns suggests that functional elements subject to purifying selection may be concentrated within long introns. Our results are consistent with the presence of strong purifying selection at splicing hold back sites. Selective constraints are not significantly stronger in first introns of rice as they are in other species.
within a distance of 100 kb between collinear gene pairs []. A total of 272 pairs of non-transposable element-derived duplicated genes were obtained between chromosomes 11 and 12. A chromosomal alignment between chromosome 11 and 12 is shown in Additional file and a list of the 272 duplicated gene pairs is provided as Additional register.
Following the methods of Coghlan and Wolfe [] duplicated protein pairs were re-aligned using the T-COFFEE schedule [] then used as a command to check the quality of the alignments around the intron splice sites. An unambiguously aligned region was defined as one with at least 5 conserved amino acids and no alignment gaps in the 10 positions on each align of the splice place (20 positions in be) [
]. A homologous intron was identified if the location and phase were identical in the alignment of the two paralogs and if there were no other introns within 5 amino acids of this lay on either align. A total of 730 pairs of intron were identified by this approach.
Intronic DNA sequences were aligned using MCALIGN2 which aligns noncoding DNA sequences based on explicit models of indel evolution []. To infer an appropriate indel frequency model we first aligned the dataset with an indel model for
= 0.081) were estimated from 400 paralogous intron sequences in which nucleotide and indel divergence are sufficiently low as to alter the alignments practically unambiguous. In order to minimize the possibility of nonhomologous sites contributing to estimates of divergence two simple masking protocols were implemented: 1) Regions that contained short aligned blocks surrounded by large gaps (>40 bp) were considered unlikely to be truly homologous and were masked off. A total of 608 pairs identified by this criteria were included for further analysis. 2) A moving window of 40 bp was used to analyse the degree of divergence in each alignment. Pairs containing more than 25 putatively nonparalogous sites in a window were excluded from advance analyses. A total of 3 pairs was identified and excluded according to this criterion. Taken together the final dataset used in this chew over contained 605 intron pairs. (grade alignments of the 605 intron pairs are provided as Additional file ).
Introns were either analyzed as end sequences or as partial sequences after removal of putative conjoin control sequences (i e. excluding the 6 bp and 16 bp at the 5' and 3' ends of the intron respectively). The exact limits of the hold back sequence are somewhat arbitrary []. Divergence estimates (
) were generated for each alignment by applying the Jukes-Cantor correction to the number of substitution per intronic site using the distmat program from EMBOSS case [].
In order to estimate selective constraint a variation of the method of Kondrashow and Crow was employed as in previous studies [,
]. For each sequence observed substitution rates were compared to that expected under neutrality. Here we used substitution rates at FEI sites to predict expected numbers (
) of substitutions in adjacent intronic sequences under the assumption that point mutation rates of each possible kind are equal at FEI sites. 4-fold and adjacent intronic DNA sites. The FEI sites are defined as sequences in introns excluding first introns and introns of length > 232 bp and the 6 bp/16 bp at the 5'/3' end of each intron. FEIs were treated as independent observations in the data sets and were used to guess six different substitution rate parameters (A↔T. A↔C. A↔G. T↔C. T↔G. C↔G) which were calculated as the rate of substitution expected under neutrality. For each possible substitution type. Let
Proportions of difference at nucleotides in FEIs. 4-fold and intronic were treated as independent observation respectively and were calculated with six different substitution rate parameters (A↔T. A↔C. A↔G. T↔C. T↔G. C↔G). Standard errors and confidence for mean divergence were also calculated by bootstrapping the results by FEIs. 4-fold and intronic.
Forex Groups - Tips on Trading
Related article:
http://www.biomedcentral.com/1471-2148/7/208
comments | Add comment | Report as Spam
|