@ARTICLE{TreeBASE2Ref19545,
author = {Soowon Cho and Andreas Zwick and Jerome C. Regier and Charles Mitter and Michael P Cummings and Jianxiu Yao and Zaile Du and Hong Zhao and Akito Y Kawahara and Susan J Weller and Donald R Davis and Joaqu?n Baixeras and John W Brown and Cynthia Parr},
title = {Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)?},
year = {2011},
keywords = {Hexapoda, Lepidoptera, Ditrysia, nuclear genes, molecular phylogenetics, gene sampling, taxon sampling, missing data},
doi = {10.1093/sysbio/syr079},
url = {http://},
pmid = {},
journal = {Systematic Biology},
volume = {60},
number = {6},
pages = {782?796},
abstract = {This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly-understood deeper relationships in the large clade Ditrysia (>150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly non-synonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78-85% bootstrap) was weak or non-existent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A ?more-genes-only? data set (41 taxa ? 26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses. }
}
Citation for Study 11299
Citation title:
"Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)?".
Study name:
"Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)?".
This study is part of submission 11289
(Status: Published).
Citation
Cho S., Zwick A., Regier J., Mitter C., Cummings M.P., Yao J., Du Z., Zhao H., Kawahara A.Y., Weller S.J., Davis D.R., Baixeras J., Brown J.W., & Parr C. 2011. Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)?. Systematic Biology, 60(6): 782?796.
Authors
-
Cho S.
-
Zwick A.
-
Regier J.
-
Mitter C.
(submitter)
301 405 3912
-
Cummings M.P.
-
Yao J.
-
Du Z.
-
Zhao H.
-
Kawahara A.Y.
-
Weller S.J.
-
Davis D.R.
-
Baixeras J.
-
Brown J.W.
-
Parr C.
Abstract
This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly-understood deeper relationships in the large clade Ditrysia (>150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly non-synonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78-85% bootstrap) was weak or non-existent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A ?more-genes-only? data set (41 taxa ? 26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses.
Keywords
Hexapoda, Lepidoptera, Ditrysia, nuclear genes, molecular phylogenetics, gene sampling, taxon sampling, missing data
External links
About this resource
- Canonical resource URI:
http://purl.org/phylo/treebase/phylows/study/TB2:S11299
- Other versions:
Nexus
NeXML
- Show BibTeX reference
@ARTICLE{TreeBASE2Ref19545,
author = {Soowon Cho and Andreas Zwick and Jerome C. Regier and Charles Mitter and Michael P Cummings and Jianxiu Yao and Zaile Du and Hong Zhao and Akito Y Kawahara and Susan J Weller and Donald R Davis and Joaqu?n Baixeras and John W Brown and Cynthia Parr},
title = {Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)?},
year = {2011},
keywords = {Hexapoda, Lepidoptera, Ditrysia, nuclear genes, molecular phylogenetics, gene sampling, taxon sampling, missing data},
doi = {10.1093/sysbio/syr079},
url = {http://},
pmid = {},
journal = {Systematic Biology},
volume = {60},
number = {6},
pages = {782?796},
abstract = {This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly-understood deeper relationships in the large clade Ditrysia (>150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly non-synonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78-85% bootstrap) was weak or non-existent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A ?more-genes-only? data set (41 taxa ? 26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses. }
}
- Show RIS reference
TY - JOUR
ID - 19545
AU - Cho,Soowon
AU - Zwick,Andreas
AU - Regier,Jerome C.
AU - Mitter,Charles
AU - Cummings,Michael P
AU - Yao,Jianxiu
AU - Du,Zaile
AU - Zhao,Hong
AU - Kawahara,Akito Y
AU - Weller,Susan J
AU - Davis,Donald R
AU - Baixeras,Joaqu?n
AU - Brown,John W
AU - Parr,Cynthia
T1 - Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)?
PY - 2011
KW - Hexapoda
KW - Lepidoptera
KW - Ditrysia
KW - nuclear genes
KW - molecular phylogenetics
KW - gene sampling
KW - taxon sampling
KW - missing data
UR - http://dx.doi.org/10.1093/sysbio/syr079
N2 - This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly-understood deeper relationships in the large clade Ditrysia (>150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly non-synonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78-85% bootstrap) was weak or non-existent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A ?more-genes-only? data set (41 taxa ? 26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses.
L3 - 10.1093/sysbio/syr079
JF - Systematic Biology
VL - 60
IS - 6
ER -