MIC-Phy 2021

The virtual MIC-Phy meeting will be happening here!

Home
Program
Workshop
Registration
Contributed talks
Virtual posters
Discussion group
Scientific committee
Organizing committee
Participants
Contacts
Sponsors

Virtual Posters

1. A novel approach combining diffusion approximation and Bayesian skyline plots for inferring demographic histories from SNP data

Ronja Jessica Billenstein [1,2], Sebastian Höhna [1,2]

[1] GeoBio-Center LMU, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333 Munich, Germany, [2] Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Richard-Wagner-Str. 10, 80333 Munich, Germany

Reconstructing demographic histories from genome sequences is one of the key problems in population genetics. Many analyses use site frequency spectra (SFS) to infer parameters such as the effective population size and population size changes to model historical events. SFS are derived from single-nucleotide polymorphism (SNP) data and describe the distribution of genetic variants in a population sample. An extensively used software for demographic inference from SFS is dadi (Diffusion Approximation for Demographic Inference), which applies a diffusion approach to calculate the expected frequency spectrum for different demographic models. The model input parameters then are optimized by maximizing the spectrum similarity to empirical data based on a composite likelihood function. Here, we present an alternative method utilizing a combination of the diffusion approach to generate model spectra and a Bayesian skyline plot to infer a posterior distribution of the demographic history of a population. We simulated SNP data applying coalescent theory and a mutation process to generate SFS for various demographic scenarios and used the produced data to test the method.

2. Higher Order Substitution Models - Mixture Models

Killian Smith [1,2], Sebastian Höhna [1,2]

[1] Geo-Bio Center LMU München, [2] Paläontologie & Geobiologie LMU München

Long branch attraction is a well known and problematic artifact in phylogenetic reconstruction. One of the proposed methods to mitigate this issue is to relax the assumption that all sites in the alignment share the same base distribution, and allow for models that can account for structural constraints (ex: CAT, C10-C60, and EDCluster). We have implemented mixture models in RevBayes in a Bayesian framework, and performed a simulation study to analyze the costs and benefits of the models. We find that increasing the number of mixtures produce results with higher log likelihood values, but this effect approaches a limit. We also find that using an excessive number of mixtures does not change the results of the analysis. Our general recommendation from this study is to use mixture models, and to set the number of categories to a generous value (pending hardware and time constraints).

3. Phylogenomics analysis reveals new microsporidian species as the most basal member of Ordospora clade

de Albuquerque, N. R. M. [1], Pombert, J. [2], Haag, K. L. [1], Ebert, D. [3]

[1] Department of Genetics and Post-Graduation Program of Genetics and Molecular Biology, Federal University of Rio Grande do Sul, Av. Bento Gonçalves 9500, Porto Alegre, RS 91501-970, Brazil., [2] Department of Biology, Illinois Institute of Technology, Chicago, IL 60616, USA., [3] Department of Environmental Sciences, Zoology, Basel University, Vesalgasse 1, 4051 Basel, Switzerland.

Microsporidia are obligatory intracellular parasites characterized by streamlined genomes. The rapid evolutionary rates of microsporidia, as well as the physiological simplification, made it difficult to position them in the tree of life due to methodological artifacts, such as long-branch attraction. Through phylogenomics it was possible to show that microsporidia are actually derived from very ancestral fungi. Encephalitozoon and Ordospora species are closely related, and their genomes are models for genomic reduction and compaction, as the smallest eukaryotic genome known belongs to Encephalitozoon genus. Here we assembled the genome of new microsporidian species, not yet described and temporarily named FI-F-10, with SPAdes. We obtained an assembled genome size of 2.20 Mb, and protein annotation was performed with Prodigal. A preliminary phylogenomics analysis, with 24 microsporidian lineages, made with RaxML with support statistics of 500 bootstraps replicates positioned FI-F-10 in a basal position in relation to Ordospora colligata clade. We are currently performing a phylogenomics analysis with all microsporidian lineages that have roteomes available in databases, totaling 43 lineages.

4. Phylogeny and species limits in African grass rats (Muridae: Arvicanthis) using genomic-scale data

V. Bartáková [1], D. Mizerovská [1,2], O. Mikula [1], A. Bryjová [1], J. Bryja [1,2]

[1] Institute of Vertebrate Biology of the Czech Academy of Sciences, Brno, Czech Republic, [2] Department of Botany and Zoology, Faculty of Science, Masaryk University, Brno, Czech Republic

African grass rats of the genus Arvicanthis are an important group of rodents of open habitats in sub-Saharan Africa and they are major agricultural pests. This genus is composed of two major groups with different evolutionary histories, but most relationships among species are still not satisfactorily resolved. Until now, the most comprehensive phylogeny was based on mtDNA and six nuclear markers. It was found that the phylogeny of nuclear markers does not correspond to mtDNA phylogeny, this is an indication of reticulate evolution in Ethiopian highlands, similarly as in other montane Ethiopian taxa. The main aims of our study are: (1) to reconstruct the evolutionary history using genomic data and solve the most problematic nodes, especially compared to previous study; (2) to identify the genomic pools (= species delimitation), with the main focus on Ethiopian taxa, where the reticulate evolution is expected; and (3) to resolve taxonomy of problematic groups using integrative approach (genomic species delimitation and geometric morphometry). We already have data from ddRAD sequencing for 117 individuals of genus Arvicanthis with length of 258,170 bp.

5. Impact of the Paleocene–Eocene Thermal Maximum on diversification dynamics in Paederinae rove beetles

Katarzyna Koszela [1]

[1] Museum and Institute of Zoology, Polish Academy of Sciences, Wilcza 64, 00-679 Warsaw, Poland

The Paleocene-Eocene Thermal Maximum (PETM) has been the most influential climatic warming event of the Cenozoic but its impact on the insect diversification is little known. The main aim of our project is to study the evolutionary response to this event of a mega-diverse insect lineage, the Paederinae rove beetles. Their rich fossil record after PETM could suggest that warming was a driving force for the evolution of this group. To test it, we will build the first largescale phylogeny of Paederinae using morphological data and genomic data of UCEs and use it for exploration of evolutionary patterns. The total-evidence dating approach will be used to jointly estimate phylogenetic relationships and divergence times. Timing the lineage turnover along the tree branches and testing diversification models will allow for assessing the impact of PETM on diversification dynamics of the group and for answering the question if the event is one of the main causes of their current mega-diversity.

6. Molecular Evolution of Olfactory Receptors through jawless and jawed fishes

Liliana Silva [1,2], Tito Mendes [1,2], Agostinho Antunes [1,2]

[1] CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, [2] Department of Biology, Faculty of Sciences, University of Porto

Olfactory receptors (ORs) perform a major role in detecting odors, crucial for intra/interspecific communication and species survival. The several ORs classification systems available are full of ambiguities mainly related with existence of a gene repertoire highly diverse and expanded across vertebrate species (the presence of a single intact gene in elephant sharks compared to the more than 1,000 genes in some mammalian species). In fishes, the OR family was characterized in some punctual species but little is known about the gene repertoire in most fishes. The aim was to explore the diversity of OR genes among fish lineages. For that purpose, 37 genomes of jawless and jawed fishes were considered. A gene extraction protocol was applied followed by ML phylogenetic reconstructions. We expected to use new approaches of phylogenetic inference as powerful tools to unravel the ORs classification system and provide an adequate nomenclature since the high diversity of OR gene repertoire across species is a nightmare for traditional phylogenetic inferences methodologies.

7. Assessment of intra- and inter-outbreak diversity of Paenibacillus larvae by core- and whole-genome multilocus sequence typing

Bojan Papić [1], Darja Kušar [1]

[1] University of Ljubljana, Veterinary Faculty, Institute of Microbiology and Parasitology, Gerbičeva 60, 1000 Ljubljana

Paenibacillus larvae is the causative agent of American foulbrood (AFB), a serious disease of honeybees. Here, 59 P. larvae isolates from Slovenia associated with an extensive AFB outbreak in Slovenia, period 2019–2020, underwent whole-genome sequencing (Illumina). Of these, 40 isolates originated from three apiaries maintained by the same beekeeper. A newly developed cg/wgMLST scheme for P. larvae implemented in BioNumerics software was used for cluster analysis. By combining genetic and epidemiological data, two ST11-ERIC II outbreak clusters were identified on minimum-spanning trees by applying a threshold of 34 wgMLST allele differences (AD) and 24 cgMLST AD. The clusters were separated by a minimum of 51 cgMLST AD and 63 wgMLST AD. Isolates from a single beekeeping practice could be linked by a threshold of 7 cgMLST AD and 11 wgMLST AD. Phylogenetic trees generated by both approaches were generally concordant and provided sufficient discriminatory power to delineate the outbreak clusters. This study improves our understanding of the intra- and inter-outbreak diversity of P. larvae, which is necessary for improving the epidemiological surveillance of AFB.

8. New World primate phylogeny and species boundaries with novel genomic data

Mareike C. Janiak [1], Dorien de Vries [1], Ian B. Goodhead [1], Jean P. Boubli [1], Robin M. D. Beck [1]

[1] School of Science, Engineering and Environment, University of Salford, Salford, UK

New World primates (NWPs) are a classic adaptive radiation of >100 living species, derived from an ancestor that reached South America from Africa ~35 million years ago. They are therefore an ideal system for studying the processes and drivers of speciation. However, genomic data for NWPs has been limited to four whole genomes and a few mitochondrial markers. Thus, key aspects of their phylogeny remain unresolved, and it is unclear exactly how many NWP species there are, which impedes conservation efforts. Using a novel genomic dataset from over 200 individual NWPs, including fully assembled mitochondrial genomes, ~3000 UCE loci, and whole genome shotgun sequences, together with revised fossil calibrations, we are producing a comprehensive, dated NWP phylogeny that will be used as a framework for macroevolutionary analyses. We are also implementing coalescence-based identification of species, in order to clarify species boundaries and create a stable list of NWP species.

9. Phylogenetic relationships and evolution of Paederinae rove beetles (Staphylinidae) based on genomic and morphological data

Dagmara Żyła [1,2,3], Alexandra Tokareva [1]

[1] Museum and Institute of Zoology, Polish Academy of Sciences, Wilcza 64, 00-679 Warsaw, Poland, [2] Department of Invertebrate Zoology and Parasitology, University of Gdańsk, 80-308, Gdańsk, Poland, [3] Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50011, USA

The project concerns the phylogeny of Paederinae, one of the largest subfamily of r ove beetles, which includes more than 7,600 species in 225 genera. Despite some recent attempts to shed light on the Paederinae phylogeny, most of the questions of their relationships remain open. At the same time, these few published phylogenetic reconstructions were enough to show how artificial the current classification of the group is. This motivated my PhD advisor to set the aim to build a first genus-level phylogeny of the group using both genomic and morphological data. Genomic data will be generated using next-generation sequencing (NGS) technology targeting the ultraconserved elements (UCEs), while the existing morphological matrix will be expanded with more taxa and characters, including fossils. Then, we will perform a total-evidence analysis using a Bayesian inference method, with a divergence-time estimation as well as Maximum Likelihood analysis to build the phylogeny of the group.

10. Reconstructing the evolutionary history of a rapid species radiation: the Andean Lupins

Bruno Nevado [1], Colin Hughes [2]

[1] CE3C – Faculty of Sciences, University of Lisbon, Portugal; [2] Institute of Systematic Botany, University of Zurich, Switzerland.

With over 80 species described and an estimated age of 2 Myr, Andean Lupinus are amongst the fastest species radiations found in plants. To better understand the evolutionary processes driving this rapid species diversification, we are reconstructing the evolutionary history of this clade using genome-wide data obtained using nextRAD sequencing. So far, we have collected and sequenced around 400 specimens representing 75-80 Andean species. We have also assembled a draft genome of one species in this clade, which will allow us to compare sequence-free and sequence-based assembly of RAD loci for phylogenetic reconstructions. Reconstructing the evolutionary history of this clade will provide insight into both temporal and geographical patterns of diversification during this rapid radiation; and will allow us to test further hypotheses about the role of demography during rapid diversification.

11. Phylogenomic analyses of large-shelled cones from Cabo Verde, West Africa

Rocha S [1,2], Pérez-Figueroa A [1,2], Lemmon AR [4], Lemmon EM [5], Tenorio MJ [6], Afonso CM [7], Zardoya R [8], Posada D [1,2,3]

[1] CINBIO, Universidade de Vigo, 36310 Vigo, España, [2] Instituto de Investigación Sanitária Galicia Sur, Hospital Álvaro Cunqueiro, 36213 Vigo, España, [3] Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, España, [4] Department of Scientific Computing, Florida State University, Tallahassee, Florida, [5] Department of Biological Sciences, Florida State University, Tallahassee, Florida, [6] Departamento CMIM y Química Inorgánica – Instituto de Biomoléculas (INBIO), Facultad de Ciencias, Torre Norte, 1ª Planta, Universidad de Cadiz, 11510 Puerto Real, Cadiz, Spain, [7] CCMAR, Centre of Marine Sciences, Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal, [8] MNCN-CSIC, Museo Nacional de Ciencias Naturales, José Gutiérrez Abascal 2, 28006, Madrid, Spain.

Conidae is one of the most specious families of marine animals, many of its subgroups having a convoluted taxonomic history and little or no molecular data available. We focus on Cabo Verde archipelago clade of “large-shelled” cones, for which we produced target capture data. We sampled all known geographic and morphological diversity, and individuals were genotyped for 1 mtDNA and 1293 nuclear loci. Species-tree methods that deal with incomplete lineage sorting, such as ASTRAL and revPoMo were applied to populations defined based on SNPs clustering. These resulted in strongly contradicting inferences respect to basal relationships: while an (A, (B, C)) topology is recovered by ASTRAL with several sets of gene trees (the full 1293 loci set and several subsets based on informative sites), revPoMo (applied all positions of the 1293 loci) infers a discordant ((A, B), C) topology. Interestingly, if applied only to the variable positions subset (~17K from ~1M total sites) it infers the same topology as ASTRAL. In all cases inferences are strongly supported. I will present the analyses details and discuss possible reasons for these contradicting inferences.

12. Patterns of genomic divergence and phylogenomic analyses in a young marine radiation

Martin Helmkampf [1], Kosmas Hench [1,2], Oscar Puebla [1,3]

[1] Leibniz Centre for Tropical Marine Research (ZMT), Bremen, Germany, [2] Max Planck Institute of Animal Behavior, Radolfzell, Germany, [3] Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany

Adaptive radiations are a major source of biodiversity, yet few have been studied in the sea, where barriers to gene flow are less pronounced than on land. Hamlets, a group of Caribbean reef fish that have recently radiated into a brilliant diversity of color patterns, provide an excellent opportunity to investigate a marine radiation from a genomics perspective. Generating and analyzing 170 whole-genome sequences of 13 hamlet species, we found that genetic divergence between species was generally low. In contrast, several narrow genomic intervals harboring genes involved in pigmentation and color vision were characterized by differentiation peaks, underscoring that in hamlets, speciation is driven by natural and sexual selection on color patterns. However, evidence for hybridization and thus ongoing gene flow was found between several species. Preliminary phylogenetic analyses suggest that some species may not be monophyletic, and that some phenotypes may have evolved convergently. However, the combination of low phylogenetic signal, hybridization, and presumably incomplete lineage sorting severely impacted the reliability of the reconstruction. Thus, additional analyses are needed to obtain a more complete picture of the hamlet radiation.

13. Congruent birth-death models can be collapsed using Bayesian inference

Bjørn Tore Kopperud [1], Sebastian Höhna [1,2]

[1] GeoBio-Center Ludwig-Maximilians-Universität München, 80333 Munich, Germany

The discovery of congruent sets of birth-death processes has raised a series of questions as to what diversification patterns we can infer from phylogenetic trees. Many phylogenetic trees are estimated solely from extant samples and no information about extinction. For such systems, there exists a class of models that are all equally likely, and thus are statistically unidentifiable. However, the general behaviour for such classes is not well known. We simulate a series of phylogenetic trees, both within and across congruence classes, and explore their properties using state of the art Bayesian inference methods. Results show that diversification rates inferred using Bayesian shrinkage priors produce not an arbitrary model from the congruence class. Bayesian shrinkage priors collapses the congruence class to a single, simplest model in accordance with the prior expectations. Thus, diversification rate can be inferred from molecular phylogenies when realistic priors are used.

14. Phylogenomic analysis of New Zealand polyploid Azorella (Apiaceae)

Weixuan Ning [1], Heidi Meudt [2], Jennifer Tate [1]

[1] School of Fundamental Sciences, Massey University, [2] Museum of New Zealand Te Papa Tongarewa, New Zealand

When using high copy nuclear gene markers such as ITS (internal transcribed spacer, nuclear ribosomal DNA, nrDNA) or plastid regions, phylogenetic inference of closely related plant species with reticulate evolutionary histories are challenging. This is because ITS can be subjected to concerted evolution and commonly used plastid markers have few informative sites. Newer approaches use target enrichment sequencing of low/single copy nuclear genes (LCNG) by the universal Angiosperm353 bait set, and this provides a way to capture all copies of the desired genes at low costs. This approach can be particular useful to resolve the phylogenetic relationships of genera that contain many closely related polyploids. Many New Zealand genera have a complex evolutionary history involving polyploidy. For example, the current sections Schizeilema and Stilbocarpa of the genus Azorella comprise a subalpine lineage of 17 species in New Zealand (16 species) and Australia (1 species) whose ploidal levels may be 4x, 6x or 10x. Unpublished ITS and plastid phylogenetic trees offer low resolution of New Zealand Azorella species relationships, and suggest that they originated from diploid Azorella species in Chile and Argentina. New phylogenomic data is being generated from low/single copy nuclear genes using the Angiosperm353 bait set. By analyzing data from the Angiosperm353 markers, we aim to 1) setup the pipelines to extract the multiple copies of the targeted genes, which is the critical step yet to be solved; 2) interpret the origins of the polyploid species from the extracted homoeologous sequences; and 3) understand the biogeographic history of Azorella sections Schizeilema and Stilbocarpa.  

15. A comparison of Utility of Target Enrichment (TE) and Double-Digest Restriction Site Associated (ddRAD) sequencing in systematics – a case study of a parapatric species pair of butterflies

Mukta Joshi [1], Marianne Espeland [2], Vlad Dinca [1], Roger Vila [3], Mohadeseh Tahami [4], Kyung Min Lee [5], Marko Mutanen [1]

[1] Ecology and Genetics research Unit, University of Oulu, Finland [2] Zoological Research Museum Alexander Koenig, Leibniz Institute for Animal Biodiversity, Adenauer Allee 160 53113 Bonn, Germany [3] Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Passeig Marítim de la Barceloneta, 37, 08003 Barcelona, Spain, [4] Department of Biological and Environmental Sciences, University of Jyväskylä, Finland [5] Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland

Target enrichment (TE) and Double-Digest Restriction Site Associated (ddRAD) sequencing are two widely used reduced representation approaches to recover a large amount of genomic data. While RAD sequencing targets random loci throughout the genome, the loci to be obtained are fixed a-priori in target enrichment. TE was originally developed for resolving deeper phylogenetic relationships by mainly targeting the exons, but now seeing its application at shallow scales as well. Many studies have investigated systematic, phylogeographic and population genetic questions using one of these two methods but to our knowledge none of them have explored both these methods in the same study system. In the present study, we explore the species delimitation problems and genomic admixture in parapatric sibling species pair Melitaea athalia and Melitaea celadussa. The ranges of these two species briefly overlap in the part of the Alps from where the intermediate specimens based on genitalia morphology were detected. Initial data obtained using RADseq indicated the presence of a completely new lineage, the Balkan athalia. We then used a subset of this data and compared it to newly obtained target enrichment data. We compared the phylogenetic trees and STRUCTURE results obtained from both datasets. The results from phylogenetic analysis revealed nearly identical tree topology. In both cases, the admixed individuals from hybrid zone formed a part of a monophyletic celadussa clade. These results were further validated by species tree inference using ASTRAL. The results from STRUCTURE analysis using SNPs extracted from TE data estimated highest likelihood of 2 clusters, same as that estimated from ddRAD data. Further species delimitation analyses using TE data are ongoing and some more interesting insights are awaited. This kind of comparison could have an implication for future studies addressing species delimitation issues in taxonomically difficult cases and could provide an aid in the choice of genomic method to be used. We would like to highlight that the fixed set of loci used in TE approach plus the low levels of missing data could have a benefit for standardizing the principles for species delimitation

16. Here, there and everywhere: the ubiquity of hybridization in an orchid group

Cecilia F. Fiorini [1,2], Eric de Camargo Smidt [3], L. Lacey Knowles [4], Eduardo Leite Borba [2]

[1] Departamento de Genética, Ecologia e Evolução, Universidade Federal de Minas Gerais, [2] Departamento de Botânica, Universidade Federal de Minas Gerais, [3] Departamento de Botânica, Universidade Federal do Paraná, [4] Department of Ecology and Evolutionary Biology, University of Michigan

Genetic data shows that cryptic hybrids are more common than previously thought and that hybridization and introgression are widespread processes in nature. Despite being common in plants, hybridization is not universal, with evidence of strong phylogenetic signal. Orchidaceae is a group with high hybridization propensity and several artificial orchids hybrids are known. Regardless of this, hybridization has not been considered one of the main drivers of diversification on this plant family. Bulbophyllum is one of the largest Orchidaceae genera, including 2,200 species and presents many examples of recent radiations, in which hybridization is theoretically more frequent. However, only three natural Bulbophyllum hybrids are currently recognized, all of them recently described based on morphological evidence. Both B. ×cipoense and B. ×guartelae are hybrids between species of the B. sect. Didactyle, and here we investigate the occurrence of hybridization in this section, leveraging the power of next-generation sequence data and model-based analysis. We found that five of the seven species of the Neotropical B. sect. Didactyle are involved in hybridization. Despite the occurrence of hybridization, there are no signs of backcrossing. Because of the high propensity of hybridization across many taxa, the common occurrence of hybridization during the evolutionary history of Bulbophyllum means its time to account for and examine its evolutionary role in orchids.

17. Phylogenomic GWAS approach to study adaptive divergence in bottlenose dolphins (Tursiops spp.-) ecotypes

M. Harazim [1,2,3], M. Dromby [1], A. E. Moura [1]

[1] Museum and Institute of Zoology, Polish Academy of Sciences, Gdansk, Poland, [2] Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czechia, [3] Faculty of Science, Masaryk University, Brno, Czechia

Integration between next-generation sequencing and Genome-Wide Association Study (GWAS) approach, is an efficient strategy to detect candidate genes undergoing adaptive evolution, particularly if correlated with environmental variables of interest. Here, we present a phylogenomic aware GWAS analyses in a cetacean genus (Tursiops spp.), to identify potential environmental drivers of adaptive changes between various worldwide ecotypes. Restriction site associated DNA sequencing (ddRAD) was used to genotype 6615 biallelic SNPs, from 54 individuals belonging to offshore  and coastal ecotypes, obtained from regions characterized by different environmental conditions. Local environments were characterized based on 14 physico-chemical variables obtained from remote sensing, with annual averages and ranges for each variable calculated through a GIS approach. Using a phylogenomic aware GWAS , we identified 12 SNP loci significantly associated with environmental conditions, namely temperature and dissolved O2. Annotation of the reference genome for regions in the vicinity of these SNPs, identified the presence of five genes: MTCL1, TRAF4, TLCD1, RAB34, NAV2 CHCHD3 and AVPR1A. Functions associated with these genes include cerebral development, social behavior, and energetic metabolism, and could be involved in adaptive evolution between coastal and offshore populations. These results demonstrate the potential of phylogenomic GWAS, to identify genetic variants underlying the variability in traits among adaptive divergence between ecological disparate lineages, at both population and species level.

18. CloudForest: An integrated and dynamic phylogenomic toolset for the modern age

Benjamin S. Toups [1], Jeremy M. Brown [1], Kyle A. Gallivan [2], Zhifeng Deng [2], Reid Wagner [3], Thomas McGowan [3], James C. Wilgenbush [3]

[1] Louisiana State University, [2] Florida State University, [3] University of Minnesota

Advances in sequencing technologies have allowed us to collect larger datasets, which in turn allows us to address phylogenomic questions with greater statistical power. One of the most striking observations from this deluge of data is that variation across inferred gene trees is more the rule than the exception. Gene tree variation remains largely a mystery due to the lack of robust, efficient, and flexible workflows to investigate it. To address this, we are developing a tool for phylogenomic analysis in the form of a portable cyberinfrastructure called CloudForest. We aim to provide researchers with a set of streamlined tools to perform analysis on large treesets from phylogenomic analyses. Using our robust set of tools and customizable workflows, researchers can perform some of the most pressing tasks in phylogenomics, such as visualizing variation across gene trees, revealing structure in sets of trees, and detecting outlier genes, among others. CloudForest takes the form of a dockerized instance of the Galaxy framework configured with all of the tools needed to execute phylogenomic analysis as well as many intuitive visualization schemes to explore results from cutting-edge analyses, such as NLDR, bipartition-covariance analyses, and community detection in a centralized way. This containerized approach means that the system is highly portable, and can be accessed across a wide range of computing platforms from personal computers to HPC resources. By allowing researchers to address some of the most prominent questions in modern phylogenomics in a dynamic and customizable way, CloudForest will revolutionize the way we analyze sets of trees of all sizes.

19. Using nearest neighbours to combine distance and tree-search based tree reconstruction methods

Florian Pflug[1], Alina Leuchtenberger[1], Simon Haendeler[1], Veronika Bošková[1], Arndt von Haeseler[1]

[1] Center for Integrative Bioinformatics Vienna (CIBIV), Joint Institute of the University of Vienna and Medical University of Vienna, Max Perutz Labs, A-1030 Vienna, Austria.

Methods to reconstruct the phylogenetic tree from a multiple sequence alignment (MSA) fall into two broad categories. Distance-based methods use the MSA to infer a matrix of evolutionary distances between pairs of taxa, and then rely on these distances to reconstruct the tree. Tree-search methods on the other hand search the space of all possible trees for the “best” tree under some scoring paradigm, typically either maximum-likelihood or maximum-parsimony. Tree-search methods usually outperform distance-based methods when it comes to accuracy of the reconstructed tree, but are computationally much more costly.

We present a mathematical framework for combining these two approaches, with the goal of reducing the computational costs of tree-search based methods without sacrificing their accuracy. We consider pairs of taxa (a, b) where the distance between a and b is smaller than between a and any other taxon and show how such nearest-neighbour pairs constrain the set of possible tree topologies. We then present an algorithm to reliably infer nearest neighbours from empirically observed distances, and discuss how the resulting topological constraints could be integrated into a tree-search procedure. To bound the possible performance benefits, we derive a formula for computing the size of the restricted search-space, and show that the achievable size reductions are considerable.