Draft genome and reference transcriptomic resources for the urticating pine defoliator Thaumetopoea pityocampa

Information
Authors: 
Gschloessl, B., Dorkeld, F., Berges, H., Beydon, G., Bouchez, O., Branco, M., Bretaudeau, A., Burban, C., Dubois, E., Gauthier, P., Lhuillier, E., Nichols, J., Nidelet, S., Rocha, S., Sauné, L., Streiff, R., Gautier, M. & Kerdelhué, C.
Journal: 
Molecular Ecology Resources
Journal publication date: 
2018
DOIs: 
http://dx.doi.org/10.1111/1755-0998.12756
Abstract

The pine processionary moth Thaumetopoea pityocampa (Lepidoptera: Notodontidae) is the main pine defoliator in the Mediterranean region. Its urticating larvae cause severe human and animal health concerns in the invaded areas. This species shows a high phenotypic variability for various traits, such as phenology, fecundity, and tolerance to extreme temperatures. This study presents the construction and analysis of extensive genomic and transcriptomic resources, which are an obligate prerequisite to understand their underlying genetic architecture. Using a well-studied population from Portugal with peculiar phenological characteristics, the karyotype was first determined and a first draft genome of 537 Mb total length was assembled into 68 292 scaffolds (N50=164 kb). From this genome assembly 29 415 coding genes were predicted. To circumvent some limitations for fine scale physical mapping of genomic regions of interest, a 3X coverage BAC library was also developed. In particular, 11 BACs from this library were individually sequenced to assess the assembly quality. Additionally, de novo transcriptomic resources were generated from various developmental stages sequenced with HiSeq and MiSeq Illumina technologies. The reads were de novo assembled into 62 376 and 63 175 transcripts, respectively. Then, a robust subset of the genome-predicted coding genes, the de novo transcriptome assemblies and previously published 454/Sanger data were clustered to obtain a high quality and comprehensive reference transcriptome consisting of 29 701 bona fide unigenes. These sequences covered 99% of the CEGMA and 88% of the BUSCO highly conserved eukaryotic genes and 84% of the BUSCO arthropod gene set. Moreover, 90% of these transcripts could be localized on the draft genome.