GENERAL INFORMATION Title Replication data for: Association genetics and genomic prediction for resistance to root rot in a diverse collection of Pisum sativum L. Data description This repository contains genotypic and phenotypic data from a collection of 254 genotypes of pea (Pisum sativum L.). These data was used to identify genomic regions in the pea genome associated with root rot resistance. Principal investigators - Daniel Ariza-Suarez (daniel.arizasuarez@usys.ethz.ch) (1) - Bruno Studer (bruno.studer@usys.ethz.ch) (1) - Valentin Gfeller (valentin.gfeller@fibl.org) (2) - Monika M. Messmer (monika.messmer@fibl.org) (2) (1) Molecular Plant Breeding. Institute of Agricultural Sciences. ETH Zurich. Zurich, Switzerland. (2) Department of Crop Sciences, Research Institute of Organic Agriculture (FiBL), Frick, Switzerland Data collection The genotypic matrices included in this repository were obtained in March and October 2022. They were obtained from sequencing data generated in January 2021. The phenotypic data were originally reported by Wille et al. (2020; https://doi.org/10.3389/fpls.2020.542153) and is reproduced here for reference. DATA AND FILE OVERVIEW - Cameor_anntd_q30_dp3_s_bial_maf02_oh05_n55.vcf.gz: Genotypic matrix in Variant Call Format (VCF) with single nucleotide polymorphism (SNP) data using the pea reference genome of cultivar 'Caméor' (https://doi.org/10.1038/s41588-019-0480-1). This matrix is annotated using snpEff (v5.0e; https://doi.org/10.4161/fly.19695) and filtered to retain biallelic SNPs, genotype calls supported by more than 3 reads, minor allele frequency greater than 0.02, maximum heterozygosity rate per SNP of 0.05, and at least 55 accessions genotyped per SNP. This file is in plain text with gzip compression. This file was generated in March 2022. - Cameor_anntd_q30_dp3_s_bial_maf02_oh05_n55.gds: Same genotypic matrix as above, converted to Genomic Data Structure (GDS) format using the 'seqVCF2GDS()' function of the R package SeqArray (v1.42.4). This file was generated in March 2022. - ZW6_anntd_q30_dp3_s_bial_maf02_oh05_n55.vcf.gz: Genotypic matrix in Variant Call Format (VCF) with SNP data using the pea reference genome of cultivar 'Zhongwan 6' (https://doi.org/10.1038/s41588-022-01172-2). This matrix is annotated using snpEff (v5.0e; https://doi.org/10.4161/fly.19695) and filtered to retain biallelic SNPs, genotype calls supported by more than 3 reads, minor allele frequency greater than 0.02, maximum heterozygosity rate per SNP of 0.05, and at least 55 accessions genotyped per SNP. This file is in plain text with gzip compression. This file was generated in October 2022. - ZW6_anntd_q30_dp3_s_bial_maf02_oh05_n55.gds: Same genotypic matrix as above, converted to Genomic Data Structure (GDS) format using the 'seqVCF2GDS()' function of the R package SeqArray (v1.42.4). This file was generated in Ocober 2022. - Phenotypic_data.csv: List of plant material and phenotypic data used to identify genomic regions associated with root rot resistance. This table is in plain text CSV format. SHARING AND ACCESS INFORMATION This work is licensed under a Creative Commons Attribution 4.0 International license. METHODOLOGICAL INFORMATION This data was derived from plant material reported by Wille et al. (2020; https://doi.org/10.3389/fpls.2020.542153). Briefly, a panel of 254 pea genotypes was assembled and tested for root rot resistance. The panel contained full-leaf and semi-leaf-less genotypes comprising 177 genebank accessions from the USDA-ARS GRIN Pea Core Collection, 47 advanced breeding lines from a private organic breeding organization (Getreidezüchtung Peter Kunz, Switzerland) and 34 registered cultivars from Europe. The population was genotyped-by-sequencing (GBS) following the protocol proposed by Poland et al. (2012; https://doi.org/10.1371/journal.pone.0032253) using a combination of PstI and MspI as restriction enzymes. GBS libraries were prepared at the plateforme d’analyses génomiques of the Institut de Biologie Intégrative et des Systèmes (IBIS, Université Laval, Québec, Canada) with the following modifications: a BluePippin (Sage Scientific, Beverly, MA) was used to size the libraries before PCR amplification (elution set between 50 and 65 min, on a 2% gel). Libraries were normalized, pooled, and then denatured in 0.02N NaOH and neutralized using HT1 buffer. Plate barcoding was used to enable sequencing on a shared Illumina NovaSeq S4 lane. Sequencing was performed at the Centre d’expertise et de services Genome Québec in Canada. The pool was loaded at 225pM on an Illumina NovaSeq S4 lane using the Xp protocol according to the manufacturer’s recommendations. The run was performed for 2x150 cycles (paired-end mode). A phiX library was used as a control and mixed with libraries at 1% level. Base calling was performed using RTA software (v3). The bcl2fastq2 software (v2.20) was then used to demultiplex samples and generate FASTQ reads. Sequence demultiplexing was performed with Stacks (v2.60; https://doi.org/10.1111/mec.12354), allowing up to one mismatch in the adapter sequence. Adapter tails were clipped with HTStream (v1.3.3; https://github.com/s4hts/HTStream). Using Bowtie (v2.4.4; https://doi.org/10.1038/nmeth.1923), the processed reads were mapped to the reference genomes of P. sativum cv. ‘Caméor’ (https://doi.org/10.1038/s41588-019-0480-1) and ‘Zhongwang 6’ (https://doi.org/10.1038/s41588-022-01172-2). The mapped reads were used for single nucleotide polymorphism (SNP) calling using NGSEP (v4.1.0; https://doi.org/10.1111/1755-0998.13737). The variant call format (VCF) matrix was filtered for genotype calls with a quality score above 30, minor allele frequency above 0.02, and a maximum observed heterozygosity rate of 0.05 per SNP marker. Finally, variants in less than 22% genotyped samples were removed to reduce the proportion of missing data in the genotypic matrix to approximately 30%. The predicted effect of these sequence variants on the gene models of the reference genomes was annotated with snpEff (v5.0e; https://doi.org/10.4161/fly.19695). DATA-SPECIFIC INFORMATION The 'Phenotypic_data.csv' file encodes missing data as 'NA' values, and contains the following columns: - Genotype_ID: Unique genotype identifier. This is the same as the genotype identifier used in the VCF and GDS files. - Alternative_ID: Alternative identifier. This includes the accession, cultivar or breeding line identifier. - BioSample: NCBI Sequence Read Archive (SRA) BioSample identifier for the raw sequencing data of each genotype. - SRA_run: NCBI Sequence Read Archive (SRA) run identifier for the raw sequencing data of each genotype. - Leaf_type: Leaf morphology of the pea accession. It can be full leaf type (0) or semi-leafless (1). - Plant_height_NS: Plant height in centimetres of plants grown under non-sterile soil conditions. - Plant_height_S: Plant height in centimetres of plants grown under sterile soil conditions. - SDW_NS: Shoot dry weight in grams of plants grown under non-sterile soil conditions. - SDW_S: Shoot dry weight in grams of plants grown under sterile soil conditions. - Emergence_NS: Emergence rate of seedlings 14 days after sowing under non-sterile soil conditions. - SDW_NS/S: Ratio of shoot dry weight between non-sterile and sterile conditions. - RRI_NS: Root rot severity index. It takes values between 1 (no disease symptoms) and 6 (complete disintegration of the root system). - RDW_SDW_NS: Ratio of root to shoot dry weight of plants grown under non-sterile soil conditions.