[Objective] Identifying and correcting the overestimation on contamination of metagenome-assembly genomes (MAGs) caused by the broken marker genes.[Methods] The impact of broken genes on quality assessment of genome was first analyzed using the simulated genomes from randomly fragmented the complete genome of isolates. We designed a corrected pipeline that identifying the broken genes pairs from the same "source" gene according to the taxonomic annotation against the nr database. Then the genome contamination was corrected by removing the redundant marker genes.[Results] The phenomenon that the genome contamination is positively correlated with the genome fragmentation degree was observed in both simulated genomes and MAGs obtained by genome binning. We designed a corrected pipeline based on the idea of identifying broken genes from the same "source" gene and the results based on the simulated genomes showed the contamination can be adjusted to complete genome level. Testing on 760 MAGs with contamination from gut and soil samples, we observed a reduction in contamination for nearly half of the MAGs, with 43 of them dropping to 0.[Conclusion] Our pipeline can correct the overestimated contamination of genome caused by broken genes to some extent and improve the availability of MAGs. The pipeline is expected to apply to the genome quality assessment of the increasing number of MAGs.
Reference
Related
Cited by
Get Citation
Hao Li, Dongxu Yang, Linran Wen, Wei Zheng, Feng Guo. Marker gene broken caused overestimation on the contamination of metagenome-assembled genomes and its correction. [J]. Acta Microbiologica Sinica, 2021, 61(9): 2921-2933