EukDetect:基因标记基因的真核微生物注释
Challenges in capturing the mycobiome from shotgun metagenome data: lack of software and databases | Microbiome | Full Text评价EukDetect比较准确
安装
cd Software
git clone https://github.com/allind/EukDetect.git
cd EukDetect
# 下载 https://figshare.com/articles/dataset/Eukdetect_database/12670856/8?file=34880610
tar -xzvf eukdetect_database_v2.tar.gz conda env update --name eukdetect -f environment.yml
conda activate eukdetect
# install eukdetectpython setup.py install
# 要测试您的安装,请编辑文件 tests/configfile_for_tests.yml ,输入安装目录的路径和 EukDetect 数据库的路径。
python tests/test_eukdetect.py
使用
#将 default_configfile.yml 复制到新建的 your_configfile.yml 。按照描述修改配置文件中的所有参数。
# gzip -dc test.fastq.gz | head -n 10000 | awk '{ if (NR%4==2){count++; bases += length}} END{printf "%3.0f\n", bases/count}' #可以用于确定readlen填多少长度
configfile.yml文件
#Default config file for eukdetect. Copy and edit for analysis#Directory where EukDetect output should be written
output_dir: "/home/zhongpei/diarrhoea/xjs_FJ_metagenomic/drep_bin/all_bin/fungi/eukdetect/"#Indicate whether reads are paired (true) or single (false)
paired_end: true #filename excluding sample name. no need to edit if paired_end = false
fwd_suffix: "_clean_1.fastq.gz" #filename excludign sample name. no need to edit if paired_end = false
rev_suffix: "_clean_2.fastq.gz"#file name excluding sample name. no need to edit if paired_end = true
se_suffix: ".fastq.gz" #length of your reads. pre-trimming reads not recommended
readlen: 150#full path to directory with raw fastq files
fq_dir: "/home/zhongpei/diarrhoea/xjs_FJ_metagenomic/metaMIC_contigs"#full path to folder with eukdetect database files
database_dir: "/home/zhongpei/hard_disk_sda2/zhongpei/Software/EukDetect/database/"#name of database. Default is original genomes only database name
database_prefix: "ncbi_eukprot_met_arch_markers.fna"#full path to eukdetect installation folder
eukdetect_dir: "/home/zhongpei/hard_disk_sda2/zhongpei/Software/EukDetect"#list sample names here. fastqs must correspond to {samplename}{se_suffix} for SE reads or {samplename}{fwd_suffix} and {samplename}{rev_suffix} for PE
#each sample name should be preceded by 2 spaces and followed by a colon character
samples:F1:F2:F3:F4:F5:F6:F7:F8:F9:F10:F11:F12:F13:F14:F15:F16:F17:F18:F19:F20:F21:F22:F23:F24:F25:F26:F27:F28:F29:F30:F31:F32:F33:F34:F35:F36:F37:F38:F39:F40:F41:F42:F43:F44:F45:F46:F47:F48:F49:F50:J1:J2:J3:J4:J5:J6:J7:J8:J9:J10:J11:J12:J13:J14:J15:J16:J17:J18:J19:J20:J21:J22:J23:J24:J25:J26:J27:J28:J29:J30:J31:J32:J33:J34:J35:J36:J37:J38:J39:J40:J41:J42:J43:J44:J45:J46:J47:J48:J49:J50:
正式运行
eukdetect --mode runall --configfile ~/your_configfile.yml --cores 32