Intro#
A research paper published on biorxiv determined a new coronavirus subgenus, I would like to figure out is there any changes on protease. However, the sequence data has not been publish.
Fortunately, the similar sequence is do available on NCBI, unfortunately, only RNA-seq data is available.
So I need to assemble the RNA-seq reads first, and BLAST the sequence I need with the assembled data.
TL;DR#
-
Setup the environment with conda:
-
Fetch the data:
-
Data quality check
-
Quality control using fastp
-
Data quality check (post-cleaning data)
-
Assemble with Trinity
-
Check the Trinity result:
-
BLAST sequence of interest
-
Put your sequence in query.fasta.
-
Make BLAST database and run:
-
-
Check the BLAST result:
-
Extract the sequence from
trinity.Trinity.fasta
Tail#
-
You can also blast with the Predicted sequence:
-
Make BLAST database and run:
此文由 Mix Space 同步更新至 xLog
原始链接为 https://xxu.do/posts/academic/De-novo-assemble-RNA-seq-sequence