.. highlight:: console Extract ncRNA sequences =================================================== All Rfam ncRNA sequences become available on the `FTP site `_ with every new release. The following is a tutorial on how to extract sequences using the public instance of the `MySQL database `_ and the ``esl-sfetch`` tool. **Requirements:** 1. MySQL Community Server, freely available `here `_ 2. ``esl-sfetch`` from Infernal's Easel `miniapps` ===================================================== 1. Download and install the `Infernal software `_. You can find additional information in the `Infernal User's Guide `_. .. seealso:: `Genome annotation `_ section 2. Add Infernal tools to your **$PATH** using the following command: :: > export PATH="/path/to/infernal-1.1.x/bin:$PATH" 3. Download Rfam.fa.gz (combined file of all the fasta files) from the FTP using ``wget`` and then unzip: :: > wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/Rfam.fa.gz > gunzip Rfam.fa.gz 4. Index the unified sequence file using ``esl-sfetch``: :: > esl-sfetch --index Rfam.fa .. note:: If the above command is successful, you should see a ``.ssi`` file generated in your current directory. 5. Create a ``.sql`` file with a SQL command that fetches the regions of interest. Example query to retrieve all human ncRNAs: :: select concat(fr.rfamseq_acc,'/',fr.seq_start,'-',fr.seq_end) from full_region fr, genseq gs where gs.rfamseq_acc=fr.rfamseq_acc and fr.is_significant=1 and fr.type='full' and gs.upid='UP000005640' -- human upid and gs.version=14.0; .. Example query to retrieve all human snoRNAs: :: select concat(fr.rfamseq_acc,'/',fr.seq_start,'-',fr.seq_end) from full_region fr, genseq gs, family f where gs.rfamseq_acc=fr.rfamseq_acc and f.rfam_acc=fr.rfam_acc and fr.is_significant=1 and fr.type='full' and gs.upid='UP000005640' -- human upid and f.type like '%snoRNA%' and gs.version=14.0; Example query to retrieve all mammalian 5S ribosomal RNAs (RF00001): :: select concat(fr.rfamseq_acc,'/',seq_start,'-',seq_end) from full_region fr, rfamseq rs, taxonomy tx where fr.rfamseq_acc=rs.rfamseq_acc and tx.ncbi_id=rs.ncbi_id and fr.rfam_acc='RF00001' and tx.tax_string like '%Mammalia%' and is_significant=1; .. note:: In order for ``esl-sfetch`` to work with the Rfam fasta file, the regions need to be in the format: **rfamseq_acc/seq_start-seq_end**. 6. Fetch a list of accessions to extract from the database and save them in a ``.txt`` file using the MySQL database: :: > mysql -u rfamro -h mysql-rfam-public.ebi.ac.uk -P 4497 --skip-column-names --database Rfam < query.sql > accessions.txt .. 7. Extract the ncRNA sequences in the ``.txt`` file generated in **step 6** from the unified Rfam fasta file from **step 3** using ``esl-sfetch``: :: > esl-sfetch -f Rfam.fa /path/to/accessions.txt > Rfam_ncRNAs.fa