Rfam API¶
Most data in Rfam can be accessed programmatically using a RESTful API allowing for integration with other resources.
Hint
You can also access the data using a Public MySQL Database that contains the latest Rfam release.
Data access¶
The data can be accessed in several formats which can be specified in the URL:
Using curl¶
Here is how to retrieve an XML description of an Rfam family using curl:
curl https://rfam.org/family/snoZ107_R87?content-type=text%2Fxml
Output:
<?xml version="1.0" encoding="UTF-8"?>
<!-- information on Rfam family RF00360 (snoZ107_R87), generated: 12:57:01 31-Oct-2016 -->
<rfam xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://rfam.sanger.ac.uk/"
xsi:schemaLocation="http://rfam.sanger.ac.uk/
http://rfam.sanger.ac.uk/static/documents/schemas/entry_xml.xsd"
release="12.1"
release_date="2016-04-26">
<entry entry_type="Rfam" accession="RF00360" id="snoZ107_R87">
<description>
<![CDATA[
Small nucleolar RNA Z107/R87
]]>
</description>
<comment>
<![CDATA[
Z107 and R87 are members of the C/D class of snoRNA which contain the C (UGAUGA) and D (CUGA) box motifs. Most of the members of the box C/D family function in directing site-specific 2'-O-methylation of substrate RNA
]]>
</comment>
<curation_details>
<author>Moxon SJ</author>
<seed_source>Moxon SJ</seed_source>
<num_seqs>
<seed>9</seed>
<full>144</full>
</num_seqs>
<num_species>37</num_species>
<type>Gene; snRNA; snoRNA; CD-box;</type>
<structure_source>Predicted; RNAfold; Moxon SJ, Daub J, Gardner PP</structure_source>
</curation_details>
<cm_details num_states="">
<build_command>cmbuild -F CM SEED</build_command>
<calibrate_command>cmcalibrate --mpi CM</calibrate_command>
<search_command>cmsearch --cpu 4 --verbose --nohmmonly -T 19 -Z 549862.597050 CM SEQDB</search_command>
<cutoffs>
<gathering>50.0</gathering>
<trusted>50.2</trusted>
<noise>49.8</noise>
</cutoffs>
</cm_details>
</entry>
</rfam>
Using a script¶
Rfam API can also be used from a script written in any programming language, for example Python or Perl.
Python example script
import json
import requests
r = requests.get('https://rfam.org/family/RF00360?content-type=application/json')
print r.json()['rfam']['acc']
Perl example script
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->env_proxy;
my $res = $ua->get(' https://rfam.org/family/snoZ107_R87?content-type=text%2Fxml' );
if ( $res->is_success ) {
print $res->content;
}
else {
print STDERR $res->status_line, "\n";
}
Endpoints¶
Family¶
Family description¶
Returns general information about an Rfam family, such as curation details, search parameters, etc.
Examples:
Accession to ID¶
Returns the ID for the family with the given Rfam accession or ID.
Example:
https://rfam.org/family/snoZ107_R87/acc
Example output:
RF00360
Secondary structure images¶
Returns the schematic secondary structure image for the family. The following types of secondary structure diagrams are supported:
cons (sequence conservation)
fcbp (basepair conservation)
cov (covariation)
ent (relative entropy)
maxcm (maximum CM parse)
norm (normal)
rscape-cyk (secondary structure predicted by R-scape 1 based on Rfam SEED alignment)
Examples:
Covariance models¶
Returns the covariance model for the specified family.
Example: https://rfam.org/family/RF00360/cm
Sequence regions¶
Returns the list of all sequence regions for the specified families in tab-delimited format.
Note
Some families have too many regions to list. The server will return a status of 403 Forbidden
in these cases.
Examples:
Phylogenetic trees¶
Tree image¶
Returns a PNG image showing the phylogenetic tree for the specified family based on seed alignment. The image can be labelled either using species names or sequence accessions.
Examples:
Tree image map¶
Returns the HTML image map that is used in conjunction with the tree image to highlight tree nodes in the Rfam website.
Example:
Note
The HTML snippet contains an <img>
tag that automatically loads the tree image.
Structure mapping¶
Returns the mapping between an Rfam family, EMBL sequence regions and PDB residues. The plain text file has a tab-delimited format.
Examples:
Alignments¶
The following methods can be used to return family alignments in various formats.
Hint
You can request a compressed version of the alignment by adding gzip=1
to the URL.
Stockholm-format alignment¶
Returns the Stockholm-format seed alignment for the specified family.
Examples:
Formatted alignment¶
Returns the seed alignment for the specified family in one of the following formats:
stockholm (standard Stockholm format - default)
pfam (Stockholm with sequences on a single line conservation)
fasta (gapped FASTA format)
fastau (ungapped FASTA format)
Examples:
Sequence searches¶
In addition to a sequence search user interface, it is possible to run single-sequence Rfam searches programmatically.
Running a search is a two step process:
submit the search sequence
retrieve search results
The reason for separating the operation into two steps rather than performing a search in a single operation is that the time taken to perform a sequence search will vary according to the length of the sequence searched. Most web clients, browsers or scripts, will simply time-out if a response is not received within a short time period, usually less than a minute. By submitting a search, waiting and then retrieving results as a separate operation, we avoid the risk of a client reaching a time-out before the results are returned.
The following example uses simple command-line tools to submit the search and retrieve results, but the whole process is easily transferred to a single script or program.
Save your sequence to file¶
It is usually most convenient to save your sequence into a plain text file, something like this:
$ cat test.seq
AGTTACGGCCATACCTCAGAGAATATACCGTATCCCGTTCGATCTGCGAA
GTTAAGCTCTGAAGGGCGTCGTCAGTACTATAGTGGGTGACCATATGGGA
ATACGACGTGCTGTAGCTT
The sequence should contain only valid sequence characters. You can break the sequence across multiple lines to make it easier to handle.
Submit the search¶
When you send a request to the server, you can specify the format of the
response. The server supports JSON
(application/json) and XML (text/xml) output.
In the examples below we’ll
use the JSON output format by adding an Accept
header to the
request, specifying the media type application/json
.
You could use the “content-type” parameter on the URL, rather
than setting a header.
curl -H 'Expect:' -F seq='<test.seq' -H "Accept: application/json" https://rfam.org/search/sequence
Example output:
{
"resultURL": "https://rfam.org/search/sequence/d9b451d8-96e6-4234-9dbb-aa4806925353",
"opened": "2016-10-31 13:19:06",
"estimatedTime": "3",
"jobId": "d9b451d8-96e6-4234-9dbb-aa4806925353"
}
Wait for the search to complete¶
Having submitted the search, you now need to check the resultURL
given in the response, which will be the URL that you used for
submitting the search, but with a job identifier appended.
Although you can check for results immediately, if you poll before your job has completed you won’t receive a full response. Instead, the HTTP response will have its status set appropriately and the body of the response will contain only string giving the status. You should ideally check the HTTP status of the response, rather than relying on the body of the response. See below for a table showing the response status codes that the server may return.
When writing a script to submit searches and retrieve results, please add
a short delay between the submission and the first attempt to retrieve
results. Most search jobs are returned within four to five seconds of
submission, depending greatly on the length of the sequence to be
searched. The estimatedTime
given in the response provides
a very rough estimate of how long your job should take. You may want
to wait for this period before polling for the first time.
Retrieve results¶
The response that was returned from the first query includes a URL from which you can now retrieve results:
curl -H "Expect:" -H "Accept: application/json" https://rfam.org/search/sequence/01d3c704-591a-4a85-b7c1-366496c5a63
{
"closed": "2016-10-31 13:20:29",
"searchSequence": "AGTTACGGCCATACCTCAGAGAATATACCGTATCCCGTTCGATCTGCGAAGTTAAGCTCTGAAGGGCGTCGTCAGTACTATAGTGGGTGACCATATGGGAATACGACGTGCTGTAGCTT",
"hits": {
"5S_rRNA": [{
"score": "104.9",
"E": "2.7e-24",
"acc": "RF00001",
"end": "119",
"alignment": {
"user_seq": "#SEQ 1 AGUUACGGCCAUACCUCAGAGAAUAUACCGUAUCCCGUUCGAUCUGCGAAGUUAAGCUCUGAAGGGCGUCGUCAGUACUAUAGUGGGUGACCAUAUGGGAAUACGACGUGCUGUAGCUU 119 ",
"hit_seq": "#CM 1 gccuGcggcCAUAccagcgcgaAagcACcgGauCCCAUCcGaACuCcgAAguUAAGcgcgcUugggCcagggUAGUAcuagGaUGgGuGAcCuCcUGggAAgaccagGugccgCaggcc 119 ",
"ss": "#SS (((((((((,,,,<<-<<<<<---<<--<<<<<<______>>-->>>>-->>---->>>>>-->><<<-<<----<-<<-----<<____>>----->>->-->>->>>))))))))): ",
"match": "#MATCH :: U:C:GCCAUACC ::G:GAA ::ACCG AUCCC+U+CGA CU CGAA::UAAGC:C:: +GGGC: :G AGUACUA +UGGGUGACC+ UGGGAA+AC:A:GUGC:G:A ::+ ",
"pp": "#PP *********************************************************************************************************************** ",
"nc": "#NC "
},
"strand": "+",
"id": "5S_rRNA",
"GC": "0.49",
"start": "1"
}]
},
"opened": "2016-10-31 13:19:06",
"numHits": 1,
"started": "2016-10-31 13:20:08",
"jobId": "99676096-9F6C-11E6-9647-5251D1B96DDE"
}
Warning
Old search results are regularly cleared out but results will be visible for one week after completion of the original search.
Server responses¶
Server responses include a standard HTTP status code giving information about the current state of your job. These are the possible status codes:
HTTP method |
HTTP status code |
Status description |
Response body |
Notes |
---|---|---|---|---|
POST |
202 |
Accepted |
PEND / RUN |
The job has been accepted by the search system and is either pending (waiting to be started) or running. After a short delay, your script should check for results again. |
POST |
502 |
Bad gateway |
Error message |
There was a problem scheduling or running the job. The job has failed and will not produce results. There is no need to check the status again. |
POST |
503 |
Service unavailable |
Error message |
Occasionally the search server may become overloaded. If the error message suggests that the search queue is full, try submitting your search later. |
GET |
200 |
OK |
Search results |
The job completed successfully and the results are included in the response body. |
GET |
410 |
Gone |
DEL |
Your job was deleted from the search system. This status will not be assigned by the search system, but by an administrator. There was probably a problem with the job and you should contact the help desk for assistance with it. |
GET |
503 |
Service unavailable |
HOLD |
Your job was accepted but is on hold. This status will not be assigned by the search system, but by an administrator. There is probably a problem with the job and you should contact the help desk for assistance with it. |
GET, POST |
500 |
Internal server error |
Error message |
There was some problem accepting or running your job, but it does not fall into any of the other categories. The body of the response will contain an error message from the server. Contact the help desk for assistance with the problem. |