PACCMIT-CDS: Searching the coding region for microRNA targets

PACCMIT–CDS

Finding microRNA targets in the coding region is difficult due to the overwhelming signal encoding the amino acid sequence. PACCMIT–CDS [1] is an algorithm, which finds potential microRNA targets within coding sequences (CDS) by searching for conserved motifs that are complementary to the microRNA seed region and also overrepresented in comparison with a background model [2,3] preserving both codon usage and amino acid sequence. Thanks to the properly constructed background, the new algorithm achieves a lower rate of false positives and better ranking of predictions than do currently available algorithms, which were designed to find microRNA targets within 3′UTRs.

References:

Marín RM, Šulc M, Vaníček J. 2013. RNA 19: 467 – 474.
Fuglsang A. 2004. Biochem. Biophys. Res. Commun. 316: 755 – 762.
Robins H, Krasnitz M, Barak H, Levine AJ. 2005. J. Bacteriology 187: 8370 – 8374.

PACCMIT

In summary, PACCMIT [1] is a flexible algorithm that filters potential miRNA binding sites in 3'UTR regions by their conservation, accessibility, or both, and then ranks the predictions according to the over-representation with respect to a random background based on a Markov model.

The ranking of the conserved target predictions according to the over-representation of the oligomer sequences in the 3'UTR complementary to the microRNA seed was proposed by Robins and Press [2] and used by Murphy et al. [3] to discover the roles of microRNAs in the reactivation and latency of herpesviruses. The PACMIT algorithm [4] applies this ranking only to partially accessible complementary sites, i.e., sites that contain a four–mer that appears in a single-stranded RNA region of at least 20% of the Boltzmann ensemble of secondary RNA structures. This accessibility filter was motivated by a method used by Robins, Li, and Padgett [5] for incorporating RNA structure to the target prediction. Marín and Vaníček [4] showed that filtering predictions by accessibility and ranking them by over-representation results in much higher precision than that obtained by other free–energy based methods.

Finally, the flexible algorithm that we call PACCMIT (Prediction of ACcessible and/or Conserved MIcroRNA Targets) [1], was obtained by combining PACMIT with a general conservation filter.

References:

Marín RM, Vaníček J. 2012. PLoS ONE 7: e32208.
Robins H, Press WH. 2005. Proc. Natl Acad. Sci. USA 102: 15557 – 15562.
Murphy E, Vaníček J, Robins H, Shenk T, Levine AJ. 2008. Proc. Natl Acad. Sci. USA 105: 5453 – 5458.
Marín RM, Vaníček J. 2011. Nucleic Acids Research 39: 19 – 29.
Robins H, Li Y, Padgett RW. 2005. Proc. Natl Acad. Sci. USA 102: 4006 – 4009.

Unless stated otherwise, PACCMIT [1] is assumed within this web-site for predicting targets in the 3'UTR regions.

Contact

If you have any questions, please contact us either by e-mail or visit the web-page of our group, Laboratory of Theoretical Physical Chemistry, at the Ecole polytechnique fédérale de Lausanne (EPFL).

E-mail jiri.vanicek@epfl.ch
WWW http://lcpt.epfl.ch

Predictions

Options

Algorithm & Database

Please, select the version of our algorithm you wish to use.

Algorithm

Use conservation

Use accessibility

Please, select which genome, assembly version, and track you want to study.

Genome

Assembly

Track

microRNAs

Accession numbers

Please, specify a list of valid microRNA accession numbers you wish to test.

Input file

You can also paste microRNA accession numbers directly below. Note that both inputs are merged. If left empty, all microRNAs from the corresponding database are analyzed.

For an example, try out:

mRNAs

Transcript IDs

Please, specify a list of valid transcript IDs you wish to test.

Input file

You can also paste transcript IDs directly below. Note that both inputs are merged. If left empty, all transcripts from the corresponding database are analyzed.

For an example, try out:

Output

Output specification

Here you can limit the number of returned microRNA–mRNA pairs and also specify the desired False Discovery Rate (FDR) in the sense of Benjamini-Hochberg.

Maximum number of pairs

FDR threshold

Please, specify an output-file name and format. If no file name is provided, output will be displayed in a table form in a new browser tab/window.

Output file

PACCMIT-CDS [1]

How are the resulting predictions ranked?

The PACCMIT–CDS algorithm is based on the assumption that functional binding sites that have avoided deleterious mutations should be overrepresented with respect to the composition of the surrounding sequence. Let $c$ denote the number of conserved seed matches (i.e., sites complementary to the seed sequence) in a given coding sequence and for a given microRNA. PACCMIT–CDS ranks its predictions according to an approximate probability $P_{\text{SH}}$ that a randomly generated sequence (based on a model preserving both the amino acid sequence and codon usage) would contain at least $c$ conserved seed matches. As usual, lower values of $P_{\text{SH}}$ suggest that the interaction is more likely to be functional.
How do you define conservation?

We employ the “Any–species” approach used by Marín and Vaníček [2]: a seed match is considered conserved if it is present in the aligned sequences of at least $S=12$ species (including the reference), regardless of their phylogenetic distance from the reference.
How did you assess precision and/or sensitivity?

Precision and sensitivity of PACCMIT–CDS were evaluated using PAR-CLIP and proteomics data sets. See Section Data sets of validated targets in [1].

References:

Marín RM, Šulc M, Vaníček J. 2013. RNA 19: 467 – 474.
Marín RM, Vaníček J. 2012. PLoS ONE 7: e32208.

PACCMIT [1]

How does PACCMIT score microRNA—3'UTR interactions?

The PACCMIT algorithm [1] is based on the assumption that functional binding sites that have avoided deleterious mutations should be overrepresented with respect to the composition of the surrounding sequence. Let $c$ denote the number of conserved and/or accessible seed matches (i.e., sites complementary to the seed sequence) in a given 3'UTR for a given microRNA. PACCMIT ranks its predictions according to an approximate probability $P_{\text{SH}}$ that a randomly generated sequence (based on a Markov model) would contain at least $c$ conserved and/or accessible seed matches. As usual, lower values of $P_{\text{SH}}$ suggest that the interaction is more likely to be functional.
What is the effect of the various filters (accessibility/conservation) within PACCMIT?

We showed [1] that while the conservation filter is more effective than the accessibility filter for predicting targets of highly conserved microRNAs, accessibility performs better than conservation in the case of weakly conserved microRNAs. Moreover, in the case of highly conserved microRNAs, the performance is improved even further, especially among the top predictions, by using the combined filter.
How do you evaluate accessibility within PACCMIT/PACMIT?

The algorithm for calculating accessibility is common to both PACMIT [2] and PACCMIT [1]. A 7–mer in the 3'UTR sequence is catalogued as accessible if it contains at least one 4–mer located in the single stranded region of at least 20% of RNA secondary structures. The probability of a 4–mer to appear in a single–stranded region is evaluated with the program RNAplfold [3] using a window $W=80$ and a maximum pairing distance $L=40$ [4].
How do you treat conservation in PACCMIT?

The conservation filter in PACCMIT was optimized using a training dataset constructed from the proteomics data by Baek et al. [5] and Selbach et al. [6]. We computed [1] precision and the number of true targets per microRNA as functions of the number of predictions per microRNA for varying stringency of the conservation filter and found that the conservation filter with four selected species (human, chimp, rhesus, mouse) outperformed the filter with any twelve species and thus was used by default in all PACCMIT analyses.

References:

Marín RM, Vaníček J. 2012. PLoS ONE 7: e32208.
Marín RM, Vaníček J. 2011. Nucleic Acids Research 39: 19 – 29.
Bernhart SH, Hofacker IL, Stadler PF. 2006. Bioinformatics 22: 614 – 615.
Tafer H, Ameres SL, Obernosterer G, Gebeshuber CA, Schroeder R, et al. 2008. Nat. Biotechnol. 26: 578 – 583.
Baek D, Villén J, Shin C, Camargo F, Gygi S, et al. 2008. Nature 455: 64 – 71.
Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, et al. 2008. Nature 455: 58 – 63.

Tutorial

The purpose of this short tutorial is to demonstrate a typical usage of this web server. The recipe below guides the user through the fours steps of the wizard accessible via the link Predictions in the main menu.

Choosing the algorithm & database

In our example, we select the PACCMIT algorithm and invoke the conservation and accessibility filters. Also, note that we explictly request Ensembl genes.
Selecting microRNAs of interest

Let us assume that we are interested in the targets of the microRNAs listed in the figure below. In general, the accession numbers (not the IDs) can be pasted in the window by hand, but the specific microRNA accession numbers shown in the figure can be loaded automatically by clicking the Load sample data button.
Selecting mRNAs of interest

In this step, we specify the mRNAs. Since we selected Ensembl genes, we have to paste the identificators in the corresponding format. As in step 2, the specific set of mRNA IDs shown in the figure can be loaded automatically by clicking the Load sample data button.
Imposing output parameters

Finally, we impose that we are interested in at most 200 microRNA—mRNA pairs. However, this constraint does not apply in our example since the maximum possible number of returned microRNA—mRNA pairs is $3\times4=12$. Moreover, we set the (Benjamini-Hochberg) FDR threshold to 5%. Since the input field Output file is left empty, the output is redirected into a new browser window.
Structure of the output

The top part of the output, labeled as “Input overview,” shows which user-supplied miRNA accession numbers and mRNA IDs were discarded and which were used in the search. The top section also explicitly lists those used miRNAs from the input that have at least one potential target among the used mRNAs and those used mRNAs from the input that are potential targets of at least one used miRNA. By a potential target is meant a mRNA that contains a seed match (i.e., a 7-mer complementary to the seed of a given miRNA) satisfying the required accessibility and/or conservation filters. The web server automatically provides IDs and nucleotide sequences for the miRNAs as well as gene names/descriptions for mRNAs.

PACCMIT-CDS

Searching CDS & 3'UTRs for microRNA targets

EPFL

PACCMIT–CDS

References:

PACCMIT

References:

Contact

Predictions

Options

microRNAs

mRNAs

Output

PACCMIT-CDS [1]

How are the resulting predictions ranked?

How do you define conservation?

How did you assess precision and/or sensitivity?

References:

PACCMIT [1]

How does PACCMIT score microRNA—3'UTR interactions?

What is the effect of the various filters (accessibility/conservation) within PACCMIT?

How do you evaluate accessibility within PACCMIT/PACMIT?

How do you treat conservation in PACCMIT?

References:

Tutorial

Choosing the algorithm & database

Selecting microRNAs of interest

Selecting mRNAs of interest

Imposing output parameters

Structure of the output