The input files
To run the analysis two files must be provided: 1) the gene-profile, and 2) one or more collections of gene-sets.Gene-profile
A gene-profile is a tab separated value .txt file where the first column is the gene-symbol and the second column is a numeric value (eg. differential expression level). No particular ordering is requested as the tool automatically computes the ranks associated to each gene symbol. As an example see this file gene profile. As an example, see this file.Collection of gene-sets
A gene-sets collection is a file in .gmt format (Gene Matrix Transposed file format) from the Broad institute. A clear description of the format is from this page, while an example is from the the Ma'ayan Laboratory. The tool allows to upload multiple .gmt files. The source of each collection is shown in the result table.Setting the parameters of the analysis
Three parameters can be set to run the analysis: 1) the alternative hypothesis, 2) the level of significance of the test with respect to p-value or its corrected versions (Bonferroni and BH), and 3) the thresholds value of logit2NES. The tool encodes for each parameter a default value that the user can change before running the analysis.the alternative hypothesis
The identification of the alternative hypothesis depends on the logic of the gene-profile. The tool allows to choose among: Greater (upper tail) connected to $\mathcal H_1: F_{out}(x) > F_{in}(x)$, Less (lower tail) connected to $\mathcal H_1: F_{out}(x) < F_{in}(x)$, and Two sided (upper and lower tails) connected to $\mathcal H_1: F_{out}(x) \neq F_{in}(x)$. In case the scheme behind the gene-profile is treatment group versus control group, it would be better to adopt the greater alternative; when the scheme is treatment group 1 versus treatment group 2, then the two sided alternative is more appropriate. The choice of the alternative hypothesis affects the computation of the p-value.the level of significance of the test
The user can choose any value between 0 and 1 as the level of significance of the enrichment test. This value can be applied to the p-values, the adjusted p-values according to Benjamini-Hochberg rule (BH-value), or the p-values adjusted according to Bonferroni's method (B-value). This choice affects the tabular results displayed on the screen (and the network-map), while the retrievable tables contain as many rows as the number of the gene-sets.$logit2NES$ threshold value
$logit2NES$ threshold allows to select only those enrichments having a minimum probability to be associated with the treatment group. $logit2NES$ is defined as $logit2NES = \log_2\frac{NES}{1-NES}$
which is a non-linear monotone transformation of the NES. The table below allows to see the equivalence between the values corresponding to the NES, the odds, and the logit2NES.
The default value is 0.9, that means we consider those gene-sets having a probability at least greater than 65% to be associated with the treatment group.
When the alternative hypothesis is "two sided" (treatments group no. one versus treatment group no. two) the threshold concerns the
$abs\_logit2NES = |\log_2\frac{NES}{1-NES}|$.
In this version of the index we symmetrically select those gene-sets having the same probability to be associated both with the treatment group no. one (when $logit2NES > 0$), and the treatment group no. two (when $logit2NES < 0$).
As an example, if we require $abs\_logit2NES > 0.9$, we select the gene-sets with a $NES > 0.65%$ and those with $NES < 0.35%$ to be associated with the treatment group no. one.
However, the gene-sets satisfying $NES < 0.35%$ are those having a $probability = 1 - NES > 0.65$ to be associated with the treatment group no. two.
NES | odds | logit2NES |
0.2 | 0.25 | -2 |
0.3 | 0.43 | -1.23 |
0.4 | 0.67 | -0.58 |
0.5 | 1.0 | 0.0 |
0.6 | 1.5 | 0.58 |
0.65 | 1.86 | 0.90 |
0.75 | 3.0 | 1.58 |
0.9 | 9.0 | 3.17 |
analysis result
When the analysis is completed results appears in a table below. Some minimal statistics and the chosen parameters are shown at the top of the table. The title of the table can be set before the run using the "caption" box. The table shows those gene-sets whose statistics satisfy the constraints about the level of significance and the threshold-value of $logit2NES$. Results can be exported as: 1) a comma separated value text file, 2) a tab separated value text file, and 3) an HTML file of the table. The text files 1, and 2 contain the enrichments associated with every gene-set in the collection (no constraint is applied), while the HTML file is the same table shown on screen. The table contains 11 columns for each gene-set (in the rows). The rows can be ordered according to every column by clicking on the column name.column name | description |
gene-set | This is the gene-set name as it appears in the first column of the .gmt file. Behind the name, there is a link (if it is present in the second column of the .gmt file) to the description of the gene-set as in the case of the .gmt collections from Broad Institute. |
collection | This is the name of the file from which the gene.set is got. |
size | This is the number of the genes in the gene-set. |
actualSize | This is the number of the genes that are present in the gene-profile as well. |
NES | It is the Normalized Enrichment Score, that is $P\left[\mbox{the gene-set is associated with the treatment group}.\right]$ |
odds | This is the unbalance of the NES, i.e. $odds = \frac{NES}{1-NES}$ $=\frac{P\left[\mbox{the gene-set is associated with the treatment group}.\right]}{P\left[\mbox{the gene-set is not associated with the treatment group}.\right]}$ |
logit2NES | This is the $logit$ transformation of the NES, i.e. $logit2NES = \log_2 (odds)$ $= \log_2\frac{NES}{1-NES}$ $=\log_2 \frac{P\left[\mbox{the gene-set is associated with the treatment group}.\right]}{P\left[\mbox{the gene-set is not associated with the treatment group}.\right]}$ |
p‑value | This is the p-value of the Mann-Whitney computed with Central Limit Theorem. |
BH‑value | This is the adjusted p-value according to the Benjamini-Hochberg methodology |
B‑value | This is the adjusted p-value according to the Bonferroni methodology |
relevance | This is the ordering variable (see the statistical details for more info) |