Navigation

  1. What is it ?
  2. List of features
  3. How to get it ?
  4. A simple example
  5. References

Welcome to pLocalScore homepage !

last update: june 20, 2006

What is it ? (top)

Given a sequence X=X1,X2,...,Xn over an alphabet and S(), a scoring function defined on the same alphabet, the local score H on this sequence is the max sum score we can get over a segment of X.

For example, if we consider the sequence X=ababbbabbbbaabaaaba with the score function S(a)=-2, S(b)=+1, the best segment is 4-bbbabbbb-11 and H=+5.

The tool of local score can be used for many purposes: search for hydrophobic domains in proteins, search for DNA regions with high G-C content, and more generally, any kind of "sliding window" problem with the advantage that no window size have to be specified.

For any of these uses, the problem of assessing a p-value to an observation (assuming the sequence is random according to an independant or Markov model) quickly arise. Although the simplicity of the local score, its distribution is quite hard to compute.

The purpose of pLocalScore is to provide efficient and state-of-the-art p-values for local score problem using the widest range of statistical approaches available.

List of features (top)

list of current features:

  • compute local score for any custom score function
  • compute approximate p-values with Karlin's approximation (case independant sequence only)
  • compute exact p-values using FMCI (case independant only)

list of upcoming features:

  • add markov support for exact case
  • add refinements for the Karlin's approximations
  • add Markov support for Karlin's approximations
  • add finite size correction for Karlin's approximations

How to get it ? (top)

Source distribution: latest

Building the program:

tar zxvf plocalscore-xxx.tar.gz
cd plocalscore-xxx
make
make check
make install

A simple example (top)

Here is a simple command-line example:

$ plocalscore sample.fasta -S KD.score -m 0
1:104K_THEPA_(P15711)   924     905     924     49.00   0.762365
2:108_LYCES_(Q43495)_   102     14      101     62.20   2.746254
3:10KD_VIGUN_(P18646)   75      5       24      47.80   2.125093
4:110KD_PLAKN_(P13813   296     99      122     13.50   0.001387
5:11S3_HELAN_(P19084)   493     4       20      36.70   0.480048
6:11SB_CUCMA_(P13744)   480     5       23      34.70   0.407269
7:128UP_DROME_(P32234   368     66      152     43.60   0.915202
8:12AH_CLOS4_(P21215)   29      0       28      22.90   1.126437
9:12KD_FRAAN_(Q05349)   111     0       4       13.70   0.062176
10:12KD_MYCSM_(P80438)  23      0       16      24.00   1.407677
11:12S1_ARATH_(P15455)  472     3       18      36.50   0.487989
12:12S2_ARATH_(P15456)  455     3       18      34.60   0.422355
[...]

The first column gives a sequence id (with its number), the second one gives the sequence length, the third one the begin position of the best segment, the fourth the end position of the best segment, the fifth gives the local score and the last column gives -log10(p-values)

References (top)

  • G. Nuel (2006) Exact distribution of local score using Finite Markov Chain Imbedding: an effective approach. ICAM 2006, Santiago, Chile. (short abstract, extended abstract, talk)
  • G. Nuel (2006) Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics. Algo. Mol. Biol. 1(5) direct link in journal