Frequently Asked Questions

What means SPatt ?
Why should I use SPatt ?
What is a p-value ?
What is a Markov model ?
How to choose the Markov model order ?
What are large deviations ?
Where to find simple examples of SPatt command lines ?

What means SPatt ? (top)

Statistics for Patterns.

Why should I use SPatt ? (top)

If you want to study pattern occurrences in a text (such as a DNA sequence for example), then SPatt can help you to compute the p-value of a given observation in respect to a Markov model.

What is a p-value ? (top)

The lower is the p-value of a given observation the less likely is this observation under the reference model. Observations with low p-value have therefore a strange behaviour (in respect with the chosen model) and should be interesting to study.

What is a Markov model ? (top)

An order m Markov model assume that future (let say position i in a sequence) depend only on the past which is not older than m (positions i-m, ... , i-1). If k is the size of the alphabet, such a model have k^{m+1} parameters.

How to choose the Markov model order ? (top)

If the parameters of the Markov model are not provided by the user, these parameters are estimated using maximum of likelihood on the given sequence. This way, a Markov model of order m reflects the frequencies of words of length m+1. According to this, a Markov model order choice depends on what the user wants to take into account in the model: order 0 Markov model take into account the letter composition of the sequence, order 1 Markov model the composition in words of length 2, and so on. In order to avoid parameter estimation issues, high order Markov models (let say greater than 5 or 6) should not be used.

What are large deviations ? (top)

limit central theory prove that the order n empiric mean of a given random variable have deviations from its expectation of magnitude lower than the square root of n. When higher deviations are observed, this theory is no more reliable and dedicated tools should be preferred. Large deviations theory give such tools for deviations of magnitude n. As soon as we are dealing with events with very low p-value (10^-20, 10^-200, 10-2000, ...), large deviations are known to be far more reliable than any other tools.

Where to find simple examples of SPatt command lines ? (top)

Take a look here