The new SPatt branch 2.x uses DFA both to count occurrences and to perform statstistical computations. The technique involves the use of PMC (Pattern Markov Chains, check the reference for more detail) and allows to deal with higly degenerated patterns such as gapped ones (ex: atgtg.(12-15).tggat) or even Prosite ones.
Right now, this branch is still under heavy development and not all feature of the 1.x branch are yet implemented. However, SPatt 2.x is already fully functional for exact computations (including repartition distribution, a new feature) and Gaussian approximations.
Please note that there is only one program called "spatt" in the 2.x branch. The different statistical methods are now available through command-line options rather than specific programs (ex: "spatt --gaussian" rather than "gspatt").
Let us consider the pattern "aba.(0-3)baa" over the binary alphabet {a,b}.
We can build the DFA associated to this pattern and the corresponding PMC (in the M00 model) with the following command:
$ spatt -a ab -p "aba.(0-3)baa" -m -1 --dfa dfa.dot(note that adding the option "-r" to this command line will turn the program to study renewal occurrence of the pattern rather than overlapping ones).
We hence can visualize the DFA using the dot program from the Graphviz project:
$ dot -Grankdir=LR -Nfontsize=40 -Efontsize=40 -Tps dfa.dot -o dfa.epswhich gives

It is then possible to study the distribution of this pattern with several methods:
$ spatt -a ab -p "aba.(0-3)baa" -m -1 -S ab1000.fasta distribution: P(N=0)=1.451793e-24 P(N=1)=8.311068e-23 [output truncated] P(N=39)=8.389673e-03 P(N>=40)=9.733052e-01 pattern=aba.(0-3)baa Nobs=39 P(N<=Nobs)=2.669477e-02gives the exact distribution of the pattern
$ spatt -a ab -p "aba.(0-3)baa" -m -1 -S ab1000.fasta --repartition 16 6.137695e-01 1 51 1.918259e-01 1 66 3.837891e-01 1 101 1.918259e-01 1 116 3.837891e-01 1 [output truncated] 951 1.918259e-01 1 966 3.837891e-01 1 1000 1.918259e-01 0gives the occurrence positions and associate a waiting time p-value to each observation.
$ spatt -a ab -p "aba.(0-3)baa" -m -1 -S ab1000.fasta --gaussian pattern=aba.(0-3)baa Nobs=39 mean=52.402344 sd=6.872593 z-score=-1.950115 P(N<=Nobs)=2.558123e-02performs a Gaussian approximation.
Please note that it is possible to use order m>=0 Markov model but, unlike SPatt branch 1.x, the parameter must be provided through the "-M" option. If you want to use parameter estimated over a sequence, the simplest way to do this is to use SPatt banch 1.x to perform the estimation.
Here is an example: DNA pattern "g.tggtgg.(0-12)g.tggtgg" on Escherichia coli K12 complete genome
$ sspatt U00096.fna -m 3 -M tmp.markov $ spatt -a acgt -p "g.tggtgg.(0-20)g.tggtgg" -m 3 -M tmp.markov -S U00096.fna --gaussian --over pattern=g.tggtgg.(0-20)g.tggtgg Nobs=14 mean=2.173635 sd=1.510378 z-score=7.830070 P(N>=Nobs)=2.437999e-15