how to

how to run experimental tests

The main command smart is used for running experimental tests.
The easiest way to use smart is to run a single search for a custom pattern and a custom text. To this pourpose use the -simple parameter followed by the pattern and the text. For instance the command

$ ./smart -simple let sampletext

will search text sampletext for occurrences of the pattern let. Observe that the input pattern size is bounded by 100 characters, while the text size is bounded by 1000 characters.
The simple mode does not output any experimental map.
Otherwise you can select the corpus which will be used to compute the experimental results by using the parameter -text (this parameter is mandatory). For instance the command

$ ./smart -text englishTexts

will compute experimental results on the englishTexts corpus. The directory data/, located in the smart main directory, contains all the corpus which can be selected in smart. See this page for the list of all corpus included in smart. Otherwise you can select the parameter -text all in order to run experimental tests for all corpus.

$ ./smart -text all

In this last case, the corpus will be processed one after the other.
You can set an upper bound dimension of the text size used for testing the string matching algorithms. By default this upper bound dimension is set to 1MB (1024 bytes). This means that (at most) the first 1024 bytes of the selected corpus will be used for testing. You can change the default upper bound dimension by using the parameter -tsize, followed by an integer value which indicate the Mbytes which will be used. For instance the command

$ ./smart -text englishTexts -tsize 4

will perform the tests on at most 4 MB of the englishText corpus.
The text buffer is stored in shared memory, thus if you set the upper bound to a value K it is necessary to accertain your system allows the allocation of at least K MB of shared memory.
By default for each input file, smart generates sets of 500 patterns of fixed length, randomly extracted from the text (500 copies of the same pattern in the case of the -simple mode). The length of the patterns ranges over the values 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 and 4096. For each set of patterns the tool reports the mean over the running times of the 500 runs. Running times are expressed in thousandths of seconds.
Use the parameter -pset in order to modify the size of the set of patterns generated by the tool. For instance the command

$ ./smart -texts genome -pset 100

will run experimental tests on the genome corpus generating sets of 100 patterns of fixed length (the default value is 500).
You can use the parameter -short in order to perform experimental tests on short patterns. In particular the command

$ ./smart -texts genome -pset 100 -short

performs experimental tests by generating sets of 100 patterns of fixed length ranging over the values 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 and 32.
If you want to restrict the search to a given length pattern, use the parameter -plen and define an upper bound and a lower bound for the lengths of the pattern. for instance the command

$ ./smart -texts genome -pset 100 -plen 16 128

performs experimental tests over the length values 16, 32, 64, 128. If you want to test the algorithms for a single length of the pattern (for instance 128) use the command

$ ./smart -texts genome -pset 100 -plen 128 128

During the execution of the tests the tool prints out the running times of each algorithm (this is the default setting). Use the parameter -occ for printing out also the number of occurrences found by the algorithm during the runs. Since patterns are randomly extracted from the text, the number of occurrences is at least 1.
Finally the parameter -h will produce an help list.
The smart tool associates to any experimental test a unique alphanumeric code on 13 characters, beginning with EXP, followed by 10 numbers. The execution of the following command

$ ./smart -text genome -pset 100 -occ

starts the test on the genome corpus, using sets of 100 patterns patterns. The first lines of the execution should be the following

Try to process archive genome
Loading the file data/genome/ecoli.txt
Text buffer of dimension 1048576 byte
Starting experimental tests with code EXP1306868286

SMART EXP1306868286
Experimental results on genome
Searching for a set of 100 patterns with length 2
Testing 4 algorithms
- [1/4] BF ..................[OK] 10.34 ms 66513
- [2/4] BNDM ................[OK] 12.68 ms 66513
- [3/4] FS ..................[OK] 9.13 ms 66513
- [4/4] HASH3 ...............[--]

The code associated with the test is EXP1306868286.
At the end of the test experimental results are saved in the direcory results/EXP1306868286
The last lines of the execution should be the following

SMART EXP1306868286
Experimental results on genome
Searching for a set of 100 patterns with length 4096
Testing 4 algorithms

- [1/4] BF ..................[OK] 10.71 ms 1
- [2/4] BNDM ................[OK] 4.26 ms 1
- [3/4] FS ..................[OK] 4.04 ms 1
- [4/4] HASH3 ...............[OK] 3.13 ms 1

Saving data on EXP1306868286/genome.txt
Saving data on EXP1306868286/genome.xml
Saving data on EXP1306868286/genome.html
Writing EXP1306868628/index.html