The main command smart is used for running experimental tests.
The easiest way to use smart is to run a single search for a custom pattern and a custom text.
To this pourpose use the -simple parameter followed by the pattern and the text.
For instance the command
$ ./smart -simple let sampletext
will search text
sampletext for occurrences of the pattern
let.
Observe that the input pattern size is bounded by 100 characters, while the text size is bounded by 1000 characters.
The simple mode does not output any experimental map.
Otherwise you can select the corpus which will be used to compute the experimental
results by using the parameter
-text (this parameter is mandatory). For instance the command
$ ./smart -text englishTexts
will compute experimental results on the englishTexts corpus. The directory
data/, located in the smart main directory, contains all the corpus which can be selected in smart.
See this
page for the list of all corpus included in smart.
Otherwise you can select the parameter
-text all in order to run experimental
tests for all corpus.
$ ./smart -text all
In this last case, the corpus will be processed one after the other.
You can set an upper bound dimension of the text size used for testing the string matching algorithms.
By default this upper bound dimension is set to 1MB (1024 bytes). This means that (at most) the first 1024 bytes
of the selected corpus will be used for testing. You can change the default upper bound dimension by using the
parameter
-tsize, followed by an integer value which indicate the Mbytes which will be used.
For instance the command
$ ./smart -text englishTexts -tsize 4
will perform the tests on at most 4 MB of the englishText corpus.
The text buffer is stored in shared memory, thus if you set the upper bound to a value K
it is necessary to accertain your system allows the allocation of at least K MB of shared memory.
By default for each input file, smart generates sets of 500 patterns of
fixed length, randomly extracted from the text (500 copies of the same pattern in the case of the
-simple mode).
The length of the patterns ranges over the values
2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 and 4096. For each set of patterns the
tool reports the mean over the running times of the 500 runs. Running
times are expressed in thousandths of seconds.
Use the parameter
-pset in order to modify the size of the set of
patterns generated by the tool. For instance the command
$ ./smart -texts genome -pset 100
will run experimental tests on the genome corpus generating sets of 100
patterns of fixed length (the default value is 500).
You can use the parameter
-short in order to perform experimental
tests on short patterns. In particular the command
$ ./smart -texts genome -pset 100 -short
performs experimental tests by generating sets of 100 patterns of fixed length
ranging over the values 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 and 32.
If you want to restrict the search to a given length pattern, use the parameter
-plen
and define an upper bound and a lower bound for the lengths of the pattern.
for instance the command
$ ./smart -texts genome -pset 100 -plen 16 128
performs experimental tests over the length values 16, 32, 64, 128.
If you want to test the algorithms for a single length of the pattern (for instance 128) use the command
$ ./smart -texts genome -pset 100 -plen 128 128
During the execution of the tests the tool prints out the running times of
each algorithm (this is the default setting). Use the parameter
-occ for printing out also the number of occurrences
found by the algorithm during the runs. Since patterns are randomly extracted from the text, the number
of occurrences is at least 1.
Finally the parameter
-h will produce an help list.
The smart tool associates to any experimental test a unique alphanumeric code on 13 characters,
beginning with EXP, followed by 10 numbers.
The execution of the following command
$ ./smart -text genome -pset 100 -occ
starts the test on the genome corpus, using sets of 100 patterns patterns.
The first lines of the execution should be the following
Try to process archive genome
Loading the file data/genome/ecoli.txt
Text buffer of dimension 1048576 byte
Starting experimental tests with code EXP1306868286
____________________________________________________________
SMART EXP1306868286
Experimental results on genome
Searching for a set of 100 patterns with length 2
Testing 4 algorithms
- [1/4] BF ..................[OK] 10.34 ms 66513
- [2/4] BNDM ................[OK] 12.68 ms 66513
- [3/4] FS ..................[OK] 9.13 ms 66513
- [4/4] HASH3 ...............[--]
The code associated with the test is EXP1306868286.
At the end of the test experimental results are saved in the direcory
results/EXP1306868286
The last lines of the execution should be the following
____________________________________________________________
SMART EXP1306868286
Experimental results on genome
Searching for a set of 100 patterns with length 4096
Testing 4 algorithms
- [1/4] BF ..................[OK] 10.71 ms 1
- [2/4] BNDM ................[OK] 4.26 ms 1
- [3/4] FS ..................[OK] 4.04 ms 1
- [4/4] HASH3 ...............[OK] 3.13 ms 1
____________________________________________________________
OUTPUT RUNNING TIMES EXP1306868286
Saving data on EXP1306868286/genome.txt
Saving data on EXP1306868286/genome.xml
Saving data on EXP1306868286/genome.html
Writing EXP1306868628/index.html