After processing by PrePer, you can now segment and tokenize Persian document by using SeTPer.
SeTPer uses Uplug framework.
So,for using SeTPer, we have to know usage of Uplug.
(I took a few hours to understand usage of Uplug)
You can download by SourceForge, below site.
now, the latest version is 0.2.0d.
I use this version.
After downloading, open tar by following command.
$ tar -vxf uplug-0.2.0d.tar.gz
change directory to opened.
$ cd uplug
Uplug doesn't need , configure, make , make install.
In other word, Uplug is standalone program.
Detail of program and usage is written in
The main program is "uplug"
First you call this program, and call another module after that.
So, main command of Uplug is
$ ./uplug (other module)
Now I write my example following QUICKSTART.
My environment is below.
OS: Ubuntu 12.04 LST
where Uplug is:~/uplug/
make new directory "myproject" in ~/uplug/
$ mkdir ~/myproject
$ cd myproject/
$ cp ../example/1988sv.txt .
$ cp ../example/1988en.txt .
encode from txt to xml
now in directory ~/uplug/myproject.
$ ../uplug ../systems/pre/basic -ci 'iso-8859-1' -in 1988sv.txt > 1988sv.xml
$ ../uplug ../systems/pre/basic -ci 'iso-8859-1' -in 1988en.txt > 1988en.xml
you can look text in xml format now!
Now you are in directory ~/uplug/myproject
you can look alignment by following command
$ ../tools/readalign 1988sven.xml | less
Other functions can be used in similar way.