How to use "Persian Pre-processor: PrePer" ?

This article is about how to use "Persian Pre-processor: PrePer"

PrePer is text normalizer for Persian text.
Outline of PrePer is in http://stp.lingfil.uu.se/~mojgan/preper.html

Before using PrePer, a little ( or more) preparation is needed.
So, I explain in my example.

                                                                                                                  • -

Step 1:Install Ruby( If you can use ruby already. You don't have to do this)

PrePer is written in Ruby,so I needed to install Ruby.

Because my OS is Ubuntu, Ruby is available via apt-get.

Command for installing Ruby via apt-get is following:

$ sudo apt-get -yV install ruby-full

$ sudo apt-get -yV install ruby1.9.1-full

$ sudo apt-get -yV install rubygems

Now, you can use gem.

$ sudo gem install rubygems-update

$ sudo gem1.9.1 install rubygems-update

$ sudo /var/lib/gems/1.8/bin/update_rubygems
$ sudo /var/lib/gems/1.9.1/bin/update_rubygems
(Error message may be showed,but it's no problem)


$ sudo gem update
$ sudo gem1.9.1 update


Now you can use Ruby.



Step 2: Install Virastar

To use PrePer, library named virastar is needed.

Mojgan Seraji doesn't say that the need of Virastar, but I'm surely that Virastar is needed.
( Explain of "Virastar" is here, https://github.com/aziz/virastar )

To install Virastar, command is following:

$ gem install virastar



Now, you can also use "Virastar"

Step 3: Execute PrePer

Preper is just Ruby code, so it's normal way to execute ruby script.
Argument 1 is "Document you want to process"

So, command is following:

$ ruby pre_per2.rb (documentname)

( ) is not need in practice.Just for explanation.


For example, I extract sample sentence from Hamshahri news site.
And save to hamshahri1 .

So, command is

$ ruby pre_per2.rb hamshahri1



Result is

                                                                                                    • -

< نماینده پرتاب نیزه کشورمان در پرتاب نیزه F42 پارالمپیک 2012 به مدال نقره دست یافت.


> نماینده پرتاب نیزه کشورمان در پرتاب نیزه F۴۲ پارالمپیک ۲۰۱۲ به مدال نقره دست یافت.

                                                                                                    • -

F42 and 2012 is re-written into Persian writing way by Preper.