Center for Language Engineering

 
 


 

 

KICS
KICS-UET


 
 

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

 
   
  Urdu Word Segmentation System  
     
 

Word segmentation is the process of determining word boundaries in the given text. Urdu text is composed of ligatures having no defined word boundaries while writing. Word segmentation system converts the sequence of ligatures into the best sequence of words. The system takes sequence of ligatures and outputs space separated sequence of words with 97.9% accuracy. The system is statistically trained using one million words corpus.

Against the ان پٹ , you have to give the Urdu ligature strings delimited by space after pressing the  button “تخصیص کریں, system will process these ligatures and will output the best sequence of words in “آؤٹ پٹ” text field.

 
 
  ان پٹ
  آؤٹ پٹ
 
 
     
  (This file has been accessed: times, since 14 November 2011)  

webmaster@cle.org.pk