\documentclass{ijclclp}
% This template is intanted to be used with the XeLaTex compiler for supporting CJK fonts
% Title Information
\title{Improve Parsing Performance by Self-Learning}
\author{Yu-Ming Hsieh\textsuperscript{1},
Duen-Chi Yang\textsuperscript{1}, and
Keh-Jiann Chen\textsuperscript{1}}
\affiliation{Institute of Information Science, Academia Sinica, Taipei, Taiwan}
\email{\texttt{\{morris, ydc, kchen\}@iis.sinica.edu.tw}}
% Document Start
\begin{document}
\maketitle
\thispagestyle{firstpage}
% Abstract Section
\begin{abstract}
There are many methods to improve performance of statistical parsers.
Resolving structural ambiguities is a major task of these methods. In the proposed approach, the parser produces a set of n-best trees based on a feature-extended PCFG grammar and then selects the best tree structure based on association strengths of dependency word-pairs.
This paper aims to provide a self-learning method to resolve these problems. The constructed structure evaluation model improved the bracketed f-score from 83.09\% to 86.59\%.
We believe that the above iterative learning processes can improve parsing performances automatically by learning word-dependence information continuously from web.
\\
\\
% Keywords Section
\textbf{Keywords:}
Parsing, Word Association, Knowledge Extraction, PCFG, PoS Tagging, Semantic
剖析,詞彙關聯,知識萃取
\end{abstract}
% Sections
\section{Introduction}
How to solve structural ambiguity is an important task in building a high-performance statistical parser, particularly for Chinese~\citep{black1991,charniak2005}.
Since Chinese is an analytic language, words can play different grammatical functions without inflection.
A great deal of ambiguous structures would be produced by parsers if no structure evaluation were applied. There are three main steps in our approach that aim to disambiguate the structures. The first step is to have the parser produce n-best structures. Second, we extract word-to-word associations from large corpora and build semantic information. The last step is to build a structural evaluator to find the best tree structure from the $n$-best candidates.
\section{Feature Extension of PCFG Grammars for Producing the N-best Trees}
Treebanks provide not only instances of phrasal structures and word dependencies but also their statistical distributions...
% Subsections
\subsection{Coverage Rates of the Word Associations}
Data sparseness is always a problem of statistical evaluation methods. The five levels of word associations derived from Figure 1 are...
\begin{figure}[h]
\centering
\includegraphics[width=0.8\textwidth]{figure1.pdf}
\caption{WA coverage rate of Level-6.}
\end{figure}
\subsubsection{Title}
From the results shown in Table 5...
% Table Example
\begin{table}[h]
\centering
\begin{tabular}{lllrrr}
\toprule
Testing Data & Sources & Hardness & Rule type-1 & Rule type-2 & Rule type-3\\
\midrule
Sinica & Balanced corpus & Moderate & 92.97 & 94.84 & 96.25 \\
Sinorama & Magazine & Difficult & 90.01 & 91.65 & 93.91\\
Textbook & Elementary school & Easy & 93.65 & 95.64 & 96.81 \\
\bottomrule
\end{tabular}
\caption{The 50-best oracle performances from the different grammars.}
\end{table}
\section*{Acknowledgments}
This research was supported in part by National Science Council under Grant NSC 95-2422-H-001-008- and National Digital Archives Program Grant 95-0210-29-戊-13-09-00-2.
% References
\bibliography{references}
\end{document}