Since
artificial neural networks allow modeling of nonlinear processes, they
have turned into a very popular and useful tool for solving many
problems such as classification, clustering, regression, pattern
recognition, dimension reduction, structured prediction, machine
translation, anomaly detection, decision making, visualization, computer
vision, and others. This wide range of abilities makes it possible to
use artificial neural networks in many areas. In this article, we
discuss applications of artificial neural networks in Natural Language
Processing tasks (NLP).
NLP
includes a wide set of syntax, semantics, discourse, and speech tasks.
We will describe prime tasks in which neural networks demonstrated
state-of-the-art performance.
1. Text Classification and Categorization
Text
classification is an essential part in many applications, such as web
searching, information filtering, language identification, readability
assessment, and sentiment analysis. Neural networks are actively used
for these tasks.
In Convolutional Neural Networks for Sentence Classification by Yoon Kim, a series of experiments with Convolutional Neural Networks (CNN) built on top of word2vec was presented. The suggested model was tested against several benchmarks. In Movie Reviews (MR) and Customer Reviews (CR), the task was to detect positive/negative sentiment. In Stanford Sentiment Treebank (SST-1), there were already more classes to predict: very positive, positive, neutral, negative, very negative. In Subjectivity data set (Subj), sentences were classified into two types, subjective or objective. In TREC
the goal was to classify a question into six question types (whether
the question is about person, location, numeric information, etc.) The
results of numerous tests described in the paper show that after little
tuning of hyperparameters the model performs excellent suggesting that
the pre-trained vectors are universal feature extractors and can be
utilized for various classification tasks [1].
The article Text Understanding from Scratch by Xiang Zhang and Yann LeCun
shows that it’s possible to apply deep learning to text understanding
from character-level inputs all the way up to abstract text concepts
with help of temporal Convolutional Networks
(ConvNets) (CNN). Here, the authors assert that ConvNets can achieve
excellent performance without the knowledge of words, phrases, sentences
and any other syntactic or semantic structures with regards to a human
language [2]. To prove their assertion several experiments were conducted. The model was tested on the DBpedia ontology classification data set with
14 classes (company, educational institution, artist, athlete, office
holder, mean of transportation, building, natural place, village,
animal, plant, album, film, written work). The results indicate both
good training (99.96%) and testing (98.40 %) accuracy, with some
improvement from thesaurus augmentation. In addition, the sentiment analysis test was performed on the Amazon Review data set.
In this study, the researchers constructed a sentiment polarity data
set with two negative and two positive labels. The result is 97.57%
training accuracy and 95.07% testing accuracy. The model was also tested
on Yahoo! Answers Comprehensive Questions and Answers data set
with 10 classes (Society & Culture, Science & Mathematics,
Health, Education & Reference, Computers & Internet, Sports,
Business & Finance, Entertainment & Music, Family &
Relationships, Politics & Government) and on AG’s corpus
where the task was a news categorization into four categories (World,
Sports, Business, Sci/Tech.). Obtained results confirm that to achieve
good text understanding ConvNets require a large corpus in order to
learn from scratch.
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao introduced recurrent convolutional neural networks for text classification without human-designed features in their document Recurrent Convolutional Neural Networks for Text Classification [3]. The team tested their model using four data sets: 20Newsgroup
(with four categories such as computers, politics, recreation, and
religion), Fudan Set (a Chinese document classification set that
consists of 20 classes, including art, education, and energy), ACL Anthology Network (with five languages: English, Japanese, German, Chinese, and French), and Sentiment Treebank
(with Very Negative, Negative, Neutral, Positive, and Very Positive
labels). After testing, the model was compared to existing text
classification methods like Bag of Words, Bigrams + LR, SVM, LDA, Tree Kernels, RecursiveNN, and CNN.
It turned out that neural network approaches outperform traditional
methods for all four data sets, and the proposed model outperforms CNN
and RecursiveNN.
2. Named Entity Recognition (NER)
The main task of named entity recognition (NER) is to classify named entities, such as Guido van Rossum,
Microsoft, London, etc., into predefined categories like persons,
organizations, locations, time, dates, and so on. Many NER systems were
already created, and the best of them use neural networks.
In the paper, Neural Architectures for Named Entity Recognition,
two models for NER were proposed. The models require character-based
word representations learned from the supervised corpus and unsupervised
word representations learned from unannotated corpora [4]. Numerous tests were carried on using different data sets like CoNLL-2002 and CoNLL-2003
in English, Dutch, German, and Spanish languages. The team concluded
that without a requirement of any language-specific knowledge or
resources, such as gazetteers, their models show state-of-the-art
performance in NER.
3. Part-of-Speech Tagging
Part-of-speech (POS) tagging has many applications including parsing, text-to-speech conversion, information extraction, and so on. In the work, Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network a recurrent neural network with word embedding for part-of-speech (POS) tagging task is presented [5]. The model was tested on the Wall Street Journal data from Penn Treebank III data set and achieved a performance of 97.40% tagging accuracy.
4. Semantic Parsing and Question Answering
Question Answering
systems automatically answer different types of questions asked in
natural languages including definition questions, biographical
questions, multilingual questions, and so on. Neural networks usage
makes it possible to develop high performing question answering systems.
In Semantic Parsing via Staged Query Graph Generation Question Answering with Knowledge Base
Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao described
the developed semantic parsing framework for question answering using a
knowledge base. Authors say their method uses the knowledge base at an
early stage to prune the search space and thus simplifies the semantic
matching problem [6]. It also applies an advanced entity linking system and a deep convolutional neural network model that matches questions and predicate sequences. The model was tested on WebQuestions data set, and it outperforms previous methods substantially.
5. Paraphrase Detection
Paraphrase
detection determines whether two sentences have the same meaning. This
task is especially important for question answering systems since there
are many ways to ask the same question.
Detecting Semantically Equivalent Questions in Online User Forums suggests a method for identifying semantically equivalent questions based on a convolutional neural network. The experiments are performed using the Ask Ubuntu Community Questions and Answers (Q&A) site and Meta Stack Exchange data. It was shown that the proposed CNN
model achieves high accuracy especially when the words embedded are
pre-trained on in-domain data. The authors compared their model’s
performance with Support Vector Machines and a duplicate detection approach. They demonstrated that their CNN model outperforms the baselines by a large margin [7].
In the study, Paraphrase Detection Using Recursive Autoencoder, a novel recursive autoencoder architecture is presented. It learns phrasal representations using recursive neural networks.
These representations are vectors in an n-dimensional semantic space
where phrases with similar meanings are close to each other [8]. For evaluating the system, the Microsoft Research Paraphrase Corpus and English Gigaword Corpus were used. The model was compared to three baselines, and it outperforms them all.
6. Language Generation and Multi-document Summarization
Natural
language generation has many applications such as automated writing of
reports, generating texts based on analysis of retail sales data,
summarizing electronic medical records, producing textual weather
forecasts from weather data, and even producing jokes.
In a recent paper, Natural Language Generation, Paraphrasing and Summarization of User Reviews with Recurrent Neural Networks, researchers describe a recurrent neural network
(RNN) model capable of generating novel sentences and document
summaries. The paper described and evaluated a database of 820,000
consumer reviews in the Russian language. The design of the network
permits users control of the meaning of generated sentences. By choosing
sentence-level features vector, it is possible to instruct the network;
for example, “Say something good about a screen and sound quality in
about ten words” [9].
The ability of language generation allows production of abstractive
summaries of multiple user reviews that often have reasonable quality.
Usually, the summary report makes it possible for users to quickly
obtain the information contained in a large cluster of documents.
7. Machine Translation
Machine
translation software is used around the world despite its limitations.
In some domains, the quality of translation is not good. To improve the
results researchers try different techniques and models, including the
neural network approach. The purpose of Neural-based Machine Translation for Medical Text Domain
study is to inspect the effects of different training methods on a
Polish-English machine translation system used for medical data. To
train neural and statistical network-based translation systems The European Medicines Agency parallel text corpus
was used. It was demonstrated that a neural network requires fewer
resources for training and maintenance. In addition, a neural network
often substituted words with other words occurring in a similar context [10].
8. Speech Recognition
Speech
recognition has many applications, such as home automation, mobile
telephony, virtual assistance, hands-free computing, video games, and so
on. Neutral networks are widely used in this area.
In Convolutional Neural Networks for Speech Recognition, scientists explain how to apply CNNs to speech recognition in a novel way, such that the CNN’s structure directly accommodates some types of speech variability like varying speaking rate [11]. TIMIT phone recognition and a large-vocabulary voice search tasks were used.
9. Character Recognition
Character
Recognition systems also have numerous applications like receipt
character recognition, invoice character recognition, check character
recognition, legal billing document character recognition, and so on.
The article Character Recognition Using Neural Network presents a method for the recognition of handwritten characters with 85% accuracy [12].
10. Spell Checking
Most
text editors let users check if their text contains spelling mistakes.
Neural networks are now incorporated into many spell-checking tools.
In Personalized Spell Checking using Neural Networks
a new system for detecting misspelled words was proposed. This system
is trained on observations of the specific corrections that a typist
makes [13]. It outwits many of the shortcomings that traditional spell-checking methods have.
Summary
In
this article, we described Natural Language Processing problems that
can be solved using neural networks. As we showed, neural networks have
many applications such as text classification, information extraction,
semantic parsing, question answering, paraphrase detection, language
generation, multi-document summarization, machine translation, and
speech and character recognition. In many cases, neural networks methods
outperform other methods.
Resources:
No comments:
Post a Comment