nlp_abstracts_100.txt

Motivated by human cognitive process, we design a two-pass decoder (Deliberation Decoder) to improve context coherence and knowledge correctness.
Recent researches has achieved impressive results in single-turn dialogue modelling.
We experiment with 10 different types of perturbations on 4 multi-turn dialog data and find that commonly used neural dialog architectures like recurrent and transformer-based seq2seq models are rarely sensitive to most perturbations such as missing or reordering utterances, shuffling words, etc.
Empirical results show that our approach can superbly improve the diversity and relevance of the responses generated by all base models, backed by objective measurements and human evaluation.
Response selection plays important role in fully automated dialogue systems.
By utilizing the prior-knowledge of the logical form structures, we propose a novel reward signal at the surface and semantic levels which tends to generate complete and reasonable logical forms. 
We investigate the power of mechanisms for compositional semantic parsing to describe relations between sentences and semantic representations.
Our experimental results outperform all previously reported SMATCH scorings, on both AMR 2.0 (76.3% on LDC2017T10) and AMR 1.0 (70.2% on LDC2014T12).
Structured information in regards to entities is critical for many semantic parsing tasks.
The methods that have been introduced are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%.
In this paper we aim to answer question: How much structural context are the BiLSTM representations able to capture implicitly? 
We conducting extensive parsing experiments with detailed discussion; on top of existing benchmark datasets on (1) biomedical texts and (2) question sentences, we create experimental datasets of (3) speech conversation and (4) math problems. 
Yet, the relations between all tags are provided in a tag hierarchy, covering the test tags as a combination of all the many training tags. 
This setting produces the problem of poor transfer, particularly from distant languages. 
Without sufficient contexts, rare word embedding are usually less reliable than those of common words. However, current models typically trust all word embeddings equally regardless of their reliability and thus may introduce noise and hurt the performance. Since names often contain rare and uncommon words, this problem is particularly critical for name tagging.
Specifically, in the best case, LTR achieves an improvement of 5.58 BLEU points over conventional direct unsupervised method.
We propose a technique to quantitatively estimate this assumption of the isometry between two embedding locations and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant. 
In this paper, we identify and address several deficiencies of existing not supervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure.
Thus, this paper investigates the effectiveness of several possible configurations of applying the adversarial perturbation and reveals that the adversarial regularization technique can significantly and consistently improve the performance of widely used NMT models, such as LSTM-based and Transformer-based models.
It has been shown that the performing of neural machine translation (NMT) drops starkly in low-resource conditions, underperforming phrase-based statistical machine translation (PBSMT) and requiring large amounts of auxiliary data to achieve competitive results. 
We adapt sequentially across two Spanish-English and three English-German tasks, comparing not regularized fine-tuning, L2 and Elastic Weight Consolidation. 
We collect high-quality training data by distant supervision with co-reference resolving and paraphrase detection. 
Existing approaches employing rule based hard-pruning strategies for selecting relevant partial dependency structures may not always give the most optimal results. 
Our experiments on 4.7 million tweets collected during the Sandy Hurricane in 2012 show that spatial and temporal aggregating allows rapid discovery of relevant spatial and temporal topics during that period. 
Despite of their successful performances, existing bilinear forms look past the modeling of relation compositions, resulting in lacks of interpretability for reasoning on KG. 
We find that overall, a combination of BERT, BPEmb, and character representations works best across all of the languages and tasks.
Using the world knowledge to inform a model, and yet retain the ability to perform end-to-end training remains an open question.
Not all types of supervision signals are created equal: Different types of feedback have different costs and effects on the learning. 
To this end, we propose Tree Transformer, a model that captures phrase level syntax for constituency trees as well as word-level dependency for dependency trees by doing recursive traversal only with attention.
We analyze generalizing on English and Chinese corpora, and in the process obtain state-of-the-art parsing results for the Brown, Genia, and English Web treebanks.
We propose a novel self-attention mechanism that can learn the most optimal attention span.
In this paper, we propose neural news recommendation approach which can learn both long- and short-term user representations
In this research, we automatically create sentiment dictionaries for predicting financial outcomes. We compare three approaches: (i) manual adaptation of the domain-general dictionary H4N, (ii) automatic adaptation of H4N and (iii) a combination consisting of first manual, then automatic adaptation. 
We evaluate our approach in corpus-based experiments and in a user study with 60 people participating. We find that both strategies are able to generate C-tests with the desired difficulty level.
Text classification aims at mapping documents into a set of categories which have been redefined.
Our methods can easily be transferred outside of the clinical domain by using the domain-appropriate resources to provide effective neural text simplification for any domain without the need for costly annotation.
There are anecdotal evidences that CEO’s vocal features, such as emotions and voice tones, can reveal firm’s performance.
Motivated by infamous cheating scandals in the media industry, the wine industry, and political campaigns, we address the problem of detecting the hidden information information in technical settings.
This paper develops a general framework for estimating the trustworthiness of info sources in an environment where multiple sources provide claims and supporting evidence, and each claim can potentially be produced by multiple sources. 
We achieve high performance in terms of transfer accuracy, content preservation, and language fluency, in comparison to various previous ways people have done things.
In this paper, we address these serious limitation of existing approaches and improve strong neural encoder-decoder models by appropriately modeling wider contexts. 
We create the first dataset for this task and find that email subject line generation favor extremely abstracted summary which differentiates it from news headline generation or news single document summarization. 
We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing does way better than alignment models on a synthetic task as well as a manual testset.
In this paper, we propose a neural network-based approach, namely Adversarial Attention Network, to the task of multi-dimensional emotion regression, which automatically rating multiple emotion dimension scores for an input text.
On its basis, global fusion is conducted in the ‘combine’ stage to explore the interconnection across local interactions, via an Attentive Bi-directional Skip-connected LSTM that directly connects distant locality interactions and integrates two levels of attention mechanism.
Our results show that earnings calls are moderately predictive of analysts’ decisions even though these decisions are influenced by a number of other factors including private messages with company executives and market conditions. 
While easier to develop, such an approach does not fully exploit joint information from the two subtasks and does not use all of the possible sources of training information that might be helpful, such as document-level labeled sentiment corpus. 
Propositions are connected iterated to form a graph structure. 
MELD containing about 13,000 utterances from 1,433 dialogues from the TV-series Friends.
However, such formulation suffers from problems such as huge search space and emotional inconsistency. 
We then extend the dynamic routing approach to adaptively couple the semantic capsules with the class capsules under transfer learning framework. 
However, such a mechanism tends to excessively focus on a few frequent words with sentiment polarities, while ignoring the not so common ones. 
For argument classification, we improve the state-of-the-art for the UKP Sentential Argument Mining Corpus by 20.8 percentage and for the IBM Debater - Evidence Sentences dataset by 7.4 percentage points. 
Our experiments show this approach helps constrain the learning process and can get rid of some of the supervision efforts.
Most existing algorithms address them as two separated tasks and solve them one by one, or only perform one task, which can be complicated for real applications.
They have been much less successful in fostering studies of the effect of “user” traits — characteristics and beliefs of the participants — on the debate/argument outcome as type of user information is generally not available. 
Moreover, we feed the two pairs of representations to two factored tensor networks, respectively, to capture both the at sentence interactions and at topic relevance using multi-slice tensors. 
Our experiments on reference game data show that back-to-back pragmatic training produces more accurate utterance interpretation models, especially when data is sparse and language is complex.
We assess the extent to which our framework generalizes to different domains and prediction tasks, and demonstrate its effectiveness not only standard binary evaluation coherence tasks, but also on real-world tasks involving the prediction of varying degrees of coherence, achieving a new state of the art.
Snorkel’s attractive promise to create a huge amount of annotated data from a smaller set of training data by unifying the output of a set of heuristics has yet to be used for computationally difficult tasks, such as that of discourse attachment, in which one must decide where a given discourse unit attaches to other units in a text in order to form a coherent discourse structure. 
Finally, removal studies show the structured attention provides little benefit, sometimes even hurting performance.
Zero-shot learning in Language & Vision is task of correctly labelling (or naming) objects of novel categories.
Specifically, we modify the best-of-the-art higher-order mention ranking approach in Lee et al. (2018) to a reinforced policy gradient model by incorporating the reward associated with a sequence of coreference linking actions. 
We firstly propose a method to automatically extract the implicit discourse relation argument pairs and labels from a dataset of dialogic turns, result in a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse.
Our approach, which also employs BERT embeddings, results in new state-of-the-art results on CoNLL-2012 coreference resolution task, improving average F1 by 3.6%.
Coherence is an important aspect of text quality and is super important for ensuring its readability. 
This allows us to perform empire studies on several classification tasks such as (i) binary discrimination of Moldavian versus Romanian text samples, (ii) intra-dialect multi-class categorization by topic and (iii) cross-dialect multi-class categorization by topic.
Our automatically-generated data consistently lead a supervised WSD model to state-of-the-art performance when compared with other auto and semi-auto methods. 
In this work, we take the first step towards a comprehensive evaluation of CLE models: we thoroughly evaluate both supervised and unsupervised CLE models, for a large number of language pairs, on BLI and three down river tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. 
Three representative SP acquisition methodology based on pseudo-disambiguation are evaluated with SP-10K. 
Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on the most common state-of-the-art evaluation task.
Second, to avoid spurious conclusions, large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. 
Multiple entity in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs.
We carefully study how the design of candidate idioms and the representation of idioms affect how good state-of-the-art models perform.
We propose a task designed to elicit human judgments of token-level topic assignment. 
In this paper, we present a method for identifying markables for coreference annotating that combines high-performance automatic markable detectors with checking with a Game-With-A-Purpose (GWAP) and aggregation using a Bayesian annotation model. 
Our model is composed of an utterance encoder, a slot gate, and a state generator, which are shared across different fields of application. 
Such task groups represent supervised information at the between-task level and can be encoded into the model.
The semantic representations used, however, are often not very well-specified, which places a higher burden on the generation model for sentence planning, and also limits the extent to which generated responses can be controlled in a live system.
For this study, we collect a new Open-ended Dialog <-> KG parallel corpus called OpenDialKG, where each utterance from 15K human-to-human role-playing dialogs is manually annotated with ground-truth reference to corresponding entities and paths from a large-scale KG with 1M+ factoids.
In this paper, we present an approach to incorporate retrieved datapoints as supporting evidence for context-dependent semantic parsing, such as generating source codes conditioned on the class environment. 
Experimental results on two datasets from different domains prove the validity and effectiveness of our model, where it outperforms state-of-the-art baselines by a huge margin.
We propose two probabilistic methods to build models that are more robust to such biases and better transfer between the different datasets. 
To alleviate issue, we propose a graph-based evidence aggregating and reasoning (GEAR) framework which enables information to transfer on a fully-connected evidence graph and then utilizes different aggregators to collect multi-evidence information. 
Due to our candidate selection process based on strong distributional evidence, SherLIiC is much harder than the testbeds that currently exist in our world because distributional evidence is of little utility in the classification of InfCands.
From inter-rater agreement, we find that the task is inherently super difficult.
Our approach examines participants’ turn-by-turn interaction, their linguistic alignment, the feelings expressed by speakers during the conversation, as well as the different topics being discussed. 
For example, prior literature suggests that experience might not convert into consequential changes in counselor behavior. 
Preliminary results indicate that LSTMs trained end-to-end perform best, with a test accuracy of 62.13% and a recall@5 of 89.56%, and demonstrate that we can accelerate response time by several order of magnitude.
Negative medical findings are prevalent in clinical reports, yet telling them apart from positive findings remains a challenging task for in-formation extraction.
The questions come from exams to access specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. 
With the advancement in finding of arguments, we suggest to pay more attention to the challenging task of identifying the more convincing arguments.
We focus on relations between Wikipedia concepts, and show that they differ from well-studied lexical-semantic relations such as hypernym, hyponym and antonym.
Then, we frame the task as a multi-view learning problem to induce semantic information from a multimodal model into our only containing acoustics network using a contrastive loss function.
In this piece of work, we propose a new task: emotion-cause pair extraction (ECPE), which aims to extract the potential pairs of emotions and corresponding causes in a document. 
As far as we have learned, this is the first time that this approach to argument invention is formalized and made explicit in the context of NLP. 
The most important obstacles facing multi-document summarization include excessively redundancy in source descriptions and the looming shortage of training data. 
GOLC increases the probabilities of generating summaries that have high evaluation scores, ROUGE in this paper, within desired length.
In this paper, we seek to better understand how neural extractive summarization systems could do better by using different types of model architectures, transferable knowledge and learning schemas.
We argue that establishing theoretical models of Importance will advance our understanding of task and help to further improve summarization systems.