Deep Learning for Semantic Composition

Xiaodan Zhu
National Research Council Canada
zhu2048@gmail.com

Edward Grefenstette
DeepMind
etg@google.com


1. Tutorial Description

Learning representations to model the meaning of text has been a core problem in 
natural language understanding (NLP). The last several years have seen extensive 
interests on distributional approaches, in which text spans of different 
granularities are encoded as continuous vectors. If properly learned, such 
representations have been shown to help achieve the state-of-the-art 
performances on a variety of NLP problems.

In this tutorial, we will cover the fundamentals and selected research topics on 
neural network-based modeling for semantic composition, which aims to learn 
distributed representations for larger spans of text, e.g., phrases (Yin and 
Schutze, 2014) and sentences (Zhu et al., 2016; Chen et al., 2016; Zhu et al., 
2015b,a; Tai et al., 2015; Kalchbrenner et al., 2014; Irsoy and Cardie, 2014; 
Socher et al., 2012), from the meaning representations of their parts, e.g., 
word embedding.

We begin by briefly introducing traditional approaches to semantic composition, 
including logic-based formal semantic approaches and simple arithmetic 
operations over vectors based on corpus word counts (Mitchell and Lapata, 2008; 
Landauer and Dumais, 1997). 

Our main focus, however, will be on distributed representation-based modeling, 
whereby the representations of words and the operations composing them are 
jointly learned from a training objective. We cover the generic ideas behind 
neural network-based semantic composition and dive into the details of three 
typical composition architectures: the convolutional composition models 
(Kalchbrenner et al., 2014; Zhang et al., 2015), recurrent composition models 
(Zhu et al., 2016), and recursive composition models (Irsoy and Cardie, 2014; 
Socher et al., 2012; Zhu et al., 2015b; Tai et al., 2015). After that, we will 
discuss several unsupervised approaches (Le and Mikolov, 2014; Kiros et al., 
2014; Bowman et al., 2016; Miao et al., 2016).

We will then advance to discuss several selected topics. We first cover the 
models that consider compositional with non-compositional (e.g., holistically 
learned) semantics (Zhu et al., 2016, 2015a). Next, we discuss composition 
models that integrate multiple architectures of neural networks. We also discuss 
semantic composition and decomposition (Turney, 2014). In the end
we briefly discuss sub-word neural-network-based composition models (Zhang et 
al., 2015; Sennrich et al., 2016).

We will then summarize the tutorial, flesh out limitations of current 
approaches, and discuss future directions that are interesting to us.


2. Tutorial Outline

Introduction
    - Definition of semantic composition
    - Conventional and basic approaches
        * Formal semantics
        * Bag of words with learned representations (additive, learned 
          projection)

Parametrising Composition Functions
    - Convolutional composition models
    - Recurrent composition models
    - Recursive composition models
        * TreeRNN/TreeLSTM
        * SPINN and RL-SPINN
    - Unsupervised models
        * Skip-thought vectors and paragraph vectors 
        * Variational auto-encoders for text

Selected Topics
    - Incorporating compositional and non-compositional (e.g., holistically 
      learned) semantics
    - Integrating multiple composition architectures
    - Semantic composition and decomposition
    - Sub-word composition models

Summary


3. Speakers/Instructors

Xiaodan Zhu, Researcher, National Research Council Canada.
zhu2048@gmail.com
http://www.xiaodanzhu.com
Xiaodan Zhu is a Research Officer at the National Research Council Canada. His 
research interests are in Natural Language Processing and Machine Learning. His 
recent work has focused on deep learning, semantic composition, sentiment 
analysis, and natural language inference. Xiaodan has taught a tutorial at EMNLP 
'14.


Edward Grefenstette, Senior Research Scientist, DeepMind.
etg@google.com
http://www.egrefen.com
Edward Grefenstette is a Senior Research Scientist at DeepMind. His research 
covers the intersection of Machine Learning, Computer Reasoning, and Natural 
Language Understanding. Recent publications cover the topics of neural 
computation, representation learning at the sentence level, recognising textual 
entailment, and machine reading.


4. References

Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. 
Manning, and Christopher Potts. 2016. A fast unified model for parsing and 
sentence understanding. In ACL.

Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016. Enhancing and 
combining sequential and tree lstm for natural language inference. In 
arXiv:1609.06038v1.

Ozan Irsoy and Claire Cardie. 2014. Deep recursive neural networks for 
compositionality in language. In NIPS.

Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional 
neural network for modelling sentences. In ACL.

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard Zemel, Antonio Torralba, 
Raquel Urtasun, and Sanja Fidler. 2014. Skip-thought vectors. In 
arXiv:1506.06726.

Thomas K Landauer and Susan T. Dumais. 1997. A solution to platos problem: The 
latent semantic analysis theory of acquisition, induction, and representation of 
knowledge. Psychological Review 104(2):211–240.

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and 
documents. In ICML.

Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for 
text processing. In ICML.

Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic 
composition. In ACL.

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine 
translation of rare words with subword units. In ACL.

Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. 
Semantic compositionality through recursive matrix-vector spaces. In EMNLP.

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved 
Semantic Representations From Tree-Structured Long Short-Term Memory Networks. 
In ACL.

Peter Turney. 2014. Semantic composition and decomposition: From recognition to 
generation. In arXiv:1405.7908.

Wenpeng Yin and Hinrich Sch¨utze. 2014. An exploration of embeddings for 
generalized phrases. In ACL 2014 Student Research Workshop.

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional 
networks for text classification. In NIPS.

Xiaodan Zhu, Hongyu Guo, and Parinaz Sobhani. 2015a. Neural networks for 
integrating compositional and non-compositional sentiment in sentiment 
composition. In *SEM.

Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo. 2015b. Long short-term memory over 
recursive structures. In ICML.

Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo. 2016. Dag-structured long 
short-term memory for semantic compositionality. In NAACL.