Inicio

Semestre 2021-10

Nombre del curso: Procesamiento Lenguaje Natural
Course Name:
Créditos: 4
Profesor: Ruben Manrique

Description

Natural Language Processing is a discipline of Artificial Intelligence that deals with the formulation and investigation of computational mechanisms for communication between people and machines through the use of Natural Languages. The main objective of the course is to develop a deep understanding of the algorithms available for processing linguistic information and the underlying computational properties of natural languages.

Syllabus

  • Introduction
  • Languages and Grammars
    • Automata: Finite-State Automata, Regular Languages and FSA.
    • Morphology and Transducers: Inflectional and derivational morphology, finite state morphological parsing, Combining FST Lexicon and rules. Lexicon free FST: Porter Stemmer.
    • Part-of-Speech Tagging: Rule Based part of speech Tagging, Lookup Based Tagging, Hidden Markov Models and Viterbi Algorithm.
    • Grammars: Context Free rules and Trees, Finite state context free grammars.
    • Parsing with context free grammars: Top down Parser, the early Algorithm, Finite state parsing method.
    • Probabilistic context free grammars.
  • Language modelling and Vector Space Representations using Machine Learning and Probabilistic Models
    • N-gram language models
    • Neural Language Models
      • RNN: Recurrent Neural Networks
      • Tagging using LSTM (Long short-term memory architecture).
    • The concept behind distributional semantics
    • Embeddings models.
      • Word2vec and doc2vec.
        • CBOW Model
        • Skip-Gram model
      • Topic Modelling
        • Latent semantic analysis (LSA)
        • Latent dirichlet allocation (LDA)
  • Sequence models for machine translation and summarization systems
    • Machine translation problem
    • Encoder-decoder architectures
      • The Attention mechanism
      • Sequence to Sequence Learning with Neural Networks
    • Summarization with pointer-generator networks
    • Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin download via (https://web.stanford.edu/~jurafsky/slp3/)
    • Deep Learning in Natural Language Processing by Li Deng, Yang Liu
    • Neural Network Methods in Natural Language Processing. Yoav Goldberg , Graeme Hirst.
    • Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. Delip Rao y Brian McMahan.
    • Pattern Recognition and Machine Learning (Information Science and Statistics). Christopher M. Bishop
    • Machine Learning: A Probabilistic Perspective. Kevin P. Murphy.
    • Peter F Brown, et al.: Class-Based n-gram Models of Natural Language, 1992.
    • Tomas Mikolov, et al.: Efficient Estimation of Word Representations in Vector Space, 2013.
    • Tomas Mikolov, et al.: Distributed Representations of Words and Phrases and their Compositionality, NIPS 2013.
    • Quoc V. Le and Tomas Mikolov: Distributed Representations of Sentences and Documents, 2014.
    • Jeffrey Pennington, et al.: GloVe: Global Vectors for Word Representation, 2014.
    • Ryan Kiros, et al.: Skip-Thought Vectors, 2015.
    • Piotr Bojanowski, et al.: Enriching Word Vectors with Subword Information, 2017.
    • Thomas Hofmann: Probabilistic Latent Semantic Indexing, SIGIR 1999.
    • David Blei, Andrew Y. Ng, and Michael I. Jordan: Latent Dirichlet Allocation, J. Machine Learning Research, 2003.
    • Yoon Kim: Convolutional Neural Networks for Sentence Classification, 2014.
    • Christopher Olah: Understanding LSTM Networks, 2015.
    • Matthew E. Peters, et al.: Deep contextualized word representations, 2018.