Deep Learning for Speech and Language

Winter Seminar UPC TelecomBCN (January 24-31, 2017)

The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlaytics tools.

Course Instructors

Antonio Bonafonte Jose Adrián Rodríguez Fonollosa Marta R. Costa-jussà Javier Hernando Santiago Pascual Elisa Sayrol Xavier Giro-i-Nieto
Antonio Bonafonte (AF) Jose Adrián Rodríguez Fonollosa (JA) Marta Ruiz Costa-jussà (MR) Javier Hernando (JH) Santiago Pascual (SP) Elisa Sayrol (ES) Xavier Giro-i-Nieto (XG)

Teaching assistants

Ahmad Mel Carlos Escolano AlbertAparicio Manel Baradad
Ahmad Mel Carlos Escolano Albert Aparicio Manel Baradad


logo-etsetb logo-talp logo-gpi logo-upc
UPC ETSETB TelecomBCN UPC TALP Group UPC Image Processing Group Universitat Politecnica de Catalunya (UPC)


Social event funded by the Tradeheader. logo-tradeheader
Computational resources have been provided by the Amazon Web Services Educate Program. logo-aws
Code repos and website provided by the Github Education Program. logo-github

If your company or organization is interested in sponsoring this course, please contact Professor Antonio Bonafonte.

Slides and Videos

Topic Speaker Slideshare YouTube
D1L2 The Perceptron SP Slides Video
D1L3 Convolutional Neural Networks ES Slides  
D1L4 Basic Deep Architectures XG Slides Video
D1L5 Backpropagation ES Slides Video
D1L6 Training ES Slides Video
D2L1 Deep Belief Networks ES Slides Video
D2L2 Recurrent Neural Networks I SP Slides Video
D2L3 Recurrent Neural Networks II SP Slides Video
D2L4 Word Embeddings AB Slides Video
D2L5 Generative Adversarial Networks SP Slides Video
D2L6 Advanced Deep Architectures XG Slides Video
D3L1 Language Model MR Slides Video
D3L2 Speech Recognition I AR Slides Video
D3L3 Speaker Identification I JH Slides Video
D3L4 Neural Machine Translation I MR Slides Video
D3L5 Speech Synthesis I AB Slides Video
D3L6 Speech Recognition II JA Slides Video
D4L1 Speaker Identification II JH Slides Video
D4L2 Neural Machine Translation II MR Slides Video
D4L3 Speech Synthesis II:WaveNet AB Slides Video
D4L4 Multimodal Deep Learning XG Slides Video
D5 Music Data Processing JP Slides Video

Invited talks

This 2017 edition of the seminar will include two invited talks

Joan Serrà, from Telefonica Research.

Title: Facts and myths about deep learning.

Abstract: Deep learning has revolutionized the traditional machine learning pipeline, with impressive results in domains such as computer vision, speech analysis, or natural language processing. The concept has gone beyond research/application environments, and permeated into the mass media, news blogs, job offers, startup investors, or big company executives’ meetings. But what is behind deep learning? Why has it become so mainstream? What can we expect from it? In this talk, I will highlight a number of facts and myths that will provide a shallow answer to the previous questions. While doing that, I will also highlight various applications we have worked on at our lab. Overall, the talk wants to place a series of basic concepts, while giving ground for reflection or discussion on the topic.

Jordi Pons from the Music Technology Group of the Universitat Pompeu Fabra (UPF)

Title: Deep learning for Music Informatics Research

Abstract: A brief review of the state-of-the-art in music informatics research and deep learning reveals that such models achieved competitive results for several tasks in a relatively short amount of time. Due to these promising results, some researchers declare that is the time for a paradigm shift: from hand-crafted features and shallow classifiers to deep processing models. In the past, introducing machine learning for global modeling (ie. classification) resulted in a significant state-of-the-art advance. And now, some researchers think that another advance could be done by using data-driven feature extractors based on deep learning instead of using hand-crafted features. However, deep learning for music informatics research is still in its early ages - current systems are based on solutions proposed for computer vision or speech. We will present our work describing how to adapt these technologies for the music case.


Student Projects

Master and bachelor student developed during the week of the course a practical project. Summary slides and source code are publicly available.

Master Students

Team Project Web Slides Repo
Team 1 Sentiment analysis of Movie Reviews   Slides Repo
Team 2 Smart text Web Slides Repo
Team 3 Sentiment analysis for IMDB database   Slides Repo
Team 4 (award) Text to Phonemes   Slides Repo
Team 5 Phonetic Transcription   Slides  

Bachelor Students

Team Slides Repo
Team 1 Slides Repo
Team 2 Slides Repo
Team 3 Slides Repo
Team 4 Slides Repo
Team 5 (award) Slides Repo


Photo album available from Google Photos.


When Tuesday 24 Wedneday 25 Thursday 26 Friday 27 Tuesday 31
10:00-10:20 Welcome DNN/DBN LM SpeakerId II Project Expo 1
10:20-10:40 Perceptron Recurrent I ASR Translation II Project Expo 2
10:40-11:00 Convolutional Recurrent II     Project Expo 3
11:00-11:20 Architectures I       Embeddings       SpeakerID         Joan Serrà     Project Expo 4          
11:20-11:40 Backpropagation Keras Translation I   Project Expo 5
11:40-12:00 Training Keras      
12:00-12:20 Keras Keras Synthesis I Synthesis II  
12:20-12:40 Keras Generative ASR II Multimodal Jordi Pons
12:40-13:00 Keras Architectures II      
13:00-14:00 Project (MSc) Project (MSc) Project (MSc) Project (MSc) Closing


  • Course on Piazza.
  • Course code: 230362 (Phd & master) / 230325 (Bachelor)
  • ECTS credits: 2.5 (Phd & master) / 2 (bachelor) (corresponds to full-time dedication during the week course)
  • Teaching language: English
  • The course is offered for both master and bachelor students, but under two study programmes adapted to each profile.
  • Class Dates: 24, 25, 26, 27 and 31 January 2017 (there are no sessions on January 30).
  • Class Schedule: 4 hours a day (you will need 6 extra hours a day for homework during the week course). From 10am until 2pm.
  • Capacity: 15 MSc/Phd students + 15 BSc students
  • Location: Campus Nord UPC, Module D5, Room 010


If you have any general question about the course, please use the public issues section of this repo. Otherwise, you can send an e-mail to Xavier Giro-i-Nieto.

Our Computer Vision Seminar

If you liked this seminar, you may want to check the Deep Learning for Computer Vision seminar we organised in 2016 on computer vision, as well as enrol in the new one we are organizing in 2017.