Deep Learning for Speech and Language
Winter Seminar UPC TelecomBCN (January 24-31, 2017)
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlaytics tools.
Course Instructors
Antonio Bonafonte (AF) | Jose Adrián Rodríguez Fonollosa (JA) | Marta Ruiz Costa-jussà (MR) | Javier Hernando (JH) | Santiago Pascual (SP) | Elisa Sayrol (ES) | Xavier Giro-i-Nieto (XG) |
Teaching assistants
Ahmad Mel | Carlos Escolano | Albert Aparicio | Manel Baradad |
Organizers
UPC ETSETB TelecomBCN | UPC TALP Group | UPC Image Processing Group | Universitat Politecnica de Catalunya (UPC) |
Sponsorship
Social event funded by the Tradeheader. | |
Computational resources have been provided by the Amazon Web Services Educate Program. | |
Code repos and website provided by the Github Education Program. |
If your company or organization is interested in sponsoring this course, please contact Professor Antonio Bonafonte.
Slides and Videos
Topic | Speaker | Slideshare | YouTube |
---|---|---|---|
D1L2 The Perceptron | SP | Slides | Video |
D1L3 Convolutional Neural Networks | ES | Slides | |
D1L4 Basic Deep Architectures | XG | Slides | Video |
D1L5 Backpropagation | ES | Slides | Video |
D1L6 Training | ES | Slides | Video |
D2L1 Deep Belief Networks | ES | Slides | Video |
D2L2 Recurrent Neural Networks I | SP | Slides | Video |
D2L3 Recurrent Neural Networks II | SP | Slides | Video |
D2L4 Word Embeddings | AB | Slides | Video |
D2L5 Generative Adversarial Networks | SP | Slides | Video |
D2L6 Advanced Deep Architectures | XG | Slides | Video |
D3L1 Language Model | MR | Slides | Video |
D3L2 Speech Recognition I | AR | Slides | Video |
D3L3 Speaker Identification I | JH | Slides | Video |
D3L4 Neural Machine Translation I | MR | Slides | Video |
D3L5 Speech Synthesis I | AB | Slides | Video |
D3L6 Speech Recognition II | JA | Slides | Video |
D4L1 Speaker Identification II | JH | Slides | Video |
D4L2 Neural Machine Translation II | MR | Slides | Video |
D4L3 Speech Synthesis II:WaveNet | AB | Slides | Video |
D4L4 Multimodal Deep Learning | XG | Slides | Video |
D5 Music Data Processing | JP | Slides | Video |
Invited talks
This 2017 edition of the seminar will include two invited talks
Joan Serrà, from Telefonica Research.
Title: Facts and myths about deep learning.
Abstract: Deep learning has revolutionized the traditional machine learning pipeline, with impressive results in domains such as computer vision, speech analysis, or natural language processing. The concept has gone beyond research/application environments, and permeated into the mass media, news blogs, job offers, startup investors, or big company executives’ meetings. But what is behind deep learning? Why has it become so mainstream? What can we expect from it? In this talk, I will highlight a number of facts and myths that will provide a shallow answer to the previous questions. While doing that, I will also highlight various applications we have worked on at our lab. Overall, the talk wants to place a series of basic concepts, while giving ground for reflection or discussion on the topic.
Jordi Pons from the Music Technology Group of the Universitat Pompeu Fabra (UPF)
Title: Deep learning for Music Informatics Research
Abstract: A brief review of the state-of-the-art in music informatics research and deep learning reveals that such models achieved competitive results for several tasks in a relatively short amount of time. Due to these promising results, some researchers declare that is the time for a paradigm shift: from hand-crafted features and shallow classifiers to deep processing models. In the past, introducing machine learning for global modeling (ie. classification) resulted in a significant state-of-the-art advance. And now, some researchers think that another advance could be done by using data-driven feature extractors based on deep learning instead of using hand-crafted features. However, deep learning for music informatics research is still in its early ages - current systems are based on solutions proposed for computer vision or speech. We will present our work describing how to adapt these technologies for the music case.
Student Projects
Master and bachelor student developed during the week of the course a practical project. Summary slides and source code are publicly available.
Master Students
Team | Project | Web | Slides | Repo |
---|---|---|---|---|
Team 1 | Sentiment analysis of Movie Reviews | Slides | Repo | |
Team 2 | Smart text | Web | Slides | Repo |
Team 3 | Sentiment analysis for IMDB database | Slides | Repo | |
Team 4 (award) | Text to Phonemes | Slides | Repo | |
Team 5 | Phonetic Transcription | Slides |
Bachelor Students
Team | Slides | Repo |
---|---|---|
Team 1 | Slides | Repo |
Team 2 | Slides | Repo |
Team 3 | Slides | Repo |
Team 4 | Slides | Repo |
Team 5 (award) | Slides | Repo |
Pics
Photo album available from Google Photos.
Schedule
When | Tuesday 24 | Wedneday 25 | Thursday 26 | Friday 27 | Tuesday 31 |
---|---|---|---|---|---|
10:00-10:20 | Welcome | DNN/DBN | LM | SpeakerId II | Project Expo 1 |
10:20-10:40 | Perceptron | Recurrent I | ASR | Translation II | Project Expo 2 |
10:40-11:00 | Convolutional | Recurrent II | Project Expo 3 | ||
11:00-11:20 | Architectures I | Embeddings | SpeakerID | Joan Serrà | Project Expo 4 |
11:20-11:40 | Backpropagation | Keras | Translation I | Project Expo 5 | |
11:40-12:00 | Training | Keras | |||
12:00-12:20 | Keras | Keras | Synthesis I | Synthesis II | |
12:20-12:40 | Keras | Generative | ASR II | Multimodal | Jordi Pons |
12:40-13:00 | Keras | Architectures II | |||
13:00-14:00 | Project (MSc) | Project (MSc) | Project (MSc) | Project (MSc) | Closing |
Practical
- Course on Piazza.
- Course code: 230362 (Phd & master) / 230325 (Bachelor)
- ECTS credits: 2.5 (Phd & master) / 2 (bachelor) (corresponds to full-time dedication during the week course)
- Teaching language: English
- The course is offered for both master and bachelor students, but under two study programmes adapted to each profile.
- Class Dates: 24, 25, 26, 27 and 31 January 2017 (there are no sessions on January 30).
- Class Schedule: 4 hours a day (you will need 6 extra hours a day for homework during the week course). From 10am until 2pm.
- Capacity: 15 MSc/Phd students + 15 BSc students
- Location: Campus Nord UPC, Module D5, Room 010
Contact
If you have any general question about the course, please use the public issues section of this repo. Otherwise, you can send an e-mail to Xavier Giro-i-Nieto.
Our Computer Vision Seminar
If you liked this seminar, you may want to check the Deep Learning for Computer Vision seminar we organised in 2016 on computer vision, as well as enrol in the new one we are organizing in 2017.
Related courses
- Phil Blunsom et al, “Oxford Deep NLP 2017 course”. Oxford University 2017. [videos]
- Richard Socher, “CS224d: Deep Learning for Natural Language Processing”. Stanford University 2016.
- Thang Luong, Kyunghyun Cho, Christopher Manning “Neural Machine Translation”. Tutorial ACL 2016.
- Aaron Courville and Yoshua Bengio, “Deep Learning Summer School”. Montreal 2016.
- Hugo Larochelle, “Neural Networks”. Université de Sheerbroke.
- Joan Bruna, “Stats212b: Topics on Deep Learning”. Berkeley University. Spring 2016.
- Yann LeCun, “Deep Learning: Nine Lectures at Collège de France”. Collège de France, Spring 2016. [Facebook page]
- Dhruv Batra, “ECE 6504: Deep learning for perception”. Virginia Tech, Fall 2015.
- Vincent Vanhoucke, Arpan Chakraborty, “Deep Learning”. Google 2016.
- Jeremy Howard, “Practical Deep Learning for Coders”. Fast AI 2016.
- Deep Learning TV on YouTube, Facebook and Twitter.