Deep Learning for Speech and Language

Winter Seminar UPC TelecomBCN (January 24-31, 2017)

The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlaytics tools.

Course Instructors


Antonio Bonafonte (AF)	Jose Adrián Rodríguez Fonollosa (JA)	Marta Ruiz Costa-jussà (MR)	Javier Hernando (JH)	Santiago Pascual (SP)	Elisa Sayrol (ES)	Xavier Giro-i-Nieto (XG)

Teaching assistants


Ahmad Mel	Carlos Escolano	Albert Aparicio	Manel Baradad

Organizers


UPC ETSETB TelecomBCN	UPC TALP Group	UPC Image Processing Group	Universitat Politecnica de Catalunya (UPC)

Sponsorship


Social event funded by the Tradeheader.
Computational resources have been provided by the Amazon Web Services Educate Program.
Code repos and website provided by the Github Education Program.

If your company or organization is interested in sponsoring this course, please contact Professor Antonio Bonafonte.

Slides and Videos

Topic	Speaker	Slideshare	YouTube
D1L2 The Perceptron	SP	Slides	Video
D1L3 Convolutional Neural Networks	ES	Slides
D1L4 Basic Deep Architectures	XG	Slides	Video
D1L5 Backpropagation	ES	Slides	Video
D1L6 Training	ES	Slides	Video
D2L1 Deep Belief Networks	ES	Slides	Video
D2L2 Recurrent Neural Networks I	SP	Slides	Video
D2L3 Recurrent Neural Networks II	SP	Slides	Video
D2L4 Word Embeddings	AB	Slides	Video
D2L5 Generative Adversarial Networks	SP	Slides	Video
D2L6 Advanced Deep Architectures	XG	Slides	Video
D3L1 Language Model	MR	Slides	Video
D3L2 Speech Recognition I	AR	Slides	Video
D3L3 Speaker Identification I	JH	Slides	Video
D3L4 Neural Machine Translation I	MR	Slides	Video
D3L5 Speech Synthesis I	AB	Slides	Video
D3L6 Speech Recognition II	JA	Slides	Video
D4L1 Speaker Identification II	JH	Slides	Video
D4L2 Neural Machine Translation II	MR	Slides	Video
D4L3 Speech Synthesis II:WaveNet	AB	Slides	Video
D4L4 Multimodal Deep Learning	XG	Slides	Video
D5 Music Data Processing	JP	Slides	Video

Invited talks

This 2017 edition of the seminar will include two invited talks

Joan Serrà, from Telefonica Research.

Title: Facts and myths about deep learning.

Abstract: Deep learning has revolutionized the traditional machine learning pipeline, with impressive results in domains such as computer vision, speech analysis, or natural language processing. The concept has gone beyond research/application environments, and permeated into the mass media, news blogs, job offers, startup investors, or big company executives’ meetings. But what is behind deep learning? Why has it become so mainstream? What can we expect from it? In this talk, I will highlight a number of facts and myths that will provide a shallow answer to the previous questions. While doing that, I will also highlight various applications we have worked on at our lab. Overall, the talk wants to place a series of basic concepts, while giving ground for reflection or discussion on the topic.

Jordi Pons from the Music Technology Group of the Universitat Pompeu Fabra (UPF)

Title: Deep learning for Music Informatics Research

Abstract: A brief review of the state-of-the-art in music informatics research and deep learning reveals that such models achieved competitive results for several tasks in a relatively short amount of time. Due to these promising results, some researchers declare that is the time for a paradigm shift: from hand-crafted features and shallow classifiers to deep processing models. In the past, introducing machine learning for global modeling (ie. classification) resulted in a significant state-of-the-art advance. And now, some researchers think that another advance could be done by using data-driven feature extractors based on deep learning instead of using hand-crafted features. However, deep learning for music informatics research is still in its early ages - current systems are based on solutions proposed for computer vision or speech. We will present our work describing how to adapt these technologies for the music case.

[Slides]

Student Projects

Master and bachelor student developed during the week of the course a practical project. Summary slides and source code are publicly available.

Master Students

Team	Project	Web	Slides	Repo
Team 1	Sentiment analysis of Movie Reviews		Slides	Repo
Team 2	Smart text	Web	Slides	Repo
Team 3	Sentiment analysis for IMDB database		Slides	Repo
Team 4 (award)	Text to Phonemes		Slides	Repo
Team 5	Phonetic Transcription		Slides

Bachelor Students

Team	Slides	Repo
Team 1	Slides	Repo
Team 2	Slides	Repo
Team 3	Slides	Repo
Team 4	Slides	Repo
Team 5 (award)	Slides	Repo

Pics

Photo album available from Google Photos.

Schedule

When	Tuesday 24	Wedneday 25	Thursday 26	Friday 27	Tuesday 31
10:00-10:20	Welcome	DNN/DBN	LM	SpeakerId II	Project Expo 1
10:20-10:40	Perceptron	Recurrent I	ASR	Translation II	Project Expo 2
10:40-11:00	Convolutional	Recurrent II			Project Expo 3
11:00-11:20	Architectures I	Embeddings	SpeakerID	Joan Serrà	Project Expo 4
11:20-11:40	Backpropagation	Keras	Translation I		Project Expo 5
11:40-12:00	Training	Keras
12:00-12:20	Keras	Keras	Synthesis I	Synthesis II
12:20-12:40	Keras	Generative	ASR II	Multimodal	Jordi Pons
12:40-13:00	Keras	Architectures II
13:00-14:00	Project (MSc)	Project (MSc)	Project (MSc)	Project (MSc)	Closing

Practical

Course on Piazza.
Course code: 230362 (Phd & master) / 230325 (Bachelor)
ECTS credits: 2.5 (Phd & master) / 2 (bachelor) (corresponds to full-time dedication during the week course)
Teaching language: English
The course is offered for both master and bachelor students, but under two study programmes adapted to each profile.
Class Dates: 24, 25, 26, 27 and 31 January 2017 (there are no sessions on January 30).
Class Schedule: 4 hours a day (you will need 6 extra hours a day for homework during the week course). From 10am until 2pm.
Capacity: 15 MSc/Phd students + 15 BSc students
Location: Campus Nord UPC, Module D5, Room 010

Contact

If you have any general question about the course, please use the public issues section of this repo. Otherwise, you can send an e-mail to Xavier Giro-i-Nieto.

Our Computer Vision Seminar

If you liked this seminar, you may want to check the Deep Learning for Computer Vision seminar we organised in 2016 on computer vision, as well as enrol in the new one we are organizing in 2017.

Phil Blunsom et al, “Oxford Deep NLP 2017 course”. Oxford University 2017. [videos]
Richard Socher, “CS224d: Deep Learning for Natural Language Processing”. Stanford University 2016.
Thang Luong, Kyunghyun Cho, Christopher Manning “Neural Machine Translation”. Tutorial ACL 2016.
Aaron Courville and Yoshua Bengio, “Deep Learning Summer School”. Montreal 2016.
Hugo Larochelle, “Neural Networks”. Université de Sheerbroke.
Joan Bruna, “Stats212b: Topics on Deep Learning”. Berkeley University. Spring 2016.
Yann LeCun, “Deep Learning: Nine Lectures at Collège de France”. Collège de France, Spring 2016. [Facebook page]
Dhruv Batra, “ECE 6504: Deep learning for perception”. Virginia Tech, Fall 2015.
Vincent Vanhoucke, Arpan Chakraborty, “Deep Learning”. Google 2016.
Jeremy Howard, “Practical Deep Learning for Coders”. Fast AI 2016.
Deep Learning TV on YouTube, Facebook and Twitter.