Lecture Times | TuTh 5:30-7pm |
Lecture Location | AP&M 2452 |
Class webpage | http://grammar.ucsd.edu/courses/lign256/ |
Instructor | Roger Levy (rlevy@ling.ucsd.edu) |
Instructor's office | AP&M 4220 |
Instructor's office hours | Th 3:15-5:15pm |
The goal of this course is to train you, the students, to do research in natural language processing — work that can potentially be published in the leading conferences and journals of the field. In addition to helping you succeed academically in this field (and related fields including AI, machine learning, and psycholinguistics), this is also great training if you are interested in doing NLP work in industry, either in a research lab (Google, Microsoft, Powerset, Yahoo, etc.) or in a startup.
Graduate students in linguistics, computer science, engineering, cognitive science, psychology, and any other discipline who are interested in how to process natural language by computer. Highly motivated undergraduates are also welcome, but please talk to the instructor before enrolling.
We're going to be using the two premier textbooks in the field for this course:
There will also be at least one reading from the following new textbook:
Finally, we may occasionally read recent papers from the literature, and I have some alpha-version book chapters that we may also make use of.
Working in natural language processing requires putting several different types of skills together:
You may not have all of these skills yet, but hopefully you have a substantial subset of them. It may require a bit of extra work for you to strengthen your background in any area where you're deficient — the focus within the class will be on how to put them together.
The mailing list for the class is ligncse256@ling.ucsd.edu. Sign up for the mailing list here.
Week | Day | Topic | Readings | Materials | Homework Assignments | |
---|---|---|---|---|---|---|
Week 1 | 6 Jan | Class Introduction | M&S 1, 2 | Lecture 1 | ||
8 Jan | No class, Roger out of town | |||||
Week 2 | 13 Jan | Language Modeling I | M&S Chapter 6, J&M Chapter 4 | Lecture 2 | Programming Assignment 1 (due 27 Jan) | |
15 Jan | Language Modeling II | Chen and Goodman 1998 (an absolute classic) | Lecture 3; Kneser-Ney mini-example | |||
Week 3 | 20 Jan | Text Categorization: supervised methods | MRS 2008, Chapter 13 | Lecture 4 | ||
22 Jan | Unsupervised learning I: topic models for text categorization | Griffiths & Steyvers, 2004 | Lecture 5 | |||
Week 4 | 27 Jan | Unsupervised Learning II: Word segmentation | Goldwater et al., 2006 | Lecture 6 | Written assignment 1; Mini example of unsupervised word segmentation | |
29 Feb | Formalisms: weighted finite state automata & context-free grammars | J&M Chapter 3; Levy, 2008 | Final project guidelines go out; Short intro to directed graphical models | |||
Week 5 | 3 Feb | Part-of-speech Tagging | M&S Chapter 9, J&M Chapter 6 | Lecture 8 | Programming Assignment 2 (due 19 Feb); HMM Viterbi inference mini-example | |
5 Feb | Syntax | M&S Chapter 10, J&M Chapter 12 | Lecture 9 | |||
Week 6 | 10 Feb | Parsing I | J&M Chapter 13, M&S Chapter 11 | Lecture 10 | ||
12 Feb | Parsing II | J&M Chapter 14, M&S Chapter 12 | Lecture 11 | |||
Week 7 | 17 Feb | Computational Psycholinguistics and Incremental Parsing | Hale 2001; Levy et al., 2009 | Lecture 12 PDF component; Lecture 12 PPT component | ||
19 Feb | Word-sense disambiguation and semantic roles | M&S Chapter 7, J&M Chapters 19 & 20 | Lecture 13 | You should show me a draft final project proposal by this point | ||
Week 8 | 24 Feb | Compositional Semantics | J&M Chapter 18, handout | Lecture 14 | Short written homework assignment to be handed out | |
26 Feb | Discourse Processing | J&M Chapter 21 | Lecture 15 | |||
Week 9 | 3 Mar | Unsupervised Learning III: POS induction, morphology | Clark 2000, Goldsmith 2001, Goldwater & Griffiths, 2007 | Lecture 16 | ||
5 Mar | Unsupervised Learning IV: Syntactic acquisition | Klein & Manning 2002, 2004; Johnson et al., 2007 | Lecture 17 | |||
Week 10 | 10 Mar | Finish grammar induction, talk a bit about machine translation | Lecture 18 | |||
12 Mar | No class, Roger out of town | |||||
Finals | 20 Mar | Final Projects Due |
Your grade will be based on the following criteria:
Collaboration is encouraged for homework assignments and final projects, but you must be explicit about who you collaborated with and what the division of labor was.
You have seven late days to use on your assignments, at your discretion. No more than five days can be used per assignment. After those days are used up, you lose 10% of your grade for that assignment per day late. I reserve the right to increase the number of late days if that seems appropriate (it can be challenging to correctly assess the difficulty and length of an assignment), but don't count on it!
Computational linguistics/NLP is a very conference-oriented field; many of the classic articles in the literature never wind up getting published in journals. The top conferences include:
There are also some excellent workshops and conferences run regularly by "special interest groups", including
and others. Finally, excellent work in computational linguistics/NLP also appears in machine learning, artificial intelligence, and other conferences, notably the Conference on Neural Information Processing Systems (NIPS).
The flagship and leading journal of the field is Computational Linguistics. Other excellent journals in the field include: