Linguistics 251: Probabilistic Methods in Linguistics (Fall 2012)

1 Instructor info

InstructorRoger Levy (rlevy@ucsd.edu)
OfficeApplied Physics & Math (AP&M) 4220
Office hoursWednesdays and Thursdays 10am-11am (subject to change)
Class TimeTuTu 2:00-3:50pm (in general, Tuesdays 3-3:50pm will be practicum times for the first part of the quarter)
Class LocationAP&M 4301
Class webpagehttp://grammar.ucsd.edu/courses/lign251/

2 Course Description

This course is about probabilistic approaches to language knowledge, acquisition, and use. Today, studying language from a probabilistic perspective requires mastery of the fundamentals of probability and statistics, as well as familiarity with more recent developments in probabilistic modeling. In this course we'll move quickly through basic probability theory, then cover fundamental ideas in statistics–parameter estimation and hypothesis testing. We'll then cover a fundamental class of probabilistic models–the linear model–which as a side effect will familiarize you with the most widely used tools in statistics: linear regression, analysis of variance (ANOVA), and generalized linear models (including logistic regression). We'll cover these topics using both frequentist methods (what you need to use in order to write publishable data analyses) and Bayesian methods (which are becoming increasingly popular in all sorts of settings, especially in cognitive modeling of language). We'll then move on to the more advanced topic of hierarchical (a.k.a. multilevel or mixed-effects) modeling, and perhaps even a bit of probabilistic grammars if we have a chance.

The course will involve a hands-on approach to data and modeling, and we'll be using the open-source R programming language (and a bit of JAGS, which interfaces nicely with R, for Bayesian modeling). You'll learn the basics of data visualization and statistical analysis in R, and the class will involve periodic programming practica to ensure that your R programming questions are adequately addressed. Transcripts of programming practica will also be put up online. I encourage you to download R hereas soon as you can, get it running on your own computer, and go through the R tutorial found in Chapter 1 of Harald Baayen's book, or this hands-on introduction to R. You can also download JAGS here.

3 Target audience

The course assumes no expertise in linguistics, quantitative methods, or programming, but background in one or more of these areas will be useful. We'll start from elementary probability theory and build up briskly. We will make a fair amount of use of high school algebra, and also a bit of calculus and liner algebra; there's an appendix in the book that provides you with what you need for the latter two.

4 Reading material

The main reading material will be draft chapters of a textbook-in-progress, Probabilistic Models in the Study of Language, that I am writing. These draft chapters can be found here. There are also a number of other reference texts that may be of use in the course, including:

Finally, we may supplement these with additional readings, both from statistics/NLP texts and pertinent linguistics articles.

5 Syllabus

WeekDayTopicReadingMaterialsR practicum?Homework Assignments
Week 027 SepIntroduction and motivating material; Fundamentals, conditional probability, Bayes' rule, discrete random variablesChapter 2.1-2.5Intro/Motivation Slides; Lect. 1 slidesHomework 1
Week 12 OctContinuous random variables; the uniform distribution; expectation and variance; the normal distributionChapter 2.6-2.10Lecture 2Yes! TranscriptHomework 2
4 OctEstimating probability densitiesChapter 2.11Lecture 3Homework 3; Peterson & Barney dataset
Week 29 OctJoint probability distributions; marginalization; introduction to graphical models;Chapter 3.1-3.2, Appendix C.1-C.2, 3.3.1, 3.4Lecture 4Yes! Transcript
11 OctCovariance, correlation, linearity of expectation; the binomial distributionChapter 4.1-4.3Lecture 5Ad-hoc practicum transcript
Week 316 OctIntro. parameter estimation; consistency, bias, variance; max. likelihood; Bayesian parameter & density estimationChapter 4.4-4.5Lecture 6Homework 4
18 OctBayesian confidence intervals and hypothesis testingChapter 5.1-5.2Lecture 7Yes! Transcript; Raw R code
Week 423 OctBayesian confidence intervals and hypothesis testing IIChapter 5.2Homework 5; spillover word rts file
25 OctFrequentist confidence intervals and hypothesis testingChapter 5.3-5.4Yes! Transcript; Raw R code
Week 530 OctIntro to generalized linear models: linear models (incl. covariance, correlation, multivariate normal distribution)Chapter 6.1-6.2Lecture 10
1 NovLinear models IIChapter 6.3-6.5Lecture 11Yes! Transcript; Raw R code ;
Week 66 NovLinear models IIIChapter 6.6Lecture 12Yes! Raw R code; Norms datasetHomework 6; elp.txt; ELP readme
8 NovFinish up linear models; logistic regression IChapter 6.7Lecture 13Yes! Raw R code
Week 713 NovRoger out of town, no class
15 NovLogistic regression IIChapter 6.8-6.9Lecture 14
Week 820 NovHierarchical models IChapter 8.1-8.2Lecture 15Yes! Raw R codeHomework 7
22 NovThanksgiving, no classYes!
Week 927 NovHierarchical models IIChapter 8.3Lecture 16Yes! Files for practicum
29 NovHierarchical models IIIChapter 8.4Lecture 17Yes! Files for practicum
Week 104 DecEstimating n-gram language modelsChapter 4.6 (to appear)Lecture 18
6 DecProbabilistic grammarsChapter 10Lecture 19
Finals11 DecFinal projects due!

6 Requirements

If you are taking the course for credit, there are four things expected of you:

  1. Regular attendance in class.
  2. Doing the assigned readings and coming ready to discuss them in class.
  3. Doing several homework assignments to be assigned throughout the quarter. Email submission of the homework assignments is encouraged, but please send it to lign251-homework@ling.ucsd.edu instead of to me directly. If you send it to me directly I may lose track of it.

    You can find some guidelines on writing good homework assignments here. The source file to this PDF is here.

  4. A final project which will involve computational modeling and/or data analysis in some area relevant to the course. Final project guidelines are here.

7 Mailing List

There is a mailing list for this class, lign251-l@mailman.ucsd.edu. Please make sure you're subscribed to the mailing list by filling out the form at https://mailman.ucsd.edu/mailman/listinfo/lign251-l! We'll use it to communicate with each other.

8 Programming help

For this class I'll be maintaining an FAQ. Read the FAQ here.

I also run the R-lang mailing list. I suggest that you subscribe to it; it's a low-traffic list and is a good clearinghouse for technical and conceptual issues that arise in the statistical analysis of language data.

In addition, the searchable R mailing lists are likely to be useful.

Date: 2012-12-10 17:51:55 PST

Author: Roger Levy

Org version 7.8.06 with Emacs version 23

Validate XHTML 1.0