Syllabus

Students in Stat 131A are expected to have read the syllabus in its entirety by the second week of the course.

Course Details πŸ₯—

Description πŸ”Ž

Stat 131A is a upper-division course that follows Data 8 or STAT 20. The course will teach a broad range of statistical methods that are used to solve data problems, including group comparisons, standard parametric statistical models, multivariate data visualization, multiple linear regression and classification, classification and regression trees, and random forests. Students will be introduced to the widely used R statistical language and they will obtain hands-on experience in implementing a range of statistical methods on numerous real world datasets.

In short, Stat 131A will provide you with a Swiss army knife of foundational statistical methods to use for data science projects!

Lectures πŸ§‘β€πŸ«

MWF 2-3pm @ Morgan 101

Lecture attendance is mandatory. See below for more details.

Labs πŸ§ͺ

Lab 101: Tuesday and Thursday, 11am-12pm @ Evans 330

Lab 102: Tuesday and Thursday, 4-5pm @ Hearst Mining Bldg 310 (Hearst is next to Evans Hall)

You should only be signed up for one lab group.

Lab attendance is optional, but highly encouraged.

Office hours (OH) πŸ—“οΈ

Josh’s Office Hours:

  • MWF 1:45pm in lobby outside Morgan 101 (before lecture)
  • T+Th 7-8pm @ Evans 334

Van’s Office Hours:

  • Wednesdays 10am-12pm @ Evans 434
  • Fridays 9am-11am on Zoom
  • For security reasons, Zoom links for OH are posted on bcourses.

15-minute coffee chats with Josh on the phone (experimental!):

  • One slot every weekday at 10am.
  • Designed for individual advising, not logistical concerns or coursework help.
  • For example, we can talk about career plans or life advice.
  • Please limit to no more than one chat per month.
  • Book at this link

We may add or reschedule OH if needed.

Coming to office hours does not send a signal that you are behind or need extra help. In fact, the students who come to OH are often the most successful in the course.

  • OH is a great opportunity to discuss not only topics directly related to the course, but also anything else that’s on your mind.
  • We also welcome questions about career trajectories and research opportunities at UC Berkeley and beyond.
  • Keep in mind that you do not need to come to office hours with an agenda. Listening in is welcomed and encouraged!
  • Finally, attending and participating in office hours is a great way to set yourself up for a terrific letter of recommendation. This is true for most courses!
  • If you don’t already, I highly recommend that you attend the instructors’ office hours in other classes from time to time.

Study groups πŸ‘₯

We encourage you to work together in groups to solidify your understanding of the course material.

If you would like assistance forming a study group, please complete the study group form by Monday, September 2nd at 11:59pm PT.

Our goal is to form the study groups ASAP, so students can begin discussing the first homework assignment.

Concurrent enrollment and auditing πŸ‘‚

Concurrent enrollment students wishing to register for the class should fill out this Google Form to give me information about their previous coursework so we can assess whether they have satisfied the pre-requisites.

UPDATE: Stat 131A will not be enrolling concurrent enrollment students in Fall 2024. But, concurrent enrollment students are welcome to audit the course.

Students who wish to audit the course can also fill out the top portion of this form to get their email added to bcourses as a guest. Only name and email is needed for auditors.

Course platforms πŸ–₯️

bcourses will only be used for secure course material, like exam solutions, grades, and office hour Zoom links.

All other course materials will be posted on the public course homepage.

Assignments should be submitted via Gradescope.

All course communication will take place via Ed.

The only acceptable large-language model (LLM) for use in Stat 131a is PingPong.

  • Unless otherwise indicated, all other LLMs (e.g., ChatGPT) are prohibited and considered cheating in this course.
  • Enrolled students will receive an invitation to PingPong via email shortly after the semester begins.

Grades πŸ’―

Grades are calculated as follows:

  • Lecture attendance and participation: 10%
  • Labs: 10%
  • Homework: 20%
  • Final project: 15%
  • Midterm 1 (during class): 10%
  • Midterm 2 (during class): 15%
  • Final: 20%

Grades will not be curved.

  • In other words, there is no limit to the proportion of students with an A, B, etc. You are incentivized to help each other learn and succeed.
  • You are guaranteed an A if you score 93% or higher, an A- if 90% or higher, a B+ if 87% or higher, a B if 83% or higher, and so on.
  • A+ grades are awarded rarely, and only for truly exceptional performance.
  • Grade cutoffs may be adjusted downward at the end of the semester, but this is not guaranteed.

See attendance policy below for an opportunity to earn up to two percentage points of extra credit.

Lecture technology policy ❌ πŸ‘©β€πŸ’» \(~\) βœ…πŸ“±

Most lectures will consist of an interactive problem-solving session, followed by a hands-on demo or coding session.

  • Laptops and tablets with attached keyboards are not allowed during the problem-solving session, though you are permitted to use a tablet to take handwritten notes.

  • If you need to use technology for accessibility reasons, the previous bullet does not apply to you.

  • Laptop use is permitted (and encouraged!) during the hands-on demo and coding sessions.

  • Phones are allowed during lecture. It is preferable to use a phone to submit conceptual questions and neighbor discussion answers during lecture.

  • This article explains why we have the laptop policy. Long story short, laptop use can negatively impact the learning of nearby students (i.e., this policy does not punish you; the policy prevents you from punishing others).

  • The course staff reserves the right to reduce your lecture attendance grade for violating the technology policy.

Lecture recordings πŸŽ₯

Lectures will be recorded automatically.

  • The course staff cannot guarantee audio or video quality.
  • Lecture recordings are posted on bcourses.

Labs and office hours are not recorded.

The homework assignments may occasionally ask you to watch additional recordings to supplement the lecture material (e.g., if we run out of time covering an essential topic).

Attendance and participation βœ‹

In-person lecture attendance is mandatory.

  • It is critically important to practice learning in a live setting.
  • Difficulty with paying attention in live meetings is a common hurdle for new grads.

Lecture attendance is a substantial component of your grade.

  • Lecture cannot be attended remotely.
  • You are allowed three unexcused lecture absences. Each additional absence will impact your lecture attendance grade.

If you cannot attend a lecture due to an extenuating circumstance, please complete the lecture attendance excusal form before the lecture starts.

  • This form can be completed months, weeks, or days in advance of lecture.

Acceptable extenuating circumstances include:

  • Illness. DO NOT come to class if you are sick! Even a sniffle!
  • Personal emergencies.
  • Important life events (e.g., weddings)
  • Pre-planned collegiate athletic events in which you are a participant.
  • This list is not exhaustive. If you think an absence should be excused, complete this form and explain your reasoning. We cannot guarantee that your absence will be excused, but we will be reasonable.

Concept checks βœ…

We will use in-class concept checks and neighbor discussions to track attendance.

  • Concept checks are not graded.
  • Concept checks are answered via this form.
  • Submitting a concept check outside of standard lecture time is considered cheating and an honor code violation. We will use your seat number and submission time to validate that your responses were entered during lecture time. We reserve the right to photograph the lecture hall to verify attendance.

Neighbor discussions πŸ—£οΈ

In addition to concept checks, there may be one or more neighbor discussions during each lecture.

  • Neighbor discussion answers are submitted via this form.
  • Neighbor discussion answers are not graded.

To encourage discussion among all classmates, we will award up to two percentage points of extra credit for having a variety of neighbors.

  • The students with the highest number of unique neighbors will receive the full two percentage points of extra credit.
  • Everyone else will receive, at the minimum, a fraction of extra credit proportional to their number of unique neighbors.
  • For example, if you sit next to the same person all semester, you can receive full participation credit for neighbor discussions, but you will very little extra credit.
  • The extra credit policy will only take effect if at least one student has spoken to at least 20 unique neighbors over the course of the semester.
  • As above, submitting a neighbor discussion answer outside of standard lecture time is considered cheating and an honor code violation.

Homework πŸ“

There are 6 homework assignments planned, though the exact number may change.

  • Homework will be a combination of computational exercises and data analysis using the computer, as well as conceptual questions.
  • Homework assignments are weighted equally.

HW is generally due every other week.

  • Homework assignments will be posted to the course website at least one week before the HW deadline.
  • All homework assignments will be submitted via Gradescope and are due by 11:59 pm of the due date.

We will not drop your lowest-scoring homework assignment.

  • Instead, we will raise your lowest homework score to 80% of its maximum score, regardless of what you actually scored.
  • We will not change the grade of your lowest-scoring homework assignment if its score is above 80%.

Poorly organized assignments will be docked points at the discretion of the grader.

  • It is critical to have empathy for the person who will be reviewing your work, whether a member of the course staff, another student providing feedback, or your future manager.

Late HW ⏰

You are allotted five slip days for labs and homework assignments.

  • Each slip day adds 24 hours to the deadline.
  • Slip days cannot be used on the final project or exams.
  • Slip days are intended to account for unexpected delays, like minor illness or homework overload.
  • There is no extra credit awarded for unused slip days.
  • You cannot use partial slip days.

You are allowed to use, at most, two slip days per assignment.

  • In other words, assignments will not be accepted more than 48 hours after the original due date.
  • This policy ensures that we can grade all assignments in a timely fashion.

If you plan to use slip days, do not contact the course staff.

  • We will automatically account for slip days when calculating grades.

Extensions will only be granted if required by a Letter of Accommodation (LoA), or in extraordinary circumstances (e.g., medical emergencies).

Labs πŸ§ͺ

During lab sessions, a GSI will review conceptual material and help you work through lab coding assignments.

  • Lab sections meet twice a week.
  • There are no lab sections the first week of classes.

We plan to have 12 lab assignments.

  • Each assignment will teach you how to perform the analyses shown in class using R.
  • Labs are intended to be finished or mostly finished during section.
  • HW assignments may build on the exercises covered in lab.

Lab assignments are generally due on Mondays at 11:59 pm and should be submitted via Gradescope.

  • Labs are graded on completion, not correctness.

While there is no Week 1 lab, there is a self-paced Lab 0 for you to work through independently. This lab is not graded.

  • Before the first lab section on Tuesday of Week 2, work through Lab 0.

  • You do not have to turn in this lab, but we will assume that you have worked through this lab and understand the code.

  • If you have questions about this lab, please come to office hours or post on Ed.

Final project πŸ“Š

The final project will be due on the last day of reading week, Friday December 13.

More details on the final project will be provided later in the semester.

Exams βŒ›

The first midterm is scheduled for Wednesday October 9 and will take place during lecture in Morgan 101.

The second midterm is scheduled for Wednesday November 13 and will take place during lecture in Morgan 101.

The final exam will take place Thursday December 19, 3-6pm (scheduled by the registrar). Location TBD.

All exams are cumulative, with emphasis on more recent material.

  • To acknowledge maturation over the course of the semester, exams are increasingly weighted in your course grade.
  • If you do not perform well on the first midterm, you can still do very well in 131a!

If you cannot attend an exam due to an extenuating circumstance, please contact the course staff ASAP to determine whether your circumstance qualifies for a make-up exam.

Textbooks and resources πŸ“–

Everything you need to know for Stat 131a will be covered in lectures, labs, and assignments.

  • It is possible to do very well in Stat 131a without ever referring to an outside textbook or resource.

However, most of the course material is covered by the online textbook developed specifically for 131A.

  • You can find the textbook here.

The StatQuest YouTube Channel is an excellent resource.

  • StatQuest provides videos on many of the topics we will cover in class. The instructor is very entertaining!

If you would like some additional optional reading, you can try the following books.

  • Theory Meets Data by Ani Adhikari. This is the online book for STAT 88 that covers introductory probability at the level of Stat 20.
  • R for Data Science, by Garrett Grolemund and Hadley Wickham. This is a free online book that covers the tidyverse set of R packages.
  • The Statistical Sleuth: A Course in Methods of Data Analysis by Ramsey and Schafer
  • Introductory Statistics with R by Peter Dalgaard

None of these books covers all of the topics we will cover in 131A, nor do they necessarily have the same perspective and focus as this class. But for those students wanting some additional structure or R assistance, these books may be helpful and should be at the right level for this class.

Stat 20 and Data 8 are similar courses, but each covers some subjects that the other does not. While we will cover these topics in class, you may find the following useful background if you are seeing them for the first time (more to follow):

This is the online book used by Data 8. These chapters introduce hypothesis testing using only resampling ideas, ideas which are not necessarily covered in Stat 20.

Policy on Large Language Models (LLMs) πŸ’¬

LLMs (e.g., ChatGPT) are becoming increasingly essential in the workplace.

  • To that end, the use of LLMs is not only permitted in this course, but encouraged.
  • Use this course as an opportunity to learn where LLMs are most useful, and where they fall short.

Potential uses of LLMs in Stat 131A:

  • Generating practice quiz questions
  • Explaining course concepts
  • Helping you code

Unless otherwise indicated, you can only use the course-approved LLM PingPong for help on labs, homework, and the final project..

  • Furthermore, if you use PingPong to help you, you must submit a PDF of the relevant PingPong conversation along with your assignment.
  • You are responsible for understanding every line of code that you submit in 131A. Exams may ask you to explain specific lines of code used in lab and HW solutions. So, don’t use LLMs to write code without taking time to understanding the code.
  • Of course, LLMs cannot be used on exams.

It is often easy to spot default LLM text output.

  • If you copy and paste answers directly from PingPong, be warned that your grader may interpret your answer as lacking in effort and you may lose points on the assignment.
  • Take the time to understand and paraphrase the information LLMs provide you.

If you find an especially interesting use case of an LLM for any component of the course, please share it with the course staff! We are excited to hear what you find.

Course communication πŸ—£οΈ

We use the Ed platform to manage course questions and discussion, and to make announcements.

In general, do not email the course staff.

  • Exception: You are welcome to email individual members of the course staff if you have a private concern that you do not want shared with the entire course staff.

Please post publicly when possible.

  • Public posts benefit many more students than private posts.
  • We may ask you to change your private post to a public post if the answer could be of use to other students.
  • You are always allowed to remain anonymous!

If you include code in your Ed post, please use the code editing fonts:

Standard font is hard to read:

── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ── βœ“ ggplot2 3.3.2 βœ“ purrr 0.3.4 βœ“ tibble 3.0.3 βœ“ dplyr 1.0.2 βœ“ tidyr 1.1.2 βœ“ stringr 1.4.0 βœ“ readr 1.3.1 βœ“ forcats 0.5.0 ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ── x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag()

# here’s my plot code

x <- ggplot(df) + geom_point(aes(x = year, y = count))

Code font is easier to read:

── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
βœ“ ggplot2 3.3.2     βœ“ purrr   0.3.4
βœ“ tibble  3.0.3     βœ“ dplyr   1.0.2
βœ“ tidyr   1.1.2     βœ“ stringr 1.4.0
βœ“ readr   1.3.1     βœ“ forcats 0.5.0
── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

# here's my plot code
x <- ggplot(df) + geom_point(aes(x = year, y = count))

Computing environment πŸ–₯️

The official course materials use the R programming language.

  • As in Data 8 and Stat 20, labs and assignments will be distributed via DataHub.

You do not need to know anything about R to take this course.

  • We will provide resources for you to learn everything you need to know.

The concepts taught in this course are language-agnostic.

  • In other words, everything you learn in this class can be readily implemented using a combination of other tools (e.g., Python, SQL, etc.).
  • Note that LLMs are an excellent aid for translating your knowledge across different programming language and software.

Weekly topics πŸ₯—

The following is a rough and optimistic guideline for the material we will cover in the semester.

  • The actual topics may vary as the semester goes along.
  • It is likely that we will proceed more slowly than this schedule indicates.
  • Relevant sections of the textbook are linked for each week, though you are only responsible for the material we cover in lecture, lab, or HW assignments.

Week 1. Principles of visualization.

Week 2. Boxplots and histograms. Discrete and continuous distributions. 2.1, 2.3.

Week 3. Probability. Bayes’ theorem. Naive Bayes algorithm. 2.2.

Week 4. Sampling distributions. Bootstrapping. 2.4.

Week 5. Confidence intervals. Parametric hypothesis testing. 3.

Week 6. Non-parametric hypothesis testing. Type I and II errors. Power. Study design. Multiple testing. 3.

Week 7. Midterm 1 (Weeks 1-5). Linear regression. 4.1-4.3, 6

Week 8. More linear regression. Feature generation. Transformations. 6

Week 9. Cross-validation. Bias-variance tradeoff. Logistic regression. Classification error metrics. 7 7.5

Week 10. Buffer for Week 1-9 catch-up.

Week 11. Intro to non-parametric methods. Kernel density estimation (KDE). LOESS. 2.5. 4.4-4.5

Week 12. Midterm 2 (Weeks 1-10). Principle components analysis (PCA). Clustering. 5.4

Week 13. Decision trees. Random Forests. 8

Week 14. Buffer for Week 10-13 catch-up.

Week 15. Most likely, catch-up and review. If time permits, intro to causal inference.

Academic Honesty Policy πŸ‘

Homework and projects must be completed independently, with the following exceptions:

  • You may discuss specific issues/questions you have about the homework at a high level, but you must not sit down and do the assignment jointly.
  • Giving advice about code or coding tips is also not cheating, but you can not directly share code with other classmates.

For exams, cheating includes, but is not limited to, using electronic materials in an exam beyond that allowed, copying off another person’s exam or quiz, allowing someone to copy off of your exam or quiz, and having someone take an exam or quiz for you.

Requesting, obtaining, and/or using solutions from previous years or from the internet or other sources, if such happen to be available, is considered cheating.

In fairness to students who put in an honest effort, cheaters will be harshly treated.

  • Any evidence of cheating will result in a score of zero (0) on the entire assignment or examination, and perhaps a failing grade in the class.
  • We will always report incidences of cheating to the Office of Student Conduct, which may administer additional punishment.

Accommodations πŸ’™

UC Berkeley is committed to creating a learning environment that meets the needs of its diverse student body including students with disabilities.

  • If you anticipate or experience any barriers to learning in this course, please feel welcome to discuss your concerns with Josh, whether after class, in office hours, via Ed, or via email.

If you already have a Letter of Accommodation, please open a private Ed post ASAP and attach your LoA.

  • We can accommodate you more easily if you provide this information early in the semester.
  • We cannot guarantee that last-minute requests for accommodation will be provided.

If you have a disability, or think you may have a disability, you can work with the Disabled Students’ Program (DSP) to determine any accommodations you may need to have equal access in this course.

  • The Disabled Students’ Program (DSP) is the campus office responsible for authorizing disability-related academic accommodations, in cooperation with the students themselves and their instructors.
  • You can find more information about the DSP application process here.
  • Josh is available if you have any questions or concerns about your accommodations.
  • In the event of a disagreement, the proper procedure is for you to work with your DSP Specialist and your DSP Specialist to work with Josh toward a resolution.

Accessible DS education for all ⭐

In support of our commitment to making Data Science education inviting, engaging, and respectful for people of diverse identities, backgrounds, experiences, and perspectives, I want to relay the following three items from the Data Science Undergraduate Studies (DSUS):

Device Lending options

Students can access device lending options through the Student Technology Equity Program (STEP) program.

Data Science Student Climate

Data Science Undergraduate Studies faculty and staff are committed to creating a community where every person feels respected, included, and supported. We recognize that incidents may happen, sometimes unintentionally, that run counter to this goal. There are many things we can do to try to improve the climate for students, but we need to understand where the challenges lie. If you experience a remark, or disrespectful treatment, or if you feel you are being ignored, excluded or marginalized in a course or program-related activity, please speak up. Consider talking to your instructor, but you are also welcome to contact Executive Director Christina Teller at cpteller@berkeley.edu or report an incident anonymously through this online form.

Community Standards

Ed is a formal, academic space. Posts in this forum must relate to the course and be in alignment with Berkeley’s Principles of Community and the Berkeley Campus Code of Student Conduct. We expect all posts to demonstrate appropriate respect, consideration, and compassion for others. Please be friendly and thoughtful; our community draws from a wide spectrum of valuable experiences. Posts that violate these standards will be removed.