📝 HW1

Due date: Friday, September 13 at midnight.

⏳ We recommend reading through each problem ASAP so you can accurately estimate the time needed to complete the assignment.

Unless otherwise stated, assignments in STAT 131A are to be done individually.

Some components of this assignment have not been seen by a previous cohort of STAT 131A students, so there may be some unforeseen hiccups.

We will use this Ed post to track errors and clarifications on HW1.

📮 Submission

Submit your assignment via Gradescope. The Gradescope will be live at least a few days before the HW deadline.

  • Make sure to tag your answers properly on Gradescope, or else you may be docked points for making the grading process more time-consuming.

You must submit a PDF of any PingPong chats that include code you used in your submission.

  • This should take the form of one long PDF. One way to do this is to copy all of your relevant PingPong chats into a Google Doc, and then print the doc as a PDF.

  • You are responsible for understanding all the code you submit, regardless of whether or not you used PingPong for help.

For coding components, you will produce both (1) a .Rmd file with your code and (2) an HTML file containing the code and output.

  • On Gradescope, you will submit a single ZIP file containing both the .Rmd and HTML files.
  • You can generate an HTML file in RStudio by pressing Knit and then Knit to HTML.
  • The knitting process will not work if there are errors in your code, so be sure to leave plenty of time to knit your lab notebooks before the deadline.
  • Proofread your HTML to make sure all of your answers and plots are visible. If your HTML file is really long, it is possible that your code is printing out a large dataset or a really long vector. Make sure to comment out any code that prints more information than each question asks you for.

For the plot presentation, there are two submission steps.

  • First, submit a public link to your screencast via Gradescope. We will dock points and/or dock slip days if the link is not publicly accessible during grading. Test your link in an incognito window to ensure it’s public.
  • Second, submit the same public link to your screencast using this Google Form. This second step prepares us for HW2, when another student will provide anonymous feedback on your screencast.

For math problems, prepare a photo of your handwritten answers to each problem, convert the photo to a PDF, and submit the PDF to Gradescope.

  • Alternatively, you can use \(\LaTeX\) to typeset your answers within a .Rmd file within RStudio, or using another \(\LaTeX\) editor like Overleaf.
  • The basics of \(\LaTeX\) are useful to learn if you ever plan to include a mathematical expression in a presentation or document in the future.
  • Here’s a nice guide for getting started.
  • We can also help with \(\LaTeX\) in office hours or via Ed.

Forms to complete (0% of the HW1 grade)

Complete the new student survey.

Complete the office hour survey.

📈 Data manipulation and plotting with R (50% of the HW1 grade)

Complete the HW1 coding problems: DataHub

  • The HW1 coding problems are located in 131a-labs-fall-2024 on DataHub.

  • These problems build on the concepts covered in Lab 1, so be sure to complete Lab 1 before attempting these questions.

🗣️ Plot presentation (30%)

In their 2013 research paper titled The Missing “One-Offs”: The Hidden Supply of High-Achieving, Low-Income Students, Caroline Hoxby and Chris Avery investigate the behavior of high-achieving low-income applicants to undergraduate programs in the United States.

Here is Figure 10 of their research paper:

Your task: Record a 60-90 second screencast describing the plot above.

  • Your audience is a UC Berkeley undergraduate who has not taken a statistics or data science course, and is unfamiliar with the research paper. You do not need to read the research paper in full to understand the plot, but you might find it helpful to reference.
  • In your screencast, you must explain the minimum necessary background required to understand the plot, along with the key takeaway(s) of the plot.
  • Your name and face should not be in the screencast. In other words, the plot should take up the entire window of the screencast, with your voice playing in the background. Do not introduce yourself or use any identifying information.
  • You should use your mouse pointer to indicate particular points of interest on the plot. Alternatively, you can verbally direct the viewer to points of interest (e.g., “In the top right corner, you can see that…”).
  • You should use an engaging tone that sounds as though you are presenting to a live audience.
  • You are welcome to read off a script that you have written ahead of time, but try not to make it too obvious. In fact, it can be very helpful to write a script for a presentation ahead of time, even if you do not actually read the script when presenting.
  • Remember the three key guiding questions you should address before digging into the details: (1) What’s on the X axis? (2) What’s on the Y axis? (3) What does a specific point/line/feature on your plot mean in context?
  • There are lots of free tools for recording screencasts. For example, QuickTime is a useful tool for recording screencasts on a Mac. Feel free to post on Ed if you cannot identify a way to record a screencast anonymously.
  • If you need more than 60-90 seconds to record for accessibility reasons, please let the teaching staff know via Ed.

As part of HW2, you will provide anonymous feedback on another student’s submission for this problem.

  • Your screencast, along with the quality of feedback you provide to another student, will be graded by the teaching staff. At no point will other students grade your work.

💾 Upload your screencast to your UC Berkeley Google Drive account, or another place that allows you to share a public link.

  • You will submit a public link to your screencast in two places. See submission details at the top of this page.
  • Before submitting, make sure your screencast is publicly accessible with the link. One easy way to do this: Open the link in an incognito window in Google Chrome.
  • If your video is private or unviewable when we try to access it, we will not be able to confirm that you submitted your video before the HW1 deadline. You may lose points and/or have to use slip day(s).

Why complete this problem? Communication is a critical, but often under-appreciated, component of the data science life cycle. This exercise helps develop your data storytelling ability, which is essential for getting anyone to actually care about your statistical analyses!

🔔 Working with probability density functions (PDFs) (20%)

For the problem below, please show all steps. You are welcome to use a scientific calculator for arithmetic (e.g., no need to do long division by hand).

Consider the function \(f\) defined on the interval \([0,1]\) : \(f(x) = cx^\frac{1}{3}\).

  1. Find the value of \(c\) that makes \(f\) a valid PDF for an r.v. \(X\).

  2. What is \(\Pr(0.1\leq X \leq 0.5)\)?

  3. What is \(\Pr(0 \leq X \leq x)\), for any \(x\) in \((0,1)\)?

  4. What is \(\frac{d}{dx} \Pr(0\leq X \leq x)\)? How does your answer to this problem compare to your answer in the first problem?

  5. What is \(\mathbb{E}(X)\)?

  6. What is \(\text{Var}(X)\)?

Why complete this problem? This problem will solidify your foundational understanding of probability density functions, which are a bread-and-butter building block of applied statistics.