Skip to main content Link Search Menu Expand Document (external link)

Assignments

Assignments 1, 2, and 3 are to be completed individually. These assignments are made the enhance the concepts learned in the lectures. Please remember that brevity and clarity in writing are more important than length. No single assignment question should require more than a paragraph.

Assignment 1

Assignment 1 must be completed on Gradescope and you can find the instructions there

  1. How to find the number of unique words a singer used for part I?

    You can hover over the profile picture of the singer on the scatter plot, and the number would show up. And if you want to select a singer by their name, go to the top-left corner and find a dropdown list there. Select the name of the singer you are interested in.

  2. Does each of us need to collect at least 10 data points for part II?

    No, only at least 10 data points for the whole group, but there must be more than two people collecting the data points.

  3. What kind of plots should we use for our data points?

    Try to look back at the data visualization lecture slides and think about what kind of variables you’re working with. One example is if you have two or more categories that you want to represent the count for, a barplot may be the best way to visualize your data.

  4. For the scatterplots, do we need to show a best fit line or any other expectations?

    No, for the scatterplots on this assignment, make sure you have a clear title and labeled axis. And make sure the data points are spread evenly across. We only want to see the trend of the data points, so no lines or additional modeling is required. But feel free to add it if it helps you.

  5. Are there any resources to research token analysis?

    We won’t go into token analysis too much in this class, but NLP (natural language processing) is a unique form of data analysis that involves categorizing words or phrases. If you would like to learn more about natural language processing this video provides a great breakdown of token analysis with a Python Jupyter Notebook demo.

Assignment 2

Assignment 2 must be completed on Gradescope and you can find the instructions there

  1. How many parameters are required to find the p-values?

    You can choose whatever combination you have as long as the p-value is less than 0.05. Just make sure that you write down the parameters sufficiently clearly, so your results can be easily replicated.

  2. How much detail do we need to go into for the justification when picking our first set of variables?

    Not much, as this is not a poli sci course. There just needs to be some justification for what you selected. This may be a bit challenging, especially if you have less knowledge of US politics and the economy. We understand this and for this reason your justification does not need to be true, and we do not need to agree with the justification. But try to make it as logical as possible to support your hypothesis.

Assignment 3

Assignment 3 must be completed on Gradescope and you can find the instructions there