Programming Homework Help
Programming Homework Help. R Coding with LARGE Dataset
Utilize LARGE “R” Dataset of Congressional Documents: https://drive.google.com/file/d/1Q_-EUwW2I_C9WrYCgpxlcIrYfoMTx3os/view?usp=sharing
RESEARCH Question: Does the participation or discussion of UNION by a speaker or state reflect differences geographically or between parties?
PROVE Hypothesis: The substantive affect of geography has a stronger influence than party affiliation on how favorably or unfavorably members of Congress discuss collective labor rights regarding unions.
R Code MUST include the following:
-Load in 2018-2020 Workspace R Data included from Google Drive
-Create a DFM / CORPUS
–Define how many documents are included
-Determine TOKENS and FREQUENCY
-DFM_LOOKUP Function for keywords: UNION, Scab, Strike, Collective Bargaining, TAFT
-Segment the text by Speaker
-Create KWIC Keyword Windows utilizing UNION, Collective Bargaining, TAFT
-Generate a relatively large kwic window around the key word UNION
-Create GLOVE Word Embeddings codes
-Run a REGEX code to view ALL instances of UNION or UNIONS
-Run a Sentiment Analysis through coding
-Use STM to TOPIC MODELl the text in the kwic windows, using a prevalence variable of your choosing
-Use estimateEffect, barplots, and any other methods to illustrate differences in topic proportions across your speeches
-Include UNION Strength statistical information to pair with the DFM: Union Membership from US Bureau of Labor Statistics ( https://www.bls.gov/news.release/union2.nr0.htm ) as well as pertinent data from (http://unionstats.gsu.edu/)
-Run any other code & models that defines the hypothesis, UNION STRENGTH, UNION MEMBERSHIP including histograms and visualizations
After the above R Code DataFrame is initiated, must complete the following:
-Include code and explanatory #comments for the following steps:
- Using either a continuous or binary dependent variable, run the appropriate regression, generate an output table, and interpret the results within the context of your research question.
- Now run regression for predictive purposes. Generate the relevant measure(s) of predictive performance and assess how well your model performed. Experiment with using more/different x’s and observe the difference, if any, in predictive performance.
- Add interpretation and explanation throughout your code with liberal use of #comments.
**R Code & Reporting of Results w/ Visualizations**