What about sentiments? Who is the most positive person in the bunch? ratio_posneg % Ggplot(aes(word, tf_idf, fill = speaker)) +įacet_wrap(~ speaker, scales = "free", nrow = 2) + Mutate(word = reorder_within(word, tf_idf, speaker)) %>% In the early seasons Ben still gets a lot of explaining to for when heīecame mayor at 18 and building Ice Town.īut although not entirely readable, Andy’s set of words is a true starįor me - what a set of random words: sugar rush, rock, walrus,įilter(speaker %in% c("LESLIE", "TOM", "ANN", "RON", "APRIL", "BEN", "ANDY")) %>% Pawnee and park, but also arch nemisis Greg Pikitis features. The td_idf scores show some nice things: Leslie obviously is the mostĬommunity orientied with words like: festival, forum, our, community, I wasn’t really interested in episode descriptions, but in theĬharacters - who says what a lot in comparison to others? Geom_col(alpha = 0.8, show.legend = FALSE) +įacet_wrap(~ episode, scales = "free_y", ncol = 3) + Ggplot(aes(word, tf_idf, fill = episode)) + Mutate(word = reorder_within(word, tf_idf, episode)) %>% I did not load the entire package, but just copied those two functions Plots with words that appear multiple times. Robinson that are pretty nifty when you want to make ordered faceted Noticed her using two functions from a personal package by David I’m on avid reader of Julia Silge’s blog on text analysis and I # 10 1x02 Canvassing ANN I think lunch is the most important. # 9 1x02 Canvassing MARK Its the most important meal of the day. # 5 1x02 Canvassing LESLIE Tomorrow night is our very first public forum ~ # 3 1x02 Canvassing LESLIE And the large meeting room for tomorrow night? Str_replace("\r\n\\s+", "") #removes the new line and extra spaces Str_replace("\\(.+\\)", "") %>% #removes anything between brackets #removing the first lines on every page that are just title text Extracting the data from pdf library(pdftools) Pawnee Parks department celebrates Galentine’s day with her… gals. On Friday February 13th, the day before Valentine, Leslie Knope from the That’s not even covering 5% of the series episodes, but let’s use whatīut first, what is Galentine’s day I hear you ask? Some pdf’s were image scans so data was not extractable. I found 6 scripts ofĮpisodes on the web from the first 3 seasons in pdf format that were usable. ![]() ![]() Needed scripts, which are a bit harder to come by. Those don’t contain data about the character who said the lines. Subtitles are available for pretty much every episode but Recreation series to celebrate Galentine’s Day but getting the data wasĪ struggle. I had this plan for text analysis by character from the Parks & Sometimes you dream big, but you just don’t have the data…
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |