Project Overview
- we have a list of keywords in the excel sheet
- we should check all the video titles and see if the keywords ever appear in the titles.
- to enable this, we load the keywords from excel to a dataframe first using Pandas package.
- then we reshape the column into a wide long string, using
tolist
function and.join
-
python has a string method str.contains(<YOUR PATTERN>)
to check if there is a match of strings separated by ****. -
** ** works as “or” syntax for string match. For example as long string – "|2022 bolsonaro|2022 doria|brazil|"
if the title of the video contains any of the 3 phrases, then it is considered as a match.
Code
from Ipython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
import pandas as pd
key = pd.read_csv("/home/tiger/br_election/brkeywords.csv")
key = key.iloc[:,0:2]
key.head(10)
keywords = '|'.join(key['Name'].tolist())
keywords[1:20]
- column are concated to longer string now.
# find match
elec = pd.read_csv("/home/tiger/br_election/election_video.csv")
elec = elec[elec['title'].str.contains(keywords)] -- pattern match
elec = elec.drop_duplicates()
- Done!!
edited by Jiashu miao edited by Jiashu Miao :) 08/12/2022