- Who is the data set about? Who were sampled in this data set? Who were over sampled or under sampled? Are they representative of the main characters in Assignment 1? Is there any identifiable information or is there any risk of disclose identifiable information? This is fundamentally about the sampling issue, and anonymity.
- Here we use the datasets for collecting reviews from hotel guests which help us understand and add better features in our application
- What events, activities, behaviors, and observations etc. are recorded by the data set? Does the data set record the targeted events, activities, behaviors, etc. in Assignment 1? This is fundamentally about the variables.
- Here we extract the reviews from the CSV files of the 2 references and do a sentiment analysis on it using vader sentiment analysis tool
for reviews and focus on the negative reviews to find out the missing or new feature and help the hotel guests are looking for.
- When did the event, activity, behavior, and observation, etc. take place? When were the data collected? Is it longitudinal or cross-sectional? Are they real time data? How old or fresh are the data? To what extent generalization can be made across time to inform Assignment 1? This is fundamentally about the temporal structure of the data set, and the external validity of the data set across time.
- We use the following 2 datasets which are CSV files Here we get the data about 1000 hotels and their reviews including hotel location, name, rating, review data, title, username and more.
- Where did the event, activity, behavior, and observation, etc. take place? Where were the data collected if the information is available? What does the geographical coverage of the data set look like? Does the data set contain geographical information (GIS)? Is this a local, regional, national, or global data set? To what extent generalization can be made across settings to inform Assignment 1? This is fundamentally about geographic variables in the data set, and the external validity of the data set across settings.
- We plan on collecting the review statistics to know the guests’ response and their
experience according to the hotel, location and facilities offered.
- Why did the event, activity, behavior, or observation etc. take place? Why were the data collected?
- .We collected data from two various data sets, Kaggle and datafiniti Hotel reviews and Sentiment analysis with hotel reviews from Kaggle
- How: If you would like, you can add a dimension of how. How did it happen? Sometimes, the answer to how can be covered by what, when and where.