- Who is the data set about?
- The data set concern the customer review of amazon that is a very competitive company ion the online business industry. The source was previously uploaded in AWS data base since 2017 with the following link https://s3.amazonaws.com/amazon-reviews-pds/readme.html. This dataset according to the information enticed to it have three different dimensions: the first one describes from 1995 to 2015 all the customer reviews available in Amazon.com market place. Besides the second dimension reveals how several customers from multiple languages all over the world appreciate products offered in the Amazon business platform. Finally, place has been given to display all the reviews that are not conform to the ethical framework of Amazon. Taken as it is, the data set itself is a list of over 28,000 consumer reviews for Amazon products like the Kindle, Fire TV Stick , and had been analyzed Datafiniti. it includes basic product information, rating, review text, and more for each product forming a total of 24 columns.
- Who were sampled in this data set?
- The sampling of the dataset concerns the reviews of Amazon products users and the information provided by this users sampling could be categorized as follow: The sampling of the dataset
concerns the reviews of Amazon products users and the information provided by
this users sampling could be categorized as follow:
0 id 1 dateAdded
2 dateUpdated 28332
- Who were over sampled or under sampled?
- There is no over sampling issue with our data. The data obtained had been sufficient to reach all the objectives fixed for the project
- What events, activities, behaviors, and observations etc. are recorded by the data set? Does the data set record the targeted events, activities, behaviors, etc. in Assignment 1? This is fundamentally about the variables.
The behavior observed in this dataset is to analyze how to interpret the reaction s of a sample of customers in relationship with a business. To achieve this goal some, question relative to how the reviewer actually recommend the product or not, and if whether the said reviewer purchase the product or not have been set to collect all those insightful impressions.
- When did the event, activity, behavior, and observation, etc. take place? When were the data collected?
The event collection had been spread over 4 years. In the occurrence, the Data source information reveals that they were collected from 2015 to 2019 in amazon database which give enough possibility to enhance the data analysis.
- Is it longitudinal or cross-sectional?
- This study as far as it might be analyzed could be classified as a cross
- Are they real time data? How old or fresh are the data?
Due to the fact that a collection had been spread over 4 years it will be very difficult to assert that it is a real time data. In fact, we can conclude that they are not real time data because they are
- Where did the event, activity, behavior, and observation, etc. take place? Where were the data collected if the information is available?
The events eventually had taken place on the Amazon online platform. however, as the company is a multinational it has provided the opportunity to extend the data collection over some different geographical coverage other than the USA. According to the sources the dataset is deemed to be identified to five several countries.
- What does the geographical coverage of the data set look like? Does the data set contain geographical information (GIS)? Is this a local, regional, national, or global data set?
Far to take in account all the whole data set available online, our analysis is just focused on all the element required to achieve the product prediction. For this purpose, the dataset selected for our analysis does not concede any geographical component.
- To what extent generalization can be made across settings to inform Assignment 1? This is fundamentally about geographic variables in the data set, and the external validity of the data set across settings.
- Taking in consideration the information provided for the previous section, since the dataset does not contain geographical variables, it may appear very difficult to realize generalization around any geographical variable.
- Why did the event, activity, behavior, or observation etc. take place? Why were the data collected?
This activity or observation was realized in order to help the e-commerce website to better off their customer service. In some extent, Amazon sales managers in their products management could know how they may adjust their sale experiences according to the consumers rating or reviews on their website. Gathering feedback from a company’s product purchasers is an outstanding activity that will help the sales force to evaluate its strength and weakness from the simple rating or reviews of their product. To conclude, it appears clearly that the main objective of the data collection is to help the marketing dispositive to have a prominent vision of the consumer sentiments of their products.
- How: If you would like, you can add a dimension of how. How did it happen? Sometimes, the answer to how can be covered by what, when and where.
- This event may be prompted by an envy to increase the sales. Sometimes, not only consumers analysis is deemed to increase sales but they may help to prevent the decrease also. Besides, report to the shareholders may help to evaluate the performance of the marketing strategies. Moreover, this type of analysis could help financial institutions to entrust the company with additional pecuniary resources.