Who is the data set about?
The dataset contains images of faces of people from different angles and we have 15 points on the face. There are three points for each eye, two for each eyebrow, one for the tip of the nose and 4 for lips. The CSV file has data as (x,y) coordinates on the image. The columns are names left and right for the left and right parts of the face. The image data is also stored in the CSV file and sized at 94*94pixel.
Who were sampled in this data set?
Here are the 15 keypoints we have in our dataset: left_eye_center, right_eye_center, left_eye_inner_corner, left_eye_outer_corner, right_eye_inner_corner, right_eye_outer_corner, left_eyebrow_inner_end, left_eyebrow_outer_end, right_eyebrow_inner_end, right_eyebrow_outer_end, nose_tip, mouth_left_corner, mouth_right_corner, mouth_center_top_lip, mouth_center_bottom_lip In some examples, some of the target keypoint positions are missing (encoded as missing entries in the csv, i.e., with nothing between two commas).
Who were over sampled or under sampled? Are they representative of the main characters in Assignment 1?
Here, the dataset is balanced. So, there is no problem of over sampling or under sampling of the data.
Is there any identifiable information or is there any risk of disclose identifiable information? This is fundamentally about the sampling issue, and anonymity.
Yes, there is identifiable information in the dataset. It contains images of the people.
What events, activities, behaviors, and observations etc. are recorded by the data set?
Our observations from this dataset are that we have images of different faces of both male and female with key points on face. Data is stored in CSV format. There are different face key points like: eye, eyebrow, nose, lips.
Does the data set record the targeted events, activities, behaviors, etc. in Assignment 1? This is fundamentally about the variables.
Yes, data set does contain reproducible images that can have emotions (or behaviors) of people
When did the event, activity, behavior, and observation, etc. take place? When were the data collected?
The images of the people were collected randomly. These images belong to people of different generations. So, for these random images and they have plotted the x and y co- ordinates of the key points.
Is it longitudinal or cross-sectional?
Data is cross-sectional
Are they real time data?
For testing purpose, we have used our team member images. But this model can be used in mobile apps with front camera.
How old or fresh are the data? To what extent generalization can be made across time to inform Assignment 1? This is fundamentally about the temporal structure of the data set, and the external validity of the data set across time.
The Training data set is from the Kaggle link mentioned here (https://www.kaggle.com/c/facial-keypoints-detection) . However, we can feed any new test data to validate the model. We did pass our own images to test it.
Where did the event, activity, behavior, and observation, etc. take place? Where were the data collected if the information is available? What does the geographical coverage of the data set look like?
Event, activity or behavior of the dataset is unknown. It does have images from different colors of people that covers wide range of dataset
Does the data set contain geographical information (GIS)? Is this a local, regional, national, or global data set?
No, the dataset doesn’t contain any geographical information.
To what extent generalization can be made across settings to inform Assignment 1? This is fundamentally about geographic variables in the data set, and the external validity of the data set across settings.
The model does a very good generalization because we have implemented data augmentation technique that is accounting for real-time dataset. The technology when improved can even operate through car windshields – opening up the possibility that it could one day be used in conjunction with traditional Automatic number-plate recognition enforcement to automatically identify the driver of a vehicle or person in an unauthorized site or mood of the person depending on the facial features.
Why did the event, activity, behavior, or observation etc. take place? Why were the data collected?
This data is collected in order to determine the facial key points as the dataset contains only the images and the corresponding coordinates of facial key points.
How: If you would like, you can add a dimension of how. How did it happen? Sometimes, the answer to how can be covered by what, when and where.
We are trying to add an interactive filter like snapchat once we are done with key facial feature recognition.