Data is the most important aspect of Machine Learning and Artificial Intelligence.
A good dataset is critical in getting meaningful prediction results.
Obviously AI requires a dataset to be in a tabular format with rows and columns.
A row represents an instance and the column represents the attributes.
E.g: We can consider a customer churn dataset. Each row represents a customer and the columns represent the attributes of the customer like his name, phone number, current plan, total charges etc.
Types of Data for Machine Learning Model
Numerical Data: Numerical Data consists of numbers or essentially a quantity which can be measured. You can perform operations like addition, subtraction, average and so on. Examples of numeric data include Height, Weight, Age, etc.
Categorical Data: Categorical Data consists of labels are non-numeric. You can't perform any operations on them. Examples of categorical data include gender, race, state, country etc.
Date/Time: This Data essentially contains entries which represent Date or Time. Examples include 12/11/2020, 12:30:00 etc.
Obviously AI requires a structured dataset to get meaningful prediction outcomes. The dataset needs to be structured, but not necessarily clean. Meaning, it can have inconsistencies like text values in number columns OR empty cells.
We made a quick DIY checklist to ensure your data is well structured and machine learning ready.
For CSV files
Below is the checklist of pre-requisites for CSV files.
The file size is less than 25 MB.
The first row is column names.
The first column is an ID column.
File has a minimum of 1,000 rows and 5 columns.
File has very few empty cells.
File is in a .CSV format.
Here is a sample file for your reference.
Below is the checklist of pre-requisites for connecting your database.
Ensure Obviously AI's IP address is whitelisted on your firewall. This can be found under Connection Requirements when adding the dataset.
The first column in your table is an ID column.
The table has a minimum of 1,000 rows and 5 columns.
The table has very few empty cells.