Data is the most important aspect of Machine Learning and Artificial Intelligence.

A good dataset is critical for Machine Learning and helps to make accurate predictions. A good Dataset should have fewer missing values and must essentially include columns which can influence the prediction column/Target variable.Obviously AI requires a dataset to be in a tabular format with rows and columns.

Let us take an example of the Customer Churn Dataset.
The Customer churn is influenced by various parameters like the usage, income, age, location, payments etc. So it is important to have the columns with have Demographic, Usage and Transactional information.

Customer Dataset

Also, we have certain Data Requisites to make sure the predictions are accurate and not biased.We expect the data to be stored in a single csv file or a table. The data should have fewer missing values and must have at least 100 rows and 5 columns.

Dataset Requirements

Obviously AI requires a structured dataset to get meaningful prediction outcomes. The dataset needs to be structured, but not necessarily clean. Meaning, it can have inconsistencies like text values in number columns OR empty cells.

We made a quick DIY checklist to ensure your data is well structured and machine learning ready.

For CSV files

Below is the checklist of pre-requisites for CSV files.

The file size is less than 25 MB.
The first row is column names.
The first column is an ID column.
File has a minimum of 1,000 rows and 5 columns.
File has very few empty cells.
File is in a .CSV format.
Here is a sample file for your reference.

For Databases

Below is the checklist of pre-requisites for connecting your database.

Ensure Obviously AI's IP address is whitelisted on your firewall. This can be found under Connection Requirements when adding the dataset.
The first column in your table is an ID column.
The table has a minimum of 1,000 rows and 5 columns.
The table has very few empty cells.
Was this article helpful?
Cancel
Thank you!