This article details the steps of how to navigate the platform easily after you successfully upload your data to your Obviously AI account.
Uploading your data to the platform is covered in details in our Adding Data articles. Please make sure to check them accordingly to ensure successful upload of your data to the platform.
First thing after uploading your dataset is the Review section. It gives a quick overview of the dataset details such as the size of the dataset, datatype of each column, %emptiness of each column, etc. Additionally, it also shows the graphical distribution of each column on the right. You can use the dropdown to look through the distribution of each column in your dataset. You can change the datatype of a column as well using the dropdown.
If you choose to “Save and Close”, this dataset gets saved and you can find it under “My Datasets”. Else you can choose to Continue button and will be directed to the simple view of your dataset (Advanced View OFF by default) before starting to build models on the platform. The main thing to check here is whether the Prediction column is correct or not. Else you can choose the correct Prediction column using the dropdown. The platform by default chooses the last column in your dataset as the Prediction column. Again, whatever column is chosen, the distribution graph of that column appears on the right. If the last column is indeed your prediction column, then you can directly start building your model by clicking on the Start Predicting button.
In case you want to toggle the Advanced View ON, you’ll be directed to the advanced view of your data on the platform. The main 3 things to look out for here are:
The top section shows some toggles such as Remove Outliers, Normalization, Upsample/Downsample. The default toggles are on for them. This is because it is standard industry practice to remove outliers from the data, normalize the data to achieve columns that are on the same scale and upsampling the data in case of imbalance dataset for classification problems.
The middle section displays all the columns present in your dataset. You can choose your identifier column, the platform chooses the first column as the id column and the last column as the prediction column by default. Thus they are prefilled as you can see below. All the other feature columns are present under Include.
If you want to remove any feature column you can drag it from Include and drop on Exclude. If you have any date columns in your dataset, then you can drag it to the Date Column area. Additionally each column is provided with filter and pin options. The filter option lets you choose a range of values for that particular column. The pin option let’s you forcefully include an otherwise not fit for use column
The Similar columns section shows the correlation between the columns in your data. If the correlation between two columns is too high then it is redundant to keep both the columns. Hence, it is advised to keep any one of them. Finally you can click on the Start Predicting button to move on with building the models
Time series datasets are typically assumed to have only two columns - a sequential Date column and a numeric prediction column. However our platform is extremely flexible, if you have more feature columns, you’ll be able to choose the required prediction column on the platform. This is also beneficial in case you want to use multiple numeric columns as prediction column.
Once you upload your time series data on the platform you are directed to the review section (here the screenshots are for both 2 columns and >2 columns time series datasets). Similar to AutoML, we see a brief review of the dataset size on the left. In the middle we have all the columns listed with their % emptiness, whether the column is fit for use and the distribution of each column on the right section. You can always use the dropdown to check through each column.
Once you hit Continue you’ll be directed to the next page. You can use the dropdown to choose a particular column as your prediction column. Else if you have only one numeric column it is automatically chosen as the prediction column by default
After checking your date and prediction column, you are required to choose the Data Level, Aggregation Function and Seasonality of your data. This depends on how the dataset is structured originally. The default values for Data Level, Aggregate Function and Seasonality are Month, Sum and 12 respectively.
Data Level: This gives you the option to choose between different levels of data for time series, such as, hourly, daily, weekly, monthly or quarterly. For example, if your original dataset is structured as monthly, then you can decide to have the data level as month or quarter
Aggregate Function: The aggregate function is sum or average. You can use either sum or average to aggregate your data. If you use sum data for consecutive dates would be summed, else it will be averaged
Seasonality: The value for seasonality depends on the data level and is auto filled on the platform whenever you choose a particular data level. Some data levels have multiple options for seasonality, that you can choose from the corresponding drop down accordingly
On the right we have the distribution of the prediction column. Thus you can just click on the Start Predicting button and create your model.
To see Obviously AI in action, checkout this demo video OR enroll in the No-Code AI University for free to become a certified no-code AI expert.