Obviously AI includes a variety of publicly available datasets which you can use for predictions. These datasets will not only help you to to deal with data crunch, but will also be super useful to get an idea of the features/columns that you should be looking at while building your own dataset for your use case.

Superstore Retail Dataset

Artificial intelligence (AI) and machine learning is set to transform the retail industry driving deeper insights into customer behavior, operations, finances, and human resources. Obviously AI gives your retail organization the tools needed to accurately forecast demand and inventory, better understand customer behavior, and optimize staffing, helping you dominate your market and delight customers. This data has been collected from a global superstore and contains data for the past 4 years. This data would be super helpful if you want to predict Sales and forecast Demand.

It contains 50,000 rows and 23 columns.

The various columns that are included in this dataset are as follows:

1. row_id

2. order_id

3. order_date

4. ship_date

5. ship_mode

6. customer_id

7. customer_name

8. segment

9. city

10. state

11. country

12. market

13. region

14. product_id

15. category

16. sub_category

17. product_name

18. order_priority

19. quantity

20. discount

21. profit

22. shipping_cost

23. sales

Airbnb Homes Dataset

This Dataset contains the information of the various Airbnb home listings in New York City for the year 2019. It includes the columns like location of the listing, host name, price, geographical coordinates, reviews etc. This dataset can be extremely useful to predict the price of the listing given the location, reviews, type of rooms.

The dataset consists of 48,895 rows and 13 columns.

The features that are included in this dataset are as follows:

  1. id

  2. name

  3. host_name

  4. neighbourhood_group

  5. neighbourhood

  6. room_type

  7. last_review

  8. availability_365

  9. minimum_nights

  10. number_of_reviews

  11. reviews_per_month

  12. calculated_host_listings_count

  13. price

Marketing Campaign Dataset

A marketing campaign involves promoting the products through various channels like newspapers, promotions, television ads etc. Marketing is extremely important for a product to be successful. Targeting the right and high value customers seems to be a challenge. Predictive Analytics and machine learning help to tackle this issue by finding patterns in the buying behavior, customer demographics, and helps identify high-value customers and retain them. The customer and marketing analytics help increase growth and profitability.

The data consists of 9134 rows and 24 columns.

The features/columns included in this dataset are:

  1. Customer

  2. State

  3. Customer Lifetime Value

  4. Response

  5. Coverage

  6. Education

  7. Effective To Date

  8. EmploymentStatus

  9. Gender

  10. Income

  11. Location Code

  12. Marital Status

  13. Monthly Premium Auto

  14. Months Since Last Claim

  15. Months Since Policy Inception

  16. Number of Open Complaints

  17. Number of Policies

  18. Policy Type

  19. Policy

  20. Renew Offer Type

  21. Sales Channel

  22. Total Claim Amount

  23. Vehicle Class

  24. Vehicle Size

Avocado Prices

Avocados are one of the most popular fruits and are cultivated in tropical and Mediterranean climates throughout the world. According to Transparency Market Research (TMR), the global avocado market was valued at $13.64 billion in 2018 and is predicted to attain an overall value of $21.56 billion by 2026.

This dataset has 18,249 rows and 11 columns.

The features/columns included in this dataset are:

  1. ID

  2. Date

  3. AveragePrice

  4. Total Volume

  5. Total Bags

  6. Small Bags

  7. Large Bags

  8. XLarge Bags

  9. type

  10. year

  11. region

FIFA Players

Football is one of the most popular sport and a widely played game in Europe and South America. The use of artificial intelligence and machine learning has been increasing in Sports Analytics. Sports analytics is a field that applies data science techniques to analyze various components of the sports industry, such as player performance, business performance, recruitment, and more.

This data contains information about the players and their demographics such as their clubs, height and weight, and various performance parameters. This can be super useful to compare the performance of various players and make a prediction of which players is in good form and is likely to perform well in the game.

This dataset consists of 18,207 rows and 52 columns.

Some of the features/columns included in this dataset are:

  1. id

  2. player_name

  3. age

  4. nationality

  5. overall

  6. potential

  7. club

  8. wage

  9. special

  10. preferred_foot

  11. international_reputation

  12. weak_foot

  13. skill_moves

  14. body_type

  15. position

  16. jersey_number

  17. height

  18. weight

  19. crossing

  20. finishing


The NIFTY is a benchmark stock market index that represents the largest companies listed on the National Stock Exchange (NSE). It is one of the main stock indexes used in India. NIFTY 500 represents the top 500 companies in India's National Stock Exchange (NSE) based on market capitalization and average daily turnover. It represents 94% of free float market capitalization of stocks listed on NSE.

The Nifty 500 Dataset consists of 500 rows and 14 columns.

The features/columns included in this dataset are:

  1. company

  2. industry

  3. symbol

  4. category

  5. market_cap

  6. current_value

  7. high_52week

  8. low_52week

  9. book_value

  10. price_earnings

  11. dividend_yield

  12. roce

  13. roe

  14. sales_growth_3yr

Restaurant Data

This dataset contains the information about the various restaurants based in Bengaluru, India. Bengaluru is considered as Silicon Valley of India and consists of tons of restaurants serving cuisines from different parts of the world. This data will be extremely helpful to get important insights about the kind of food popular in different neighborhoods, the kind of cuisines people prefer, relationship between affordability and popularity.

This Dataset consists of 10,000 rows and 14 columns.

The features/columns included in this dataset are:

  1. restaurant_name

  2. address

  3. location

  4. phone

  5. type

  6. cost

  7. online_order

  8. book_table

  9. rating

  10. votes

  11. dish_liked

  12. cuisines

  13. meal_type

  14. meal_city

Login to your account to use the Data Store.

Did this answer your question?