Obviously AI includes a variety of publicly available datasets which you can use for predictions. These datasets will not only help you to to deal with data crunch, but will also be super useful to get an idea of the features/columns that you should be looking at while building your own dataset for your use case.

Superstore Retail Dataset

Artificial intelligence (AI) and machine learning is set to transform the retail industry driving deeper insights into customer behavior, operations, finances, and human resources. Obviously AI gives your retail organization the tools needed to accurately forecast demand and inventory, better understand customer behavior, and optimize staffing, helping you dominate your market and delight customers. This data has been collected from a global superstore and contains data for the past 4 years. This data would be super helpful if you want to predict Sales and forecast Demand.

It contains 50,000 rows and 23 columns.

The various columns that are included in this dataset are as follows:

1. row_id

2. order_id

3. order_date

4. ship_date

5. ship_mode

6. customer_id

7. customer_name

8. segment

9. city

10. state

11. country

12. market

13. region

14. product_id

15. category

16. sub_category

17. product_name

18. order_priority

19. quantity

20. discount

21. profit

22. shipping_cost

23. sales

Airbnb Homes Dataset

This Dataset contains the information of the various Airbnb home listings in New York City for the year 2019. It includes the columns like location of the listing, host name, price, geographical coordinates, reviews etc. This dataset can be extremely useful to predict the price of the listing given the location, reviews, type of rooms.

The dataset consists of 48,895 rows and 13 columns.

The features that are included in this dataset are as follows:

  1. id
  2. name
  3. host_name
  4. neighbourhood_group
  5. neighbourhood
  6. room_type
  7. last_review
  8. availability_365
  9. minimum_nights
  10. number_of_reviews
  11. reviews_per_month
  12. calculated_host_listings_count
  13. price

Marketing Campaign Dataset

A marketing campaign involves promoting the products through various channels like newspapers, promotions, television ads etc. Marketing is extremely important for a product to be successful. Targeting the right and high value customers seems to be a challenge. Predictive Analytics and machine learning help to tackle this issue by finding patterns in the buying behavior, customer demographics, and helps identify high-value customers and retain them. The customer and marketing analytics help increase growth and profitability.

The data consists of 9134 rows and 24 columns.

The features/columns included in this dataset are:

  1. Customer
  2. State
  3. Customer Lifetime Value
  4. Response
  5. Coverage
  6. Education
  7. Effective To Date
  8. EmploymentStatus
  9. Gender
  10. Income
  11. Location Code
  12. Marital Status
  13. Monthly Premium Auto
  14. Months Since Last Claim
  15. Months Since Policy Inception
  16. Number of Open Complaints
  17. Number of Policies
  18. Policy Type
  19. Policy
  20. Renew Offer Type
  21. Sales Channel
  22. Total Claim Amount
  23. Vehicle Class
  24. Vehicle Size

Avocado Prices

Avocados are one of the most popular fruits and are cultivated in tropical and Mediterranean climates throughout the world. According to Transparency Market Research (TMR), the global avocado market was valued at $13.64 billion in 2018 and is predicted to attain an overall value of $21.56 billion by 2026.

This dataset has 18,249 rows and 11 columns.

The features/columns included in this dataset are:

  1. ID
  2. Date
  3. AveragePrice
  4. Total Volume
  5. Total Bags
  6. Small Bags
  7. Large Bags
  8. XLarge Bags
  9. type
  10. year
  11. region

FIFA Players

Football is one of the most popular sport and a widely played game in Europe and South America. The use of artificial intelligence and machine learning has been increasing in Sports Analytics. Sports analytics is a field that applies data science techniques to analyze various components of the sports industry, such as player performance, business performance, recruitment, and more.

This data contains information about the players and their demographics such as their clubs, height and weight, and various performance parameters. This can be super useful to compare the performance of various players and make a prediction of which players is in good form and is likely to perform well in the game.

This dataset consists of 18,207 rows and 52 columns.

Some of the features/columns included in this dataset are:

  1. id
  2. player_name
  3. age
  4. nationality
  5. overall
  6. potential
  7. club
  8. wage
  9. special
  10. preferred_foot
  11. international_reputation
  12. weak_foot
  13. skill_moves
  14. body_type
  15. position
  16. jersey_number
  17. height
  18. weight
  19. crossing
  20. finishing

NIFTY 500

The NIFTY is a benchmark stock market index that represents the largest companies listed on the National Stock Exchange (NSE). It is one of the main stock indexes used in India. NIFTY 500 represents the top 500 companies in India's National Stock Exchange (NSE) based on market capitalization and average daily turnover. It represents 94% of free float market capitalization of stocks listed on NSE.

The Nifty 500 Dataset consists of 500 rows and 14 columns.

The features/columns included in this dataset are:

  1. company
  2. industry
  3. symbol
  4. category
  5. market_cap
  6. current_value
  7. high_52week
  8. low_52week
  9. book_value
  10. price_earnings
  11. dividend_yield
  12. roce
  13. roe
  14. sales_growth_3yr

Restaurant Data

This dataset contains the information about the various restaurants based in Bengaluru, India. Bengaluru is considered as Silicon Valley of India and consists of tons of restaurants serving cuisines from different parts of the world. This data will be extremely helpful to get important insights about the kind of food popular in different neighborhoods, the kind of cuisines people prefer, relationship between affordability and popularity.

This Dataset consists of 10,000 rows and 14 columns.

The features/columns included in this dataset are:

  1. restaurant_name
  2. address
  3. location
  4. phone
  5. type
  6. cost
  7. online_order
  8. book_table
  9. rating
  10. votes
  11. dish_liked
  12. cuisines
  13. meal_type
  14. meal_city

Login to your account to use the Data Store.

Did this answer your question?