Obviously AI includes a variety of publicly available datasets which you can use for predictions. These datasets will not only help you to to deal with data crunch, but will also be super useful to get an idea of the features/columns that you should be looking at while building your own dataset for your use case.
Superstore Retail Dataset
Artificial intelligence (AI) and machine learning is set to transform the retail industry driving deeper insights into customer behavior, operations, finances, and human resources. Obviously AI gives your retail organization the tools needed to accurately forecast demand and inventory, better understand customer behavior, and optimize staffing, helping you dominate your market and delight customers. This data has been collected from a global superstore and contains data for the past 4 years. This data would be super helpful if you want to predict Sales and forecast Demand.
It contains 50,000 rows and 23 columns.
The various columns that are included in this dataset are as follows:
1. row_id
2. order_id
3. order_date
4. ship_date
5. ship_mode
6. customer_id
7. customer_name
8. segment
9. city
10. state
11. country
12. market
13. region
14. product_id
15. category
16. sub_category
17. product_name
18. order_priority
19. quantity
20. discount
21. profit
22. shipping_cost
23. sales
Airbnb Homes Dataset
This Dataset contains the information of the various Airbnb home listings in New York City for the year 2019. It includes the columns like location of the listing, host name, price, geographical coordinates, reviews etc. This dataset can be extremely useful to predict the price of the listing given the location, reviews, type of rooms.
The dataset consists of 48,895 rows and 13 columns.
The features that are included in this dataset are as follows:
- id
- name
- host_name
- neighbourhood_group
- neighbourhood
- room_type
- last_review
- availability_365
- minimum_nights
- number_of_reviews
- reviews_per_month
- calculated_host_listings_count
- price
Marketing Campaign Dataset
A marketing campaign involves promoting the products through various channels like newspapers, promotions, television ads etc. Marketing is extremely important for a product to be successful. Targeting the right and high value customers seems to be a challenge. Predictive Analytics and machine learning help to tackle this issue by finding patterns in the buying behavior, customer demographics, and helps identify high-value customers and retain them. The customer and marketing analytics help increase growth and profitability.
The data consists of 9134 rows and 24 columns.
The features/columns included in this dataset are:
- Customer
- State
- Customer Lifetime Value
- Response
- Coverage
- Education
- Effective To Date
- EmploymentStatus
- Gender
- Income
- Location Code
- Marital Status
- Monthly Premium Auto
- Months Since Last Claim
- Months Since Policy Inception
- Number of Open Complaints
- Number of Policies
- Policy Type
- Policy
- Renew Offer Type
- Sales Channel
- Total Claim Amount
- Vehicle Class
- Vehicle Size
Avocado Prices
Avocados are one of the most popular fruits and are cultivated in tropical and Mediterranean climates throughout the world. According to Transparency Market Research (TMR), the global avocado market was valued at $13.64 billion in 2018 and is predicted to attain an overall value of $21.56 billion by 2026.
This dataset has 18,249 rows and 11 columns.
The features/columns included in this dataset are:
- ID
- Date
- AveragePrice
- Total Volume
- Total Bags
- Small Bags
- Large Bags
- XLarge Bags
- type
- year
- region
FIFA Players
Football is one of the most popular sport and a widely played game in Europe and South America. The use of artificial intelligence and machine learning has been increasing in Sports Analytics. Sports analytics is a field that applies data science techniques to analyze various components of the sports industry, such as player performance, business performance, recruitment, and more.
This data contains information about the players and their demographics such as their clubs, height and weight, and various performance parameters. This can be super useful to compare the performance of various players and make a prediction of which players is in good form and is likely to perform well in the game.
This dataset consists of 18,207 rows and 52 columns.
Some of the features/columns included in this dataset are:
- id
- player_name
- age
- nationality
- overall
- potential
- club
- wage
- special
- preferred_foot
- international_reputation
- weak_foot
- skill_moves
- body_type
- position
- jersey_number
- height
- weight
- crossing
- finishing
NIFTY 500
The NIFTY is a benchmark stock market index that represents the largest companies listed on the National Stock Exchange (NSE). It is one of the main stock indexes used in India. NIFTY 500 represents the top 500 companies in India's National Stock Exchange (NSE) based on market capitalization and average daily turnover. It represents 94% of free float market capitalization of stocks listed on NSE.
The Nifty 500 Dataset consists of 500 rows and 14 columns.
The features/columns included in this dataset are:
- company
- industry
- symbol
- category
- market_cap
- current_value
- high_52week
- low_52week
- book_value
- price_earnings
- dividend_yield
- roce
- roe
- sales_growth_3yr
Restaurant Data
This dataset contains the information about the various restaurants based in Bengaluru, India. Bengaluru is considered as Silicon Valley of India and consists of tons of restaurants serving cuisines from different parts of the world. This data will be extremely helpful to get important insights about the kind of food popular in different neighborhoods, the kind of cuisines people prefer, relationship between affordability and popularity.
This Dataset consists of 10,000 rows and 14 columns.
The features/columns included in this dataset are:
- restaurant_name
- address
- location
- phone
- type
- cost
- online_order
- book_table
- rating
- votes
- dish_liked
- cuisines
- meal_type
- meal_city
Login to your account to use the Data Store.