Your Cart
Loading

Real world data sets




All Dataset Categories


🏥 HEALTHCARE & PUBLIC HEALTH

1. CDC WONDER Database

  • Link: https://wonder.cdc.gov/
  • What it is: Real mortality data, birth data, cancer statistics, environmental health data
  • Project ideas:
  • Analyze opioid death trends by county
  • Compare maternal mortality across demographics
  • Build a public health dashboard for your state
  • Why employers care: Healthcare analytics is a $50B+ industry

2. WHO Global Health Observatory

  • Link: https://www.who.int/data/gho
  • What it is: Health data from 194 countries - disease outbreaks, life expectancy, healthcare access
  • Project ideas:
  • COVID-19 response effectiveness comparison
  • Clean water access vs. disease prevalence
  • Mental health resource allocation analysis

3. Medicare Provider Utilization & Payment Data

4. MIMIC-III Clinical Database

  • Link: https://physionet.org/content/mimiciii/1.4/
  • What it is: De-identified health data from 40,000+ ICU patients
  • Requirements: Free registration, complete ethics course
  • Project ideas:
  • Patient readmission prediction
  • Treatment outcome analysis
  • Length-of-stay forecasting

5. HealthData.gov

  • Link: https://healthdata.gov/
  • What it is: 3,000+ healthcare datasets from US government
  • Includes: Hospital capacity, vaccination rates, nursing home data

💰 FINANCE & ECONOMICS

6. FRED (Federal Reserve Economic Data)

  • Link: https://fred.stlouisfed.org/
  • What it is: 800,000+ economic time series - GDP, unemployment, inflation, interest rates
  • Project ideas:
  • Recession prediction model
  • Inflation impact analysis
  • Housing market forecasting
  • API available: Yes, free

7. SEC EDGAR Filings

  • Link: https://www.sec.gov/edgar/searchedgar/companysearch
  • What it is: Every public company's financial filings
  • Project ideas:
  • Executive compensation trends
  • Industry financial ratio comparisons
  • Insider trading pattern analysis
  • Pro tip: Use sec-api.io for easier data extraction

8. World Bank Open Data

  • Link: https://data.worldbank.org/
  • What it is: Development indicators for every country
  • Project ideas:
  • GDP growth factor analysis
  • Education spending vs. outcomes
  • Climate finance tracking

9. Bureau of Labor Statistics

  • Link: https://www.bls.gov/data/
  • What it is: Employment, wages, prices, productivity data
  • Project ideas:
  • Wage growth by industry analysis
  • Automation impact on employment
  • Cost of living comparisons

10. Yahoo Finance API (via yfinance)

  • Link: https://pypi.org/project/yfinance/
  • What it is: Historical stock prices, dividends, financial statements
  • Project ideas:
  • Portfolio optimization
  • Sector performance analysis
  • Volatility prediction

11. European Central Bank Data

  • Link: https://sdw.ecb.europa.eu/
  • What it is: Eurozone economic and financial data
  • Project ideas:
  • Euro exchange rate analysis
  • Cross-country interest rate comparisons

🌍 ENVIRONMENT & CLIMATE

12. NASA Earthdata

  • Link: https://earthdata.nasa.gov/
  • What it is: Satellite imagery, climate data, atmospheric measurements
  • Project ideas:
  • Deforestation tracking
  • Urban heat island analysis
  • Air quality monitoring

13. NOAA Climate Data

  • Link: https://www.ncdc.noaa.gov/cdo-web/
  • What it is: Historical weather data, storm events, climate normals
  • Project ideas:
  • Extreme weather event trends
  • Agricultural impact modeling
  • Energy demand forecasting

14. EPA Environmental Dataset Gateway

  • Link: https://edg.epa.gov/
  • What it is: Air quality, water quality, toxic releases, Superfund sites
  • Project ideas:
  • Environmental justice analysis (pollution vs. income)
  • Water quality trend monitoring
  • Industrial pollution mapping

15. Global Forest Watch

  • Link: https://www.globalforestwatch.org/
  • What it is: Real-time forest monitoring data
  • Project ideas:
  • Deforestation prediction
  • Fire risk modeling
  • Carbon sequestration analysis

16. OpenAQ

  • Link: https://openaq.org/
  • What it is: Real-time air quality data from global monitoring stations
  • API available: Yes, free
  • Project ideas:
  • City pollution comparison dashboards
  • Health impact correlation analysis

🚗 TRANSPORTATION & URBAN

17. NYC Open Data

  • Link: https://opendata.cityofnewyork.us/
  • What it is: 2,000+ datasets - 311 calls, taxi trips, crime, permits
  • Project ideas:
  • Uber/Lyft vs. taxi analysis
  • 311 complaint prediction
  • Restaurant inspection analysis
  • BONUS: NYC Taxi Trip Data (billions of records)

18. Chicago Data Portal

  • Link: https://data.cityofchicago.org/
  • What it is: Crime, transportation, building permits, city services
  • Project ideas:
  • Crime hotspot prediction
  • Public transit optimization
  • Food inspection failure prediction

19. UK Road Safety Data

20. Bureau of Transportation Statistics

  • Link: https://www.bts.gov/
  • What it is: Airline on-time performance, traffic data, freight statistics
  • Project ideas:
  • Flight delay prediction
  • Airline performance ranking
  • Supply chain analysis

21. Divvy Bike Share Data (Chicago)

  • Link: https://divvybikes.com/system-data
  • What it is: Every bike share trip with timestamps and stations
  • Project ideas:
  • Demand forecasting
  • Station rebalancing optimization
  • User behavior analysis

22. Citibike NYC Trip Data

  • Link: https://citibikenyc.com/system-data
  • What it is: Millions of bike trips with full details
  • Project ideas:
  • Commuting pattern analysis
  • Tourism vs. commuter usage
  • Expansion planning analysis

⚽ SPORTS (Yes, Sports Analytics is Real)

23. FBRef (Football/Soccer)

  • Link: https://fbref.com/
  • What it is: Comprehensive soccer statistics
  • Project ideas:
  • Player valuation model
  • Team performance prediction
  • Transfer market analysis

24. Basketball Reference

  • Link: https://www.basketball-reference.com/
  • What it is: Complete NBA/WNBA/international basketball stats
  • Project ideas:
  • Player efficiency analysis
  • Salary cap optimization
  • Draft pick value analysis

25. Lahman Baseball Database

26. NFL Big Data Bowl

27. Understat (Expected Goals)

  • Link: https://understat.com/
  • What it is: Advanced soccer metrics with shot-level data
  • Project ideas:
  • xG model building
  • Player overperformance analysis

🗳️ GOVERNMENT & CIVIC

28. Data.gov (US Government)

  • Link: https://data.gov/
  • What it is: 300,000+ datasets from federal agencies
  • Categories: Agriculture, climate, education, energy, finance, health, public safety

29. data.gov.uk (UK Government)

  • Link: https://www.data.gov.uk/
  • What it is: UK government data across all departments
  • Project ideas:
  • Brexit impact analysis
  • NHS performance metrics
  • Crime and policing data

30. European Data Portal

  • Link: https://data.europa.eu/
  • What it is: Open data from EU institutions and member states
  • Project ideas:
  • Cross-country policy comparisons
  • EU funding allocation analysis

31. USASpending.gov

  • Link: https://www.usaspending.gov/
  • What it is: Every federal contract, grant, loan, and payment
  • Project ideas:
  • Government contractor analysis
  • Spending trend visualization
  • Regional funding distribution

32. OpenSecrets (Campaign Finance)

  • Link: https://www.opensecrets.org/open-data
  • What it is: Political donations, lobbying, PAC spending
  • Project ideas:
  • Donation pattern analysis
  • Industry lobbying trends
  • Election outcome correlation

📚 EDUCATION

33. IPEDS (Integrated Postsecondary Education Data)

  • Link: https://nces.ed.gov/ipeds/
  • What it is: Data on every US college and university
  • Project ideas:
  • ROI by major analysis
  • Graduation rate prediction
  • Tuition trend analysis

34. PISA (Programme for International Student Assessment)

  • Link: https://www.oecd.org/pisa/data/
  • What it is: Student performance across 80+ countries
  • Project ideas:
  • Education system comparison
  • Equity in education analysis
  • Factor analysis of student success

35. EdX/Coursera Research Data


🛒 E-COMMERCE & BUSINESS

36. Instacart Market Basket Data

37. Yelp Dataset

  • Link: https://www.yelp.com/dataset
  • What it is: Business reviews, user data, check-ins
  • Project ideas:
  • Sentiment analysis
  • Review helpfulness prediction
  • Business success factors

38. Amazon Product Reviews

39. Google Merchandise Store (GA4 Demo)


🌐 SOCIAL MEDIA & WEB

40. Reddit Comments Dataset

  • Link: https://pushshift.io/
  • What it is: Historical Reddit data (posts, comments, subreddits)
  • Project ideas:
  • Community sentiment analysis
  • Topic modeling
  • Viral post prediction

41. Wikipedia Page Views

  • Link: https://dumps.wikimedia.org/
  • What it is: Wikipedia article traffic, edits, content
  • Project ideas:
  • Trending topic detection
  • Edit war analysis
  • Knowledge graph building

42. Common Crawl

  • Link: https://commoncrawl.org/
  • What it is: Petabytes of web crawl data
  • Project ideas:
  • Web structure analysis
  • Content trend analysis
  • Note: Very large - requires cloud computing

43. Internet Archive


🏠 REAL ESTATE & HOUSING

44. Zillow Housing Data

  • Link: https://www.zillow.com/research/data/
  • What it is: Home values, rents, inventory by region
  • Project ideas:
  • Housing market prediction
  • Affordability analysis
  • Investment opportunity identification

45. Inside Airbnb

46. HMDA (Home Mortgage Disclosure Act)

  • Link: https://ffiec.cfpb.gov/data-browser/
  • What it is: Every mortgage application in the US
  • Project ideas:
  • Lending discrimination analysis
  • Approval rate prediction
  • Market trend analysis

🔬 SCIENCE & RESEARCH

47. NASA Exoplanet Archive

48. CERN Open Data

  • Link: http://opendata.cern.ch/
  • What it is: Particle physics data from Large Hadron Collider
  • Project ideas:
  • Particle classification
  • Event detection

49. GenBank (Genetic Sequences)

50. Protein Data Bank

  • Link: https://www.rcsb.org/
  • What it is: 3D structures of proteins
  • Project ideas:
  • Structure prediction
  • Drug target analysis

🎵 ENTERTAINMENT & MEDIA

51. Spotify Million Playlist Dataset

52. MovieLens

53. IMDb Datasets

  • Link: https://www.imdb.com/interfaces/
  • What it is: Movies, TV shows, cast, crew, ratings
  • Project ideas:
  • Box office prediction
  • Career trajectory analysis
  • Genre trend analysis

54. Last.fm Dataset

55. The Movies Dataset


🚨 CRIME & SAFETY

56. FBI Crime Data Explorer

57. NYPD Complaint Data

58. Police Data Initiative

59. Global Terrorism Database

  • Link: https://www.start.umd.edu/gtd/
  • What it is: Terrorist events since 1970
  • Project ideas:
  • Pattern analysis
  • Geographic trend analysis
  • Attack type classification

🔄 APIs FOR REAL-TIME DATA

60. OpenWeatherMap

61. News API

  • Link: https://newsapi.org/
  • Free tier: 100 requests/day
  • Project ideas: Sentiment analysis, topic tracking

62. Twitter API v2

63. Alpha Vantage (Stocks)

64. CoinGecko (Crypto)

65. Ergast (Formula 1)