Quant Macro Investing

Risk Taking Disciplined

30 Resources to Find the Data You Need

http://flowingdata.com/2009/10/01/30-resources-to-find-the-data-you-need/

Let’s say you have this idea for a visualization or application, or you’re just curious about some trend. But you have a problem. You can’t find the data, and without the data, you can’t even start. This is a guide and a list of sources for where you can find that data you’re looking for. There’s a lot out there.

Universities

Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere.

DATA SOURCES:

News Sources

I’m sure you’ve seen a graphic in the paper or I guess more likely, on a news site, and wondered about another aspect of the data. Major news organizations always put their sources somewhere on the graphic or are mentioned in the accompanying article. It’s usually not a direct link, but a quick online search will get you to the right place. Sometimes, you’ll have to email someone to get the same data, but those people are usually happy that you’re interested in their data or analysis.

DATA SOURCES:

Geographic Data

Got some mapping software, but no geographic data? You’re in luck. There are plenty of shapefiles, etc. at your disposal.

DATA SOURCES:

  • TIGER – From the US Census Bureau, detailed data about roads, railroads, rivers, and zipcodes. Probably the most extensive you’re going to find.
  • OpenStreetMap – One of the best examples of data and community effort.
  • Geocommons – Both data and a map maker.
  • Flickr Shapefiles – Boundaries as defined by Flickr users.

Sports

America loves its sports, and thus, has decades of sports data. You’ll find it on Sports Illustrated or the sports organizations’ sites, but you’ll also find more on sites dedicated to the data.

DATA SOURCES:

World

There are several noteworthy international organizations that keep data about the world, mainly health and development indicators. It does take some sifting though, because a lot of the data sets are pretty sparse. It’s not easy to get standardized data across countries with varied methods.

DATA SOURCES:

Government and Politics

With the new administration, there’s been a fresh emphasis on data and transparency, so there are lot of government organizations that supply data. They’ve been doing this for a while, but with the launch of data.gov, much of the data is finding itself in one place. There are also plenty of non-governmental sites that aim to make politicians more accountable.

DATA SOURCES:

General Sources

You’re usually going to find the best data straight from the source, but there are lots of applications and sites that try to make all data easier to find or easier to access.

DATA SOURCES:

  • Freebase – Free data and a community effort. For some types, the data are kind of sparse, but it continues to get better.
  • Numbrary
  • Many Eyes – More of a visualization and exploratory site than for data, but they do have a data section.
  • Infochimps – Did you get your invite?
  • Swivel
  • Amazon Public Data Sets
  • DBpedia – Allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.
  • Wikipedia – Lots of HTML tables. Copy and paste in Excel.

Get it From an API

Plenty of sites and applications make their data freely available via APIs. Twitter has an API (duh). Google has lots of APIs. Yahoo does too. So on and so forth. Visit Programmable Web for a detailed catalog of what’s available.

Scrape the Data

When all else fails, you can always find a site that serves the data through HTML pages, and then scrape the data with Python, javascript, or whatever language you’re comfortable with. I use Python with the Beautiful Soup library that makes parsing pretty easy.

For example, I scraped weather data from Weather Underground a while back (although I don’t think the script works anymore). I also used it gather television sizes from CNet.

I’m still figuring out how to scrape AJAX-based sites though. I’d be happy to hear any tips from anyone who has experience with that.

Did I miss anything? Where do you get your data from?

October 16, 2009 - Posted by | Uncategorized

2 Comments »

  1. The Census Bureau is a valuable treasure when it comes to data gathering. In addition, the official websites of government departments, political parties, and international organizations all provide abundant amount of readily-accessible information for free. Some of my favorite sites include those of IMF, WTO, and FAO.
    Bloomberg.com is already a staple in economic research and stockcharts.com gives impressive graphic materials for stocks and market analysis.

    Comment by evening | October 16, 2009 | Reply

  2. Hi Guys, thanks for Vicktor Capital’s 30 source of data. Very interesting and useful. I also share my 8 favorites with you here:

    IMF
    http://www.imf.org/external/pubs/ft/weo/2009/02/index.htm
    Economic Outlook 2009: Highly recommended

    China Government Statistics
    http://www.stats.gov.cn/tjsj/

    BIS
    http://www.bis.org/statistics/index.htm
    provides plenty of data on financial institutions ACROSS different countries! Extremely useful in research! Especially when people talked about financial stability in the midst of the credit crisis last year!

    US Fed
    http://www.federalreserve.gov/econresdata/researchdata.htm

    There are plenty of official and prestigious information, maybe quicker than accessing in your bloomberg terminal! My fauvorite when I prepare US market perspectives

    Bloomberg
    http://www.bloomberg.com/markets/index.html?Intro=intro_markets
    Provide charts and cross border asset prices, I find the name search engine quite user friendly when you don’t know the bloomberg code of a certain asset. Have a try!

    Fundsupermart.com’s weekly update leading market indicators

    http://www.fundsupermart.com.hk/hk/main/research/viewHTML.tpl?articleNo=3140

    These indicators are very useful, free and updates periodically. Some of these indicators become the indicators of my trading rules.

    OPEC
    http://www.opec.org/home/
    At the right column you will find free monthly reports. You can find in the reports almost all the critical data about the demand and supply of the oil sector. It’s very comprehensive in describing the supply and demand players of the industry.

    Science / Technology

    I am interested in nano-technology research and I book marked this academic journal:

    http://www.springerlink.com/content/121100/?p=267adc4913a64ff7b32f6595df440372&pi=0

    Advantage: Recent nano-science breakthrough, free of charge! Open Access!

    If you guys have time, when you type my name “Angelo” in the search engine for that journal, then you will find my academic journal (An academic journal about the development of a catayst for automobile, cerium dioxide) published there as well!

    Comment by Angelo | October 16, 2009 | Reply


Leave a comment