Models for API driven startups built around public data
By Kin Lane
I had a conversation with a venture capitalist recently who was looking for information on startups who had APIs and had built their company around public data.
The two companies that were referenced in the original contact email were companies like Eligible API and Clever API: Two similar, yet very different, approaches to aggregating public and data into a viable startup (Clever used to aggregate data from school districts, but now just provides login services).
During our conversation, we talked about the world of APIs and open data, both government and across the private sector. I spent 30 minutes helping them understand the landscape, and told them that when I was done I would generate of list of APIs I thought were interesting, and that I would categorize into a similar space as Eligible and Clever, something that was much more difficult to quantify than I expected. Nonetheless, I learned a lot and, as I do with all my research, I wanted to share the story of my experience.
I started with the companies, that off the top of my head, had built interesting businesses around publicly available facts and data, a definition that would expand as I continued.
I started with a couple of APIs I know provide some common data sources (via APIs):
Next, I wanted to look at a couple of the business data APIs I depend on daily, and while I was searching I found a third. What I think is interesting about these business data providers is their very different business models and approaches to gathering and making the data available.
Immediately after looking through Crunchbase and OpenCorporates, I queried my brain for other leading APIs who are pulling content or data from public sources, and developing a business around them. It makes sense to look at the social data and content realm, but this is where I stop. I don’t want to venture too far down the social media rabbit hole, but I think that these two providers are important to consider.
While Twitter data isn’t the usual business, industry, product, and other common facts, it has grown to be the new data that is defining not just Twitter’s business, but a whole ecosystem of aggregators and other services that are built on consuming, aggregating and often publishing public raw or enriched social data.
I wanted to also step back again and look at Clever, and think about their pivot from aggregating school data to being a login service. There was another API that I was tracking on who offered a similar service to Clever, aggregating school data, that I think is important to list alongside with Clever.
As far as I know, both Clever and LearnSprout are adjusting to find the sweet spot in what they do, but I keep them on the list because of what they did when I was originally introduced to their services. I think we can safely say that there will be no shortage of startups to come, following in Clever and LearnSprout’s footsteps, unaware of their predecessors, and the challenge they face when aggregate data across school districts.
After taking another look at Clever, I also took another walkthrough at the Eligible API, and spent time looking for similar data driven APIs in the healthcare space. I think that Eligible is a shining example of what this particular VC was looking for, and a good model for startups looking to not just build a company and API around public data, but do it in a way that you can make a significant impact on an industry.
I know there are more healthcare data platforms out there, but these are a handful of ones that have APIs that I track. Healthcare is one of those heavily regulated industries where there is huge opportunity to aggregate data from multiple public and private sectors sources and build an API-driven business from.
After healthcare, my mind immediately moved into the world of energy data, because there is a task on my task list to study open data licensing as part of a conversation I’m having with Genability. I think what this latest wave of energy API providers, and the work they do with the data of individual customers, but also wider power companies, and state and federal data, is very interesting.
When I was in Maryland this last May, moderating a panel with folks from the Department of Energy, the conversation came up around the value of the Department of Energy data, to the private sector. I’d say that the Department of Energy data is in the top five agencies when it comes to viability for use in the private sector and making a significant economic impact.
Pushing the boundaries of this definition again, I stumbled onto the concept of launching APIs for libraries, built around public or private collections. While not an exact match to other APIs is this story, I think what DPLA is doing, reflects what we are talking about, and about building a platform around public and private datasets (collections in this case).
Just like government agencies, public and private institutions possess an amazing amount of data, content and media that is not readily available online and provides a pretty significant opportunity to build API driven startups and organizations around these collections.
There is a wealth of valuable scientific data being made available via public APIs, from various organizations, and institutions. I’m not sure where these groups are gathering their data from, but I’m sure there is a lot of public funding and sources included in some of the APIs I track.
These are just two of the numerous scientific data APIs I keep an eye on, and I agree that this is a little out of bounds of exactly for what we are looking for, however, I think that the opportunity for designing, deploying and managing high-performing, high-value APIs from publicly and privately-owned scientific data is pretty huge.
As I look at these energy, and scientific APIs across my monitoring system, I’m presented with other government APIs that are consumer focused and often have the look and feel of a private sector startup, while also having a significant impact on private sector industries.
While all of these APIs are clearly .gov initiatives, they provide clear value to consumers, and I think there is opportunity for startups to play around in offering complimentary, or even competing services with these government-generated, industry-focused open data–going further than these platforms do.
Alongside those very consumer, industry oriented government efforts, I can’t help but look at the quasi-government APIs I’m familiar with that are providing similar data-driven APIs to the government ones above.
While these may not be models for startups, I think they provide potential ideas that private sector non-profit groups can take action on. Providing mortgage, energy, environmental, or even healthcare services, developed around public and private sector data, will continue to grow as a viable business model for startups and organizations in coming years.
Watchers of government data
Adding another layer to government data, I have to include the organizations that keep an eye on government, a segment of organizations that have evolved around building operational models for aggregating, generating meaning from, then republishing data that helps us understand how government is working (or not working).
These are all nonprofit organizations doing this work, but when it comes to journalism, politics, and other areas, there are some viable services that can be offered surrounding, and on top of the valuable open data being liberated, generated and published by the watchers of our government.
School data again
One interesting model for building a business around government data is with Great School. There are some high-value datasets available at the Department of Education as a well as Census Bureau, and using these sources have presented a common model for building a company around public data:
I’m not exactly a fan of the Great Schools, but I think it is worthy of noting. I’ve talked with them, and they don’t really have as open of a business model and platform as I would like to see. I feel it is important to “pay it forward” when building a for-profit company around public data. I don’t have a problem with building businesses and generating revenue around public data, but if you don’t contribute to it being more accessible than you found it, I have a problem.
After spending time looking through the APIs I monitor, I remembered the use of public data by leading news sources. These papers are using data from federal, state and city data sources, and serving them via APIs, right alongside the news.
These news sources don’t make money off the APIs themselves. Like software-as-a-service providers, they provide value-add to their core business. Census surveys, congressional voting, economic numbers, and other public data is extremely relevant to the everyday news that impacts us.
Been doing this for a while
When we talk about building businesses around publicly available data, there are some companies who have been doing this a while. The concept really isn’t that new, so I think it is important to bring these legacy providers into the conversation.
Most of these data providers have been doing it for over a decade. They all get the API game and offer a wide range of API services for developers, providing access to data that is taken directly from, derived or enhanced from public sources. When it comes to building a business around public data, I don’t think these four have the exact model I’m looking for, but there are many lessons of how to do it right, and wrong.
Weather is age-old model
When you think about it, one of the original areas we built services around government data is weather. Weather data is a common reference you will experience when you hear any official talk about the potential of government data. There are numerous weather API services available that are doing very well when it comes to digesting public data and making it relevant to developers.
Weather is the most relevant API resource I know of in the space. Weather impacts everyone, making it a resource all web and mobile applications will need. With the growing concern around climate change, this model for using public data, and generating valuable APIs will only grow more important.
Time zone data
Right there behind weather, I would say that time and data information is something that impacts everyone. Time shapes our world and government sets the tone of the conversation when it comes to date and time data, something that is driving many API-driven business models.
What I like about time and date APIs is that they provide an essential ingredient in all application development. It is an example of how government can generate and guide data sources, while allowing the private sector to help manage vital services around this data, that developers will depend on for building apps.
Along with time zone data, currency conversion is a simple, valuable, API driven service that is needed across our economy. You have to know what time it is in different time zones, and know what the conversion rate between different currencies to do business in the global API economy.
In our increasingly global, online world, currency conversion is only going to grow more important. Workforces will be spread across the globe, and paying employees, buying goods and services will increasingly span the globe, requiring seamless currency conversion in all applications.
Another important area of APIs, that are increasingly impacting our everyday lives, are transit APIs, providing real-time bus, train, subway and other public transit data to developers.
Transit data will always be a tug of war between the public and private sector. Some data will be generated in each sphere, with some projects incentivized by the government, where the private sector is unwilling to go. Establishing clear models for public and private sector partnerships around transit data will be critical to society functioning.
While I won’t be covering every example of building a business around public data in this story, I would be remiss if I didn’t talk about the real estate industry, one of the oldest businesses built on public data and facts.
I’m not a big fan of the real estate industry. One of my startups in the past was built around aggregating MLS data, and I can safely say that the real industry is one of the shadiest industries I know of that is built on top of public data. I don’t think this industry is a model that we should be following, but again, I do think there are a huge lessons to be learned from the space as we move forward building business models around public data.
That is as far as I’m going to go in exploring API driven businesses built on public data. My goal wasn’t meant to be comprehensive, I was just looking to answer some questions for myself around who else is playing in the space.
This list of businesses came out of my API monitoring system, so is somewhat limited in its focus, requiring the company who is building on top of public data to also have an API, which creates quite a blind spot for this research. However, this is a blind spot I’m willing to live in, because I think my view represents the good in the space, and where we should be headed.
Open Data 500
Their description from the site sums up the project:
The Open Data 500 is the first comprehensive study of U.S. companies that use open government data to generate new business and develop new products and services. Open Data is free, public data that can be used to launch commercial and nonprofit ventures, do research, make data-driven decisions, and solve complex problems.
I’m pretty convinced that we have a lot of work to do in making government machine-readable data at the federal, state, county and city level more available before we can fully realize the potential of the API economy.
Without high quality, real-time, valuable public data, we won’t be able to satisfy the needs of the next wave of web, single page, mobile and Internet of things application developers. I’m also hoping we can work to establish some healthy blueprints for developing private sector businesses and organization around public data, by reviewing some of the existing startups who are finding success with this model, and build on, or compliment this existing work, rather than re-invent the wheel.