Weekly Insight

Understanding Data

There are few topics that cause people to glaze over more than discussions about data. Even for those interested in data, discussions quickly shift into acronyms and lingo that is not intuitive and difficult for laypeople to follow. 

I find discussions about data frustrating, particularly within the realm of government and public policy. Even defining the term is a difficult task. The word “data” has a vast range of meanings, and it is difficult to understand which one is applicable.

Is today’s presenter talking about “data” in the scientific sense of collecting information? Or does she mean “data” in the sense of computer coding information for the efficient transmission of information? Or does she really mean “data analysis,” or the process of looking at data to evaluate problems and affect policy making?

And just when it seems we have a complete understanding of “data,” something shifts and we are again in muddy waters. 

Primer of the Primers

The initial challenge of defining the term “data” is complicated by an evolution that makes the word encompass more and more subtle variations. Apparently I am not alone in struggling with the meaning: the word is among Merriam-Webster’s top 1 percent of lookups.

For those looking for a good primer on data, here are some useful options:

  • The Data Journalism Handbook provides a good framework for becoming data-literate. It is a guide to the numbers and statistics side of data, but it is also a useful review source.
  • Data One, an organization collecting all sorts of information related to the earth, provides a more technical primer, focused around data management.

Each of these resources provides some general orientation around data, but I struggled to get a sense of how to better relate to data and its effects on policy. I reached out to Zach Ambrose, a consultant on public data and a former chief of staff to former Governor Bev Perdue, to get a better understanding of how public entities use data. Ambrose provided several interesting frames that give good perspectives on the state of public data in North Carolina.   

Governments have different roles and strengths in the public data framework

The local, state, and federal governments each have a different role to play in the public data frameworks.

Local governments have access to some of the most fine-tuned sets of data. They have information on where bus stops are located and where trash cans are picked up. They have information on specific citizens and properties that allow for useful combinations of data. Within the larger subset of local governments there are both the most and least sophisticated collectors public data, which makes interlocal collaboration challenging.

The state and federal governments are well-situated for collecting broad swaths of information that is easier to understand at high levels than at individualized levels. The federal government has steadily improved its data collection capabilities and has a robust data portal for accessing a wide variety of data types.  The Census Bureau, the Bureau of Labor Statistics and the Bureau of Economic Analysis are heavily utilized sources of public data and have evolved to utilize improvements in technology.

At the state level, the Log Into North Carolina (LINC) portal provides easy access to a wide variety of state and federal data sets. In addition, the NC One Map Program is an important effort to create common standards for geospatial data (i.e., map-making.) 

The Hamilton Project, a joint venture by the American Enterprise Institute and Brookings, produced a recent report and hosted a conference discussing how federal data sets are well-used in the both the public and private sector.

At the state level, there are solid accomplishments and low-hanging opportunities

One of the areas where North Carolina has developed a strong data presence is with the Criminal Justice Law Enforcement Automated Data Services (CJLEADS) project. Partnering with SAS, the state’s Department of Information Technology combines criminal justice data from different sources and helps law enforcement professionals connect know more about an offender’s background.  

A significant impetus for CJLEADS came from the 2008 murder of UNC Student Body President Eve Carson.  The perpetrators in that tragedy had been on probation, in part because law enforcement officials did not know about their full criminal records. CJLEADS was designed to provide more efficient sharing of that kind of criminal information.

“CJLEADS combines, in a way never before, multiple pieces of information that at one time could have taken days to find. This can never be translated into dollars, but it can certainly translate into a safer state.” Franklinton Police Department officer on CJLEADS

CJLEADS is administered from within the N.C. Government Data Analytics Center (GDAC), a multi-agency enterprise that facilitates the sharing information from multiple agencies and in ways that can create efficiencies and lead to better policy results. GDAC is the chassis around which lots of information sharing activities can be housed.

Ambrose noted that health policy is an area where multi-agency coordination can have a big impact fairly quickly.  Through GDAC, the state is administering data-sharing pilots on Medicaid and child protective services.

Government data efforts face consistent hurdles

When I asked Ambrose about the challenges facing North Carolina’s advancement in data-related activities, he noted that there are four hurdles that governments regularly face:

Funding is a challenge, though perhaps the simplest to overcome.

Many data-related initiatives take time—well beyond the two and four-year political cycles. That can make it hard for a project to rise on the priority list.

Government lacks the skill sets, both technically and from the policymaking perspective.

The effective use of data in decision-making requires expertise on the front end— understanding how to build the systems that will produce intended results. While becoming increasingly available in education curricula, these skill sets are not widely prevalent in the state’s workforce. Those jobs require training and higher salaries, which can make them hard to come by.  

Once the systems get built and agencies are able to run reports based on the combination of new information, there have to be people on the receiving end who know how to analyze and interpret the information.  Data driven decision-making is not always intuitive and requires employees to have or develop a clear skill set. 

Data quality issues and record keeping standards are a problem.

Developing common standards and then reorienting the information in consistent way is time intensive and expensive. Many state agencies have moved towards digital record keeping and information storage. That is a clear first step, but how agencies store the data and consistently record information is also a challenge. Many times the systems used to collect the data are different. In addition, information is often organized differently and gathered in different ways. Many agency efforts around digitization occurred prior to GDAC, so those efforts were not done with common standards that can be applied easily across agencies. 

Data sharing has risks that need to be carefully managed.

Some of the information that could be most helpful to policymaking is not fully public. Personal health records, for example, could be useful in addressing the opioid problem and understanding how certain drugs are being prescribed and filled. But that information is highly protected. Personal financial information could also be useful in certain policy contexts but sharing that information raises privacy concerns. At its root, the improved use of data is about sharing information, but the sharing itself can be the source of concern and requires thoughtful risk mitigation protocols.

Open data exercises are enhancing community and democracy 

Open data” is an area that is probably worthy of its own column. Put simply, open data endeavors are ones where entities that collect data and make information freely available to the public, often around projects or events specifically designed to inspire new ideas for utilizing data. At the project level, Wake County has provided information about restaurant sanitation grades to restaurant rating service Yelp for inclusion in the Yelp platform. As a result, when you consider that highly-rated greasy spoon, you can do a quick check to make sure that it has sanitation compliance.

Similarly, competition events like NC Datapalooza offer space for organizers to provide pre-selected sets of information to participants in a competitive format to see who can come up with the most innovative way to analyze and use the data. These sorts of exercises are win-win propositions.  For the government providing the data, they provide an incentive to digitizing and formatting the data in a way that can be used by others.  In addition, the government provider also is able to leverage outside talent to help think about problems in a low-cost environment. For the participants in the event, it is civic engagement in its most raw form and often provides distinctive opportunities for innovation and product creation.

US Department of Education Datapalooza, 2014

 

My insecurity about data knowledge does not seem to be unique. At all levels, education curricula are evolving and developing to help teach data literacy and to help citizens at all level be more in tune with these resources that technology and computing have allowed. The potential for data in public policy is vast but unlike some infrastructural issues, implementation can be iterative. We can continue to take measured steps towards discrete and manageable data projects as we move towards realizing the full potential of public information.

 

AHH headshot-2016

Andrew Holton is a board member and contributor to the N.C. Center for Public Policy Research.

Leave a comment

Previous article

Insight Explainer: Autonomous vehicles

by Andrew Holton on May 26, 2017

Next article