Towards an Open Data Framework

by noospheer

Today, an accelerating amount of data is released ‘open access‘. This trend is led by scholarly, scientific and governmental sources, yet is increasingly including corporate information. Currently, hundreds of institutions have signed the 2003 Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities.

The Department of Open Access Journals (DOAJ) indexes over 7000 journals containing 700,000+ reviewed articles. Additionally, Cornell University Library’s provides full access to over 700,000 physics, math and computer science articles. These and other open resources are priceless tools for cross-disciplinary research. Generally, articles hidden behind paywalls are priced $15-40, allowing access to abstracts and perhaps a preview. — Freedom of information accelerates innovation and breaks down traditional barriers to education.

Besides scholarly data, governments – primarily municipal – are opening data sets. World e-Governments Organization (WeGO) currently has 50+ member cities. In North America, Washington DC, Ottawa, SF, Edmonton, Vancouver and Toronto all publish raw data sets for online access. Furthermore, the US, UK and Canada freely provide federal information portals.

Despite the proliferation of open data, there is no common distribution platform to disseminate it ~ excluding the Internet as a basic platform itself. Government data sets are generally raw markup files such as .xml or .csv. Each set sits in a silo. These files are non-integrated and cumbersome to regularly update. Users are unable to pit one query against another. ex: election results vs. voting locations vs. income demographics.

In the academic case, open access indices offer direct .pdf downloads of full articles plus some searchable metadata – abstract, author, field, etc. In data-intensive research such as particle acceleration at the European Organization for Nuclear Research (CERN), geographically local scientists hold primary access to the majority of experiment data; whereas solely findings/conclusions are to be found online.

A decentralized, open source, high performance and browser accessible data integration platform is necessary to feed our knowledge-hungry age.

Access/use rights are a non-issue with open data, thus the key challenges are software related: Bringing together disparate sources, making all data structured and searchable, enabling a network as opposed to a centralized service, and providing granular access permissions for personal data in this potential ‘cloud’. These challenges cover a range of domains in information science including distribution and human-computer interaction.

As an R&D effort, noospheer (from noosphere) has focused on solving a myriad of technical problems in these domains. Its goal is to provide laypeople/researchers/activists/government /industry with a GPL-compatible means of data sharing, discovery and creation. The realization of such a system is hoped to foster live and direct collaboration across the traditional divides of income, position and nation.

We aim to achieve this goal by stitching together and creating novel, open technologies.