Motivation

The recent financial crisis led the world into what is now called the Great Recession, in reference to the Great Depression of the 1930s. Economic recovery is still nascent in some parts of Europe. Shortcomings in the working of capital markets undermined corporate investment and spread unemployment. Hence, they rose inequalities, harmed well-being of citizens, and built mistrust vis-àvis decision makers and scientists. These fallouts affected society’s innovation and openness. The European Commission has identified investment, growth, and creation of jobs as the key objectives of its agenda. To reach these objectives, it brings forward further policy initiatives such as the EU capital markets union to improve access to capital for businesses, especially SMEs.

The European Commission has identified sound scientific evidence as a key element of policy-making at all levels of the European process. Yet, the European huge research potential in social sciences has not been entirely realised due to a lack of empirical works. The weak empirical foundations of the analytical models used to analyse structural and cyclical changes has become obvious in the fierce debate among scholars following the financial crisis. One of the main reasons for this shortcoming is the scarcity of detailed historical high-quality firm level data for Europe available to test these models, which are crucial to understand the interactions between financial, economic, and social evolutions. This scarcity is particularly glaring at the European level: policy for the future must be aware of both the dynamics inherited from the past and the directions these dynamics are structuring the present.

The “Big data” revolution in historical social sciences will bring crucial progress in and fundamental revisions of current knowledge of the EU economy. The scaling-up in both variety and size of available historical high-quality economic and financial data for Europe has the potential for a major epistemological rupture. On the one hand, datasets are usually build to test models and bring answers to specific questions, but in the new paradigm trends and patterns in the data will emerge due to data mining techniques independently from preliminary research questions: History is a boundless natural laboratory for a host of economic and financial experiences. On the other hand, the building of historical “born-on-paper” big data – as opposite to “born-digital” data – represent a unique occasion to bring down barriers not only between social sciences but also between social and “hard sciences” in the context of interdisciplinary digital humanities.

USA has been investing enormous resources to build and link databases suited for research over the long run. The Collaborative for Historical Information and Analysis (CHIA) links academic and research institutions to sustain a Human System Data Resource, connecting variables to analyze many areas of human experience. The Wharton Research Data Services (WRDS) provides the user with one location to access over 250 terabytes of data across multiple disciplines including accounting, banking, economics, healthcare, insurance and marketing. The Center for Research in Security Prices (CRSP), the most widely used database in finance, contains prices and dividends for shares listed on the New York Stock Exchange since 1926. A Google Scholar search on scientific papers that have used CRSP data returns nearly 46,000 hits including many papers by Nobel prizes. But the use of US data precludes any understanding of the features of the European economy. Because of the USA’s dominant position in data production, American companies are frequently and implicitly deemed “representative” or “the norm”. Lessons are consequently drawn from their behaviour that are supposed to be applicable everywhere, generating many biases.

Today, only a very few large stand-alone databases have been built so far on Europe by both academic community (e.g. the London Share Prices Database of the London Business School) and private companies (e.g. the US Datastream), without any concern for interoperability. Within the academia, considerable resources have been devoted to data building, very often with the aim to study very specific issues. Such datasets are without any systematic comparative or diachronic analytic purpose and have no concept for cumulativity or sustainability. Hence, it is nearly impossible to compare the existing fragmented datasets. Moreover, a continuing access is not guaranteed, the data dissemination being often left at discretion of these individuals. Consequently, with no permanent infrastructure that cares about harmonization and access, these data are in most of the cases lost to the community.

On the other hand, the very few historical series that are contained in some commercial databases are sometimes unsuitable for research, but daily used by business and academia. They can give rise to serious errors, because they are poorly documented and flawed because based on easy-to-find but inappropriate sources. The building of such data requires sharp interdisciplinary skills, some of which are specific to a country, or even to a region, because of the heterogeneity of historical business rules and practises. These peculiarities call for an ad hoc Research Infrastructure (RI) able to connect to other already existing ones.