Open Bioinformation in the Life Sciences as a Gatekeeper for Innovation and Development

Despite the increasing advocacy towards the “openness” of science and research data, it is still far from being a widespread practice. The goal of this paper is to identify the most pressing obstacles (in terms of funding, technology, Intellectual Property Rights, contracts, data protection, and social norms) which are hindering the development of Open Science and Open Research Data, with particular attention to the situation of developing countries. The innovative aim of this paper, which is the first essay of a broader research, is to prepare the epistemological basis for a Law and Technology theory of “Open Bioinformation” (OB), where bioinformation stands for research data in life sciences. We argue that so far the literature has addressed the promotion of openness in science and research data only in a sectorial manner, taking into account just one or a few of the factors affecting openness as if they were not related or mutually influenced. Therefore, the suggested solutions are limited to a single perspective and fail to consider the dynamics of information control. In our view, a holistic approach, that tries to zoom out from the specific disciplines and take into account the whole picture, would contribute to determining an effective policy for promoting OB. For this reason, we have to consider the technological, legal, Roberto Caso is the author of paragraphs 3 and 6, and co-author of paragraphs 1 and 7; Rossana Ducato is the author of paragraphs 2, 4, 5, and co-author of paragraphs 1 and 7. R. Caso (&) Faculty of Law, University of Trento, Trento, Italy e-mail: roberto.caso@unitn.it URL: http://www.lawtech.jus.unitn.it/index.php/people/roberto-caso R. Ducato e-mail: rossana.ducato@unitn.it URL: http://www.lawtech.jus.unitn.it/index.php/people/rossana-ducato © Springer International Publishing Switzerland 2016 G. Bellantuono and F.T. Lara (eds.), Law, Development and Innovation, SxI – Springer for Innovation / SxI – Springer per l’Innovazione 13, DOI 10.1007/978-3-319-13311-9_7 115 roberto.caso@unitn.it and sociological aspects, in order to assess whether and how changes in one domain might affect the others.


Introduction
Everyone says that data sharing is an imperative in science. Everyone agrees that the free and immediate access to genetic information and medical data is crucial for the progress of life sciences research. Paraphrasing James Boyle in one of his most famous writings, such statements are so obvious that we should be able to make them in a law article without having to add footnotes. 1 Open access (OA) to research data, as a gatekeeper for innovation and development, is of paramount importance in the so-called "Global South" (GS). In the field of medical and biotechnological research, developing countries face a considerable delay, which is exacerbated by the chronic lack of funds for the creation of research infrastructures and investment in education and training, as well as by widespread recourse to the practice of secrecy and/or the application of strong Intellectual Property Rights (IPRs), which hinders access to and circulation of scientific knowledge. 2 A possible way out of this situation has been identified in the open models for sharing the building block of life sciences, i.e. research data. In particular, in the paper we refer to such a heterogeneous category as "bioinformation" in order to subsume a composite set of digitized biological information, relating to an individual, and which is commonly used for the purposes of biomedical and biotechnological research. 3 The following section will be specifically dedicated to framing the definition of bioinformation and explaining the importance of sharing for current scientific research. The third section will be devoted to understanding the dynamics, the content, and the tools of "openness" in life sciences. Open models for the free access and reuse of scientific knowledge are commonly catalogued under the labels of "Open Science" (OS) and "Open Research Data" (ORD), but their meaning is still vague and polysemantic in the literature. For this reason, we will try to untangle some ambiguities, by clarifying the terms of our discourse and presenting the legal transposition of OS and ORD. 1 The reference is to Boyle (1997, p. 87).
Our analysis grew out of the realisation that despite the increasing advocacy towards the "openness" of science and research data, it is still far from being a widespread practice. 4 The goal of this paper is to identify the most pressing obstacles (in terms of funding, technology, IPRs, contracts, data protection, and social norms), which are blocking the development of OS and, in particular of ORD, with particular attention to the situation of developing countries. 5 The innovative aim of this paper, which is the first essay of a broader research, is to prepare the epistemological basis for a Law and Technology theory of "Open Bioinformation" (OB), where bioinformation stands for research data in life sciences. We argue that so far the literature has addressed the promotion of openness in science and research data only in a sectorial manner, taking into account just one or a few of the factors affecting openness as if they were not related or mutually influenced. Therefore, the suggested solutions are limited to a single perspective and fail to consider the dynamics of information control. In our view, a holistic approach, that tries to zoom out from the specific disciplines and take into account the whole picture, would contribute to determining an effective policy for promoting OB. For this reason, we have to consider the technological, legal, and sociological aspects, in order to assess whether and how changes in one domain might affect the others.
Once the causes of the problem have been identified, we will recommend some strategies and solutions that could make OB a more viable option. In particular, we will discuss two examples ("open through licenses" and "open through social norms") where openness can be realized thanks to the combination of different strategies and legal tools.

There's Something About Bioinformation: A Short Premise on Research Data for Life Sciences
If information is the blood and fuel of our world, indeed bioinformation is the vital principle of the current research methods in life sciences. 6 "Bioinformation" is an umbrella term we use to refer to information that is: (a) biological, i.e. of cellular and molecular human origin; (b) related to the βίος, the existential sphere of a person's life; (c) bioinformatic, since computer programming is applied to the processing of biological data, which are digitized or born-digital; (d) bioMedTech, in the sense that it can be used for the purpose of medical or biotech research. This includes all information derived from biological samples or consisting of data generated by the individual or other subjects involved in the care/research process (physicians, researchers, nurseries, etc.). This can be, inter alia, data relating to the molecular or biochemical characteristics of the sample, genetic information, data generated in clinical trials, diagnosis, prescriptions, medical history, eating habits, etc. 4 David and Foray (2002) and Pampel and Dallmeier-Tiessen (2014). 5 From a comparative perspective, we must specify that no particular geographic area will be the object of the analysis: we will mention some general trends shared by the countries of the GS. 6 Quoting James Gleick: "We can see now that information is what our world runs on: the blood and the fuel, the vital principle" (Gleick 2011).
The availability of this data is not only crucial for personalized medicine, but also a fundamental resource in many fields of bioscience research, since by linking genomic data or biochemical interactions with environmental factors and information relating to the illness' long-term course, we can improve our understanding of the causes or development of certain diseases (it is the idea currently behind research methods, for example, in genome-wide association studies, drug discovery, cancer research, translational medicine, pharmacogenomic investigations, etc.). 7 Advances in technology and the convergence of different disciplines-computer science, biology, engineering, mathematics, and medicine-have helped to shape this kind of information as a new commodity 8 : nowadays, genome sequencing is faster and cheaper than at the end of the Human Genome Project 9 ; data are more accurately annotated and can be stored in more widely available high-quality tools -such as computers, smartphones, and wireless devices; infrastructures like the new generation of research biobanks linked to electronic health records allow for professional and systematized collection 10 ; the huge amount of data generated can be gathered in new kinds of storage spaces like the cloud 11 ; data and information can be easily copied and transferred through digitization, 12 and so on.
Technology has greatly contributed to the potential of scientific progress, developing tools and infrastructures that allow for more and better information. Nevertheless, data collected by a researcher or a single institution, even a large one, are not sufficient to conduct a genome-wide association study or an evidence-based medicine project 13 : firstly, because data-intensive scientific discovery needs a huge amount of information from diverse sources; secondly, such investigations are intrinsically interdisciplinary, thus requiring collaboration from experts from different disciplines; thirdly, the skills, equipment and know-how are shared among stakeholders in both the public and private sector, making it necessary to overcome the traditional boundaries between the different players and build new forms of partnerships. 14 Thus, progress in research requires a vast pool of scientifically reliable data, as well as expertise from different fields of knowledge and industry. Such a need has made data sharing, rather than an option, a categorical imperative for promoting 7 West (2006). 8 On the commodification of information caused by the expansion of the IPRs domain and the new possibilities opened up by technology, see Boyle (2003) and Hess and Ostrom (2003). With a specific focus on developing countries, Forero-Pineda (2006). 9 The Human Genome Project (http://www.genome.gov/10001772) was a collaborative research program started in 1990 and aimed at sequencing the entire human genome. The first draft was published in 2001 (International Human Genome Sequencing Consortium: Lander et al. 2001), while the complete sequence was released in April 2003. At the end of the Human Genome Project the cost of the sequencing was around $100 million and in 2014 was estimated at $5,000. See Hayden (2014). 10 Kohane (2011), Jensen et al. (2012), Scott et al. (2012) and Guarda (2013). 11 Rosenthal et al. (2010) and Stein (2010). 12 Topol (2013). 13 Floca (2014, p. 298). 14 In drug discovery, the collaboration among industries, academia, and other funders has been supported by Weigelt (2009). See also, Krumholz et al. (2014). scientific progress (in the public interest) and, at the same time, for surviving in a highly specialized and competitive market (in the interest of private companies). 15 This is confirmed by the creation of networks of international research consortia that adopt collaborative policies and open access rules. The latter were codified in some soft law instruments, such as the Bermuda Principles (1996), 16 the Fort Lauderdale Agreement (2003), 17 the Amsterdam Principles, 18 or the Toronto Statement. 19 Many other initiatives from governments, international organizations and civil society have been supporting OA to scientific data over the last few years. To mention a few of them: the OECD Principles and Guidelines for Access to Research Data from Public Funding (2007) 20 ; the EU Commission Communication on Scientific information in the digital age: access, dissemination and preservation (2007)  Despite the spread of an "open culture" and the common understanding of the need for data sharing in science, there is still confusion around terms like "Open Science" and "Open Research Data". Actually, they are not clearly defined from a legal perspective. The next section aims at providing a coherent legal framework for such concepts. 15 Hagedoorn et al. (2000) and Edwards et al. (2009 3 "Open Science" and "Open Research Data": Finding the Definitions Open science is a very popular concept in the current scientific debate, but its meaning seems to be defined and interpreted in different nuances. 28 An oft-cited definition comes from Stephen Maurer, who described the OS features around three pillars: "(a) full, frank, and timely publication of results, (b) absence of intellectual property restrictions, and (c) radically increased pre-and post-publication transparency of data, activities, and deliberations within research groups". 29 More broadly, OS has been described as: "not only accessibility to research objects such as articles, data, code, protocols and workflows that people are free to use, re-use and distribute without legal, technological or social restrictions, but also the opening up of the entire research process-right from agenda-setting, data generation and data analysis, to dissemination and use". 30 For an overview of the OS phenomenon, it is useful to refer to the study by Fecher and Friesike, who from a literature review have identified at least five "schools of thought" 31 : (1) the so-called "Public School" emphasizes the need to make science understandable for the general public and the research process accessible to scientists; (2) the "Democratic School" stresses the importance of gaining access to the products of research (not only publications and data, but namely source materials, digital representations, multimedia materials); (3) the "Pragmatic School" promotes OS as a mechanism for making research more efficient; (4) the "Infrastructure School" deals with the challenges raised by the technical infrastructures that enable collaborative research projects through the web; (5) the "Measurement School" argues in favour of alternative and specific scientific impact factors for the digital age.
To adopt a strict notion of OS would be useless by definition, also considering the "open" nature of such a concept. Rather than five parallel lines, we imagine the different schools outlined by Fecher and Friesike as diverse points of view on the same phenomenon, showing us various ways of approaching it. They necessarily complement each other. The argument behind "openness" finds its root in the idea of Mertonian communalism, 32 but OS can alternatively be justified in light of utilitarian theories (it is better because it is more efficient). The promotion of sharing and collaboration among researchers shall be enabled through suitable online (and common) platforms and infrastructures. At the same time, such a system of sharing and dissemination of results can only withstand if scientists are given the right incentives. An open and wide diffusion of science materials is not only beneficial to professionals, but has to engage society more generally, 28 Grubb and Easterbrook (2011) and Frischmann et al. (2014). 29 Maurer (2003). In the same sense, Nielsen (2011). 30 Open Knowledge Foundation (2014, p. 15). 31 Fecher and Friesike (2014). 32 Merton (1942). empowering citizens. Sharing shall not be confined to scientific publications or materials, but extend to research data. The latter, in particular, are the object of the "Open Research Data" movement, a subcategory of the broader OS. Research data, such as those previously outlined as bioinformation, "form[s] the basis for the quantitative analysis underpinning many scientific publications", 33 and they represent the fundamental building block of basic research. 34 OS and ORD have certainly emerged as extra-legal phenomena, but they have begun to take on a legal dimension. Therefore, it is crucial to understand how they fit in the legal categories and what sources of law can be found in this field.
We can find some general normative indicators in the mandate to share scientific knowledge and benefits derived from them, affirmed by Article 27 of the Universal Declaration of Human Rights (1949), Article 15 of the Covenant on Economic, Social and Cultural Rights (1966), Articles 2 and 19 of the UNESCO Declaration on the Human Genome and Human Rights (1997), and Articles 2, 15, and 24 of the UNESCO Universal Declaration on Bioethics and Human Rights (2005), which explicitly take into consideration the importance of scientific data sharing for developing countries 35 ; meanwhile, the relevance of a broad access to biological materials and genetic data has been affirmed by Articles 18 and 19 of the UNESCO International Declaration on Human Genetic Data (2003).
Despite the principles they affirm, these international documents only have a programmatic value. Their provisions are declamations and not binding and operative rules. Furthermore, they are not decisive for our discussion because they do not solve the main critical tension, that is the balance between free access to the benefits flowing from scientific knowledge and the exclusive rights granted by intellectual property law: to use the terms of the UNESCO Declaration on the Human Genome and Human Rights, such soft law statements echo, but do not 33 European Commission (2012), point 3. 34 The definition of research data is hard to find in the literature. According to some authors, because there is no consensus on the notion of data itself, it would be preferable to adopt a very broad approach: the term research data shall "include any kind of data produced in the course of scientific research, such as databases of raw data, tables, graphics, pictures or whatever else". Dietrich and Wiebe (2013, p. 17). In the same sense, also the EU Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 which state that: "Research data refers to information, in particular facts or numbers, collected to be examined and considered and as a basis for reasoning, discussion, or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images" (footnote 5, p. 3). See also Leonelli (2013b), according to whom: "scientific data can be defined as material artifacts that are collected and used as empirical evidence for the plausibility of claims about the nature of reality ('the earth revolves around the sun') and/or the efficacy of specific interventions ('500 milligrams of paracetamol help to relieve headache')". 35 Caulfield et al. (2012).
Open Bioinformation in the Life Sciences as a Gatekeeper … prejudice, the international instruments governing the IPRs framework. 36 The conflict with TRIPs Agreements was pointed out in the Report on Ethics, Intellectual Property and Genomics, issued by the International Bioethics Committee (IBC) in 2002. 37 It is interesting to note that this document explicitly mentions the term "open science"-understood in a narrow sense as the antithesis to a strong intellectual property rights protection on some pharmaceutical developments which is able to affect the right to life and health of millions of people, especially in the South of the World-but we should also note that the concerns expressed in it were not implemented in the subsequent UNESCO declarations.
The top-down approach does not solve our problem of finding the legal definitions. In fact, we should note that openness started to become familiar in the legal discourse from the bottom, and, in particular, with the advent of open source software and, later on, the open access movement. 38 Open source software, born in the computer programming environment, is characterized by a decentralized production and a collaborative effort among everyone who wants to contribute to the programming of a piece of software. 39 Openness here concerns the source code of the software (i.e. the human-readable language), which is freely distributed. In this way, the program can be: run for any purpose; studied and modified as desired; redistributed as such; distributed with the modifications. 40 In order to keep the code open, a viral license is applied, which allows software to be freely used, modified, and shared, but both the code and any enhancement or derivative work must be shared on the same license terms. 41 Open Access refers to research publications and its core has been recognized (and shaped) by the "Three Bs", three declarations issued between 2002 and 2003, and resulting from three different initiatives: the Budapest Open Access Initiative Declaration (2002), the Bethesda Statement on Open Access Publishing (2003), and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003). Its main features have been effectively summarized by Peter Suber, who has described OA as a literature that is "digital, online, free of charge,   Caso and Ducato (2014). 39 See Di Bona and Ockman (1999), Raymond Raymond (2000) and Stallman (2002). 40 These are the four fundamental freedoms established by the General Public License manifesto: https://www.gnu.org/gnu/manifesto.en.html. Accessed 18.10.2014. 41 Probably the best known example is the GNU GPL license, created by Richard Stallman. Stallman (1998). and free of most copyright and licensing restrictions". 42 OA philosophy, thus, recognizes an unrestricted access and reuse of e-contents through the Internet, and contractual tools as the operative solution for doing so. In this sense, the Creative Commons licenses, developed since 2002, have been a valuable instrument for supporting the implementation of OA in a concrete way. 43 Through these modular and user-friendly licenses, the way of sharing digital content-not necessarily creative (it is possible for example to waive the sui generis right on databases)-has been radically changed, because the author is free to choose their copyright settings. 44 Filtering the precipitate of both open source and open access in order to infer a legal meaning of the concept of openness, we can observe at least two aspects in which the law can operate: firstly, openness involves a limitation of IPRs; secondly, accessibility to a specific resource is managed through licenses or contracts.
The foregoing observations hopefully clarify the terms of our analysis. We can now proceed to analyse the dynamics of ORD in its operational reality, paying particular attention to the situation of developing countries.

Open Bioinformation in the Developing World: An Overview
If ORD is crucial for the promotion of innovation and development in technologically advanced countries, it is even more so for the developing world, where openness is now considered a possible way for lifting the traditional barriers between the North and the South of the globe. 45 In the public health field, the promotion of access to scientific information as a means for overcoming the inadequate institutional, infrastructural and regulatory capacity to conduct high-quality investigations in Africa has been strongly affirmed by the Algiers Declaration on "Narrowing the Knowledge Gap to Improve Africa's Health" (2008). 46 Besides the declamations, some projects based in the developing world are starting to promote collaborative science and ORD/OB in a concrete way: it is the case of the Human Heredity and Health in Africa (H3Africa) Consortium, 47 which aims at building a network for engaging African countries in the genomic 42 Suber (2012). For a complete overview of the OA movement, see Frosio (2014); meanwhile for a specific focus on academic publications, Moscon (2015).  Lessig (1999), Carroll (2006) and Goss (2007). 45 Rahman (2012, p. 7). 46 The Algiers Declaration was issued by the ministers of health and heads of delegation of African countries, during the Ministerial Conference on Research for Health in the African Region, held in June 2008. 47 Ramsay et al. (2014). revolution; the MalariaGEN, 48 a data-sharing community studying malaria by integrating epidemiology with genomics; the Gambian National DNA Bank, 49 the first biobank created in Africa in collaboration with the Jean Dausset Foundation-CEPH that promotes the sharing of collected information; the Malaysian Oral Cancer Database and Tissue Bank System (MOCDTBS), 50 which makes data and specimens available to researchers; or the Datos Científicos Abiertos Program, 51 launched by the Comisión Nacional de Investigación Científica y Tecnológica (CONICYT) of Chile for promoting best practices and the creation of a policy for sharing scientific data. 52 Several initiatives are coming up from the bottom. A paradigmatic example is the OpenSciDev Group, attributable to the Open Knowledge Foundation, whose goal is to set the agenda for the realization of an open and collaborative science in the developing world. 53 OS, ORD and OB in particular, are becoming extremely popular because they can potentially solve some age-old problems of the GS, in primis the availability and the equal distribution of information and knowledge. As pointed out by the OpenSciDev Group with reference to publications, academic and commercial journals are inaccessible to most of the researchers and institutions in developing countries due to the high cost of subscription. 54 Such a situation creates a vicious circle, because limited access to research resources reduces the chances of authors from the GS of being published in international journals, and their underrepresentation implies at least two important consequences: 55 (1) a limited visibility and a low impact factor of developing-country (DC) researchers (and, as a result, they have little chance of spreading their ideas, being quoted, being involved in collaborative research projects, having access to opportunities for training abroad, etc.); (2) a reduced ability for institutions both of the North and the South to know the research generated in a certain DC, thus preventing both North-South and South-South collaborations.
The same applies to scientific data produced in life sciences research. Those fields are highly expensive, requiring a huge amount of investment for the gathering of samples, data and analysis: the cost of laboratories, chemicals, reagents, 48 http://www.malariagen.net/. Accessed 18.10.2014. For an overview of their data-release policy, see Parker et al. (2009). MalariaGEN is a network that includes several participants from different countries, thus enacting a North-South collaboration. machinery, equipment, specialised and trained personnel is unaffordable for most DCs. 56 To give an overview of the costs, we can mention the well-known example of the Human Genome Project. 57 The US government invested about $2.7 billion from 1990 to 2003 in the collaborative research program aimed at the sequencing of the entire human genome. Just to have an idea of the scale, the cost of a single research project is approximately equivalent to the GDP of Burundi in 2013. 58 Almost all research in DCs is conducted with scant public funding and the partnership with industry is not well implemented, so the sharing of bioinformation is of paramount importance for carrying out data-intensive research in those DCs that would otherwise be cut off from the research net. 59 An open approach, supported by a decent ICT infrastructure and sufficient expertise, could offer a cost-effective solution for performing research with limited resources. 60 OB can also foster participation and engagement in a research project. 61 This is of particular importance in life sciences research, where the success of an investigation may depend on the collaboration, in some cases, of a group of people or an entire population. A democratization of the whole process, the so-called "partnership governance", 62 incorporating research participants and giving them decision-making power, would allow citizen empowerment and increase trust in the organization conducting the research. 63 Thus, strong altruistic and economic arguments support the promotion of OB, but there is a further point to consider, which has ethical implications. A great number of DCs represent a sort of new 'goldmine' for biotechnologically advanced countries. Populations from low-income countries can be the source of a valuable pool of data, because of the genetic peculiarities of a certain ethnic group or, sadly,  Winickoff (2009). The model for realizing such a partnership governance could be found in the charitable trust, according to Winickoff and Winickoff (2003). 63 According to Frischmann, Madison, and Strandburg: "commons governance offers a defense against potential privatization of commonly useful shared resources and the possibility that an individual IP rights owner would "hold up" the enterprise as a whole. Examples of such arrangements might include "open source" commons constructed for basic biological building blocks such as the Single Nucleotide Polymorphism (SNP) consortium or the publicly available databases of genomic sequences that are part of the Human Genome Project. Formal licenses and related agreements assure that participants become part of what amounts to a mutual nonaggression pact that is necessary precisely because of the possibility that intellectual resources may be propertized" (Frischmann et al. 2014, p. 26). because patients affected by the diseases are based there. 64 After the collection of biological samples and information, research is conducted in developed countries and the results (new drugs, treatments, diagnostic methods, vaccines, etc.) are not always granted back to research participants, thus raising several ethical and benefit-sharing concerns. 65 It would be fair and compliant with the international principles mentioned above to make freely available at least the data and the analysis generated from the screening of DCs' population, allowing local scientists to reuse them for the needs and priorities of local research. 66 Even though OB represents a new hope for the GS, it is not a common practice and it is facing several obstacles. From a literature review, we have counted six variables that affect the openness of data, and, in particular, bioinformation: (1) public investment; (2) technology; (3) intellectual property; (4) contracts; (5) privacy; (6) social norms.
(1) The origin of every problem related to OA can be traced back to funding and sustainable business plans for the long term. 67 In the GS, basic research is carried out with an insufficient amount of public money. 68 As already outlined, the lack of public-private partnerships does not help overcome such an impasse. This can result in inadequate lab equipment, resources, and libraries, the lack of educational and training programs for specialised personnel, a weak ICT infrastructure, etc. 69 (2) OB may be hampered by technology: the lack of ICT infrastructures or their inability to share and re-use information, hindering the database interoperability or data portability, constitutes a serious weak point in the very possibility of data sharing. 70 The process of integrating data depends on the adoption of standards which ensure the source (metadata) and the data curation. 71 In the GS the problem is exacerbated by poor digitization of information and limited access to the Internet. 72 64 Sgaier et al. (2007). 65 Costello and Zumla (2000), Cambon-Thomsen (2004), Dickenson (2004), Knoppers (2005) and Parker et al. (2009). For an overview of the main critical issues of such a practice, see also de Vries et al. (2011). 66 Knoppers (2000). 67 Bastow and Leonelli (2010). The study by Halla Thorsteinsdóttir, Uyen Quach, Abdallah S. Daar and Peter A. Singer shows that political will and public investments have been crucial for the development of health biotechnology in seven developing countries (Brazil, China, Cuba, Egypt, India, South Africa, and South Korea), which have been taken into account as case studies (Thorsteinsdóttir et al. 2004). 68 Muñoz Palma (2012), Mboera (2012) and Inyang (2012). 69 Sirugo et al. (2004), Hardy et al. (2008), Mboera (2012) and Rahman (2012, p. 8). 70 De Roure et al. (2003), Altunay et al. (2010) and Leonelli (2013a). 71 Ankeny and . 72 Kahn (2012), Mboera (2012), Leonelli (2013b) and Open Knowledge Foundation (2014, p. 37).
(3) The complex landscape of intellectual property rights and the uncertain legal status of data are a serious disincentive to collaborative research. 73 The commodification and enclosure of data may appear in the guise of copyright and sui generis database right protection. Such IPRs, although designed for databases, ultimately end up affecting the contents of the database itself. 74 In particular, the sui generis right has been strongly criticized for its potentially negative consequences, such as the danger of creating monopolies, the increase of transactions costs, the interference with data aggregation, and the negative impact on the cooperative ethos. 75 (4) The private control of bioinformation is indeed exercised through contracts, as in the case of Data Transfer Agreements (DTA). These can be effectively enforced through technological measures that are designed to manage and protect the rights of access and use of digital contents, including through the immediate and timely sanction of any violation of the contract conditions. 76 Mastering the jungle of the terms of agreements is far from a trivial task, and it inevitably involves transactional costs, 77 which are incompatible with the timelines of scientific research. 78 (5) The rationale of OB is potentially in conflict with the right to privacy and confidentiality. 79 Just to mention the two biggest legal models for data protection, in Europe, Directive 95/46/EC 80 and Directive 2002/58/EC 81 frame the general rules, which will be profoundly affected by the new Regulation, with particular reference to the treatment of personal data for scientific research 82 ; meanwhile, the US has sector-specific federal legislation (the HI-PAA; the Federal Drug and Alcohol Confidentiality Statute; the Common 73 Guibault and Wiebe (2013). See also, Reichmann and Uhlir (2003). 74 Trosow (2004) and Davison and Hugenholtz (2005). 75 Reichman and Samuelson (1997), Reichman and Uhlir (1999), David (2000), Reichman and Uhlir (2003, pp. 396 and ff.), David (2004) and Trosow (2004). 76 Dussollier (2002), Caso (2004) and Ginsburg (2005). 77 Guibault and Margoni (2013). 78 Reichman and Uhlir (2003, pp. 402-404), Streitz and Bennett (2003) and Margoni (2013). 79 Kaye (2012) Rule; the GINA Act, etc.). 83 The basic principle in both jurisdictions is the obtainment of the data subject's informed consent. Such rules, designed to protect a fundamental right, pose de facto (legitimate) restrictions and exemptions to OB. 84 Another fundamental point stressed in the literature is that such protection only comes into play if information relates to an identified or identifiable person (ex multis, Article 2, Directive 95/46/EC; Article 5, § LXXII, Brazilian Constitution; Article 2, Ley de Argentina 25326/2000; Article 4(d), Ley de Uruguay 18331/2008; Article 3(b), Ley de Costa Rica 8968/2011; Chap. 1, § 55, South Africa Act 4/2013), or protected health information (PHI) that "does not identify an individual" or allow "a reasonable basis to believe that the information can be used to identify an individual" (HIPAA). Such an objective scope is critical because it is failing on that technological premise on which all data protection legislations have relied for reaching a balance between the protection of the individual and the free movement of information: anonymization. 85 Several studies show the increasing possibility of re-identifying individuals from anonymized data, 86 suggesting that anonymization is a promise that cannot be maintained in absolute terms in the digital environment. 87 (6) Finally, it is fundamental to take into consideration social norms, and, in this case, the scientific ethos. Despite the Mertonian principles, 88 researchers are not ontologically inclined to share their data for a number of reasons 89 : creating a dataset costs time, money and labour and they are not willing to share it without some form of compensation; sharing would eliminate the competitive advantage; the quality of a dataset might determine how grants are awarded, with consequent benefits in terms of career advancement and livelihood of the research group. 90 The lack of adequate economic or reputational incentives risks inhibiting the informal exchange of information within the scientific community. In the GS, such a problem seems to be one of the most difficult to address, since several scholars denounce the practice of secrecy as a common behaviour in the community of peers and the lack of a culture of sharing.   Merton (1942). 89 Borgman (2007). 90 Gitter (2013). 91 Mboera (2012) and Rahman (2012, p. 8).
All these factors should be considered in order to design an effective policy for OB, because they mutually influence each other. In order to provide a preliminary analysis of these complex dynamics, we will examine two cases in which we can observe the interactions among some of the abovementioned variables for achieving openness of bioinformation: the first one touches upon the limitation of IPRs through licenses and social norms, while the second one focuses on how to shape the social norms of the scientific community by using incentives and legal tools.

IPRs in Data?
In order to solve the first set of issues, a premise is needed: we have to understand which type of IPRs can be applied to data. In contrast to secret information or the end-product (publications or inventions), the application of an exclusive right in factual data is highly problematic. There is no legal definition of data nor a specific regulation for them. 92 The word "data" (datum, in Latin) comes from the ancient Greek dedomena, that literally means "difference". According to a general notion, they are uninterpreted variables not processed by a cognitive intervention. 93 If there is no human intervention, strictly speaking, the necessary precondition for intellectual or industrial property is missing. 94 Nevertheless, IPRs can indirectly affect data management and circulation through the legal regime applicable to the collections of data. Compilations and databases, in fact, can be protected by copyright and, in some jurisdictions, also by the so-called sui generis right (SGR). 95 Collections of data are eligible for copyright protection if they constitute, as a whole, an original work of authorship, whose creativity is expressed through the selection, coordination or arrangement of data and materials. 96 92 The only one legally described and expressly regulated is personal data, which is protected in accordance with national and international data protection rules. 93 Floridi (2010, pp. 25-28). 94 As is well known, copyright protects original works of authorship, but not facts or ideas; meanwhile, patent law grants the temporary monopoly for an invention that is new, involves an inventive step and is susceptible of industrial application. A right of property in data can be detected also in the provisions regarding the protection of a certain type of information, as in the case of know-how (see Article 39 TRIPS). 95 For a general overview, see Derclaye (2014). 96 Such a principle is valid on both sides of the Atlantic. The US system, in fact, protects compilations "as a work formed by the collection and assembling of pre-existing materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship" (17 U.S.C. § 101); meanwhile, Directive 96/9/EC on the legal protection of databases states that "databases which, by reason of the selection or arrangement of their contents, constitute the author's own intellectual creation shall be protected as such by copyright. No other criteria shall be applied to determine their eligibility for that protection" (Article 3). The case law has confirmed the legislative component in the leading case Feist v. Rural, 499 U.S. 340 (1991) for the US system and in the ECJ Case C 5/08 Infopaq In some jurisdictions, non-creative databases can also be protected 97 : this is the case of the sui generis right recognized in the EU and Mexico, and the sweat of the brow doctrine accepted in South Africa. 98 The EU legal system grants a 15-year protection period of protection to "the maker of a database which shows that there has been qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents to prevent extraction and/or re-utilization of the whole or of a substantial part, evaluated qualitatively and/or quantitatively, of the contents of that database" (Article 7, Directive 96/9/EC).
The substantial investment, and not creativity, is the precondition for the exercise of the SGR; furthermore, the work of the maker of the database must be directed to the gathering, validation or presentation of data. The interpretation of such requisites has given rise to some contrasts. In particular, the meaning of obtaining and verification has been at the centre of a hermeneutical dispute before the European Court of Justice (ECJ For historical accuracy, we have to mention that before the introduction of Directive 96/9/EC, a similar right, namely the "catalogue rule", already existed in Scandinavian countries (Karnell 1997). Also the US and the Australian systems used to protect the non-creative databases, applying the sweat of the brow doctrine, according to which copyright rewards the efforts and work that go into a compilation of facts. Such a principle was rejected in the US since the notorious case Feist v. Rural (1991), where the Court affirmed: "Without a doubt, the 'sweat of the brow' doctrine flouted basic copyright principles. Throughout history, copyright law has 'recognized a greater need to disseminate factual works than works of fiction or fantasy '. Harper & Row,471 U.S.,at 563. […] But 'sweat of the brow' courts took a contrary view; they handed out proprietary interests in facts and declared that authors are absolutely precluded from saving time and effort by relying upon the facts contained in prior works. In truth, 'it is just such wasted effort that the proscription against the copyright of ideas and facts… [is] designed to prevent' […] Protection for the fruits of such research… may in certain circumstances be available under a theory of unfair competition. But to accord copyright protection on this basis alone distorts basic copyright principles in that it creates a monopoly in public domain materials without the necessary justification of protecting and encouraging the creation of 'writings' by authors". For a comment, see Fulwood (1991), Ginsburg (1992) and Strong (1994). For the sake of completeness, it should be noted that after the enactment of Directive 96/9/EC, the US Congress tried to re-insert an exclusive right model for database protection similar to the SGR with some legislative proposals in 1996 and 2000. See Reichman and Uhlir (2003).The Australian jurisprudence arrives at the same conclusion in the cases IceTV Pty Ltd. v. Nine Network, Australia Pty Ltd. (2009) and Telstra Corporation Limited v Phone Directories Company (2010) (Lindsay 2012). 98 For a general overview of the sui generis right in Europe, see Stamatoudi (1997) and Derclaye (2008Derclaye ( ), (2014. An introduction to the Mexican provisions regarding the legal protection of databases can be found in Ovilla Bueno (1998), Caballero Leal (2000) and De La Parra Trujillo (2004). (2004), 100 the European Court distinguished between "obtaining" and "creation": the database is eligible for the sui generis protection only if the aim of the investment is to "seek out existing independent materials and collect them", 101 but not if the effort is directed at the "resources used for the creation as such of independent materials". 102 The activity of verification implies the ensuring of the reliability of the information contained in a database. Thus, according to the ECJ, the substantial investment has to be evaluated only with regard to those resources used "to monitor the accuracy of the materials collected when the database was created and during its operation" 103 and not those "used for verification during the stage of creation of data or other materials which are subsequently collected in a database", 104 because they are resources used during a database creation. 105 In other words, the ECJ tried to "domesticate" 106 the SGC recalling the utilitarian rationale of the directive, that is the protection of data storage and the encouragement of processing system development and not the creation of new informational resources like data and materials. 107 The exclusive right attributed to the maker of the database is particularly pervasive because it helps prevent a lawful user of the database from extracting and/or re-utilizing substantial parts of its contents, evaluated qualitatively and/or quantitatively, and impede the repeated and systematic extraction and/or re-utilization of insubstantial parts of the contents of the database if in conflict with a normal exploitation of that database or with the legitimate interests of the maker of the database (see, in particular Articles 7 and 8, Directive 96/9/EC).

Organization Ltd (2004) 99 and the three Fixtures cases
The directive contains a temperament for the abovementioned control by the maker of the database, allowing Member States to implement specific exceptions to SGR, such as in the case of extraction for the purposes of illustration for teaching or Ibid, para 34. 104 Ibidem.

105
Although the ECJ seems to make a clear distinction, in several cases it can be very hard to find a difference between the obtaining and creation of scientific data. The terms of the debate can be efficiently summarized by referring to the two points of view expressed by Derclaye (2004) and Davison and Hugenholtz (2005). 106 Davison and Hugenholtz (2005). 107 As the Court motivates, in fact: "the purpose of the protection by the sui generis right provided for by the directive is to promote the establishment of storage and processing systems for existing information and not the creation of materials capable of being collected subsequently in a database". British Horseracing Board v. William Hill, para 34. scientific research, as long as the source is indicated and to the extent justified by the non-commercial purpose to be achieved (Article 9, Directive 96/9/EC). This (shiny) attempt at openness has not been transposed across the whole Union in a uniform way, remaining a dead letter in many legal systems like Italy and Spain. 108 Furthermore, we have to consider the duration of the SGR: it arises automatically from the date of completion of the database but the period of protection begins to run afresh after any substantial change, evaluated qualitatively or quantitatively, to the contents of a database, including any substantial change resulting from the accumulation of successive additions, deletions or alterations, which would result in the database being considered to be a substantial new investment, evaluated qualitatively or quantitatively. In that case, the database resulting from that investment shall qualify for its own term of protection (Article 10, Directive 96/9/EC).
The vagueness of the European sui generis right and of its scope have raised several concerns from a legal point of view. The "rolling" duration, the difficulties in distinguishing in practice between "obtaining" and "creation" of data, the unclear policy about publicly funded databases, 109 and the limited scope of the SGR exceptions make such a right "one of the least balanced and most potentially anti-competitive intellectual property rights ever created". 110 Similar policy considerations can be made with reference to the Mexican SGR, although we should point out that such a legal model has shortcomings and has been poorly developed. Article 108 of the Ley Federal del Derecho de Autor (1996) only states that: "Las bases de datos que no sean originales quedan, sin embargo, protegidas en su uso exclusivo por quien las haya elaborado, durante un lapso de 5 años". Interpreting in a systematic way such a provision, we can infer that all non-creative databases, regardless of any evaluation of the effort for establishing them, are protected by the Mexican SGR for a period of 5 years. Furthermore, in contrast to the European solution, the SGC cannot be cumulated with copyright: original database are covered by the derecho de autor, meanwhile non-original databases can be protected through the SGR. 111 Even though the duration is shorter than the European SGR, the objective requirements are broader and the SGR extends to all non-creative Mexican databases, without taking into account any further conditions. 108 Ducato (2013) and Guibault and Wiebe (2013). 109 Only The Netherlands has explicitly denied a public authority the ability of exercising the SGR (Article 8, Dutch Database Act). See, Guibault (2013). Although not expressly recognized by the legislative component, also in the Italian legal system it is possible to reach the same conclusion. Legal scholars have, in fact, observed an irresolvable contradiction between the industrial or commercial rationale protected by the Directive and the public goals pursued by a public administration, rejecting the application of the SGR to publicly funded databases. See, Cardarelli (2002). The same principle has been confirmed also by the case law and precisely by Tribunale di Roma, Sez. IP, ordinanza 5 giugno 2008, Edizioni Cierre s.r.l. v. Poste Italiane s.p.a., in AIDA, 2010, 688. 110 Reichman and Samuelson (1997). In South Africa the sweat of the brow doctrine is still a cornerstone of copyright protection. 112 Contrary to the holding of the Feist case, the South African High Court has recently affirmed the copyright infringement in the case Board of Healthcare Funders v. Discovery Health Medical Scheme and Others (2012), since the latter used, published and adapted the contents of applicants' Practice Code Numbering System ("PCNS"). The PCNS is a database that includes personal data related to medical practitioners (name, address, bank account details, preferred payment methods, etc.) and codes for medical service providers, attributing to such information a unique identifying number. In stating the violation of the Copyright Act, the South African Court interpreted the originality requirement adopting a very low standard: "There is little doubt if regard be had to the work and energy put in over the three phases of the development of the PCNS that indeed while some of the component parts may not necessarily be original in its totality the work could be said to be original. It would be cynical to suggest that no effort or skill was expended in the development of the system over the years and in my view the respondents' stance that the work lacks originality must be dismissed in the light of the meaning that has come to be attached to the concept of originality in the case law developed over the years". 113

Open Through Licenses
The limits imposed by IPRs in scientific databases, through the long arm of the control offered by copyright and SGR, but also the uncertainty about the legal status of a dataset (as seen in the case of Europe, Mexico, and South Africa) may hinder both the regional and the transnational circulation of information. In these circumstances, a viable solution towards open models can be pursued through a legal agreement: "since the legal status of scientific databases and their content is more difficult to assess […], the use of standard licenses would eliminate the need for the user to look for the rights owner and to negotiate the terms of use". 114 Several models of standard licenses, in the form of user-friendly web tools, have been developed over the last few years allowing the exercise of IPRs on digital content according to the needs and wishes of the author. Probably the most well-known example are the Creative Commons (CC) licenses. 115 Guibault and Margoni (2013, p. 148). See also, Aliprandi (2011) andLeucci (2014). 115 Creative Commons (CC) is a charitable corporation that promotes the sharing and circulation of knowledge in compliance with copyright law. Although it offers standardized models, its modular licenses (attribution, non-commercial, no derivative works, share alike) and their combinations can provide flexibility in setting the interests of the parties. http://creativecommons.org/. instruments, created by the ingenuity of Lawrence Lessig, offer both professionals and laymen a simple way to manage copyright and, as far as we are concerned, also database rights. CC license are, in fact, designed in three main layers: (1) the Legal Code, that is the full text of the license; (2) the Common Deed, or the "human-readable" version that summarizes in an effective way (also through the use of icons) the main conditions of the license; (3) the "machine-readable" version of the license, which is written in a software format that computers can understand. 116 There are essentially three types of CC license that can promote the principles of data openness in different nuances 117 : • CC0 ("No Right Reserved"). 118 Rather than a license, it is a waiver according to which the author dedicates the work to the public domain by giving up all of his or her rights to the work worldwide. 119 In our case, it means that, for example, everyone can copy, modify, or distribute a substantial part of a database, even for commercial purposes, without asking permission and before the expiration of the 15-year period. • CC-BY-4.0 ("Attribution"). 120 Solving a gap affecting the previous ones, the latest version (4.0) of this license applies also to data, since it expressly includes the copyright on database and the SGR. 121 Under the terms of this agreement, the licensor grants a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to reproduce and share his/her creation, in whole or in part, and to produce, reproduce, and share any modification of the same. The only obligation of the user is to give credit to the creator in any reasonable manner requested by the licensor, provide a link to the license, and indicate if changes were made. 122 Another innovation of the version 4.0, attractive for researchers, relates to the attribution requirements: in addition to the obligation of indicating the URI (Uniform Resource Identifier), to the extent reasonably practicable, the new CC-BY includes also the possibility of indicating the hyperlink to the licensed material. In this way, credit attribution is flexible and allows an easier compliance especially in the case of datasets. 123 116 https://creativecommons.org/licenses/.
117 Creative Commons provides two other options, namely "non-commercial" and "no-derivatives". See, Guibault (2013 Aliprandi (2011, p. 33 Guibault (2013). For a critical analysis of the previous exclusion of the database SGR from the scope of the CC licenses, see Guibault (2011). 122 http://creativecommons.org/licenses/by/4.0/legalcode. 123 On the other hand, such a possibility carries on the problem of the links' expiration, which de facto is able to cross the attribution obligation. For a general overview of the problem for digital publication, see Kling and Callahan (2003).
• CC-BY-SA-4.0 ("Attribution-Share Alike"). 124 In addition to the clauses already seen for the CC-BY, the Share-Alike adds to the license the so-called viral effect: every modification, remix or transformation of the original work should be licensed under the BY-SA conditions or under any compatible license.
Another set of licenses-specifically crafted for the management of the bundle of rights on databases-has been created by the Open Data Commons (ODC) project. 125 The standard agreements developed by it are: (1) the ODC Public Domain Dedication and Licence (PDDL) 126 ; (2) the Open Data Commons Attribution License (ODC-By) 127 ; (3) the Open Data Commons Open Database License (ODbL). 128 Their function and content mirrors that of the CC described above, with two main differences: ODC licenses do not cover every genre of intellectual work but only databases, and they are not expressed in the "machine-readable" form. 129 Even though the ODC licenses are database-specific and should be considered as the more customized legal tool for data, some authors have found the Achilles heels of such agreements exactly in their sectoriality. Considering that they cover just databases and not the content itself, a research repository should necessarily use different types of licenses (one for the scientific publication and another for the dataset supporting that publication), thus creating inconsistencies within the system. 130 124 http://creativecommons.org/licenses/by-sa/4.0/. 125 The Open Data Commons was one of the first projects in drafting a specific open license for database in 2008 (http://opendatacommons.org/). ODC is now part of the Open Knowledge Foundation, a not-for-profit organization whose associative goal is the promotion of the openness and the sharing of knowledge in its every form. See Pollock and Walsh (2012). 126 The ODC-PDDL is an irrevocable dedication to the public domain through which the rightholder waives all rights and claims in copyright or sui generis database rights over a certain database built in every possible media and formats now known or created in the future. In case the waiver is not valid in a particular jurisdiction, the PDDL includes a worldwide, royalty-free, non-exclusive licence to use the work for any purpose for the duration of any applicable copyright and database rights. See more at: http://opendatacommons.org/licenses/pddl/1.0/. 127 The ODC-By allows users to freely share, modify, and use the database subject only to the attribution requirements in the manner specified in the license. According to the license, the rights of the user consist in the: (1) extraction and re-utilisation of the whole or a substantial part of the Contents; (2) creation of derivative databases; (3) creation of collective databases; (4) creation of temporary or permanent reproductions by any means and in any form, in whole or in part, including any derivative databases or as a part of collective databases; (5) distribution, communication, display, lending, making available, or performance to the public by any means and in any form, in whole or in part, including any derivative database or as a part of collective databases. Even if tailored on database rights, such a license resembles the contents and the aim of the CC-BY. See: http://opendatacommons.org/licenses/by/1.0/. 128 The ODC-ODbL is a license agreement intended to allow users to freely share, modify, and use a database while maintaining this same freedom for others. This is realized through the following clause: "4. Any Derivative Database that You Publicly Use must be only under the terms of: i. This License; ii. A later version of this License similar in spirit to this License; or iii. A compatible license". See: http://opendatacommons.org/licenses/odbl/1.0/. 129 Aliprandi (2011, pp. 35-36), Guibault and Margoni (2013, p. 155) and Leucci (2014, p. 12). 130 Guibault and Margoni (2013, p. 158).
Open Bioinformation in the Life Sciences as a Gatekeeper … "Open" licenses are a paradigmatic example of the interaction among different variables: they fit into the copyright and sui generis database right domain, but they allow a customization of the right-holder preferences. Thus, such legal tools help in managing the shortcomings of a strong and totalizing IP protection.
Furthermore, they internalize, in a simple and standardized way, some norms of the scientific community: in particular, the option "attribution" reflects a form of reputational reward. This is particularly important, considering that one problem with current credit attribution mechanisms is that they are essentially based on authorship of journal articles. 131 Thanks to their user-friendly features, open licenses have been successfully adopted for several data access and sharing policies. The use of the Internet and Web 2.0 has also affected scientific culture, enhancing the possibilities of information disclosure and networking. Nowadays, a researcher has a number of tools-such as blogs, thematic social networks, wikis, etc.-which enable a real-time sharing of his/her thoughts, datasets, analysis, small and negative findings with potentially everybody and without waiting for a traditional publication in a scientific journal. 135 This can produce several advantages: data can circulate more broadly and faster than in the paper-based context, partial results can be cross-checked and validated by several experts, communication enhances the possibility of receiving feedback from a larger community, the disclosure of the so-called "blind-alleys" (negative findings), which of course are never published because unproductive of results, can guide other scientists in their investigations or, at least, avoid the duplication of research in the same deadlock field. 136 However, a favourable attitude towards sharing is not widespread among researchers, especially in the GS. 137 We have probably to dismiss the Mertonian idea of an investigator moved by high values and/or the public benefit, and the concept of the scientist as a rational individual acting in the interest of the scientific 131 . Many of the considerations developed in this paragraph were already expressed in Caso and Ducato (2014). 135 Bartling and Friesike (2014, p. 8) and Rinaldi (2014). 136 Boggio (2008, p. 10) and Bartling and Friesike (2014, p. 9). 137 Mboera (2012). body. 138 In a more cynical way, we have to admit that building a dataset requires huge intellectual efforts, and in the end those data constitute the scientist's "little treasure", which will be used for publishing any significant result. Sharing such information will mean losing a significant competitive advantage and run the concrete risk of favouring the priority of someone else's publication or invention. 139 If at all, sharing has been practised in the scientific community as a means for ensuring a relationship among two researchers or labs. 140 Put in other words, it has been conceived as a "gift relationship" 141 : a courtesy occasionally made inside a small community of peers, presumably hoping to be reciprocated in the hour of need.
The lack of openness has been in some way challenged by the data sharing policies adopted by several public funding bodies in Europe and the US. 142 Many grant agreements obligate researchers to "grant back" their results and to make their dataset available for re-use. Such conditions are generally fulfilled by uploading research data into a public repository. These policies are an important recognition of the value of data collecting; nevertheless, they only have a limited scope (see for example the opt-out mechanisms in the Horizon 2020 Open Data Pilot) and face a gigantic problem of enforcement. The lack of strict controls and effective sanctions lead to a dilution of the innovative significance of such an institutional effort. 143 We argue that one the possible solutions for encouraging data sharing lies in creating special incentives for researchers, which internalize reputational rewards. 144 As Ankeny and Leonelli have outlined, current credit attribution mechanisms are shaped around the traditional outcome of a research: the publication. 145 Traditional metrics fail to measure the value of efforts spent in data collection and sharing, leaving this type of work out of their evaluation grids. 146 In the end, why should a researcher be forced to share his/her dataset with someone else? Why should he/she compromise his/her career? The labour behind such tasks is far from an automated one: it requires time and professional skills, but 138 As in the Polanyi's view (1962). 139 nowadays it does not receive any recognition. If we want to make "openness" effective and fair for its players, the challenge is to think up novel mechanisms that bring to the surface the "undeclared work" of: (1) collecting reliable data; (2) sharing them.
In the context of biobanks, for example, Anne Cambon-Thomsen has proposed the creation of a BRIF (Bioresource Research Impact Factor), a special citation impact factor for biorepository. 147 Such metrics should "trace the quantitative use of a bioresource, the kind of research using it and the efforts of the people and institutions that construct it and make it available", 148 giving credit to those who created and maintained a valid resource.
In life sciences research, which depends also on the possibility of access to biological samples (which are a scarce resource), we have proposed to think of a "sharing-index", 149 measuring the contribution of a scientist in making his/her dataset available worldwide and rewarding him/her with priority access to the material resources of a biorepository or total/partial waiver of the cost recovering fees.
The recognition of the contribution in creating a dataset would also be functional for accountability purposes, in view of assessing potential responsibilities. 150 Information is valuable only if truthful, 151 so evaluating the accuracy and integrity of data would push competition towards the top and generate as a positive externality the improvement of the general quality of informational resources.
Also, contractual tools can play a role in this context: for example, the use of a license, which recognizes the attribution, would serve to build a reputational reward for the researcher who has decided to share his/her data collection. In this sense, we argue that in drafting data sharing policies it would be preferable to adopt a CC-BY-4.0 or an ODC-By instead of a CC0 or a PDDL. The latter, in fact, would not allow the original contributor to gain credits.

Conclusions
With the advancements in science and technology over the last few years, bioinformation has acquired an unprecedented importance and value, not only for the individual to whom it relates for the possible consequences in terms of personalized medicine, but also for the different stakeholders who are interested in something that has become an exploitable resource: the current methods of research in life sciences are characterized, in fact, by a massive and cross-oriented analysis of bioinformation, which is collected, indexed, verified, made available or sold, like a new commodity. The boundaries of IPRs have gradually extended. Scientists tend 147 Cambon-Thomsen et al. (2011). It represents the evolution of the BIF, Biobank impact factor proposed by Cambon-Thomsen (2003). See also, De Castro et al. ( 2013). to protect with secrecy and IPRs those resources that until a few years ago were informally exchanged. The privatization of bioinformation is critical because the enclosure movement tends now to encompass the "raw material" of every investigation. Considering the cumulative nature of knowledge, such a commodification can create a dangerous impasse for the scientific progress.
In the current information economy the possibility of accessing and using such data is crucial for innovation and development, but it is even more important for developing countries. For the latter, openness means the possibility of access to a resource that they are not able to create due to the lack of funds, the chance of not being cut off from the international research net, and the hope of decreasing the knowledge gap with the North of the world.
However, the open philosophy is universally preached, but little practiced. As outlined in the paper, there are legal, technological and social obstacles that can explain such a situation: (1) the lack of public investments; (2) the absence of ICT infrastructures or their inability to share and re-use information, hindering the database interoperability or the data portability; (3) a pervasive private control of data through strong IPRs, contracts and technological protection measures; (4) the potential conflict between data protection and open access to bioinformation; (5) the lack of adequate economic or reputational incentives to share information within the scientific community and society in general. The construction of an effective OB policy must consider the interaction of all these factors in order to create a virtuous circle of sharing and a new knowledge commons. 152 We have presented examples of how two obstacles (pervasive private control of data and the lack of adequate economic or reputational incentives to share information) can be mitigated by using combined solutions from different domains.