How can you discover and reuse existing research data?

There is so much data out there, that we want to help you find your way more easily.

Re-using Existing Data

Anything that can be used for analysis can be considered “data(sets)”. Many national and international organisations provide access to large datasets free of charge: this is called Open Data.

Datasets may contain different kinds of data files, e.g. raw or edited/cleaned data, and macro or micro data. Raw data refers to the data as they are primarily collected, and includes all data, even the missed or mismatched pieces in the data file. Edited or cleaned data refers to data that have been tidied up for analysis and publication. Macro data and statistics are results based on micro data units and provide a general overview of the micro data. Although datasets can contain data of varying type or aggregation level, and there may be overlap between these definitions, each element can contain very important information.

When re-using research data, scientists must be familiar with the rules and regulations governing data copyright, intellectual property rights, and laws governing sensitive or personal information. SURF has compiled a report on the legal status of raw data including information on the types of consent required for the re-use of data. Your 🔒 Privacy Champion can answer questions about the use of personal data. IXA can provide legal help with the re-use of data.

See also the ZonMw explanation of different kinds of property rights in the Netherlands (text available in Dutch only).

Sources for Finding Existing Datasets

The number of datasets that are available grows rapidly. Datasets are made available in many formats, by many people or organizations. Some datasets are raw files and some are specifically organised and formatted as databases that require a licence or subscription to use them. The library of the Vrije Universiteit Amsterdam has collected links to some of the data repositories used and has licensed several databases.

  1. Popular Free and Licensed Databases: These can be found with LibSearch Advanced.

If you need help finding & using free or licensed sources you can contact the Research Data Services Helpdesk. For students and personnel in the fields of economics, finance, or organisation science a separate LibGuide has been created to help them find and use/re-use data.

You can also start looking for data in these four places:

  1. The literature. Research articles may point you to the data that they are based on. Sometimes, (part of) the data are added to the article as supplementary files, and sometimes the data are published separately in a data repository. In the latter case, the article usually provides a clear reference to the published dataset. Some datasets may even be specifically published in Data Journals.
  2. Scientific data repositories. Data repositories are platforms used to access and archive research data. Universities often provide a repository for data archiving, but other platforms arranged by discipline or by country also exist. Some repositories are only accessible to consortium members, whereas others are free of charge. Many universities in the Netherlands use DataverseNL to archive datasets for the mid-term. Long-term archiving is provided by the national research data archives DANS and 4TU.Research Data. In Europe, B2SHARE and Zenodo are platforms used to access research data. Data repositories can be accessed by searching by topic or country using Re3data, a data repository registry. The VU has its own research portal, PURE, where researchers register their datasets. You can find instructions on how to register your own dataset in PURE on the Dataset Registration page of this LibGuide.
  3. Data search engines. Search engines allow you to quickly browse data sets and supplementary data files published by researchers. They cover data sets from many sources. This makes them useful for quick orientation on a topic. Example of a search engines are: DataCite, Google DataSet Search.
  4. Data portals of (governmental) organisations. Organisations that regularly collect (statistical) data sometimes offer these data through their own portal. An example is Eurostat, which collects and disseminates statistics at the European level, by country and by theme. Some of these websites have been linked in the Finding data LibGuide.

Data Sources for VU Researchers

Researchers from the Vrije Universiteit Amsterdam have also developed some databases containing data collected during research. See here for some examples:

  • Nederlands Tweelingenregister (Netherlands Twin Register) The database contains data on twins and their families and was created to do research on the relationship between genetics and growth, development, personality, behaviour, diseases, mental health and all kinds of risks.
  • Geoplaza VU - the portal for all matters related to GIS (Geographical Information Systems) and geodata at the VU University Amsterdam. It offers students and employers a platform to exchange, examine and download digital map material.
  • Dutch monasteries - database with information about Dutch monasteries of the Middle Ages.
  • Slave owners in Amsterdam 1863 - the place of living of owners of slaves in Amsterdam in 1863, visualized in GeoPlaza.
  • Deaths at the Borders Database - collection of official, state-produced evidence on people who died while attempting to reach southern EU countries from the Balkans, the Middle East, and North & West Africa, and whose bodies were found in or brought to Europe.
  • Datasets published by VU Researchers can be found at the VU Research Portal.

Citation Elements

Citing data is not different from citing a journal publication. Similar to citing a journal publication, it helps to give and receive credit, and show the impact of the original source.

Make sure to check the rules of the journal to know how you should cite when writing an article for a specific academic journal. For all of the journals, however, the minimum compulsory elements in a data citation include:

  • Author(s): Name of the author (creator) of the dataset
  • Title: Name of the dataset
  • Date of publication
  • Publisher: Archive where dataset is stored
  • Persistent Identifier: Unique identifier, most common is the DOI (see section Data Publication).

Optional elements that may be included in the reference are:

  • File Type: Codebook, movie, software
  • Version: Version number of the edition
  • Creation Date
  • Date of Consultation (last)

Example data citation

Stephens, William, 2020, “Resiliences to Radicalisation - QSort Data”, https://doi.org/10.34894/35MTMN, DataverseNL, V1.


For more information, see the following guidelines:

Relevant is also the Citation File Format (CFF).