Clinical practice now takes place in a highly data rich environment. Information on clinical care may be routinely collected through much of a person’s lifetime and collected quite intensively during periods of illness and treatment. Beyond the familiar everyday clinical forms of data collection and storage, less familiar entities, such as clinical registries and administrative health datasets, may hold enormous quantities of potentially useful information. Strategies to link the various data entities are complicated by issues of consent, data ownership, governance and confidentiality1. Nonetheless, successful data linkage can provide valuable insights into clinical care. Evaluation of linked data in cancer has shed light on aspects such as practice patterns and quality of lung cancer staging in the SEER database in the USA2, on MDT-related survival in the national Cancer Registry in Taiwan3 and on outcome inequalities identified through the UK National Lung Cancer Audit (NLCA)4. However, without considerable epidemiological experience, practising clinicians may find it challenging to identify the full range of data sources relevant to particular questions, let alone interrogate them effectively. Even a single institution is likely to have multiple data repositories linked to each patient, including the clinical electronic health record, demographic datasets, the Oncology Management Information System (OMIS) (in the case of cancer patients) and subspecialty institutional registries. The CancerLINQ programme5 in the USA employs innovative wide-ranging data linkage strategies, attempting to utilize nationally collected cancer care data to influence clinical management and outcomes. In the Australian setting, identification of the range of relevant data sources pertinent to cancer care, mapping of linkage pathways and clarification of the steps required for to data access may help clinicians to collect and use clinical information to better inform clinical care and potentially improve outcomes.