OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 07:49

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

In Reply: Big Data Research in Neurosurgery: A Critical Look at this Popular New Study Design

2018·42 Zitationen·Neurosurgery
Volltext beim Verlag öffnen

42

Zitationen

7

Autoren

2018

Jahr

Abstract

To the Editor: We thank the authors for their review and discussion1 of our recent article, “Big Data Research in Neurosurgery: A Critical Look at this Popular New Study Design.”2 To begin, their thoughtful comments are based on a definition of “Big Data” that differs significantly from ours. We defined “Big Data” as any nonneurosurgery-specific clinical data repository being used for neurosurgical research, the administrative database being a prime example. In contrast, the definition of “Big Data” used by the authors signifies large but narrow volumes of data, such as vital signs, intracranial pressure, and electroencephalogram (EEG) collected per patient, combined and placed in a “black box” (ie, artificial intelligence, machine learning [ML], and deep learning) in an effort to accurately predict a future event. With this being said, we would like to respond to some of the comments put forth by the authors. The 4 components of big data—volume, variety, velocity, and veracity—are simplistic and useful. Volume is self-explanatory and the most touted advantage of these data repositories. Variety is dictated by the database being used, but in general lacks sufficient detail and outcomes specific to neurosurgical pathologies. Regarding the velocity and veracity with which the datasets are compiled, there is also room for improvement. A study comparing natural language processing (NLP) to administrative data found NLP superior in identifying postoperative complications over the administrative data analysis, which based indicators of complications on discharge codes.3 Because natural language processing can detect and extract valuable information from unstructured data (ie, the free text in a physician note),3 this finding also suggests that direct analysis of the source data provides more reliable conclusions over data that are first transformed into a code, then transferred to an external database, then statistically analyzed as is the manner in which administrative data are currently compiled and studied. Thus, optimizing velocity and veracity of big data may be better served in the future by using source data instead of administrative data altogether. One of the more interesting applications of big data is in the area of artificial intelligence (eg, artificial neural networks and ML), which requires large datasets and is making its way into neurosurgical research, but as yet have not been widely implemented clinically.4,5 Although national datasets have been successfully used to construct artificial neural networks,6-8 it has been noted that “the performance of ML is highly dependent on the quality of input data.”5 Given the vast amount of digital healthcare data that are generated daily at hospitals and clinics across the nation, it would be ideal to responsibly incorporate that data directly into computational analyses, rather than with the administrative database serving as the proverbial “middle man.” Murdoch and Detsky9 point out most information in any given electronic medical record (EMR) is “currently perceived as a byproduct of health care delivery, rather than a central asset to improve its efficiency.”9 This point is further emphasized by the fact that administrative databases are essentially an afterthought application of EMRs, and there is—as of yet—no prevailing effort to make health data and big data one and the same. Incorporating computational methods into the clinical arena is not without obstacles, as mentioned by the authors as well as Murdoch and Detsky.9 They name challenges from lack of incentives; however, these could evolve as the US healthcare system focuses more intently on value-based care, privacy and digital data safety concerns (which must be met with reliable security solutions), and the fragmented status of EMRs and record-keeping systems in use at most hospitals that prevent straightforward transfer or integration of data from multiple sources. Bellazzi10 also discusses reproducibility of big data analytics due to the inherently complex nature of data collection and analysis, as well as the extreme variety and questionable reliability of the data itself. He even states that the greatest challenge is “data quality and results evaluation.” None of these barriers are necessarily insurmountable, and an open-source dataset, together with a solid understanding of the components of the datasets, may help provide a solution. Another answer may lie in improved quality control efforts by institutions overseeing the datasets, but this may very well require a paradigm shift within that organization to view the dataset as rigorously as any prospective research study. Unless a study is intended to function fully electronically, these quality control standards typically consist of regular in-person site visits from the data monitor (ie, every few weeks to every few months), as well as ongoing training of team members and audits of the site practices and data capture processes.11 Indeed, this is a cumbersome and potentially cost-prohibitive task.9 Higher standards for data entry may also improve the use of administrative data for teaching neural networks and creating predictive models; but then again, so would using the nontransformed data directly from the EMR. We certainly support the use of databases designed by neurosurgeons and would like to see more national datasets intended to study neurosurgical outcomes specifically. An open-source designation for coding and complications that could be easily accessed and employed would also be helpful in reducing coding variability. Hopefully this would also include a solution for the issue of financial incentives driving billing and coding practices. Despite all of these proposed approaches to improving administrative datasets, we agree that they still “cannot establish causality,” but may retain utility as “hypothesis-generating tools.” However, we continue to question the role and value of repeatedly disclosing their lack of ability to show anything more than correlation, but offering no apparent intention to confirm or refute the finding via any follow-up studies of sound scientific design. Virtually none of the studies that we evaluated compared the results obtained from big data analysis with their own institutional experience. In this vein we would like to see more studies that test the hypotheses generated by the exploratory results of administrative data analyses at a single or multi-institutional level. Alternatively, it would be appropriate to relegate these datasets to studying certain unique realms, such as geographic cost comparisons or resource utilization. The goal of our study was really to draw the reader's attention to the current application of big datasets to clinical neurosurgical research. We appreciate the opportunity to discuss alternative approaches and ways to improve big data studies, and look forward to future applications of computational technologies to the health care sector. Disclosure The authors have no personal, financial, or institutional interest in any of the drugs, materials, or devices described in this article.

Ähnliche Arbeiten