Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Overcoming Computational Resource Limitations in Deep Learning for Healthcare: Strategies Targeting Data, Model, and Computing
6
Zitationen
1
Autoren
2025
Jahr
Abstract
Deep learning has been identified as an indispensable backbone in health data science. However, the computational constraints faced by many healthcare providers, who may lack access to high-performance computing resources, must be considered. This commentary illustrates three representative strategies from the perspective of data, model, and computing to mitigate computational constraints in resource-limited settings. Deep learning (DL) has been identified as an indispensable backbone in health data science [1], driven by exponential growth in the scale of medical data and its remarkable modeling capability powered by increasingly complex architectures with parameter counts now surpassing hundreds of billions [2]. Supercomputing centers have been established in industry settings, but restrictions on sharing private patient data with external entities persist [3]. As a result, academic institutions and hospitals remain primary venues for developing DL models in healthcare. However, the computational constraints faced by many healthcare providers, who may lack access to high-performance computing resources, must be considered. From the perspective of the DL lifecycle, we identify three key factors that contribute to computational resource demands: data, model, and computing. Accordingly, we illustrate three representative strategies in Figure 1: informative data subset selection, model compression, and low-precision computation that offer actionable solutions for healthcare providers to mitigate computational constraints when leveraging DL models in resource-limited settings. Representative techniques toward efficient deep learning for healthcare. Data is the driving force behind the success of DL models. However, the rapid increase in both the dimensionality of individual samples and the overall size of datasets has made model development on entire datasets increasingly cost-prohibitive for hospitals and medical research institutes. A practical solution to this challenge is informative data subset selection, which aims to recognize pragmatic samples that would contribute to model training, thereby reducing computational costs and mitigating predictive errors caused by noisy or irrelevant data [4]. Sample selection can occur either before or during model training. Prior to training, a common approach involves representativeness-based selection, where sample representations are first obtained, followed by clustering to identify a core set of data points closest to the cluster centers [5]. During training, Katharopoulos and Fleuret [6] introduced a theory that suggests that many samples become redundant after a few epochs and can be excluded from subsequent training. Sample informativeness is measured by its contribution to the variance reduction in model parameter updates, with large batches of data being replaced by smaller subsets that maximize variance reduction. Compared with the default solution, their proposed strategy reduced the test error by 8.0% and 5.0% on CIFAR-10 and CIFAR-100 image classification tasks, respectively. Coleman et al. [7] expanded on these ideas by introducing a proxy model with fewer parameters, derived from the original backbone model, to further accelerate the calculation of sample informativeness while optimizing computational efficiency. The proposed method successfully removed 20.0% of the dataset, with only a 0.1% increase in the top 1 error on the Amazon review polarity benchmark. Moreover, unlike traditional DL approaches that require large datasets for training, recent advancements in foundation models across various medical domains, such as radiology [8], pathology [9], and ophthalmology [10], have demonstrated the capability to address tasks effectively with limited fine-tuning samples. This efficiency stems from their ability to leverage generalizable knowledge acquired from extensive external data on related tasks and further mitigates the potential model convergence issues caused by data subset selection [11]. The victory of DL originates not only from large-scale datasets but also from model parameter scalability and the corresponding capability to fit complex decision functions [12]. In the intervening time, optimizing these large-scale parameters imposes significant requirements on computational resources. Model compression aims to identify compact architectures with minimal parameters to reduce computational demands while preserving performance. Among common techniques, parameter pruning and low-rank factorization have been suggested because they can be easily deployed in both the training and inference stages [13]. Additionally, they are more efficient than techniques that require additional training, particularly in resource-constrained settings. Parameter pruning eliminates parameters that are not important to the target task, whereas low-rank factorization leverages matrix decomposition to approximate indispensable parameters [14, 15]. For parameter pruning, Srinivas and Babu [16] proposed a seminal approach for neuron elimination. Their method calculates the saliency of each neuron and iteratively removes those with the lowest saliency. On the MNIST dataset, a LeNet-like architecture with 83.5% parameter compression achieved an accuracy of 98.4%, thereby reflecting a decrease of < 1.0% compared with the full-parameter model, which attained 99.1% accuracy. This approach can be applied to any pre-trained model without additional training, aligning with our goal of reducing computational resource consumption. By contrast, low-rank factorization targets the matrix multiplications that are inherent in DL, which account for a substantial portion of the computational load [17]. This approach replaces the original matrix operations with decomposed matrix multiplications, thereby reducing calculation redundancy. For instance, Denton et al. [18] used singular value decomposition to approximate convolutional filter matrices with low-rank matrices, significantly reducing the number of parameters. Validated on ImageNet classification, their method achieved a reduction in weights ranging from 2.4 to 13.4 times, with a max error increase of 0.9%. However, for applications with sufficient computational resources, where the primary goal is to reduce inference latency, alternative methods such as knowledge distillation should also be considered [19]. For readers interested in a detailed exploration of model compression techniques, we recommend referring to this survey [20]. In addition to data volumes and model parameters, the resource demands of hardware, storage infrastructure, and data transmission networks are significantly influenced by the computing approach adopted throughout the DL lifecycle [21]. Low-precision computing is a technique that uses fewer than 32-bit floating-point numbers, the standard for processing deep neural networks, to represent individual parameters, activations, and gradients [22]. This approach accelerates computations and reduces memory usage, thereby lowering the overall computational resource demands [23]. Low-precision computation is particularly useful during model inference [24]. Considering high-dimensional medical image data as an example, multiple images can be spliced into low-precision batches for processing by a low-precision model [24]. Then batch prediction results are separated to retrieve the output for each individual image [24]. Compared with the inference stage, using low-precision computation during training often leads to convergence-relevant issues, such as gradient divergence, vanishing gradients, or entrapment in local minima [25]. Mixed-precision computation has gained prominence as an effective strategy to address these challenges by combining low precision for less sensitive operations with sufficient numerical resolution for critical computations. Hayford et al. [25] demonstrated a practical application of mixed-precision training for DL models. In their experiments, model parameters were stored in full-precision, whereas loss and gradient parameters were computed and stored in low-precision. When model parameters were updated, these low-precision values were temporarily converted back to full-precision. Furthermore, loss scaling was used to shift the update parameters from a wide range in low-precision to a narrower range in full-precision because most activated values during training were typically < 1. The experimental results showed that mixed-precision training not only led to a relative error increase of < 1.0% compared with full-precision training but also significantly reduced computational resource consumption, with a reduction of training time ranging from 10.6% to 43.1%. The trade-off between achieving model performance and compression cannot be fully addressed; however, various strategies have been proposed to mitigate this issue. For example, progressively decreasing the bitwidth achieves comparable or superior performance while substantially reducing the memory requirements for model parameters, compared with uniform bitwidth reduction [26]. For readers seeking an in-depth understanding of mixed-precision training and inference techniques, we recommend consulting this review [27]. In healthcare scenarios, researchers should strike a balance between computational efficiency and the rigorous demand for accuracy and reliability: Overlooking computational efficiency can lead to impractical systems that are resource-intensive, whereas sacrificing accuracy compromises patient safety and clinical utility. This commentary elucidates the three primary factors of data, models, and computing that influence computational requirements when using DL in healthcare and presents representative solutions to reduce computational resource burdens. Regarding future efficient DL, we recommend that researchers conduct rigorous evaluations of models, particularly compressed or low-precision models, in both out-of-sample and out-of-time scenarios. Such assessments should use diverse metrics tailored to specific objectives, including sensitivity and specificity, to accurately identify patients with and without a given disease [28, 29]. For high-level screening, such as infectious disease detection in a large population, models with high sensitivity are prioritized to minimize false negatives and mitigate the risk of disease spread [30]. By contrast, for critical diagnoses, such as confirming cancer prior to radiotherapy, models with high specificity are crucial to prevent unnecessary interventions, reduce patient anxiety, and avoid substantial costs [31]. If the performance of efficient DL solutions is validated to meet the real-world deployment standard, researchers are then encouraged to focus on computational feasibility, energy efficiency, and environmental sustainability [32, 33]. Han Yuan: conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing–original draft, writing–review and editing. The author has nothing to report. This study is exempted from review by the ethics committee as it does not involve human participants, animal subjects, or the collection of sensitive data. The author has nothing to report. The author declares no conflicts of interest. The author has nothing to report.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.307 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.679 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.207 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.607 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.411 Zit.