Annals of Thoracic Medicine Official publication of the Saudi Thoracic Society, affiliated to King Saud University
Search Ahead of print Current Issue Archives Instructions Subscribe e-Alerts Login 
Home Email this article link Print this article Bookmark this page Decrease font size Default font size Increase font size

Table of Contents   
Year : 2017  |  Volume : 12  |  Issue : 2  |  Page : 95-100
Systematic analysis of measurement variability in lung cancer with multidetector computed tomography

1 Department of Radiology, Sir Run Run Hospital Affiliated with Nanjing Medical University, Nanjing, China
2 Department of Radiology, BenQ Medical Center, Nanjing Medical University, Nanjing, China
3 Department of Cell Biology, Collaborative Innovation Center for Cancer Personalized Medicine, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Cancer Center, Key Laboratory of Human Functional Genomics of Jiangsu Province, Nanjing Medical University, Nanjing, China

Date of Submission01-Nov-2016
Date of Acceptance04-Dec-2016
Date of Web Publication4-Apr-2017

Correspondence Address:
Jichen Wang
Department of Radiology, BenQ Medical Center, Nanjing Medical University, No.71, Hexi Street, Jianye District, 210019, Nanjing
Yujie Sun
Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, No.101, Longmian Avenue, Jiangning District, 211166, Nanjing
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/1817-1737.203750

Rights and Permissions


Objective: To systematically analyze the nature of measurement variability in lung cancer with multidetector computed tomography (CT) scans.
Methods: Multidetector CT scans of 67 lung cancer patients were analyzed. Unidimensional (Response Evaluation Criteria in Solid Tumor criteria), bidimensional (World Health Organization criteria), and volumetric measurements were performed independently by ten radiologists and were repeated after at least 5 months. Repeatability and reproducibility measurement variations were estimated by analyzing reliability, agreement, variation coefficient, and misclassification statistically. The relationship of measurement variability with various sources was also analyzed.
Results: Analyses of 69 lung tumors with an average size of 1.1–12.1 cm (mean 4.3 cm) indicated that volumetric technique had the minimum measurement variability compared to the unidimensional or bidimensional technique. Tumor characteristics (object effect) could be the primary factor to influence measurement variability while the effect of raters (subjective effect) was faint. Segmentation and size in tumor characteristics were associated with measurement variability, and some mathematical function was established between the volumetric variability and tumor size.
Conclusion: Volumetric technique has the minimum variability in measuring lung cancer, and measurement variability is associated with tumor size by nonlinear mathematical function.

Keywords: Computed tomography, lung cancer, measurement variability

How to cite this article:
Jiang B, Zhou D, Sun Y, Wang J. Systematic analysis of measurement variability in lung cancer with multidetector computed tomography. Ann Thorac Med 2017;12:95-100

How to cite this URL:
Jiang B, Zhou D, Sun Y, Wang J. Systematic analysis of measurement variability in lung cancer with multidetector computed tomography. Ann Thorac Med [serial online] 2017 [cited 2023 Feb 4];12:95-100. Available from:

Tumor imaging plays a fundamental role in clinical care and trials of lung cancer where computed tomography (CT)-based tumor measurement is the preferred technique.[1] Compared with the World Health Organization (WHO) criteria, the Response Evaluation Criteria in Solid Tumors (RECIST) shows a better result in determining response to therapy.[2],[3] Recently, volume technique obtained with automated segmentation tool improves accuracy of assessment.[4],[5]

Owing to measurement variability, however, measurements of lung tumor size on CT scans are often inconsistent and can lead to an incorrect interpretation of tumor growth or response. Although a number of significant factors leading to measurement variability have been documented,[4],[5],[6],[7],[8],[9],[10],[11],[12],[13] those of primary importance and the quantitative relationship between those potential factors and variability have yet to be determined. The purpose of this study was to systematically analyze the measurement variability in CT interpretation in cases of nonsmall cell lung cancer.

   Methods Top

This retrospective study was approved by our institutional ethics committee, and the requirement for informed consent was waived because of the retrospective nature.

Patients' characteristics

Patients were identified in the Picture Archiving and Communication System from January 2014 to December 2015. All the identified patients had solid pulmonary nodules or masses diagnosed as nonsmall-cell lung cancer by biopsy or surgical specimen, and all tumors were imaged by CT with 2.0 mm or thinner collimation.

We identified 67 patients with 69 lung tumors, including 20 women and 47 men with mean age of 67.1 years ±12.2 standard deviation (SD). Sixty-five patients had one focus each, and two patients had two foci each.

Computed tomography data acquisition

Patients underwent imaging using a 64-detector CT scanner (LightSpeed VCT, GE Healthcare, USA, Chicago, IL) with 64 mm × 0.625 mm collimation and a 16-detector CT scanner (Sensation 16, Siemens Medical Systems, Forchheim, Germany) with 16 mm × 0.75 mm collimation. Scans were obtained with the patients at full inspiration. Exposure settings were 50–80 mAs at 120 kVp. Axial images of 1.25 mm or 1.5 mm thickness were reconstructed with 512 × 512 matrix. Air calibration was conducted every morning before CT scanning.

Tumor measurement

Pulmonary tumors were analyzed independently by 10 raters (with 2–10 years of experience in radiology, respectively) on a workstation (Leonardo; Siemens Medical Systems) using a lung window (width, 1500 HU; center, −500 HU), and if necessary, the window settings were allowed to be changed. After instruction to measure tumors on preselected images, the raters performed measurements on transverse slices using a digital caliper according to the RECIST and WHO criteria and obtained the volume of each tumor using the computer-aided semi-automated evaluation software (LungCare; Siemens Medical Solutions). Four measurements were generated: longest diameter on native axial slice (RECIST criteria), longest perpendicular diameter in the same image, product of these two diameters (WHO criteria), and volumetric quantification of the tumor. The raters were not aware of each other's selected slices. At least 5 months later, a duplication of measurement procedure was performed by each rater for all tumors.

In addition, two experienced raters (D. Z. and B. J., with 20 and 10 years of experience in radiology, respectively) visually assessed tumor morphological characteristics by consensus. Moreover, four subgroups were generated: regular group (well-defined boundary) versus irregular group (undefined boundary) and isolated group (nearly no interface between tumor and adjacent structures) versus nonisolated group (interface ≥45°).

Statistical analysis

Statistical analysis was performed with the SPSS software (PASW Statistics 18; SPSS Inc., Chicago, IL, USA) and two-tailed P < 0.05 was considered statistically significant. The required sample size to detect a significant association at α =0.05 and with a power of 90% was estimated to be 60. Continuous variable is expressed as mean ± SD.

We estimated the intraobserver reliability with formula of (between_subject SD 2 + between_observer SD 2)/(between_subject SD 2 + between_observer SD 2 + measurement_error SD 2) and interobserver reliability with formula of (between_subject SD 2)/(between_subject SD 2 + between_observer SD 2 + measurement_error SD 2), which are the mathematical derivation of equation of (SD of subject's true values)2/([SD of subject's true values]2 + [SD of measurement error]2) by Bartlett and Frost,[14] and the agreement by Bland–Altman plots. The variation coefficient (VC), defined as the ratio of the SD to the mean, was also calculated. The variation sources of the tumor measurements were modeled with the analysis of variance.[7] We also explored the relationship between measurement variability and potential factors by curve estimation.

   Results Top

Tumor size ranged from 1.1 cm to 12.1 cm (mean, 4.3 cm) by unidimensional measurements, 1.1 to 104.9 cm 2 (mean, 19.3 cm 2) by bidimensional measurements, and 0.6 to 553.4 cm 3 (mean, 66.2 cm 3) by volumetric measurements [Table 1].
Table 1: Results from tumor measurements

Click here to view

Misclassification rates

Because of unavailable criteria for volumetric technique at present, we used RECIST criteria as the reference for volumetric measurement. Misclassification rates demonstrated the potential impact of measurement variability. For each rater and each tumor, the difference between the smallest and largest measurement was computed. All measurement differences were assessed relative to the smaller measurement using RECIST and WHO criteria for progressive disease (RECIST >20% and WHO >25%) and relative to the larger measurement using criteria for response (RECIST >30% and WHO >50%). A misclassification was recorded in each group if the relative change exceeded these criteria. For inter-rater misclassification, only the first replication was used for this estimate. Volumetric technique showed the lowest misclassification rates [Table 2].
Table 2: Measurement variability and the corresponding misclassification

Click here to view

Agreement and reliability

For the repeatability (intra-rater) study, the 95% limits of agreement varied from −12.1 mm (−26.9%) to 12.9 mm (28.9%) for unidimensional, −984.0 mm 2 (−45.1%) to 960.3 mm 2 ( 47.6%) for bidimensional, and −6666.4 mm 3 (−11.2%) to 7221.8 mm 3 ( 11.6%) for volumetric measurement [Table 1]. The significant difference was found among RECIST versus WHO (P < 0.001), RECIST versus volume (P < 0.001), and WHO versus volume (P < 0.001), respectively. For the reproducibility (inter-rater) study, the 95% limits of agreement varied from −13.7 mm (−31.2%) to 13.9 mm (31.2%) for unidimensional, −1095.0 mm 2 (−52.4%) to 1153.4 mm 2 ( 53.6%) for bidimensional, and −19593.2 mm 3 (−23.9%) to 22622.5 mm 3 ( 25.8%) for volumetric measurement. The significant difference was found among RECIST versus WHO (P < 0.001), RECIST versus volume (P < 0.001), and WHO versus volume (P < 0.001). In the long run, we expect the difference between two volumetric measurements on a subject to differ by no more than −11.2%, 11.6% for repeatability study and −23.9%, 25.8% for reproducibility on 95% of occasions [Figure 1]. This means that increases and decreases less than the threshold can be a result of the inherent variability and may be indistinguishable from changes caused by variability alone and are unproven as a marker of efficacy in clinical trials.
Figure 1: Bland–Altman plots demonstrating the agreement between intra-rater (repeatability) and inter-rater (reproducibility) measurements of volume, which is logarithmically transformed. As presented in the Bland–Altman plots, the level of agreement is significantly higher for intra-rater measurements than that for inter-rater measurements

Click here to view

The intra-rater and inter-rater reliability were 0.998 and 0.971 for unidimensional measurements, 0.998 and 0.982 for bidimensional measurements, and 1.000 and 0.997 for volumetric measurements. In addition, the volumetric technique had the smallest VC [Table 1].

Sources of variation

For the analysis of variance, the dependent variable was the tumor size measured and the independent variables were tumor, rater, and replication. The results indicated that tumor effect (measurement variability resulted from tumor characteristics alone) and rater effect (measurement variability resulted from rater characteristics alone) were significant in producing measurement variability, and the vast majority of variability was contributed by tumor effect [Table 3].
Table 3: Analysis of variance of tumor measurements

Click here to view

Influence of tumor characteristics

Compared with unidimensional and bidimensional techniques, volumetric technique had the lowest misclassification rate and VC and the highest agreement and reliability. Therefore, volumetric technique was optimal for therapeutic response assessment of lung cancer [Table 4].
Table 4: Influence of tumor characteristics on volumetric variability

Click here to view

For repeatability (intra-rater) study, tumor size (P < 0.001) and interface (P = 0.001) influenced the volumetric measurement: the lower variability was found in isolated tumors with interface of < 45°, and the lowest variability could be obtained at tumor size of 57 mm by the fitted function of Y = 0.001X 2 0.114X + 7.524 [Figure 2]. For reproducibility (inter-rater) study, variability was only associated with tumor size (P < 0.001) and the lowest variability appeared at 40 mm by the fitted function of Y = 0.004X 2 0.317X + 16.079 [Figure 2].
Figure 2: Fitted curves of variability (%) by tumor size (mm). For repeatability study, the lowest variability appeared at 5.7 cm of tumor size, but at 4.0 cm for reproducibility study

Click here to view

   Discussion Top

Compared to unidimensional and bidimensional techniques, our study showed that volumetric technique had the minimum variability in measuring lung cancer with CT scans, and the vast majority of variability was produced by tumor effect. Furthermore, variability was associated with tumor size by nonlinear mathematical equation. To clarify the significance of our results, we will elucidate the following key points:

  1. Why should reliability be introduced into analysis of measurement variability in lung cancer?
  2. Is conventional inter-observer variability really a result of observer (rater) heterogeneity or subjective effect?
  3. Is there linear or nonlinear relationship between measurement variability and tumor size?

Repeatability (intra-rater) refers to the variation in repeat measurements made on the same subject under identical conditions. This means that measurements are made by the same instrument or method, the same observer (or rater), and that the measurements are made over a short period, over which the underlying value can be considered to be constant. Reproducibility (inter-rater) refers to the variation in measurements made on a subject under changing conditions. The changing conditions may be due to different measurement methods or instruments being used, measurements being made by different observers or raters, or measurements being made over a period, within which the “error-free” level of the variable could undergo non-negligible change.[14]

Reliability and agreement

Repeatability and reproducibility are characterized by the concepts of agreement and reliability. Agreement quantifies how close two measurements made on the same subject are and is measured on the same scale as the measurements themselves. Reliability relates the magnitude of the measurement error in observed measurements to the inherent variability in the “error-free,” “true,” or underlying level of the quantity between patients.

In previous studies,[4],[5],[6],[7],[8],[9],[10],[12],[13] agreement has been emphasized and most of these studies used the Bland–Altman plots to demonstrate the agreement. Compared to agreement, however, reliability is rarely referred to. Reliability is critical for evaluation of therapeutic response because it represents the validity of measurement.[14],[15] The agreement tells how close the first and the second measurements observed are, while reliability tells how close the measurements observed and the true size are. To a tumor with true size of 5.0 cm, intuitively, if the first measurement observed was 3.0 cm and the second measurement observed was 2.9 cm, agreement of measurements observed would be considered good because the difference of two measurements observed was so little (0.1 cm), but reliability would be poor because the measurements observed (3.0 cm or 2.9 cm) was so far from the true size of 5.0 cm.

We compared the agreement and reliability of unidimensional, bidimensional, and volumetric techniques, and the results revealed that volumetric technique had the best agreement and reliability, indicating that volumetric measurements were optimal in consistency between raters (agreement) and between measurements observed and true measurements (reliability). In addition, given this increased interest in quantitative tumor measurements, it becomes important to understand what measurement changes are meaningful rather than a result of variability of measurement.[12] Our results showed that the 95% confidence interval (CI) of agreement of volumetric technique was from −11.2% to 11.6% for repeatability study and −23.9% to 25.8% for reproducibility study, indicating that a meaningful or true change can be determined as differences between measurements observed are beyond these 95% CIs, because measurement variability will be within these 95% CIs.

Is conventional inter-observer variability really a result of subjective effect?

Our current results indicated that both object effect (measurement variability resulted from tumor characteristics alone) and subjective effect (measurement variability resulted from rater characteristics alone) could influence inter-rater variability. However, the vast majority of variability was a result of object effect rather than subjective effect. What does that mean? It means that the inter-observer variability is primarily not a result of subjective effect. If the inter-observer variability is intrinsic to observers, it would be closely changed as observer changed, otherwise the association would be extrinsic. For example, there is a regular tumor and an irregular tumor; different observers have different measurements observed both in regular and irregular tumors. As we know, however, the differences of measurements observed would be smaller in regular tumor than that in irregular tumor to all observers.

Before the era of advanced volume technique, Erasmus et al.[7] concluded that measurements of lung tumor size on CT scans were often inconsistent and consistency can be improved if the same reader performs serial measurements for any one patient. With the development of computer-aided methods or automation techniques, measurement variability resulted from observers would be minimized or eliminated. Therefore, we think that the future efforts should be focused on the consistence of determining tumor borderline, which is more convenient and accurate in clinical practice.

Mathematical functions between variability and tumor size

Although the effect of pulmonary nodule characteristics on measurement has been reported in a number of studies, including nodule morphology, location, size, inspiration level, and segmentation,[8],[9],[10],[11],[13] there are limited data on object characterization in pulmonary masses. Our study showed that tumor segmentation, i.e., how to delineate the boundary of a tumor, was related with volumetric measurement variability, which is accordance with the previous study reporting that segmentation represents the most important factor contributing to measurement variability.[8] With the development of computer-aided methods or automation techniques, segmentation technique, i.e., how to delineate the boundary of a tumor, would become one of the most important points in tumor measurements.

It should be noted that nonlinear relationship was of significance between tumor size and volume variability in our study. Oxnard et al.[12] reported that larger tumors tend to have larger magnitude measurement changes in millimeters, but an opposite relationship occurred in relative change (percent increase or percent decrease). However, our results showed that nonlinear relationship had better goodness of fit than that of linear relation. The nonlinear relationship (point conic, or quadratic function, or U-shaped curve) reveals that medium-sized tumors tended to have the smallest variability. This is an interesting finding and the fact that medium-sized lesions are more reliably measured and very small and very large lesions are difficult to measure accurately.

Although volumetric quantification produced a promising result, accurate determination of response may require functional and molecular techniques.[16],[17] In addition, we did not determine the threshold of evaluating therapeutic response for volumetric technique.

   Conclusion Top

Volumetric technique has the minimum variability in measuring lung cancer with CT, and the vast majority of variability is a result of object effect (tumor characteristics). Moreover, medium-sized lesions are more reliably measured according to the established U-shaped curves between variability and tumor size.


We would like to thank our research assistants Huiming Wu, Zhenzhen He, Ting Teng, Lei Jiang, Xiang Gao, Yujiao Xu, Jie Deng, Xiaohui Wang, and Yandui Sai for their excellent effort in the collection of the data for this study.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

   References Top

Miller AB, Hoogstraten B, Staquet M, Winkler A. Reporting results of cancer treatment. Cancer 1981;47:207-14.  Back to cited text no. 1
Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst 2000;92:205-16.  Back to cited text no. 2
Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). Eur J Cancer 2009;45:228-47.  Back to cited text no. 3
Revel MP, Lefort C, Bissery A, Bienvenu M, Aycard L, Chatellier G, et al. Pulmonary nodules: Preliminary experience with three-dimensional evaluation. Radiology 2004;231:459-66.  Back to cited text no. 4
Prasad SR, Jhaveri KS, Saini S, Hahn PF, Halpern EF, Sumner JE. CT tumor measurement for therapeutic response assessment: Comparison of unidimensional, bidimensional, and volumetric techniques initial observations. Radiology 2002;225:416-9.  Back to cited text no. 5
Dinkel J, Khalilzadeh O, Hintze C, Fabel M, Puderbach M, Eichinger M, et al. Inter-observer reproducibility of semi-automatic tumor diameter measurement and volumetric analysis in patients with lung cancer. Lung Cancer 2013;82:76-82.  Back to cited text no. 6
Erasmus JJ, Gladish GW, Broemeling L, Sabloff BS, Truong MT, Herbst RS, et al. Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: Implications for assessment of tumor response. J Clin Oncol 2003;21:2574-82.  Back to cited text no. 7
Gietema HA, Schaefer-Prokop CM, Mali WP, Groenewegen G, Prokop M. Pulmonary nodules: Interscan variability of semiautomated volume measurements with multisection CT – Influence of inspiration level, nodule size, and segmentation performance. Radiology 2007;245:888-94.  Back to cited text no. 8
Gietema HA, Wang Y, Xu D, van Klaveren RJ, de Koning H, Scholten E, et al. Pulmonary nodules detected at lung cancer screening: Interobserver variability of semiautomated volume measurements. Radiology 2006;241:251-7.  Back to cited text no. 9
Goodman LR, Gulsun M, Washington L, Nagy PG, Piacsek KL. Inherent variability of CT lung nodule measurements in vivo using semiautomated volumetric measurements. AJR Am J Roentgenol 2006;186:989-94.  Back to cited text no. 10
Iwano S, Okada T, Koike W, Matsuo K, Toya R, Yamazaki M, et al. Semi-automatic volumetric measurement of lung cancer using multi-detector CT effects of nodule characteristics. Acad Radiol 2009;16:1179-86.  Back to cited text no. 11
Oxnard GR, Zhao B, Sima CS, Ginsberg MS, James LP, Lefkowitz RA, et al. Variability of lung tumor measurements on repeat computed tomography scans taken within 15 minutes. J Clin Oncol 2011;29:3114-9.  Back to cited text no. 12
Wang Y, van Klaveren RJ, van der Zaag-Loonen HJ, de Bock GH, Gietema HA, Xu DM, et al. Effect of nodule characteristics on variability of semiautomated volume measurements in pulmonary nodules detected in a lung cancer screening program. Radiology 2008;248:625-31.  Back to cited text no. 13
Bartlett JW, Frost C. Reliability, repeatability and reproducibility: Analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol 2008;31:466-75.  Back to cited text no. 14
Farrell T, Cairns M, Leslie J. Reliability and validity of two methods of three-dimensional cervical volume measurement. Ultrasound Obstet Gynecol 2003;22:49-52.  Back to cited text no. 15
Zhao B, Oxnard GR, Moskowitz CS, Kris MG, Pao W, Guo P, et al. A pilot study of volume measurement as a method of tumor response evaluation to aid biomarker development. Clin Cancer Res 2010;16:4647-53.  Back to cited text no. 16
Nishino M, Dahlberg SE, Cardarella S, Jackman DM, Rabin MS, Ramaiya NH, et al. Volumetric tumor growth in advanced non-small cell lung cancer patients with EGFR mutations during EGFR-tyrosine kinase inhibitor therapy: Developing criteria to continue therapy beyond RECIST progression. Cancer 2013;119:3761-8.  Back to cited text no. 17


  [Figure 1], [Figure 2]

  [Table 1], [Table 2], [Table 3], [Table 4]

This article has been cited by
1 A Model-Strengthened Imaging Biomarker for Survival Prediction in EGFR-Mutated Non-small-cell Lung Carcinoma Patients Treated with Tyrosine Kinase Inhibitors
Annabelle Collin, Vladimir Groza, Louise Missenard, François Chomy, Thierry Colin, Jean Palussière, Olivier Saut
Bulletin of Mathematical Biology. 2021; 83(6)
[Pubmed] | [DOI]
2 Intervention to Reduce Interobserver Variability in Computed Tomographic Measurement of Cancer Lesions Among Experienced Radiologists
MinJae Woo, Steven C. Lowe, A. Michael Devane, Ronald W. Gimbel
Current Problems in Diagnostic Radiology. 2021; 50(3): 321
[Pubmed] | [DOI]
3 Validating impact of pretreatment tumor growth rate on outcome of early-stage lung cancer treated with stereotactic body radiation therapy
Soha Atallah, Lisa W. Le, Andrea Bezjak, Robert MacRae, Andrew J. Hope, Jason Pantarotto
Thoracic Cancer. 2021; 12(2): 201
[Pubmed] | [DOI]
4 Deep learning for semi-automated unidirectional measurement of lung tumor size in CT
MinJae Woo, A. Michael Devane, Steven C. Lowe, Ervin L Lowther, Ronald W. Gimbel
Cancer Imaging. 2021; 21(1)
[Pubmed] | [DOI]
5 Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre
MinJae Woo, Moonseong Heo, A Michael Devane, Steven C Lowe, Ronald W Gimbel
BMJ Open. 2020; 10(11): e040096
[Pubmed] | [DOI]
6 Dynamic evolution of lung abnormalities evaluated by quantitative CT techniques in patients with COVID-19 infection
Xinglong Feng, Xuemei Ding, Fuzhou Zhang
Epidemiology and Infection. 2020; 148
[Pubmed] | [DOI]


Print this article  Email this article
    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Article in PDF (688 KB)
    Citation Manager
    Access Statistics
    Reader Comments
    Email Alert *
    Add to My List *
* Registration required (free)  

    Article Figures
    Article Tables

 Article Access Statistics
    PDF Downloaded325    
    Comments [Add]    
    Cited by others 6    

Recommend this journal