Hur utvärderar man klinisk bildkvalitet med statistiska metoder? - PDF

Description
Hur utvärderar man klinisk bildkvalitet med statistiska metoder? SK-kurs, Medicinsk strålningsfysik, 8 okt 2013 Sammanhang Val av behandling Hälsoeffekt Behandlingseffekt Undersökning Örjan Smedby Radiologi

Please download to get full document.

View again

of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Health & Medicine

Publish on:

Views: 43 | Pages: 7

Extension: PDF | Download: 0

Share
Transcript
Hur utvärderar man klinisk bildkvalitet med statistiska metoder? SK-kurs, Medicinsk strålningsfysik, 8 okt 2013 Sammanhang Val av behandling Hälsoeffekt Behandlingseffekt Undersökning Örjan Smedby Radiologi IMH/CMIV Linköpings universitet Efficacy of Diagnostic Methods Level 1: Technical efficacy Technical, resolution, noise... Level 2: Diagnostic accuracy efficacy Hur ofta blir diagnosen rätt? Level 3: Diagnostic thinking efficacy Hur påverkas remittentens diagnostiska tänkande? Level 4: Therapeutic efficacy Hur påverkas valet av behandling? Level 5: Patient outcome efficacy Hur påverkas patientens hälsa? Level 6: Societal efficacy ytta och kostnader för samhället (Fryback DG, Thornbury JR. Med Decis Making 1991) Image vs. diagnostic accuracy entire diagnostic process Reliable ground truth RC study physical parameters Physical measuring tools Classical statistical tools Receiver operating characteristics Generalization of sensitivity and specificity How is sens. and spec. affected as threshold is changed? Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är chansen att en sjuk klassificeras rätt? Sensitivitet 25/30 = 83% Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är chansen att en frisk klassificeras rätt? Specificitet 105/120 = 88% Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är sannolikheten att en pat med pos test verkligen är sjuk? Positivt prediktionsvärde 25/40 = 63% Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är sannolikheten att en pat med neg test verkligen är frisk? egativt prediktionsvärde 105/110 = 95% Tröskelnivå Högre gräns för patologi: - sensitiviteten sjunker - specificiteten ökar Lägre gräns för patologi: - sensitiviteten ökar - specificiteten sjunker sensitivitet specificitet RC-curve Receiver operating characteristics sensitivity Area under RC curve (AURC): 1 perfect 0.5 worthless 1 specificity Generalization of sensitivity and specificity How is sens. and spec. affected as threshold is changed? Requires gold standard Requires large material Much work, large costs Image vs. diagnostic accuracy Image vs. diagnostic accuracy entire diagnostic process physical parameters entire diagnostic process Visual image concept physical parameters Reliable ground truth RC study Physical measuring tools Classical statistical tools Reliable ground truth RC study Visual grading experiment? Physical measuring tools Classical statistical tools Single images Rate image A on a scale from 1 to 5 Study types Image pairs Rate the difference between image A and B on a scale from 2 to +2 Typical: visibility of an anatomical structure Visually sharp reproduction of the thoracic aorta 1. Criterion is fulfilled 2. Criterion is probably fulfilled 3. Indecisive Criteria & rating scale 4. Criterion is probably not fulfilled 5. Criterion is not fulfilled Situation Types of data Patient P1 P2 P3 P4... Im1 Im2 Postprocessing PP1 PP2 PP3 bserver score Interval: numerical, continuous rdinal: ordered categories ominal: individual categories, no order I Measurement Rating score Persons A B C D Visual grading characteristics (VGC) (Båth & Månsson BJR 2007) För varje kvalitetsnivå: Hur stor andel uppfyller kravet med metod A resp. metod B? Metod A Metod B Figure 2. The visual grading characteristic (VGC) curve from the data presented in Tables 1 and 2. The boxes represent the operating points corresponding to the observer s interpretation of the scale steps of the rating scale. Patient 5 Discussion Statistical model system I Settings IC is a visual grading method for which valid statistical methods have been used most often previously. The dissatisfaction from the fact that the observer can only use processing a two-step rating scale in IC (criterion fulfilled/criterion not fulfilled) often leads to the use of VGA, enabling the use of multiple scale steps, although invalid statistical methods are often used. The use of VGC analysis can hopefully satisfy the needs for both a valid statistical bserver method and freedom for the observer. Furthermore, VGC analysis can be used directly on the image criteria defined by the European Commission giving statements of the needed levels of reproduction for certain anatomical landmarks without the need for extracting the relevant structures from the criteria and grading the visibility of these structures. This has the potential of leading to an increased validity in the use of the image Örjan criteria in Smedby, multiple-choice Linköping Univ. / Radiology (IMH) grading studies. However, VGC analysis is not limited to the use of European criteria. Modifications of the original criteria have been proposed for chest radiography [16], lumbar spine radiography [33] and mammography [23, 34] and these modified criteria as well as other relevant criteria may meritoriously be used. Furthermore, the grading task is not limited to normal anatomy. If applicable, grading of image criteria based on pathology may also be used. Postprocessing bserver VGC analysis consists of elements from both IC and relative and absolute VGA as well as from RC analysis. The concept of VGC analysis can be interpreted as IC meets RC with the VGC curve presenting the ICS B (the proportion of images rated as fulfilling a criterion for modality B) as a function of the ICS A (the proportion of images rated as fulfilling a criterion for modality A) for a grading task, just like the RC curve describes the TPF (the proportion of images rated as containing a signal for the positive images) as a function of the FPF (the proportion of images rated as containing a signal for the negative images) for a detection task. (ne important difference between the two curves being that the RC curve describes an observer s ability to separate the signal and noise distributions belonging to one modality from each other, whereas the VGC curve describes the observer s opinion about the separation of the image distributions from two modalities.) For the observer, the resulting study is similar to absolute VGA with the use of a multistep scale for grading the image. The resulting measure of image, AUC VGC, is finally, like in relative VGA, a relative measure of image, describing the image for modality B in comparison with modality A. Using the statistical methods of RC analysis, VGC analysis presents a solution to the need of nonparametric rank-invariant statistical methods for analysing the data from visual grading studies. The use of the RC technique for comparing data from studies other than detection tasks has been proposed previously. Sonn and Svensson [25] studied changes in activities of daily living (ADL) measured by a 10-level scale, the Staircase of ADL, in rehabilitation medicine and used the RC curve to analyse the M Båth and L G Månsson Im1 Im2 Im3 systematic change in ADL levels between two age groups. The use of the RC technique, enabling a statistically valid analysis of data, can probably be applied to many other rating10 tasks Strengths of VGC system I Settings Post-? Weaknesses of VGC The value of the AUC VGC can be criticized for the same reason as the Az can be questioned in RC analysis. The index A z is useful in most cases because it reflects accuracy in general through a range of possible operating points [35]. However, doubts have been expressed by some investigators concerning the fact that a large part of the area comes from the rightmost part of the curve and thereby include false positive fractions of limited or no clinical relevance. Also, crossing curves can cause confusion; one curve may have higher TPFs than another in the region of relevant FPFs, but if the curves cross for higher FPF values, the superiority for the first curve may be lost or even reversed if the area under each curve is used as an index of accuracy [27, 36]. In the same way a large part of the area of the VGC curve comes from a part of the curve which corresponds to a very low threshold of the observer for judging a criterion of being fulfilled possibly corresponding to an unacceptable image. The VGC curve It is important to realise that a VGC curve is completely determined by the two underlying distributions of the modalities being studied (in the same way as 174 The British Journal of Radiology, March 2007 regression score Statistical model Patient Logistic regression score Logit function logit (p) = log (p/(1 p)) Regression equation logit (p) = ax + b p = 1/(1 + exp(ax + b)) rdinal regression Statistical model Logit function logit (p) = log (p/(1 p)) Patient Regression equation logit (p) = ax + b p = 1/(1 + exp(ax + b)) VGR model logit (P(y n)) = a 1 Im1 +a 2 Im2 + b 1 PP1 +b 2 PP2 +b 3 PP3 + D P +E C n Im1 Im2 Im3 PP1 PP2 system I Settings Postprocessing bserver regression score (Smedby & Fredrikson, British Journal of Radiology 2010)! random effect Im1 Im2 Im3 fixed effect PP1 PP2 fixed effect Statistical model Patient system I Settings Postprocessing regression score Empirical data (Jakob De Geer) Coronary CTA 24 patients (P1 P24) Standard (310 mas Ref) and reduced dose (62 mas Ref) Reduced-dose images post-processed with 2D adaptive filter (Sharpview) Filtered and unfiltered reduced-dose images viewed by 9 radiologists (R1 R9) bserver random effect Criteria Criterion 1: Visually sharp reproduction of the thoracic aorta. Criterion 2: Visually sharp reproduction of the wall of the thoracic aorta. Criterion 3: Visually sharp reproduction of the heart. Criterion 4: Visually sharp reproduction of the left main coronary artery (LMA). Criterion 5: The image noise in relevant regions is sufficiently low for diagnosis. Rating scale 1.Criterion is fulfilled 2.Criterion is probably fulfilled 3.Indecisive 4.Criterion is probably not fulfilled 5.Criterion is not fulfilled Statistical model Results: filter effect Patient Postprocessing bserver unfiltered filterered regression (GLLAMM) score Criterion 1: Visually sharp reproduction of the thoracic aorta 2: Visually sharp reproduction of the aortic wall rdinal regression regression coefficient p value : Visually sharp reproduction of the heart : Visually sharp reproduction of the LMA : oise sufficiently low for diagnosis 0.96 Including mas effect Both standard-dose and reduced-dose images were viewed, reduced-dose images with and without filtering Postprocessing unfiltered filterered Statistical model with mas Patient I log mas setting bserver regression score Statistical model with mas etc. Dose reduction I Weight 1.0 Criterion Postprocessing unfiltered filterered Patient I log mas setting bserver Education regression (GLLAMM) score Probability of a score of 1 or mas Ref setting Unfiltered Filtered Results with mas Regression coefficients Criterion log (mas) adaptive filter 1: Visually sharp reproduction of the thoracic aorta : Visually sharp reproduction of the aortic wall : Visually sharp reproduction of the heart : Visually sharp reproduction of the LMA : oise sufficiently low for diagnosis Results with mas Regression coefficients Estimated Criterion log (mas) adaptive filter mas reduction 1: Visually sharp reproduction of the thoracic aorta % 2: Visually sharp reproduction of the aortic wall % 3: Visually sharp reproduction of the heart % 4: Visually sharp reproduction of the LMA % 5: oise sufficiently low for diagnosis % Criterion 1: Visually sharp reproduction of the thoracic aorta 2: Visually sharp reproduction of the aortic wall 3: Visually sharp reproduction of the heart 4: Visually sharp reproduction of the LMA 5: oise sufficiently low for diagnosis Results with mas Regression coefficients (95% confidence limits) Estimated adaptive mas log (mas) filter reduction 2.52 ( 2.88; 2.16) 2.53 ( 2.82; 2.24) 2.54 ( 2.91; 2.18) 2.52 ( 2.81; 2.24) 2.74 ( 3.04; 2.44) 0.45 ( 0.78; 0.11) 0.75 ( 1.07; 0.44) 0.74 ( 1.12; 0.36) 0.61 ( 0.91; 0.30) 0.77 ( 1.07; 0.46) 16% (6%; 27%) 26% (17%; 34%) 25% (15%; 36%) 21% (13%; 30%) 24% (17%; 32%) For analyzing diagnostic accuracy, RC studies are superior, but costly and cumbersome. Visual grading experiments describe visual image. Simple comparisons can be made with VGC. rdinal regression (VGR) makes it possible to obtain direct numeric estimates of the potential for dose reduction. Particularly useful when testing and optimising acquisition/post-processing protocols. Conclusion
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks