Welcome to the documentation for IQDM-Analytics!¶
IQDM-Analytics¶
What does it do?¶
IQDM Analytics is a desktop application that mines IMRT QA reports with IQDM-PDF and performs statistical analysis.
Executables¶
Single-file executables are available. See attachments in the latest release.
Other information¶
This library is part of the IMRT QA Data Mining (IQDM) project for the AAPM IMRT Working Group (WGIMRT).
Free software: MIT license
Documentation: Read the docs
Dependencies¶
iqdmpdf - Mine IMRT QA PDF’s
wxPython Phoenix - Build a native GUI on Windows, Mac, or Unix systems
Bokeh - Interactive Web Plotting for Python
NumPy - The fundamental package for scientific computing with Python
selenium - A browser automation framework and ecosystem
PhantomJS - PhantomJS is a headless web browser scriptable with JavaScript
pypubsub - A Python publish-subscribe library
Install and Run¶
If you prefer to run from source:
$ git clone https://github.com/IQDM/IQDM-Analytics.git
$ cd IQDM-Analytics
$ python iqdma_app.py
Note you may have to use pythonw instead of python, depending on your version.
TODO¶
Ability to cancel PDF-Miner thread
Unit testing (non-GUI)
Setup continuous integration
Consolidate/Clean-up UserSettings Dialog code
User Manual¶
This application is part of the IMRT QA Data Mining (IQDM) project for the AAPM IMRT Working Group (WGIMRT).
Introduction¶
IQDM Analytics is a desktop application designed to make IQDM-PDF more user friendly. IQDM-PDF is a python library used to mine data from IMRT QA PDF reports for the purpose of generating control charts, as recommended by AAPM TG-218.
Usage¶
The easiest way to use this application is to download an executable from the attachments in the latest release of IQDM Analytics.
Once you’ve launched the application, click on the PDF Miner icon in the toolbar. From there you can select a directory to scan and another to store a CSV file of mined data. Once this is complete (or if you already have an IQDM-PDF CSV file), click on Open in the toolbar of main window to import the CSV file.
The visuals are created with Bokeh and can be exported to HTML, SVG, or PNG files. Clicking the Save icon in the toolbar will open a window allowing you to apply temporary visual customizations prior to export. Alternatively, you can edit these visuals in Settings to store the changes permanently.
Supported Vendors¶
IQDM-PDF currently supports the following IMRT QA vendors / reports:
Sun Nuclear: SNC Patient
Scandidos: Delta4
PTW: Verisoft
Methods¶
PDF Mining¶
Generally speaking, the text from IMRT QA reports is extracted and sorted into boxes with coordinates, using pdfminer.six. Then IQDM-PDF searches for keywords to locate boxes containing data of interest. For more details, see the IQDM-PDF: How It Works page.
Although IQDM-PDF has very thorough testing, it is prudent for users to manually inspect the CSV file generated. If you find an error, please submit an issue with IQDM-PDF. If you provide an anonymized report reproducing the error, it can be included in the automated tests.
Data Parsing¶
Output from IQDM-PDF will be sorted in the following order:
Patient Name & ID ( & Plan ID/Name/SOPInstanceUID if available)
Analysis parameters (e.g., dose, distance, threshold, etc.)
Measurement date & time (or report date)
If multiple reports are found with this sorting, IQDM Analytics can be customized to select either the first or last report (by file creation timestamp), or be set to the min, mean, or max value (calculated per charting variable). See “Duplicate Value Policy” in Settings.
Control Charts¶
A control chart is simply a plot of chronological data with a center line and control limits (upper and lower). The center line is the mean value of all points. IQDM Analytics calculates a 2-point moving-range,
Control limits (\(CL\)) are calculated with
where \(3\) is the number of standard deviations, which can be customized in Settings. Since the chart is based on a 2-point moving-range, \(1.128\) is used (i.e., the value of \(d_2\)). Note that control limits are bounded if the population it describes also is bounded. For example, the UCL of a gamma pass-rate will not exceed 100%.
The control chart in the main view uses the following acronyms:
IC: In Control
OOC: Out Of Control
UCL: Upper Control Limit
LCL: Lower Control Limit
CSV Parsing¶
If you are opening a CSV file generated by IQDM-PDF, its format will be
automatically detected and loaded based on instructions from its matching JSON
file found in ~/Apps/iqdm_analytics/csv_templates
. If you develop your own
data mining script, you can still use IQDM-Analytics if you create a CSV
template (JSON formatted). Below is a simple example:
{
"columns": [
"patient",
"plan",
"field id",
"image type",
"date",
"DD(%)",
"DTA(mm)",
"Threshold(%)",
"Gamma Pass Rate(%)"
],
"analysis_columns": {
"uid": [0, 1, 2],
"date": 4,
"criteria": [5, 6, 7],
"y": [
{
"index": 8,
"ucl_limit": 100,
"lcl_limit": 0
}
]
}
}
columns¶
This is a list of columns to be imported, their values must match EXACTLY with the column header in your CSV.
analysis_columns: uid¶
This is a list of column indices, that when combined, create an ID that is unique to an “observation” or “case”. This is used to catch duplicate reports being read. You may specify as column headers instead of indices.
analysis_columns: date¶
The assigned date for chronological sorting is based on this column index. You may specify as a column header name instead of a column index.
analysis_columns: criteria¶
These indices are used to “widen” the data (i.e., separate your reports by pass-rate criteria). Generally speaking, this is really a list of independent variables.
analysis_columns: y¶
This is a list of dependent variables available for charting. Each item in this
list must be a dictionary with the keys index
, ucl_limit
, lcl_limit
.
If the charting variable has no bounded control limits, or you do not know them,
set the limit values to null
(e.g., "ucl_limit": null
). The value for
index
may also be a column header instead of a column index.
Settings¶
The Settings window allows you to customize plot visualizations such as colors, widths/sizes, line styles, and transparency (alpha). Additionaly, there are the following options:
Control Limit standard deviations
Set the number of standard deviations for UCL/LCL calculations
Duplicate Value Policy
If multiple reports are found for a given patient/date/ID, use either ‘first’, ‘last’, ‘min’, ‘mean’, or ‘max’ value
If “Enable Duplicate Detection” is unchecked, all reports will be considered unique observations / cases.
Multi-Threading Jobs
IQDM-PDF supports multi-threading, set the number of jobs used for PDF parsing
Analyze .pdf only
IQDM-PDF looks only at .pdf files by default, allow it to try parsing any file
Windows Users¶
The framework used to build this application (wxPython) leverages your operating system’s web viewer to render web pages (such as the Bokeh visuals in this application). Unfortunately, Windows still uses Internet Explorer (IE) emulation. This means there is no drag functionality (so no pan or zoom). These features can be recovered if you install Microsoft Edge Beta. If this is installed, you should be able to check “Enable Edge WebView Backend” in Settings. Note that it is much slower to initialize, but you can pan, zoom, and show/hide plot components when clicking on legend items.
Alternatively, you can export your chart as html or navigate to
~/Apps/iqdm_analytics/temp
where the last chart you generated will live
as an html file until you render a new one in IQDM Analytics. Then open the
file in your browser of choice for full interactive functionality.
Local File Storage¶
IQDM Analytics will create the directory ~/Apps/iqdm_analytics
. Your
options are stored here as a hidden file .options. This directory also
contains csv_templates
, logs
, and temp
directories. The
csv_templates
contains instructions for CSV parsing - stored as JSON files.
The logs
contains a iqdma.log
file if any python errors have been
caught. This file will be helpful when reporting any issues. The temp
directory is currently only used for html file storage on Windows.
PyInstaller¶
The executables for IQDM Analytics are generated with PyInstaller,
which basically packages a full version of python and necessary libraries.
When you run the executable, it unpacks into a temp directory with a location
depending on your OS, but starts with _MEIxxxxxx
where xxxxxx
is a
random number. If the application crashes or you kill the application, just
note that this folder will not be automatically purged.
IQDM Analytics¶
Data Table¶
A class to sync a data object and list_ctrl
- class iqdma.data_table.DataTable(list_ctrl: wx.ListCtrl, data: Optional[dict] = None, columns: Optional[list] = None, widths: Optional[list] = None, formats: Optional[list] = None)[source]¶
Bases:
object
Helper class for
wx.ListCtrl
Init DataTable class
- Parameters
list_ctrl (wx.ListCtrl) – the list_ctrl in the GUI to be updated with data in this class
data (dict) – data should be formatted in a dictionary with keys being the column names and values being lists
columns (list) – the keys of the data object to be visible in the list_ctrl
widths (list) – optionally specify the widths of the columns
formats (list) – optionally specify wx Format values (e.g., wx.LIST_FORMAT_LEFT)
- append_row(row: list, layout_only: bool = False)[source]¶
Add a row of data
- Parameters
row (list) – data ordered by self.columns
layout_only (bool) – If true, only add row to the GUI
- append_row_to_data(row: list)[source]¶
Add a row of data to self.data
- Parameters
row (list) – data ordered by self.columns
- property column_count: int¶
Number of columns
- Returns
Length of
columns
- Return type
int
- property data_for_csv: list¶
Iterate through
data
to get a list of csv rows- Returns
list of rows. Each row is a list of column data
- Return type
list of lists
- data_to_list_of_rows() → list[source]¶
Convert
data
into a list of rows as needed for list_ctrl- Returns
data in the format of list of rows
- Return type
list
- delete_all_rows(layout_only: bool = False, force_delete_data: bool = False)[source]¶
Clear all data from
data
and the layout view- Parameters
layout_only (bool) – If True, do not remove the row from self.data
force_delete_data (bool) – If true, force deletion even if layout is not set
- get_csv_rows() → list[source]¶
Convert
data
to a list of strings for CSV writing- Returns
Each item is a str for a CSV file
- Return type
list of str
- get_data_in_original_order() → dict[source]¶
Get
data
in the order it was original set- Returns
keys are column names with voalues of row data
- Return type
dict
- get_row(row_index: int) → list[source]¶
Get a row of data from self.data with the given row index
- Parameters
row_index (int) – retrieve all values from row with this index
- Returns
values for the specified row
- Return type
list
- get_value(row_index: int, column_index: int)[source]¶
Get a specific table value with a column name and row index
- Parameters
row_index (int) – retrieve value from row with this index
column_index (int) – retrieve value from column with this index
- Returns
value corresponding to provided indices
- Return type
any
- property has_data: bool¶
Check if there are any rows of data
- Returns
True if
row_count
> 0- Return type
bool
- increment_index(evt: Optional[wx.Event] = None, increment: Optional[int] = None)[source]¶
Increment the ListCtrl selection with an event or fixed increment
- Parameters
evt (wx.Event) – An event with a
GetKeyCode
methodincrement (int) – If no event is passed, use a fixed index increment
- property keys: list¶
Column names
- Returns
A copy of
columns
- Return type
list
- property row_count: int¶
Number of rows
- Returns
Length of first column in
data
- Return type
int
- property selected_row_data: list¶
Row data from the current selection in
wx.ListCtrl
- Returns
row data of the currently selected row in the GUI
- Return type
list
- property selected_row_index: list¶
Get the indices of selected rows in
wx.ListCtrl
- Returns
List of indices
- Return type
list
- set_column_width(index: int, width: int)[source]¶
Change the column width in the view
- Parameters
index (int) – index of column
width (int) – the specified width
- set_column_widths(auto: bool = False)[source]¶
Set all widths in layout based on
widths
- Parameters
auto (bool) – Use
wx.LIST_AUTOSIZE_USEHEADER
rather thanwidths
- set_data(data: dict, columns: list, formats: Optional[list] = None, ignore_layout: bool = False)[source]¶
Set data and update layout
- Parameters
data (dict) – data should be formatted in a dictionary with keys being the column names and values being lists
columns (list) – the keys of the data object to be visible in the list_ctrl
formats (list) – optionally specify wx Format values (e.g., wx.LIST_FORMAT_LEFT)
ignore_layout (bool) – If true, do not update layout
Importer¶
Import output from IQDM-PDF
- class iqdma.importer.CSVParser(json_file_path: str)[source]¶
Bases:
object
Import CSV Template from JSON
Initialization of CSVParser
- Parameters
json_file_path (str) – file path to JSON file containing CSV template info
- class iqdma.importer.ReportImporter(report_file_path: str, parser: str, duplicate_detection: bool)[source]¶
Bases:
object
Class to import IQDM-PDF CSV output
Initialize
ReportImporter
- Parameters
report_file_path (str) – File path to CSV output from IQDM-PDF
parser (str) – The parser used to generate the report. Either ‘SNCPatient2020’, ‘SNCPatientCustom’, ‘Delta4’, ‘Verisoft’, ‘VarianPortalDosimetry’
duplicate_detection (bool) – If true, apply a multi_value policy from options
- property charting_options: list¶
Column names of y-axis options
- Returns
Column names from
analysis_columns['y']
- Return type
list
- property criteria_col: list¶
Column names of analysis criteria options
- Returns
Column names from
analysis_columns['criteria']
- Return type
list
- property lcl¶
Lower Control Limit minimums
- Returns
keys are column names, values are minimum LCL values (or None)
- Return type
dict
- remove_non_numeric(val: str) → float[source]¶
Remove all non-numeric characters, convert to float, use to highjack
dtype
inwiden_data
- Parameters
val (str) – Any string
- Returns
val
converted into a float- Return type
float
- property ucl: dict¶
Upper Control Limit caps
- Returns
keys are column names, values are maximum UCL values (or None)
- Return type
dict
- property uid_col: list¶
Column names, when combined create a UID
- Returns
Column names from
analysis_columns['uid']
- Return type
list
- iqdma.importer.copy_default_csv_templates()[source]¶
Copy default JSON file form resources/csv_templates
- iqdma.importer.create_csv_template(parser: IQDMPDF.parsers.generic.ParserBase)[source]¶
Write a CSV_TEMPLATE to JSON
- Parameters
parser (ParserBase) – a parser from IQDMPDF
Stats¶
Modified DVHA-Stats for IQDM-PDF output
- class iqdma.stats.ControlChart(y, std=3, ucl_limit=None, lcl_limit=None, x=None, range=None)[source]¶
Bases:
object
Calculate control limits for a standard univariate Control Chart”
- Parameters
y (list, np.ndarray) – Input data (1-D)
std (int, float, optional) – Number of standard deviations used to calculate if a y-value is out-of-control.
ucl_limit (float, optional) – Limit the upper control limit to this value
lcl_limit (float, optional) – Limit the lower control limit to this value
range (tuple, list, ndarray) – 2-item object containing start and end index of
y
Initialization of a ControlChart
- property avg_moving_range¶
Avg moving range based on 2 consecutive points
- Returns
Average moving range. Returns NaN if arr is empty.
- Return type
np.ndarray, np.nan
- property center_line¶
Center line of charting data (i.e., mean value)
- Returns
Mean value of y with np.mean() or np.nan if y is empty
- Return type
np.ndarray, np.nan
- property chart_data¶
JSON compatible dict for chart generation
- Returns
Data used for Histogram visuals. Keys include ‘x’, ‘y’, ‘out_of_control’, ‘center_line’, ‘lcl’, ‘ucl’
- Return type
dict
- property control_limits¶
Calculate the lower and upper control limits
- Returns
lcl (float) – Lower Control Limit (LCL)
ucl (float) – Upper Control Limit (UCL)
- property out_of_control¶
Get the indices of out-of-control observations
- Returns
An array of indices that are not between the lower and upper control limits
- Return type
np.ndarray
- property out_of_control_high¶
Get the indices of observations > ucl
- Returns
An array of indices that are greater than the upper control limit
- Return type
np.ndarray
- property out_of_control_low¶
Get the indices of observations < lcl
- Returns
An array of indices that are less than the lower control limit
- Return type
np.ndarray
- property sigma¶
UCL/LCL = center_line +/- sigma * std
- Returns
sigma or np.nan if arr is empty
- Return type
np.ndarray, np.nan
- property x_ranged: list¶
Return
x
withinrange
- Returns
x
data fromrange[0]
torange[1]
- Return type
list
- property y_ranged¶
Return
y
withinrange
- Returns
y
data fromrange[0]
torange[1]
- Return type
list
- class iqdma.stats.IQDMStats(report_file_path: str, charting_column: str, multi_val_policy: str, duplicate_detection: bool, parser: str)[source]¶
Bases:
object
Modified DVHAStats class for IQDM-PDF output
Initialize
IQDMStats
- Parameters
report_file_path (str) – File path to CSV output from IQDM-PDF
charting_column (str) – Column of y-axis data
multi_val_policy (str) – Duplicate value policy from options
duplicate_detection (bool) – If true, apply a multi_value policy from options
parser (str) – CSV format
- get_index_by_var_name(var_name: str)[source]¶
Get the variable index by var_name
- Parameters
var_name (int, str) – The name (str) or index (int) of the variable of interest
- Returns
The column index for the given var_name
- Return type
int
- get_index_description() → tuple[source]¶
Get a dict of data and columns for
DataTable
- Returns
dict – Keys are column names with values being a list of values
list – Column names in order to be displayed
- univariate_control_chart(var_name: str, std: float = 3, ucl_limit: Optional[float] = None, lcl_limit: Optional[float] = None, range: Optional[tuple] = None)[source]¶
Calculate control limits for a standard univariate Control Chart
- Parameters
var_name (str, int) – The name (str) or index (int) of teh variable to plot
std (int, float, optional) – Number of standard deviations used to calculate if a y-value is out-of-control
ucl_limit (float, optional) – Limit the upper control limit to this value
lcl_limit (float, optional) – Limit the lower control limit to this value
range (tuple, list, ndarray) – 2-item object containing start and end index of
data
- Returns
stats.ControlChart class object
- Return type
- univariate_control_charts(**kwargs)[source]¶
Calculate Control charts for all variables
- Parameters
kwargs (any) – See univariate_control_chart for keyword parameters
- Returns
ControlChart class objects stored in a dictionary with var_names and indices as keys (can use var_name or index)
- Return type
dict
- property variable_count¶
Number of variables in data
- Returns
Number of columns in data
- Return type
int
- iqdma.stats.avg_moving_range(arr, nan_policy='omit')[source]¶
Calculate the average moving range (over 2-consecutive point1)
- Parameters
arr (array-like (1-D)) – Input array. Must be positive 1-dimensional.
nan_policy (str, optional) – Value must be one of the following: {‘propagate’, ‘raise’, ‘omit’} Defines how to handle when input contains nan. The following options are available (default is ‘omit’): ‘propagate’: returns nan ‘raise’: throws an error ‘omit’: performs the calculations ignoring nan values
- Returns
Average moving range. Returns NaN if arr is empty
- Return type
np.ndarray, np.nan
- iqdma.stats.process_nan_policy(arr, nan_policy)[source]¶
Calculate the average moving range (over 2-consecutive point1)
- Parameters
arr (array-like (1-D)) – Input array. Must be positive 1-dimensional.
nan_policy (str) – Value must be one of the following: {‘propagate’, ‘raise’, ‘omit’} Defines how to handle when input contains nan. The following options are available (default is ‘omit’): ‘propagate’: returns nan ‘raise’: throws an error ‘omit’: performs the calculations ignoring nan values
- Returns
Input array evaluated per nan_policy
- Return type
np.ndarray, np.nan
Utilities ported from DVHA-Stats¶
Common functions for DVHA-Stats. Copied to limit required libraries.
- iqdma.utilities_dvha_stats.apply_dtype(value, dtype)[source]¶
Convert value with the provided data type
- Parameters
value (any) – Value to be converted
dtype (function, None) – python reserved types, e.g., int, float, str, etc. However, dtype could be any callable that raises a ValueError on failure.
- Returns
The return of dtype(value) or numpy.nan on ValueError
- Return type
any
- iqdma.utilities_dvha_stats.csv_to_dict(csv_file_path, delimiter=',', dtype=None, header_row=True)[source]¶
Read in a csv file, return data as a dictionary
- Parameters
csv_file_path (str) – File path to the CSV file to be processed.
delimiter (str) – Specify the delimiter used in the csv file (default = ‘,’)
dtype (callable, type, optional) – Optionally force values to a type (e.g., float, int, str, etc.).
header_row (bool, optional) – If True, the first row is interpreted as column keys, otherwise row indices will be used
- Returns
CSV data as a dict, using the first row values as keys
- Return type
dict
- iqdma.utilities_dvha_stats.dict_to_array(data, key_order=None)[source]¶
Convert a dict of data to a numpy array
- Parameters
data (dict) – Dictionary of data to be converted to np.array.
key_order (None, list of str) – Optionally the order of columns
- Returns
A dictionary with keys of ‘data’ and ‘columns’, pointing to a numpy array and list of str, respectively
- Return type
dict
- iqdma.utilities_dvha_stats.get_sorted_indices(list_data)[source]¶
Get original indices of a list after sorting
- Parameters
list_data (list) – Any python sortable list
- Returns
list_data indices of sorted(list_data)
- Return type
list
- iqdma.utilities_dvha_stats.import_data(data, var_names=None)[source]¶
Generalized data importer for np.ndarray, dict, and csv file
- Parameters
data (numpy.array, dict, str) – Input data (2-D) with N rows of observations and p columns of variables. The CSV file must have a header row for column names.
var_names (list of str, optional) – If data is a numpy array, optionally provide the column names.
- Returns
A tuple: data as an array and variable names as a list
- Return type
np.ndarray, list
- iqdma.utilities_dvha_stats.is_numeric(val)[source]¶
Check if value is numeric (float or int)
- Parameters
val (any) – Any value
- Returns
Returns true if float(val) doesn’t raise a ValueError
- Return type
bool
- iqdma.utilities_dvha_stats.sort_2d_array(arr, index, mode='col')[source]¶
Sort a 2-D numpy array
- Parameters
arr (np.ndarray) – Input 2-D array to be sorted
index (int, list) – Index of column or row to sort arr. If list, will sort by each index in the order provided.
mode (str) – Either ‘col’ or ‘row’
Credits¶
Development Lead¶
Dan Cutright, University of Chicago Medicine
Contributors¶
Marc Chamberland, University of Vermont Health Network
Serpil Kucuker Dogan, Northwestern Medicine
Mahesh Gopalakrishnan, Northwestern Medicine
Aditya Panchal, AMITA Health
Michael Snyder, Beaumont Health
Change Log for IQDM-Analytics¶
v0.1.9 (2021.05.17)¶
Clean charting values of non-numerical characters on import
Prevent crash if only one UID columns is defined in CSV Template
Added CSV template for IBA MyQA
v0.1.8 (2021.03.28)¶
Implement CSV parsing with JSON templates, allowing for customization
Option to disable duplicate report detection
v0.1.7 (2021.03.14)¶
Added Control Chart using all data
MS Edge Support for Windows
IQDM-PDF bump to v0.3.0
v0.1.6 (2021.03.13)¶
Improved date parsing (IQDM-PDF bump to v0.2.9)
v0.1.5 (2021.03.10)¶
Last release before Change Log implemented