

Note: percentile returned is approximate, may be a tad moreĭirectory: directory of all the images you want to compare

Lows,highs: images with low mean, images with high mean into a dataframeįind_tiny_image_differences(directory, s=5, percentile=8): """ The high mean OR low mean sets- note if you set to 50, then all images will be high or low The same mean are in and out of normal range if the knife so falls Important note: approximate, and it can by chance cut the group so images with Lows,highs: images with low mean, images with high meanįind_outliers_by_mean_to_df(source_directory, percentage_to_say_outliers): """ Percentage_to_say_outliers: a number which will be the percentage of images contained in Source_directory: directory with image files (should be more than 20) Technically not a return, but puts cropped images into target folderįind_by_sample_upper(source_directory, percent_height_of_sample, value_for_line): """įunction that takes top (upper percent) of images and checks if average pixel value is above value_for_lineįind_sample_upper_greater_than_lower(source_directory, percent_height_of_sample): """įunction that checks that upper field (cut on percent_height of sample) of imagae has a higher pixel value than the lower field (it should in a typical CXR)ĭef find_outliers_by_total_mean(source_directory, percentage_to_say_outliers): """ Technically not a return but puts augmented images into a new folderĬrop_them_all(origin_folder, target_folder): """ Transformations : example tranformations =. Target_folder: folder to drop images after transformations Origin_folder: folder with 'virgin' images Run on image files which are inside a folder to check if they are "clean":Īugment_and_move(origin_folder, target_folder, transformations): """

Seperate_image_averger(set_of_images, s=5 ): """Ĭanvas/len(set_of_images): an average tiny image (can feed another function which compares to this mini) Run on a list to make a prototype tiny Xray others can be comapared to: NB: expanded from simpler crop to handle for PIl and nonPIl types of JPEGS Pics_in_both_groups: duplications of any image into both sets as a new dataframeĬrop off excessive black frames (run this on single images) one at a time: UniqueID (str): string name of column with image ID, patient IDs or some other unique ID that is in all dfs Test_df (dataframe): dataframe describing test dataset Train_df (dataframe): dataframe describing train dataset Some other functions are for cleaning datasets including ones that: Run on dataframes to make sure there is no image leakage:Ĭheck_paths_for_group_leakage(train_df, test_df, uniqueID): """

Some newly added functions allow for rapid automated data augmentation (in ways that are realistic for X-rays). Some unit tests are availalable in the test folder. We are adding new functions all the time.
CLEANX TWITTER CODE
You can get in touch with me by email ( if you have a legitamate reason to use my library without open-sourcing your code base,or following other conditions, and I can make you specifically a different license.
CLEANX TWITTER LICENSE
The GLP license implies you should be open sourcing your entire code base, and sending me modifications.
CLEANX TWITTER FREE
You are only free to use this library according to license. If you use the library, please credit me and my collaborators. Setting up a virtual environment is desirable, but not absolutely neccesary Documentation is generatedĪutomatically as new functions are added. We encourage you to build up-to-date documentation by command.ĭocumentation can be generated by command: python setup.py apidoc
