Figure 4. Key genetic features for zoonotic source prediction of Salmonella enterica serotype Typhimurium using Random Forest classifier. A) Change of out-of-bag prediction error rate as incremental inclusion of top ranking genetic features for source prediction. Red lines indicate median values; blue boxes indicate interquartile ranges. Upper and lower whiskers indicate maximum and minimum values. Circles indicate outliers. B) Distribution of top 50 source predicting features among Salmonella Typhimurium isolates on the basis of their location. Cyan, bovine; yellow, poultry; light green, wild bird; blue, swine; dark green, miscellaneous food; red, human; gray, other sources. The presence of a feature in an isolate is shown as a horizontal line in the corresponding location, with its grayscale representing the level of the MD of prediction accuracy through randomly permuting values of the feature. The higher the MD, the more important the feature is for source prediction. MD, mean decrease.