This website includes the algorithms, demos, and source code implementation of the Adaptive Synthetic Sampling approach (ADASYN) for imbalanced learning, as presented in our original paper [1]. The essential idea of ADASYN is to use a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data is generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn. As a result, the ADASYN approach improves learning with respect to the data distributions in two ways: (1) reducing the bias introduced by the class imbalance, and (2) adaptively shifting the classification decision boundary toward the difficult examples.

Input

(1) Training data set with  samples. Define  and as the number of minority class examples and the number of majority class examples, respectively. Therefore,  and .

Procedure

(1) Calculate the number of synthetic data examples that need to be generated for the minority class:

where  is a parameter used to specify the desired balance level after generation of the synthetic data.  means a fully balanced data set is created after the generation process;

(2) For each example , find K nearest neighbors based on the Euclidean distance in  dimensional space, and calculate the ratio  defined as:

where  is the number of examples in the K nearest neighbors of  that belongs to the majority class, therefore ;

(3) Normalize  according to , so that  is a density distribution ;

(4) Calculate the number of synthetic data examples that need to be generated for each minority example :

(5) For each minority example , generate  synthetic data examples according to the following two steps:

(i) Randomly choose one minority data example  from the K nearest neighbors for data .

(ii) Generate the synthetic data example:

where  is the difference vector in  dimensional spaces, and  is a random number: .

III. Demos & Source Code:

Demo 1: Given the training data with class labels, generate synthetic minority class data.