I.  ADASYN for Imbalanced Learning:

This website includes the algorithms, demos, and source code implementation of the Adaptive Synthetic Sampling approach (ADASYN) for imbalanced learning, as presented in our original paper [1]. The essential idea of ADASYN is to use a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data is generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn. As a result, the ADASYN approach improves learning with respect to the data distributions in two ways: (1) reducing the bias introduced by the class imbalance, and (2) adaptively shifting the classification decision boundary toward the difficult examples.



ADASYN Algorithm:

Input

(1) Training data set with  samples. Define  and as the number of minority class examples and the number of majority class examples, respectively. Therefore,  and .

Procedure

(1) Calculate the number of synthetic data examples that need to be generated for the minority class:

where  is a parameter used to specify the desired balance level after generation of the synthetic data.  means a fully balanced data set is created after the generation process;

(2) For each example , find K nearest neighbors based on the Euclidean distance in  dimensional space, and calculate the ratio  defined as:

where  is the number of examples in the K nearest neighbors of  that belongs to the majority class, therefore ;

(3) Normalize  according to , so that  is a density distribution ;

(4) Calculate the number of synthetic data examples that need to be generated for each minority example :

(5) For each minority example , generate  synthetic data examples according to the following two steps:

            (i) Randomly choose one minority data example  from the K nearest neighbors for data .

            (ii) Generate the synthetic data example:

where  is the difference vector in  dimensional spaces, and  is a random number: .



III. Demos & Source Code:

Demo 1: Given the training data with class labels, generate synthetic minority class data.

Function:    [AdaSYNData, AdaSYNLabel] = ADASYN(TrainingData, TrainingLabel, beta , kNN)

This function returns the generated synthetic minority class data using our ADASYN approach according to a specified balance level beta. TrainingData is a  matrix where  is the total number of training data and  is the feature dimensions. TrainingLabel is a class label vector for TrainingData. kNN is an integer representing the number of nearest neighbors under consideration. 

Source Code: You can download the source code from here


Reference

The software package and examples provided here are associated with our following paper. If you are considering to use this algorithm in your research/work, please cite and refer to our following paper:

[1] H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning." in Proc. IEEE Int. Joint Conf. Neural Networks (IJCNN'18), pp. 1322-1328, 2008. [PDF].