I. ADASYN for Imbalanced Learning:
This website includes the algorithms, demos, and source code implementation of the Adaptive Synthetic Sampling approach (ADASYN) for imbalanced learning, as presented in our original paper . The essential idea of ADASYN is to use a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data is generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn. As a result, the ADASYN approach improves learning with respect to the data distributions in two ways: (1) reducing the bias introduced by the class imbalance, and (2) adaptively shifting the classification decision boundary toward the difficult examples.
(1) Training data set with samples. Define and as the number of minority class examples and the number of majority class examples, respectively. Therefore, and .
(1) Calculate the number of synthetic data examples that need to be generated for the minority class:
where is a parameter used to specify the desired balance level after generation of the synthetic data. means a fully balanced data set is created after the generation process;
(2) For each example , find K nearest neighbors based on the Euclidean distance in dimensional space, and calculate the ratio defined as:
where is the number of examples in the K nearest neighbors of that belongs to the majority class, therefore ;
(3) Normalize according to , so that is a density distribution ;
(4) Calculate the number of synthetic data examples that need to be generated for each minority example :
(5) For each minority example , generate synthetic data examples according to the following two steps:
(i) Randomly choose one minority data example from the K nearest neighbors for data .
(ii) Generate the synthetic data example:
where is the difference vector in dimensional spaces, and is a random number: .
III. Demos & Source Code:
Demo 1: Given the training data with class labels, generate synthetic minority class data.
Function: [AdaSYNData, AdaSYNLabel] = ADASYN(TrainingData, TrainingLabel, beta , kNN)
This function returns the generated synthetic minority class data using our ADASYN approach according to a specified balance level beta. TrainingData is a matrix where is the total number of training data and is the feature dimensions. TrainingLabel is a class label vector for TrainingData. kNN is an integer representing the number of nearest neighbors under consideration.
Source Code: You can download the source code from here
The software package and examples provided here are associated with our following paper. If you are considering to use this algorithm in your research/work, please cite and refer to our following paper:
 H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN:
Adaptive synthetic sampling approach for imbalanced learning." in Proc. IEEE Int. Joint Conf. Neural Networks (IJCNN'18), pp. 1322-1328, 2008. [PDF].