Feature analysis, K-Nearest neighbor, Random Forest, Machine learning
Abstract
The performance of a machine learning algorithm depends on the algorithm’s complexity, and the feature representation of data is another critical factor. A standard feature representation method is based on expert prior knowledge, which transforms data into discrete feature representation, and each attribute represents the vital information in prior knowledge for the task. However, in practical application, the prior knowledge and the actual performance of the model are often contradictory, so it is essential to identify the features that can improve the machine learning model in machine learning, data mining and statistical modelling. Specifically, feature selection aims to identify and delete irrelevant or redundant features in the data while retaining those features that contribute the most to the model’s prediction performance. By removing irrelevant or noisy features, feature analysis can improve the model’s accuracy, generalization ability and robustness. This paper discusses the influence of feature selection on the performance of K nearest neighbour (KNN) and random forest (RF) algorithms in machine learning applications. The model’s accuracy change is observed and analysed by deleting the features in the data set one by one. Then, the feature importance ranking provided by the random forest algorithm is compared to reveal their correlation and difference in feature selection. The experimental results show that although the two methods differ in feature evaluation, they can effectively guide feature optimization and improve model performance.