Implementation of C4.5 and K-Nearest Neighbor to Predict Palm Oil Fruit Production on Local Plantations

: The agricultural sector, especially oil palm plantations, has an important role in the Indonesian economy. Oil palm farming has increased the welfare of farmers and provided employment. In this study, researchers used the C4.5 and K-Nearest Neighbor algorithms in the RapidMiner application to analyze oil palm fruit production data after going through the data preprocessing process. The C4.5 algorithm produces a decision tree with an accuracy of 94.12%, while the KNN algorithm achieves an accuracy of 82.53%. Based on these results, it can be concluded that the C4.5 algorithm has a higher accuracy in classifying oil palm fruit production based on existing attributes.


Introduction
The agricultural sector is one of the elements that gets top priority in development activities in Indonesia (Widyasari et al., 2020;Sirait et al., 2019;Fadhil et al., 2021).This is because Indonesia is an agricultural country, which means that agriculture plays an important role in the national economy as a whole (Handayani et al., 2022;Susanti et al., 2018).The role of the agricultural sector in the economy of a country or a region can be seen from several aspects.The contribution of the agricultural sector to the Gross Domestic Product (GDP) or to the Gross Regional Domestic Product (GRDP) (Nuraini et al., 2022;Muskitta et al., 2023).The contribution of the agricultural sector to employment opportunities.The ability of the agricultural sector to provide a variety of food menus will greatly affect consumption patterns and community nutrition.The ability of the agricultural sector to support the development of upstream and downstream industries (Diffenbaugh et al., 2019).Exports of agricultural products will contribute foreign exchange to the country (Diharja et al., 2022).
The agricultural sector is a very strategic factor, becoming the basis of the people's economy in rural areas, controlling the lives of the majority of the population, absorbing more than half of the total workforce, and even being a support during the Indonesian economic crisis (Mustika et al., 2022;Saygılı et al., 2022).
Oil palm is an industrial plant or a useful plant (Akram et al., 2022;Hutapea et al., 2023).Oil palm trees are used as ingredients for crude cooking oil, industrial oil, margarine, soap, cosmetics, and the pharmaceutical industry, the most popular of which is processed palm oil is in the fruit of the oil palm (Maluin et al., 2020;Tang et al., 2020).The fleshy part of the oil palm fruit will produce crude oil which is then processed into raw material for cooking oil.The remaining processing is used as animal feed and mixed ingredients which are fermented into compost (Mehraban et al., 2022;Muhsi et al., 2023;Tsujino et al., 2016).
Based on these data, oil palm plantations can generate huge profits, so that so many forests and plantations are converted into palm oil plantations.In Indonesia, the distribution of oil palm is located on the islands of Sumatra, Kalimantan, Java, Sulawesi and Papua.East Kotawaringin Regency is the largest oil palm plantation in Indonesia.Not only owned by large companies but there are also plantations owned by local farmers (Wahyuningsih et al., 2023;Zannah et al., 2023).To manage oil palm requires farmers with the aim that regional farmers can help improve a good standard of living, which is very dependent on the income earned.However, the reality is that some of them still have low incomes, which impacts their daily lives.Apart from oil and natural gas, the large plantation sub-sector also plays a very important role in the development of the industrial economy.Oil palm plantations play an important role in Indonesia's economic development, apart from oil and gas.The growth of palm oil products is in line with developments in technology and the food and non-food industry for industrial use (Bhuyar et al., 2021).
Palm oil is an important and strategic commodity in East Kotawaringin Regency because of its significant role in driving the people's economy, especially for local plantation farmers.As the prima donna plant of the East Kotawaringin community, this is reasonable because the area is suitable and has the potential for the development of plantation agriculture (Findawati et al., 2019).Until now, the plantation business is still an alternative to improve the family economy in East Kotawaringin.Therefore, public interest in plantation development is still high.
Farming is a science that studies how to efficiently and effectively use resources in agricultural business to produce maximum results.These resources consist of land, labor, capital, and management.The success of a farming business can be seen from the amount of income earned by farmers in managing their business.Income is calculated as the difference between the value of receipts and costs incurred in the farming process.Analysis of farm income requires two main components, namely revenue and expenditure during a specified period (Susilawati et al., 2022).
Oil palm farming has increased the welfare of rural farmers.Apart from that, according to Afifuddin, the development of the oil palm sub-sector has provided considerable employment opportunities and has become a source of income for farmers.Palm oil is also a commodity that plays an important role in increasing local revenue (PAD), Gross Domestic Product (GDP), and people's welfare.The market prospects for processed palm oil are promising because demand continues to increase from year to year, not only domestically but also abroad.As a tropical country that has large tracts of land, Indonesia has a great opportunity to develop oil palm plantations, both through foreign investment and smallholder plantations (Nainggolan et al., 2021;Ramadhana et al., 2021).
Research on determining the amount of oil palm production that will affect the quality of palm oil production by applying the Decision Tree algorithm on good and bad criteria.The algorithm used is C.45 and K-Nearest Neigboar to classify datasets using variable land area (ha), temperature(oC) production (tons) rainfall.To get the grouping results, and the accuracy of the results will be used (Brandão et al., 2021).
Previous research is research that provides information and becomes a research reference based on research that has been done before, but the results of the research used are related to the research that is being carried out.Previous research that has been carried out includes research in a comparison of the C4.5, KNN, and Naive Bayes algorithms for determining the classification model for the person in charge of the BSI Entrepreneur Center in 2018.This research uses primary data of 300 records consisting of 12 attributes using the C.45, KNN and Naive Bayes algorithm methods to classify employees according to existing criteria (Fajri et al., 2022;Findawati et al., 2019;Kurnia et al., 2021;Santoso et al., 2021).
The results of the accuracy of the C4.5 algorithm get a better value than the KNN algorithm namely 61.64% while the results of the KNN accuracy are 45.21%, it can be concluded that the C4.5 algorithm is more accurate for determining bad loans (Findawati et al., 2019;Mailana et al., 2021).The results of the confusion matrix show that Naive Bayes has an accuracy of 100.00% and an AUC of 1,000 higher than C4.5 and KNN.So the Naive Bayes algorithm has better performance than KNN and C4.5 (Shandley et al., 2018).
Based on the description of the development of oil palm plantations and the increase in oil palm production in East Kotawaringin, this research is to find out about the implementation of C4.5 and K-Nearest Neighbor to predict oil palm fruit production in local plantations in East Kotawaringin.

Method
In conducting C4.5 and K-Nearest Neighbor implementation research to predict oil palm fruit production on local plantations in East Kotawaringin, research stages were used which contained steps to provide direction in discussing research problems (Ananda et al., 2023).The research will be carried out in East Kotawaringin district.The location determination was carried out taking into account the accuracy of the data that affected local farmers in East Kotawaringin district.The research was conducted in April 2023 -June 2023.The population in this study was palm oil production in each district in Central Kalimantan.While the sample of this study is the result of oil palm production in several districts in Central Kalimantan, which include: West Kotawaringin Regency, East Kotawaringin Regency, Kapuas Regency, South Barito, North Barito, Sukamara, Lamandau, Seruyan, Katingan, Pulang Pisau, Gunung Mas, Barito East, Murung Raya, and Palangka Raya.The design description of the research to be carried out is as follows:  The selection of the appropriate data analysis technique depends on the purpose of the analysis and the type of data available.A combination of several analytical techniques can also be applied to gain a more comprehensive understanding of oil palm fruit production.Data analysis techniques that can be used to analyze oil palm fruit production with the analysis of variance technique: This analysis of variance technique is used to understand the factors that influence variations in oil palm fruit production.Production data is collected along with other variables such as rainfall, temperature, altitude or soil type.Analysis of variance or multivariate analysis can be used to determine the significant effect of each variable on oil palm fruit production.

Results
The steps of this stage are the dataset preparation stage then followed by determining the roots and determining the rules and the last is the classification results.The data presented below is data obtained from analysis and information search based on the official website.
Preliminary data collection ensures accurate measurement in hectares (Ha) for each entity or unit.Land area values range into three different categories.First, Land area ranging from 0 to 250 hectares is categorized as "small".Second, Land area ranging from 251 to 1000 hectares are categorized as "moderate".Finally, Land area that exceeds 1000 hectares is categorized as "sacred".
Labeling Each data point that represents a land area is assigned a category label according to its value within a specified range.The addition of a category label improves the organization and understanding of the land area attribute in the data set.In this preprocessing procedure, the land area attribute is converted into a structured format that simplifies its analysis and interpretation, providing a clearer understanding of the size distribution of land area.
Description Collect temperature data points from a data set, to ensure accurate measurements in the appropriate temperature scale.temperature values into three different categories.Temperature values below 23.6 are categorized as "low".Temperature values between 23.0 and 27.0 are categorized as "medium" into a structured format that facilitates easier analysis and interpretation, contributing to a clearer understanding of temperature variation and its implications.Preprocessing of production quantity attributes involves collecting production quantity data from a data set, ensuring accurate measurements in appropriate units (such as hectares).Sets the Range of production quantity values into three different categories based on defined thresholds.Production quantities from 0 to 500 acres are categorized as "small".Production quantities from 500 to 1000 acres are categorized as "medium".categorized as "many".Each production quantity data point is assigned a category label based on its value within a defined range.Production quantity attributes are transformed into a structured format that simplifies their analysis and interpretation, contributing to a clearer understanding of production quantity variation and its significance.Production quantity data from data set, ensures accurate measurement in appropriate units.Production quantity value ranges into three different categories based on defined thresholds.Production numbers below 20.7 are categorized as "low".Production numbers between 20.7 and 40 are categorized as "medium".Production quantities over 40 are categorized as "high".Each production quantity data point is assigned a category label based on its value within the specified range.
Inclusion of category labels improves the organization and clarity of the production quantity attributes in the data set.A clear picture of production levels and helps in understanding the distribution of production quantities.Such insights can be valuable in a variety of fields, including agriculture, manufacturing and resource management.C4.5 algorithm method to apply data that has undergone a data preprocessing process or data cleaning in the Rapidminer application.At this stage the researcher uses the C4.5 algorithm method to apply data that has undergone a data preprocessing process or data cleaning in the Rapidminer application.The proposed process model based on the oil palm fruit production dataset can be seen from the results of the C45 decision tree in the Figure 4.The conclusions of the rules that can be drawn from the decision tree that has been made.

Figure
Figure 1.Research Stages Design The data collection technique used is documentation.Documentation, namely collecting, reviewing and selecting existing data from the Central Bureau of Statistics in East Kotawaringin Regency.The data is adjusted to the variables used.The variables used in this study are production as the dependent variable and land area, number of farmers as independent variables.The details of the research variables are as follows:

Figure 2 .
Figure 2.Table of the Rapidminer Dataset

Table 1 .
Research Variables

Table 2 .
Data Sets

Table 3 .
Land Area Attribute Preprocessing n

Table 5 .
Production Attribute Preprocessing

Table 6 .
Preprocessing of Rainfall Attributes

Table 7 .
Preprocessing Results Data

Table of the
Rapidminer Dataset