2024_09_03_0a43a6152779f954be7eg

CCropLand30：通过最先进的遥感产品与最新国家土地调查的协同作用创建的中国高分辨率混合农田地图

张玲 , 王伟国 , 齐敏 , 应怡 , 惠 , 赵彦博
甘肃省遥感重点实验室，黑河遥感实验研究站，西北生态环境资源研究院，
中国科学院，兰州 730000，中国
山西师范大学，地理科学学院，中国太原 030000
成都信息工程大学资源与环境学院，中国成都 610225
江苏海洋大学海洋技术与测绘学院，中国江苏 222005
中国科学院大学，北京 100049，中国

文章信息

关键词：

混合农田地图

土地利用和土地覆盖产品
数据融合
国土调查

空间模式效率
食品安全
中国

摘要

A B S T R A C T Accurate information on the extents and dynamics of croplands is crucial to address the twin challenges of meeting the growing food needs while reducing environmental footprint. Despite the availability of numerous global and regional cropland maps, significant uncertainties and discrepancies persist in terms of overall area and spatial distribution. In this study, we developed high-resolution hybrid cropland maps of China (CCropLand30) by integrating state-of-the-art remote sensing land use and land cover products (i.e., GlobeLand30, GLAD, CLUD, CLCD, and CACD) with the latest national land survey (NLDS). To this end, we proposed a cost-effective datafusion approach, namely the majority voting-fuzzy agreement score (MV-FAS) method. The accuracy of CCropLand30 and the input cropland maps was evaluated using both visually interpreted and third-party samples, along with county-level NLDS data. CCropLand30 attains an overall accuracy, Kappa coefficient, and F1score of , and 0.86 , respectively, for the year circa 2020 . It demonstrates better agreement with the reference points compared to the input cropland maps, resulting in an enhancement in overall accuracy ranging 10-30%. Moreover, CCropLand30 shows superior spatial consistency with the NLDS data, achieving an average improvement in spatial pattern efficiency by . The superiority of CCropLand30 was further confirmed by regional visual comparison of the cropland maps. The CCropLand30 product reveals a clear decreasing trend in China's cropland area from 2000 to 2020 . The area of cropland reduced significantly in relatively waterabundant regions but expanded notably in arid and semi-arid regions with scarce water resources, raising concerns about the spatial mismatch between water and land resources in China. Our high-resolution hybrid cropland product, CCropLand30, will provide substantial support for cropland monitoring and management, as well as research in diverse fields.

1. 引言

Food security is a global concern due to population growth, shifting dietary patterns, climate change, and international instability (e.g., the conflict between Russia and Ukraine) (Fritz et al., 2013; Sloat et al., 2020; Zhu et al., 2022). While agricultural expansion and intensification are expedient ways to address food security, they concurrently have notable environmental footprints and adverse effects, such as biodiversity loss, land degradation, water depletion (Potapov et al., 2021; Meng et al., 2023; Xie et al., 2023). In order to tackle the twin challenges of the twenty-first century: meeting humanity's growing demand for food while reducing the environmental impacts of agriculture (Foley et al., 2011; Jägermeyr et al., 2017), it is crucial to have accurate and reliable information on the extent and dynamics in cropland (Waldner et al., 2015; Yang and Huang, 2021).

卫星观测为长期和大规模的农田制图提供了空间明确且具有成本效益的良好机会（Potapov et al., 2021）。在过去的二十年中，许多

https://doi.org/10.1016/j.compag.2024.108672

收到日期：2023 年 10 月 2 日；修订后收到日期：2023 年 12 月 11 日；接受日期：2024 年 1 月 23 日

在线可用日期：2024 年 2 月 3 日

0168-1699/© 2024 爱思唯尔公司。保留所有权利。
number of global and regional land use/land cover (LULC) products have been developed utilizing medium to coarse resolution remote sensing data (Bicheron et al., 2008; Pittman et al., 2010; Tateishi et al., 2011; Defourny et al., 2012; Teluguntla et al., 2018). However, most remote sensing products do not prioritize cropland mapping and have to balance various LULC classes in one mapping approach (Teluguntla et al., 2018; Van Tricht et al., 2023), which may deteriorate the accuracy of the resulting cropland maps (Yang et al., 2017). Moreover, such kinds of data are unable to identify small field patches due to the medium to coarse spatial resolution (Fritz et al., 2015; Tu et al., 2023).

In recent years, the increasing availability of high-resolution remote sensing data (e.g., Landsat and Sentinel-2) and cloud computing platforms has facilitated extensive efforts to create fine-resolution LULC products (Gong et al., 2013; Chen et al., 2015; Yang and Huang, 2021; Zhang et al., 2021b). Meanwhile, many high-resolution thematic cropland products have emerged (Teluguntla et al., 2018; Potapov et al., 2021; Tu et al., 2023). Despite these advancements, considerable uncertainties and inconsistencies persist in cropland maps when compared spatially (Xiong et al., 2017; Becker-Reshef et al., 2019; Zhang et al., 2022a; Tubiello et al., 2023), both in terms of overall area and spatial distribution. These discrepancies arise from the diversity and complexity of agricultural systems, differences in cropland definitions, classification techniques, source data, and the spectral similarity with grassland (Xiong et al., 2017; Gao et al., 2020; Lu et al., 2020). Consequently, the selection of suitable cropland map for specific applications poses significant challenges (Li et al., 2023a), potentially bring large uncertainties to the findings of downstream studies.

To reconcile inconsistences in cropland maps and enhance cropland mapping accuracy, an effective method is to create hybrid products through data fusion or synergy (Fritz et al., 2013; Fritz et al., 2015). Data synergy leverages complementary information from different data sources while mitigating their respective weaknesses (See et al., 2015; Teluguntla et al., 2018). Extensive efforts have demonstrated the effectiveness of data-synergy techniques in developing more reliable and accurate hybrid LULC products than the input maps (Ran et al., 2012; Yu et al., 2013; Kinoshita et al., 2014; Schepaschenko et al., 2015; See et al., 2015; Li et al., 2023a). Nonetheless, most of the data-synergy methods require considerable reference data for model training and validation (Schepaschenko et al., 2015), and meanwhile, the fusion results can be sensitive to the size and representativeness of the training samples (Zhong et al., 2019). While many crowdsourcing platforms such as GeoWiki and Collect Earth have been available for sample collection, ensuring the quality and consistency of the samples is challenging due to the varying domain knowledge of the contributors (See et al., 2015; Bey et al., 2016).

One potential approach to address the issue of training sample availability is to create hybrid cropland maps by integrating multiple satellite-derived cropland products with agricultural statistical and census data (Ramankutty et al., 2008; Fritz et al., 2011; Yu and Lu, 2018; Lu et al., 2020). This integration typically involves generating probabilistic cropland maps initially using cropland products derived from remote sensing data, and subsequently constraining the cropland extent using census or statistical data. The data synergy approach implicitly assumes that the remote sensing-derived cropland area is consistent with agricultural statistical and census data. However, there could be significant systematic biases between agricultural census and remote sensing products due to differences in data sources, cropland definitions, and measurement techniques. Consequently, the constraint of cropland extent directly by using census or statistical data may deteriorate the mapping accuracy (Schepaschenko et al., 2015). Furthermore, remote sensing cropland products may not be entirely independent from each other due to the adoption of automatic sample extraction from existing LULC products (Yang and Huang, 2021; Tu et al., 2023). The close linkage among the candidate maps may lead to unreliable probability maps of cropland, thereby influencing the accuracy of hybrid products. In addition, despite the growing availability of high-resolution LULC products, their integration with agricultural census or statistical data to create more precise hybrid cropland maps has been seldom carried out (Yang et al., 2020).

Reliable and high-resolution cropland maps are of utmost importance to China, as it sustains

of the world's population with only 8

of the global cropland. Over the past decades, China has experienced substantial and complicated changes in its cropland due to rapid socioeconomic development (Yang and Huang, 2021; Chen et al., 2022b; Tu et al., 2023). Given China's increasing food linkages with the rest of the world, accurate spatial information on cropland is crucial not only for agricultural monitoring within China, but also for assessing global food security. Furthermore, an overwhelming proportion of Chinese farms are small and fragmented, with the average crop field size being less than a hectare, necessitating the use of high-resolution cropland products for precise mapping of such fields (Teluguntla et al., 2018).

In this study, we proposed an innovative approach to create highresolution hybrid cropland maps of China (CCropLand30) at a fiveyear interval from 2000 to 2020 (Zhang et al., 2023). The major contributions of this study are threefold. Firstly, we introduced a biasinsensitive spatial pattern matching metric to evaluate and rank the accuracy of satellite-derived cropland maps. Second, a majority votingfuzzy agreement score (MV-FAS) method was proposed to fuse multiple remote sensing cropland maps with cropland survey data. Lastly, we created high-resolution and reliable hybrid cropland maps for China by synergistically combining state-of-the-art LULC products with China's latest national land survey (NLDS). The latest NLDS provides the most reliable information on cropland in China, and to the best of our knowledge, it was used for the first time in creating hybrid cropland maps. Our newly developed hybrid cropland maps will greatly support cropland monitoring and management, as well as research in diverse fields such as agriculture, water resources, and climate change.

2. 材料和方法

The workflow of this study is illustrated in Fig. 1. Our methodology began with the generation of a cropland agreement map by combining the cropland binary maps derived from the selected remote sensing LULC products. Next, the NLDS-reported cropland area was adjusted by incorporating the ridge field area estimates from a random forest model. The purpose of this adjustment was to improve the consistency between the cropland area estimates of the NLDS and the LULC products. We then proceeded with a spatial-pattern-oriented assessment and ranking of the candidate cropland maps by using a bias-insensitive and multicomponent spatial pattern matching metric (i.e., SPAEF). Subsequently, the hybrid cropland maps were created by integrating remote sensing cropland maps with the NLDS data through a novel MV-FAS approach. In the fifth step, we evaluated the accuracy of the hybrid cropland maps and the input maps using both visually interpreted and third-party reference points (

). Finally, we investigated the spatiotemporal dynamics of croplands in China during the 2000-2020 period using the newly developed cropland maps.

2.1. Datasets

2.1.1. State-of-the-art remote sensing LULC products

As listed in Table 1, we selected five state-of-the-art remote sensing LULC products to create hybrid cropland maps of China. These products include GlobeLand30 (Jun et al., 2014; Chen et al., 2015), global cropland extent and change dataset (GLAD) (Potapov et al., 2021), China's Land-use/cover data set (CLUD) (Liu et al., 2014; Xu et al., 2018), China's annual land cover dataset (CLCD) (Yang and Huang, 2021), and China's annual cropland dataset (CACD) (Tu et al., 2023). Our selection was mainly based on the following criteria: (1) the products were developed within the last decade; (2) the products should have a spatial resolution of about 30 m ; and (3) the products should cover multiple time periods spanning from 2000 to 2020. More details about

Fig. 1. Workflow of this study. Blue numbers deonotes the major steps involved in this study, while green boxes indicate the analysis tools and processes. Abbreviations including LULC, SPAEF, NLDS, FAR, MV and FAS correspond to land use and land cover, spatial pattern efficiency, national land suvery, field ridge area, majority voting, and fuzzy agreemen score, repsectively.

表 1
Basic information of the selected land use/land cover (LULC) products and China's national land survey (NLDS).

Product/

survey

空间覆盖

Temporal coverage

/survey period

Mapping method

耕地定义

参考

全球土地 30

地球仪

2000, 2010, 2020

Pixel-object-knowledge-

based approach

Land used for cultivating crops, including paddy fields, irrigated dryland,

rainfed dryland, and land primarily used for growing crops interspersed

with fruit trees and other economically valuable trees. Cultivated pasture

and shrub-based cash crops (e.g., tea, coffee) are also included.

Chen et al.

(2015)

GLAD

地球仪

2003, 2007, 2011,

2015,2019

Random forest

Land used for annual and perennial herbaceous crops for human

consumption, forage (including hay) and biofuel. Perennial woody crops,

permanent pastures and shifting cultivation are excluded

Potapov et al.

(2021)

CLUD

中国

1980 s to 2020 at

different intervals

Human-computer

interactive

interpretation

Land used for cultivating crops, including fallow land and crop rotation

fields, as well as land primarily used for growing crops interspersed with

fruit trees and other economically valuable trees, as well as cultivated river

banks coastal areas are also included.

Liu et al.

(2014)

CLCD

中国

1990 to 2021,

annually

Random forest

Similar to CLUD

Yang and

Huang (2021)

CACD

中国

1986 to 2021,

annually

Trajectory-based

approach

Piece of land of 0.09 ha in minimum (minimum width of 30 m ) that is

sowed/planted and harvestable at least once within the 12 months after

the sowing or planting date.

Tu et al.

(2023).

NLDS

中国

1984-1996,

Image interpretation and

land survey

Similar to CLUD

https://www.

mnr.gov.cn/

these products have been provided in Supplementary Texts.
While land used for cultivating annual crops is consistently categorized as cropland, there exist discrepancies in cropland definitions across the selected LULC products, as outlined in Table 1. For instance, GlobeLand30 designates cultivated pasture and shrub-based cash crops (e.g., tea and coffee) as cropland, whereas other LULC products exclude them. Cropland in CLUD and CLCD encompasses fallow land, which is excluded by GlobeLand30, GLAD, and CACD. Differences in cropland definitions across products might be an important driver of uncertainties in estimates of cropland area and spatial distribution (Tubiello et al., 2023). In this study, cropland is defined as land used for cultivating crops, including paddy fields, irrigated dryland, rainfed dryland, fallow land and crop rotation fields, land primarily used for growing crops interspersed with fruit trees and other economically valuable trees, as well as cultivated river banks and coastal areas. This definition is consistent with CLUD, CLCD, and the national land survey (see next section).

2.1.2. China's national land survey

Land survey, while time-consuming and costly, plays a crucial role in obtaining accurate and reliable information on land resources (Chen et al., 2022b). To date, China has conducted three national land surveys (NLDS), providing comprehensive and systematic census of China's land resources and utilization status over different time frames (Zhou et al., 2023). Notably, the third NLDS has the most reliable survey results regarding China's land resources among the three NLDS efforts, due to the use of advanced survey techniques along with a more detailed survey process (Supplementary Texts). However, over an extended period, the land survey maps and results have not been made publicly available for national security concerns. Fortunately, the Ministry of Natural

Resource of the People's Republic of China has recently released the county-level survey results of the second and third NLDS, detailing the areas of different land use types (https://www.mnr.gov.cn/). To our best knowledge, this study is the first effort to evaluate and develop LULC products by utilizing the third NLDS data.

2.2. Cropland agreement map

The cropland agreement map was generated by combining five cropland binary maps derived from the selected LULC products. Initially, the selected products were unified to geographic coordinates using the WGS84 reference system with a pixel size of approximately

degrees (equivalent to

at the Equator). Subsequently, the multiple-class LULC products (including GlobeLand30, CLUD, CLCD) were reclassified into binary cropland maps, with values of 1 indicating cropland and 0 indicating noncropland. Finally, the five binary cropland maps were summed to produce a cropland agreement map with positive pixel values ranging from 1 to 5 . These values represent the number of products that identified the pixel as cropland and indicate different levels of spatial agreement (SA) in cropland identification by the candidate cropland maps (Tubiello et al., 2023).

2.3. Field ridge area estimation

Despite the "what you see is what you get" survey method of the third NLDS, there are still significant systematic biases between its cropland area estimates and those derived from remote sensing LULC products. The NLDS-estimated cropland area represents the net cropland area, excluding areas such as field ridges, and linear and scattered features like roads, ponds and houses (Supplementary Figure S1). In contrast, the cropland area derived from remote sensing LULC products represents the gross cropland area including subpixel non-cropland areas. As a result, remote sensing-derived cropland area typically exhibits an obvious positive systematic bias compared to the NLDS reports (Zhang et al., 2021a; Zhang et al., 2022b). To enhance the consistency between the NLDS and remote sensing data, we adjusted the original cropland area of NLDS by including the ridge field area.

The county-level field ridge area was not reported by the third NLDS, but the prefecture-level data was provided by the second NLDS. In this study, we employed a machine learning model (i.e., random forest algorithm) to estimate the county-level field ridge area in the third NLDS. We first built a random forest model for prefecture-level field ridge coefficients (i.e., the ratio of field ridge area to total cropland area) using elevation, slope, dryland ratio (i.e., the ratio of dryland area to total cropland area), and administrative identity as the explanatory factors, as in Eq. (1).

error
where

represent field ridge coefficient, elevation, slope, dryland ratio, admirative identity, respectively; and

is the prefecture. The RF algorithm was implemented using the MATLAB TreeBagger function, with the hyperparameters (including the number of trees and the minimum number of observations per node) being determined through a trial-and-error process. We evaluated the performance of the random forest model via a five-fold-cross-validation technique. The model achieved a higher coefficient of determination (0.79), indicating a favorable performance (Supplementary Figure S2). Next, we used the model to estimate county-level field ridge coefficients by incorporating county-level driving variables. The original estimates were bias-corrected using Eq. (2), assuming that prefecture-level field ridge coefficients remain consistent between the second and third NLDS.

where

and

are the bias-corrected and originally estimated field ridge coefficients, respectively, in county

of prefecture

is the cropland area reported by the third NLDS;

is the field ridge coefficient reported by the second NLDS;

is the number of counties in prefecture i. Finally, we adjusted the county-level cropland area of the third NLDS, as outlined in Eq. (3)

where

is the adjusted county-level cropland area in county

of prefecture

. The adjusted cropland area of the third NLDS is thereafter referred to as the NLDS-estimated cropland area.

2.4. Spatial pattern efficiency metric

Despite the inclusion of ridge field area, the surveyed cropland area still exhibits systematic biases relative to the estimates of remote sensing products. This is attributed to the difficulty in determining the area covered by linear and scattered features (e.g., roads, ponds, and houses) within the cropland grids of remote sensing products. To account for the remaining systematic biases, we introduced a novel multicomponent spatial pattern matching metric to assess and rank the accuracy of candidate cropland maps. This metric is known as Spatial Pattern Efficiency (SPAEF) (Demirel et al., 2018; Koch et al., 2018) and is formulated as Eq. (4). SPAEF simultaneously assesses the strength of the monotonic relationship and relative variability between the cropland area estimates of the NLDS and the remote sensing LULC products, as well as their spatial matching degree at the county scale (Dembélé et al., 2020). Therefore, SPAEF is a bias-insensitive metric that emphasizes the spatial patterns of cropland rather than their magnitudes.
SPAEF

where

is the Pearson correlation coefficient between the cropland area estimates of the NLDS and the remote sensing products; Beta is calculated as:

, where

and

indicate the standard deviation and mean values, respectively. Beta denotes the variability of remote sensing-derived cropland area relative to the estimates of the NLDS; Gama is the spatial pattern matching term calculated as:

, where

is the number of counties, and

represents the cropland area. Gama represents the root-mean-square error of the standardized values (z-scores) of remote sensing-derived cropland area relative to the standardized estimates of the NLDS.

2.5. MV-FAS method for creating hybrid cropland maps

We proposed a MV-FAS method to create hybrid cropland maps of China by fusing state-of-the-art remote sensing LULC products with the NLDS (Fig. 2). Due to the non-independence of CLUD, CLCD, and CACD (Supplementary Texts), our data-fusion method began with the evaluation of the consistency between each pair of the three cropland maps for each county. The measure of consistency was the proportion of cropland and non-cropland areas labeled by both maps relative to the total county area. The pair of maps with the lowest consistency was selected for a given county, as higher inconsistency indicates more complementary information (Gong et al., 2016).

We next estimated the SPAEF of each candidate cropland map for a given county using the cropland area estimates of the NLDS in the neighboring 10 counties. The mean SPAEF of the candidate maps (

) was then compared to the 25 th percentile of the SPAE

across all counties in China (denoted as q25 and determined through a trial-and-error process). If

, we argue that the candidate maps have high consistency in mapping cropland for current county, and the pixels with high cropland agreement have great possibilities of

Fig. 2. Flowchart of the majority voting-fuzzy agreement score (MV-FAS) method. SA, SPAEF, CA, and THR denote spatial agreement, spatial pattern efficiency, cropland area, and the threshold of FAS, respectively. The term q25 refers to the 25 th percentile of the mean SPAEF of the candidate maps across all counties in China.
being cropland areas. In other words, the MV method was applied to identify potential cropland pixels in this case. Otherwise, the FAS method was applied to develop the hybrid cropland maps.

Regarding the MV method, we assumed that pixels identified as cropland by greater than or equal to

of the candidates (i.e.,

) had a high likelihood of being cropland, following previous studies (Iwao et al., 2011; Li et al., 2023a). Accordingly, we designated them as the minimum cropland area in the intermediate hybrid map. The cumulative cropland area of the intermediate hybrid map (

), calculated using an equal-area projection system, was then compared to the NLDS estimates. Considering the positive systematic bias of remote sensing-derived cropland area relative to the survey data (Supplementary Figure S1), the intermediate hybrid cropland map was determined as the final hybrid if

exceeds the NLDS-estimated cropland area

. Otherwise, we set the initial threshold of FAS to 11 (i.e.,

, corresponding to the minimum FAS when

) and proceeded to the next iteration of the FAS method.

With respect to the FAS method, we first estimated the FAS value for each pixel within a given county based on the performance rank and spatial agreement of the candidate cropland maps. The candidate map with a higher SPAEF value received a superior rank, indicating greater confidence and priority (Fritz et al., 2011). The FAS value of each pixel was calculated according to the FAS table (Supplementary Table S2). FAS values range from 0 to 15 , with larger values indicating a higher likelihood of cropland (Lu et al., 2020). Subsequently, we set the

to 15 and designated the pixels with FAS values greater than the threshold as the cropland area in the intermediate hybrid map. If the cumulative cropland area of the intermediate map (

) exceeds

, it was deemed as the final hybrid product. Otherwise, we progressively decreased the FAS threshold and repeated the above iterative process until

becomes greater than CA

. With the MV-FAS method, we first generated a hybrid cropland map for the year circa 2020. Subsequently, we preserved the data-synergy rules, including the selected candidate cropland maps and the threshold of FAS, and applied them to create hybrid cropland maps for other time frames (Supplementary Texts). Ultimately, we examined the spatiotemporal changes in cropland in China from 2000 to 2020 using the newly created hybrid cropland maps (Supplementary Texts).

2.6. Accuracy assessment

We evaluated the accuracy of CCropLand30 and the input cropland maps using both visually interpreted samples from Google Images and third-party samples from existing literature. The third-party samples for the year circa 2015 were collected from Laso Bayas et al. (2017), Congalton et al. (2017), and Potapov et al. (2021) (Supplementary Texts). As depicted in Fig. 3, there are 2,591 third-party samples, predominantly situated in areas consistently identified as either no-cropland or

Fig. 3. Spatial distribution of validation samples. The top panels show the spatial distribution of the third-party (a) and visually interpreted (b) samples. The bottom panels display the numbers and proportions of the third-party (c) and visually interpreted (d) samples in different categories of pixels. SA1-5 indicate different levels of spatial agreement (SA) in cropland identification by the candidate cropland maps.
cropland by the candidate cropland maps. It is crucial to include more samples in the regions with low cropland spatial disagreement to robustly evaluate the performance of cropland maps. Hence, we additionally obtained 5,433 samples for the year circa 2020 through the visual interpretation of Google images. Approximately

of these samples are located in pixels with SA values between 1 and 4, and the remaining

are located in areas of consistent cropland/noncropland classification in the input cropland maps.

The performance metrics derived from the confusion matrix, including overall accuracy (OA), kappa coefficient, F1-score, producer's accuracy (PA) for cropland and non-cropland classes (Supplementary Table S1), were adopted to quantitatively assess the accuracy of the cropland maps (Zhang et al., 2022b). Furthermore, the spatial pattern efficiency (SPAEF) of the cropland maps was evaluated using the countylevel NLDS data. Lastly, a visual inter-comparison of the cropland maps was conducted at five locations across different regions of China (Supplementary Figure S3).

3. 结果

3.1. Spatial agreement of the remote sensing cropland maps

Fig. 4 illustrates the spatial agreement (SA) of cropland among the five selected LULC products. The candidate maps show high levels spatial agreement in cropland identification in the Northeast China Plain (I), the Huang-Huai-Hai Plain (III), the Sichuan basin (V), and the north arid and semi-arid region (II), where croplands are extensively and continuously distributed. Conversely, on the Qinghai-Tibet Plateau, croplands are sparsely distributed, resulting in the lowest spatial agreement among the candidate maps. The Yunnan-Guizhou Plateau and Southern China, characterized by mountainous landscapes, show fragmented distribution of croplands. In these two agricultural zones, less than

of croplands exhibit consistent classification by four to five maps (i.e., SA

). Overall, approximately

of croplands have been identified by fewer than two candidate maps in China. These findings underscore significant disagreements and uncertainties in mapping China's cropland by the LULC products, particularly in regions with fragmented croplands.

3.2. Spatial pattern efficiency of the remote sensing cropland maps

As illustrated in Fig. 5a, all remote sensing products, except GLAD, yield higher estimates of cropland area than the NLDS. The cropland area estimates are more consistent among the three national cropland maps (CLUD, CLCD, and CACD) compared to the global maps (GLAD and GlobeLand30). GlobeLand30 has the highest cropland area estimates, mainly due to the fact it has included cultivated pasture and shrub-based cash crops like tea and coffee as cropland, whereas the other products excluded them (see Table 1). Conversely, GLAD provides the lowest cropland area estimates, particularly in regions where croplands are fragmentally and complexly distributed, including the Qinghai Tibet Plateau (V), the Sichuan basin (VI), the Yunnan-Guizhou Plateau (VIII),

Fig. 4. Spatial agreement of cropland among the five LULC maps for the year circa 2020. The middle panel shows the cropland agreement map, while the left and right panels display the proportions of croplands with varying levels of spatial agreement in China and the subregions.

Fig. 5. Spatial pattern efficiency of the remote sensing cropland maps for the year circa 2020. The top panels compare the cropland area estimates between the NLDS and the candidate cropland maps, while the bottom panels display the spatial pattern efficiency of the cropland maps. The left and right panels show the results for China and the nine agricultural zones, respectively. The agricultural zones from I to IX correspond to the Northeast China Plain, the north arid and semiarid region, the Huang-Huai-Hai Plain, the Loess Plain, the Qinghai Tibet Plateau, the Sichuan basin and surrounding regions, the Yunnan-Guizhou Plateau, and the Southern China (see Fig. 4).
and Southern China (IX). The lower cropland area estimates of GLAD than the other products in these regions can be attributed to both its missed-classification error and narrow crop definition (e.g., the exclusion of shift cultivation from cropland) (Zhang et al., 2022a).

In terms of spatial pattern efficiency, CACD exhibits the highest spatial agreement with the NLDS in China (Fig. 5c), followed by CLCD, GlobeLand30, and GLAD, while CLUD shows the lowest agreement. As depicted in Fig. 5d and Supplementary Figure S4, the candidate cropland maps tend to have higher SPAEF and better spatial agreement with the NLDS in agricultural zones with more extensively distributed croplands, such as the Northeastern China, the arid and semi-arid region, and the Huang-Huai-Hai plain. Nevertheless, none of the five candidate products consistently outperforms the others in all agricultural zones, suggesting significant potential in combining them to enhance spatial agreement with the NLDS.

3.3. Accuracy assessment of the hybrid cropland maps

3.3.1. Pixel-scale accuracy

As illustrated in Fig. 6a, the OA, Kappa coefficient, and F1-score of the hybrid cropland map (CCropLand30) are

, and 0.86 , respectively, for the year circa 2020 . In contrast, the ranges of these metrices are

, and

, respectively, for the input cropland maps including GlobeLand30, GLAD, CLCD, and CACD. GLAD exhibits the highest accuracy, while CLUD demonstrates the lowest accuracy among the five input cropland maps. CCropLand30 achieves notable enhancements in OA (16 %), Kappa coefficient (50 %), and F1-Score (

) compared to the average performance of the input cropland maps. Among the five input cropland maps, GlobeLand30 exhibits a higher producer's accuracy for cropland, while GLAD attains obviously a greater producer's accuracy for non-cropland. The producer's accuracy of cropland and non-cropland for CCropLand30 is close to GlobeLand30 and GLAD, respectively. This implies that CCropLand30 has effectively leveraged the strengths of different input cropland maps while compensated for their respective weaknesses. The evaluation results with the third-party samples for the year circa 2015 are similar to those in the year circa 2020. Notably, the performance of cropland maps exhibits a narrower discrepancy for the year circa 2015 compared to 2020. This is attributed to that more than

of the third-party samples are located in areas consistently identified as either cropland or non-cropland by the candidate cropland maps, as previously mentioned in Section 2.6.

3.3.2. Spatial pattern efficiency

As illustrated in Fig. 7, the distribution of county-level cropland areas estimated by CCropLand30 shows a strong spatial agreement with the NLDS data. The SPAEF of CCropLand30 reaches 0.80 in China, significantly higher than the average SPAEF of the input cropland maps (i.e., 0.64). In all agricultural zones, CCropLand30 consistently outperforms the input cropland maps in terms of SPAEF. Compared to the input cropland maps, CCropLand30 improves the SPAEF by a range from

across the different zones. The enhancement is particularly evident in the Loess Plateau (IV), the Qinghai-Tibet Plateau (V), the Yunnan-Guizhou Plateau (VIII), and Southern China (IX).

3.3.3. Visual comparison

Fig. 8 shows the regional comparison of CCropLand 30 with the input cropland maps. In location A, the cropland maps, including CCropLand30, GLAD, GlobeLand30, and CLUD display strong agreement with each other, closely matching the actual cropland distribution. However, CLCD and CACD significantly underestimate the cropland extent in this area. For locations B and C, CCropLand30, GLAD, CLCD, and CACD demonstrate higher consistency and better alignment with the actual cropland distribution compared to the others. In location D, CCropLand30 and CLCD accurately depict the actual cropland extent, while GlobeLand30, CLUD, and CACD tend to overestimate croplands, and

Fig. 6. Performance metric values of the hybrid map (CCropLand30) and the input cropland maps (GlobeLand30, GLAD, CLUD, CLCD and CACD). The top and bottom panels display the results for the years circa 2020 (a) and 2015 (b), respectively. OA and PU represent overall accuracy and producer's accuracy, respectively.

Cropland area (Mha)

Fig. 7. Spatial pattern efficiency of the hybrid cropland map (CCropLand30) and the input cropland maps. The top panels show the distributions of countylevel cropland area estimates of the NLDS (a) and CCropLand30 (b), while the bottom panel compares the SPAEF of CCropLand30 against the mean SPAEF of the input cropland maps including GlobeLand30, GLAD, CLUD, CLCD and CACD. The agricultural zones from I to IX are identical to Fig. 5.

GLAD substantially underestimates them. GlobeLand30 notably overestimates croplands, while GLAD, CLUD, and CACD considerably underestimate them. In contrast, CCropLand30 and CLCD exhibit good agreement with the actual cropland distribution. The reginal comparison confirms the superiority of CCropLand30 over the input cropland maps.

3.4. Spatial pattern and temporal changes of croplands

As shown in Fig. 9, China's cropland area decreased significantly from 164 to 161 million hectares (Mha) over 2000 to 2020. Notably, during this period, cropland expanded significantly by

(or 3.47 Mha) in the arid and semiarid region (II). The expansion of cropland, attributed to increased agricultural inputs and improvements in irrigation efficiency (Fu et al., 2022), poses a significant challenge to water security in this region (Chen et al., 2022a). The Northeastern China Plain (I) and the Qinghai-Tibet-Plateau (V) also experienced an increase in cropland. Conversely, in the remaining agricultural zones, cropland consistently exhibited a downward trend from 2000 to 2020. The reduction in cropland in these regions can be explained by socioeconomic factors such as population growth and rapid urbanization (Zhou et al., 2023).As depicted in Fig. 10, cropland is predominantly located in plains and humid regions. Over the 2000-2020 period, cropland showed a declining trend in all four geomorphic zones, with the plains experiencing a more pronounced decrease compared to the other zones. Cropland exhibited a downward trend in the humid and semi-humid regions, where water resources are relatively abundant. In contrast, it showed an increasing trend in the arid and semi-arid regions with scarce water resources. This contrasting trend has the potential to exacerbate the spatial mismatch between water and land resources in China.

4. 讨论

4.1. Pros and cons of the MV-FAS method

This study proposed a novel MV-FAS method to develop hybrid cropland maps. The key advantage of our approach is its independence from training samples, distinguishing it from many existing datasynergy methods that heavily rely on extensive and representative training samples (Schepaschenko et al., 2015). Our data-fusion approach also takes into account the systematic biases between remote sensing and survey data through the following strategies: (1) adjusting the surveyed cropland area by incorporating ridge field areas; (2) introducing a bias-insensitive spatial pattern matching metric to assess and rank candidate cropland maps; (3) considering pixels with high cropland agreement as the minimum cropland extent if candidate

Fig. 8. Visual comparison of the hybrid cropland map (CCropLand30) with the input cropland maps. Locations of A-E are shown in Supplementary Figure S3. The seven rows from top to bottom correspond to the Google map, the hybrid map, and the five input cropland maps (i.e., GlobeLand30, GLAD, CLUD, CLCD and CACD), respectively.
maps show high consistency in cropland identification; (4) assuming grid-based cropland area estimates to be greater than, rather than close to, the surveyed results. We compared our approach with the traditional FAS method that do not consider the systematic biases between remote sensing and survey data (Fritz et al., 2011; Lu et al., 2020). Our results reveal that while the traditional FAS method can enhance cropland mapping accuracy compared to the input maps, it exhibits significantly lower performance in comparison our MV-FAS method (Supplementary Figure S6).

Furthermore, our MV-FAS method is not only easy to implement but also has high computational efficiency. Considering the need to process over 15 billion grids in China using

remote sensing products, computational efficiency becomes a crucial consideration for creating hybrid cropland maps. Our method can generate a hybrid cropland map in just two hours on a Lenovo notebook PC equipped with an Intel Core
i9980H-CPU, 16 GB of RAM, and 8 processor cores. In addition, our method is flexible for creating other hybrid maps (e.g., forest and grassland) and digesting new available LULC products. In addition to cropland area, the NLDS also reports county-level data on other land use types, enabling the creation of reliable thematic maps for different land use categories. Anticipating advancements in remote sensing technology and classification algorithms, our method can easily digest these improvements, further enhancing cropland mapping accuracy in the future.

Nevertheless, our method also has some drawbacks. Firstly, it heavily relies on reliable and detailed (i.e., county-level) cropland survey or census data. Survey or census data at coarser spatial resolutions (e.g., the prefecture and provincial level) or with high uncertainty would significantly influence the performance of hybrid cropland maps. Secondly, our method is specifically designed for creating hybrid maps of

Fig. 9. Cropland extent and change in China from 2000 to 2020. The middle panel shows the stable (unchanged), expanded and reduced cropland areas. The left and right panels illustrate changes in cropland areas in China and the nine agricultural zones.

Fig. 10. Cropland area in different geomorphic and climatic zones of China from 2000 to 2020. The top panels compare the areas and proportions of croplands in different geomorphic (a) and climatic (b) zones. The bottom panels show the net changes in cropland areas from 2000 to 2020 . The spatial distributions of geomorphic and climatic zones are presented in Supplementary Figure S5.
individual land use types, rather than multiple-class products. The strategy of integrating single-class hybrid maps into a multi-class product requires further exploration. Lastly, similar to other data- fusion methods, the performance of our MV-FAS method heavily depends on the accuracy of candidate maps. In regions where all candidate maps exhibit low classification accuracy, our method cannot achieve
significant improvements in mapping accuracy. Despite these limitations, our MV-FAS method makes a valuable contribution to cropland mapping and can serve as a foundation for further research and advancements in data fusion techniques. Addressing these limitations and exploring potential improvements will undoubtedly enhance the accuracy and applicability of our method in the future.

4.2. Limitations of this study

We acknowledge several limitations of this study. To begin with, the definition of cropland exhibits inconsistencies across the selected remote sensing LULC products (see Section 2.1.1). Differences in crop definitions are challenging to harmonize and could affect the accuracy of hybrid cropland maps in this study as well as other similar studies (Fritz et al., 2011; Yu et al., 2013; Zhong et al., 2019; Lu et al., 2020). Second, the selected 30-m LULC products exhibit significant discrepancies and uncertainties in cropland classification in South China. This is primarily attributed to the "mixed pixel" problems since many field sizes are smaller than 900 square meters

in South China. The creation of hybrid cropland maps, however, actually cannot address the "mixed pixel" problems, necessitating further efforts to tackle this issue, such as generating super-high-resolution cropland maps (Li et al., 2023b). Third, this study considered the positive systematic biases of remote sensing-derived cropland area against survey data, with the former representing gross cropland area and the latter representing net area (Supplementary Figure S1). While this consideration can significantly enhance cropland mapping accuracy (Supplementary Figure S6), it brings new uncertainties to the results. This is because it is not easy to determine the actual magnitude of the systematic bias between remote sensing products and survey data. Lastly, the performance of hybrid cropland maps heavily depends on the reliability and time consistency of land survey data. This study, for the first time, utilized the most reliable land survey data in China (i.e., the third NLDS) to create hybrid cropland maps. However, the temporal discrepancy between the remote sensing products and the NLDS may introduce additional errors into the mapping results.

5. 结论

This paper outlines the development of high-resolution hybrid cropland maps in China (CCropLand30). A majority voting-fuzzy agreement score (MV-FAS) method was proposed to to integrate five state-of-the-art land use/cover products (i.e., GlobeLand30, GLAD, CLUD, CLCD, and CACD) with the latest national land survey (NLDS). The accuracy of the hybrid cropland maps and the candidate cropland maps were evaluated using both visually interpreted and third-party samples. Furthermore, the spatial pattern efficiency (SPAEF) of the cropland maps was evaluated using the county-level NLDS data. A visual inter-comparison of the cropland maps was further conducted at five locations across different regions of China. Finally, we examined the spatiotemporal changes in croplands in China from 2000 to 2020 using the hybrid cropland maps.

Results reveal significant disagreements and uncertainties in cropland identification among the candidate products, particularly in regions with fragmented croplands. Furthermore, none of the five candidate products consistently outperforms the others in all agricultural zones when compared with the NLDS using the SPAEF. These findings underscore the necessity and potential for enhancing cropland mapping accuracy through data-fusion techniques. CCropLand30 shows better agreement with the reference points than the input maps, attaining an overall accuracy (OA), Kappa coefficient, and F1-score of

, and 0.86 , respectively, for the year circa 2020. In contrast, the ranges of these metrics for the input cropland maps are

, and

, respectively. Moreover, CCropLand 30 exhibits a better agreement with the NLDS than the input cropland maps, achieving a high SPAEF of 0.80 in China. The visual inter-comparison of the cropland maps further confirms the superiority of CCropLand30 over the input maps. These findings imply that the hybrid map successfully leverages the strengths of different input cropland maps while compensating for their weaknesses.

CCropLand30 reveals a clear decreasing trend in China's cropland area from 2000 to 2020 . Notably, during this period, cropland expanded significantly by

(or 3.47 Mha ) in the arid and semiarid region with scarce water resources. In contrast, cropland significantly decreased in the plains and in the humid and semi-humid regions, such as the Huang-Huai-Hai Plain, the Middle-Lower Yangtze Plain, and Southern China. This contrasting trend has the potential to exacerbate the spatial mismatch between water and land resources in China, presenting significant challenges for sustainable land and water resource management. Our newly hybrid cropland maps will greatly support cropland monitoring and management, as well as research in diverse fields such as agriculture, water resources, and climate change.

CRediT 作者贡献声明

Ling Zhang: Conceptualization, Writing - original draft, Methodology, Funding acquisition, Data curation, Investigation, Validation, Writing - review & editing. Weiguo Wang: Investigation, Data curation, Writing - review & editing. Qimin Ma: Investigation, Data curation, Writing - review & editing. Yingyi Hu: Visualization, Writing - review & editing. Hui Ma: Visualization, Writing - review & editing. Yanbo Zhao: Data curation, Methodology, Writing - review & editing.

竞争利益声明

作者声明，他们没有已知的竞争性财务利益或个人关系，这些关系可能会影响本文所报告的工作。

Data availability

Our hybrid cropland maps (CCropLand30) can be freely accessed at: https://doi.org/10.6084/m9.figshare.23764248.v2 (Zhang et al., 2023).

Acknowledgements

This study is supported by the National Natural Science Foundation of China ( 42271286 and 41901045), and the Youth Innovation Promotion Association of Chinese Academy of Sciences (2023454). We greatly appreciate the Ministry of Natural Resources of the People's Republic of China for the data provision.

附录 A. 补充数据

Supplementary data to this article can be found online at https://doi. org/10.1016/j.compag.2024.108672.

参考文献

Becker-Reshef, I., Barker, B., Humber, M., Puricelli, E., Sanchez, A., Sahajpal, R., McGaughey, K., Justice, C., Baruth, B., Wu, B., Prakash, A., Abdolreza, A., Jarvis, I., 2019. The GEOGLAM crop monitor for AMIS: Assessing crop conditions in the context of global markets. Glob. Food Sec. 23, 173-181. https://doi.org/10.1016/j. gfs.2019.04.010.
Bey, A., Sánchez-Paus Díaz, A., Maniatis, D., Marchi, G., Mollicone, D., Ricci, S., Bastin, J.-F., Moore, R., Federici, S., Rezende, M., Patriarca, C., Turia, R., Gamoga, G., Abe, H., Kaidong, E., Miceli, G., 2016. Collect Earth: Land Use and Land Cover Assessment through Augmented Visual Interpretation. Remote Sens. (Basel). https://doi.org/10.3390/rs8100807.
Bicheron, P., Defourny, P., Brockmann, C., Schouten, L., Vancutsem, C., Huc, M., Bontemps, S., Leroy, M., Achard, F., Herold, M., 2008. Globcover: products description and validation report. ME Noordwijk, The Netherlands.
Chen, J., Chen, J., Liao, A., Cao, X., Chen, L., Chen, X., He, C., Han, G., Peng, S., Lu, M., Zhang, W., Tong, X., Mills, J., 2015. Global land cover mapping at 30 m resolution: A

- 通讯作者。
E-mail address: zhanglingky@lzb.ac.cn (L. Zhang).