A new multi-stage model structure was developed to explain the effluent predictions. 开发了一种新的多阶段模型结构,以解释排放预测。
•
Multicollinearity reduction and RReliefF algorithm improved the model performance. 多重共线性减少和 RReliefF 算法提高了模型性能。
•
An ensemble of machine learning algorithms was used for more accurate predictions. 使用了一组机器学习算法以实现更准确的预测。
•
The average R2, nRMSE, and sMAPE were 95.7%, 0.032, and 3.7% respectively. 平均 R2、nRMSE 和 sMAPE 分别为 95.7%、0.032 和 3.7%。
•
The model was able to explain a failure instance in the predicted dataset. 该模型能够解释预测数据集中一个失败实例。
Abstract 摘要
Machine learning models provide an adaptive tool to predict the performance of treatment reactors under varying operational and influent conditions. Aerobic granular sludge (AGS) is still an emerging technology and does not have a long history of full-scale application. There is, therefore, a scarcity of long-term data in this field, which impacted the development of data-driven models. In this study, a machine learning model was developed for simulating the AGS process using 475 days of data collected from three lab-based reactors. Inputs were selected based on RReliefF ranking after multicollinearity reduction. A five-stage model structure was adopted in which each parameter was predicted using separate models for the preceding parameters as inputs. An ensemble of artificial neural networks, support vector regression and adaptive neuro-fuzzy inference systems was used to improve the models’ performance. The developed model was able to predict the MLSS, MLVSS, SVI5, SVI30, granule size, and effluent COD, NH4-N, and PO43− with average R2, nRMSE and sMAPE of 95.7%, 0.032 and 3.7% respectively. 机器学习模型提供了一种自适应工具,用于预测在不同操作和进水条件下处理反应器的性能。好氧颗粒污泥(AGS)仍然是一项新兴技术,尚未有长时间的全规模应用历史。因此,该领域缺乏长期数据,这影响了数据驱动模型的发展。在本研究中,开发了一种机器学习模型,用于模拟 AGS 过程,使用了从三个实验室反应器收集的 475 天数据。输入变量是在多重共线性减少后基于 RReliefF 排名选择的。采用了五阶段模型结构,其中每个参数使用前一个参数作为输入的单独模型进行预测。为了提高模型的性能,使用了人工神经网络、支持向量回归和自适应神经模糊推理系统的集成。所开发的模型能够预测 MLSS、MLVSS、SVI5、SVI30、颗粒大小以及出水 COD、NH4-N 和 PO43−,其平均 R2、nRMSE 和 sMAPE 分别为 95.7%、0.032 和 3.7%。
Aerobic granular sludge (AGS) is a promising biological wastewater treatment technology that has shown excellent performance in laboratories for the treatment of domestic and high-strength wastewater and is starting to be applied in full-scale wastewater treatment plants (WWTPs) (Pronk et al., 2015; Zheng et al., 2020). AGS has certain advantages over conventional activated sludge (CAS) in terms of lower reactor footprint, higher capacity for organic loading and simultaneous removal of nutrients and organics (He et al., 2020). The compact structure of the biomass granules gives the reactor higher resilience against shock loads and toxic wastewater and provides better biomass retention due to the enhanced settling properties (Franca et al., 2018; Nancharaiah & Reddy, 2018). 好氧颗粒污泥(AGS)是一种有前景的生物废水处理技术,在实验室中对生活污水和高强度废水的处理表现出色,并开始在全规模废水处理厂(WWTPs)中应用(Pronk 等,2015;Zheng 等,2020)。与传统活性污泥(CAS)相比,AGS 在反应器占地面积更小、有机负荷能力更高以及同时去除营养物质和有机物方面具有一定优势(He 等,2020)。生物质颗粒的紧凑结构使反应器对冲击负荷和有毒废水具有更高的抗压能力,并由于增强的沉降特性提供了更好的生物质保留(Franca 等,2018;Nancharaiah & Reddy,2018)。
Although AGS has consistently shown promise in terms of performance, the operation of AGS bioreactors is challenging due to the large number of factors affecting the process (Wilén et al., 2018). The characteristics of the influent wastewater, biomass properties within the reactor and operational conditions all play a significant role in the removal efficiency of the reactor. Additionally, these factors are interconnected and have complex nonlinear relationships (Khan et al., 2013). Influent characteristics and the mode of operation of the sequencing batch reactor (SBR) play a significant role in the biomass microbial culture, which in turn affects the integrity of the granule structure and settling ability. The settling ability of biomass is also affected by the settling time, volumetric exchange ratio, and discharge time. At the end of the settling time, slow settling biomass that does not settle below the effluent port gets washed out of the reactor during the decant phase, leaving the faster settling granules inside the reactor (Qin et al., 2004; Wang et al., 2004). This also affects the concentration of biomass left inside the reactor after every cycle of operation, which provides seed for new granule formation and, therefore, directly affects the level of organics and nutrients removal. A certain aeration time is necessary for aerobic degradation of organics, nitrification and for providing the required shear force that triggers the granulation process in the biomass flocs, the latter being the governing factor (Hamza et al., 2018). Other factors that affect the AGS process include the influent pH, volumetric exchange ratio, hydraulic retention time (HRT), and temperature (Khan et al., 2013). Sudden changes to these factors can lead to the failure of the structural integrity of the granules and the washout of the biomass out of the reactor, resulting in the reactor failing to meet the required effluent quality. Since these factors do continue to change, the operation of an AGS system is challenging and requires careful monitoring. 尽管 AGS 在性能方面一直显示出良好的前景,但由于影响过程的因素众多,AGS 生物反应器的操作仍然具有挑战性(Wilén 等,2018)。进水废水的特性、反应器内的生物质特性以及操作条件在反应器的去除效率中都起着重要作用。此外,这些因素是相互关联的,并且具有复杂的非线性关系(Khan 等,2013)。进水特性和序批反应器(SBR)的操作模式在生物质微生物培养中起着重要作用,这反过来又影响颗粒结构的完整性和沉降能力。生物质的沉降能力还受到沉降时间、体积交换比和排放时间的影响。在沉降时间结束时,未能沉降到排放口下方的缓慢沉降生物质会在排水阶段被冲出反应器,留下沉降较快的颗粒在反应器内(Qin 等,2004;Wang 等,2004)。 这也影响了每个操作周期后反应器内剩余生物量的浓度,这为新颗粒的形成提供了种子,因此直接影响有机物和营养物质的去除水平。为了有氧降解有机物、硝化以及提供触发生物量絮凝体颗粒化过程所需的剪切力,必须有一定的曝气时间,后者是主导因素(Hamza et al., 2018)。影响 AGS 过程的其他因素包括进水 pH 值、体积交换比、水力停留时间(HRT)和温度(Khan et al., 2013)。这些因素的突然变化可能导致颗粒的结构完整性失效以及生物量从反应器中冲洗出去,导致反应器无法满足所需的出水质量。由于这些因素持续变化,AGS 系统的操作具有挑战性,需要仔细监测。
A tool that can simulate AGS reactors considering all previously mentioned factors would help alleviate some of this challenge. Such a tool can provide operators with the ability to predict the reactor performance and adapt as the quality of influent wastewater changes.
There are several studies in the literature that present physical models for AGS reactors (Baeten et al., 2019; Ni & Yu, 2010). Physical AGS models utilize the biofilm model to simulate the diffusion of the substrate into the granules and Activated Sludge Models (ASM) based equations to simulate the kinetics of the biological process (Cui et al., 2020). Many restrictions and assumptions have to be made to be able to develop physical models without being overly complicated (Ni & Yu, 2010). The calibrated kinetic and stoichiometric parameters will change with any change in operation or influent wastewater, making it challenging to use physical models for process control (Baeten et al., 2018). Physical models, however, are excellent for understanding the biological processes, conversion rates, and for studying the factors affecting the process performance.
Machine learning provides an excellent alternative to physical models for predicting reactor performance and process control. Data-driven models can overcome the need for continuous re-calibration of physical models. They are more adaptive and can learn from new data that is collected as the process continues to run (El-Din et al., 2004). The use of machine learning for AGS modelling is still not as studied as physical models (Baeten et al., 2019). Artificial neural networks (ANN) were used to simulate the AGS process in a simple model structure where only chemical oxygen demand (COD) and total nitrogen (TN) removals were predicted with an R2 of 0.9 and 0.81 (Gong et al., 2018). Single hidden layer ANNs were used to predict the effluent COD using six inputs, resulting in an R2 of 0.91 (Mahmod & Wahab, 2017). Another AGS model was developed using single hidden layer feed-forward ANNs using eight inputs to predict the effluent COD, NH4-N, and TN with an R2 of 0.9988, 0.9997, and 0.9991, respectively (Liang et al., 2020). A more comprehensive model structure was developed using feed-forward multi-layer ANNs to simulate the full AGS process, including the prediction of biomass characteristics and the removal rates of COD, ammonia, and phosphates, with a minimum prediction R2 of 99% (Zaghloul et al., 2018). Adaptive neuro-fuzzy inference systems (ANFIS) and support vector regression (SVR) were investigated as alternative algorithms to ANN, concluding that SVR provided comparable results to ANN with a minimum R2 of 0.997, while ANFIS provided lower prediction accuracy than ANN and SVR with a minimum R2 of 0.815 when simulating AGS reactors (Zaghloul et al., 2020).
Aside from modelling AGS, machine learning has shown excellent performance in simulating other wastewater treatment processes such as CAS, showing the potential application of machine learning in forecasting and process control (Corominas et al., 2018). An ANN model was successfully used for modelling the BOD and TSS removal in a full-scale CAS process using single input-single output models with an R2 of 0.665 and 0.542, respectively (Hamed et al., 2004). ANN was also used for the development of software sensors that predict the effluent TN, TP and COD for a real-time remote monitoring system in another full-scale CAS treatment plant, with an R2 of 0.952, 0.934, and 0.921, respectively (Lee et al., 2008). SVR was used to predict the removal of COD, ammonia, and nitrates in a CAS process using microbial community data with an R2 of 0.9501, 0.7936, and 0.8916, respectively (Seshan et al., 2014). ANN and SVR were compared for the prediction of effluent TN in a CAS process treating food waste leachate, showing that both algorithms performed similarly where the R2 was 0.47 and 0.46 respectively, however, the SVR suffered from overfitting where the training R2 was 1.00 (Guo et al., 2015). ANFIS was compared to SVR for predicting the removal of TKN in a full-scale BNR plant, where SVR provided better performance than ANFIS with R2 values of 0.85 and 0.91, respectively (Manu & Thalla, 2017).
The studies above concluded that ANN, SVR and ANFIS are capable of simulating various biological treatment processes in WWTPs. It was also observed that while ANN provided reliable results, it required the largest training datasets to provide good quality modelling. SVR was reported to provide unique solutions to regression problems and not as likely to get trapped in local error minima during error optimization as ANN, but it is hard to interpret the final model formulation, and the computational requirements increase with larger datasets (Karamizadeh et al., 2014, September). ANFIS models can be relatively easier to interpret than ANN and SVR, but the number of fuzzy rules increases exponentially with the number of input variables and input membership functions (Stathakis et al., 2006). Ye et al. (2020) detailed the characteristics, advantages and limitations of several algorithms, including the ones used in this study (Ye et al., 2020). They showed that: (1) ANNs are accurate but have the risk of overfitting and harder to find the best architecture. (2) SVR works well with noisy data and does not require as much training data but needs higher computation power that other algorithms. (3) ANFIS can optimally solve nonlinear problems, but it is difficult to find the best model structure.
Machine learning ideally requires large datasets for training the algorithms (Liu et al., 2017). Databases from AGS WWTPs are still not large enough for conventional machine learning simulations. Small datasets are challenging when used for training machine learning models, i.e. the training process becomes highly affected by data quality issues, dimensionality, and multicollinearity (Shaikhina & Khovanova, 2017). Additionally, small datasets with high dimensionality increase the required level of model complexity to achieve reasonable prediction accuracies (Wójcik & Kurdziel, 2019). Data pre-processing and feature selection play an important role in handling outliers and gaps, normalizing features, and reducing dimensionality and multicollinearity in the dataset, which improves the model training and final performance.
This work presents a modelling approach for AGS reactors when only small datasets are available. Data were pre-processed and cleaned, then feature-selection was performed using the variance inflation factor (VIF) for reducing multicollinearity and the RReliefF algorithm for ranking inputs. A combination of ANN, SVR and ANFIS algorithms was used via different ensemble techniques, and the best performing technique was used for the final model. A multi-stage model structure was developed to provide stepwise predictions where outputs of each stage get added to the potential pool of inputs for the following stage. The purpose of this model is to provide a tool that can predict the biomass characteristics inside AGS reactors, effluent characteristics (concentrations of COD, NH4-N, and PO43−), and potential failure to meet user-defined treatment requirements.
2. Methods
2.1. Experimental Setup
Three SBRs were set up and operated to collect the required data for this study. Reactor R1 had a diameter of 89 mm, and a working volume of 4.5 L. Reactors R2 and R3 had a diameter of 150 mm, and a working volume of 19 L. Fig. 1 shows the general setup of the reactors.
The SBR operation was automated with scheduled times for fill, idle, aeration (reaction), settling, and draw (decant). Table 1 shows the cycle times and superficial air velocity used for the duration of the data collection period. Aeration was provided using air compressors and controlled using Cole-Parmer airflow meters and regulators. Air was diffused into the reactor using Paintair fine bubble ceramic diffusers (AS4). Masterflex peristaltic pumps were used for feeding the reactors.
Table 1. Reactor operation parameters.
Parameter
Empty Cell
Reactor R1
Reactor R2
Reactor R3
Fill Time (min)
6 – 7
6 – 8
60
Idle Time (min)
0 – 5
1 – 3
2
Aeration Time (min)
180 – 182
180 – 222
145 – 172
Settling Time (min)
3 – 15
8 – 30
5 – 30
Decanting Time (min)
1 – 6
1
1
Superficial Air Velocity (cm/s)
1.6 – 4
2.11
3
The reactors were operated using synthetic wastewater prepared as detailed in (Tay et al., 2002). The main carbon, nitrogen and phosphorus sources were sodium acetate, ammonium chloride, monopotassium and dipotassium phosphate, respectively. Return activated sludge (RAS) was procured from the Pine Creek wastewater treatment plant for seeding the granulation process. The reactors were run at a stable temperature of 18±2°C. Influent, effluent and biomass samples were collected daily. Mixed liquor suspended solids (MLSS), mixed liquor volatile suspended solids (MLVSS), 5-minute sludge volumetric index (SVI5) and 30-minute SVI (SVI30) were measured according to standard methods (Rice et al., 2017). The United States Environmental Protection Agency (USEPA) reactor digestion method was adopted for the measurement of COD using a HACH DR-2400 spectrophotometer. The salicylate method was used to measure ammonia with TNT 830, 831, 832 and 833 kits. Ion chromatography was used to measure reactive phosphate using a Metrohm Compact IC Flex based on the Standard Methods for the Examination of Water and Wastewater (Rice et al., 2017). Laser particle size analysis was used to measure the granule size (Malvern MasterSizer Series 2000).
2.2. Model Structure
This study adopted a 5-stage model structure where each of the stages 2 to 5 is predicted using the preceding stages as potential inputs, as shown in Fig. 2. The multi-stage model structure is designed to simulate the cause-effect process in AGS reactors, where the influent characteristics and operational parameters affect the biomass concentration due to growth and decay of the microbial community. The biomass concentration and the SBR operation directly affect the biomass settling properties, which in turn affects the granule growth. All the previous parameters and interactions affect the removal efficiency and the effluent wastewater quality. Each of the parameters in stages 2 - 5 were predicted using a separate model, except for the F/M ratio that was calculated using the influent organics and biomass concentrations then added as input for stages 4 and 5. The multi-stage structure also adds versatility during model development as it allows using different inputs for each output.
In this study, three algorithm alternatives were individually used for simulating the AGS process: ANN, SVR and ANFIS. The outputs of individual models were combined as inputs to ensemble algorithms using five different alternative methods: ANN, SVR, ANFIS, arithmetic mean (E-AVG), and weighted average (E-WAVG). In total, each output was predicted eight times using the individual and ensemble alternatives. The best performing algorithm out of the eight alternatives was chosen for the final model. The ensemble algorithms were denoted with the prefix “E-”. Fig. 3 shows the algorithm choice approach.
The dataset of 475 days was divided into 404 days for developing the models and 71 days for evaluation. The evaluation dataset was completely isolated and was only used after the models were trained and chosen. Fig. 4 shows the data divisions for the algorithms used in this study. The model development data (404 days) was divided according to the requirements of the algorithm being trained. The ANN and E-ANN models had a data division scheme of 70% for training, 15% for test and 15% for validation, which corresponded to 284, 60 and 60 days, respectively. The SVR and E-SVR models utilized the full 404 days for training. The ANFIS and E-ANFIS models used 85% of the data for training and 15% for validation, which corresponded to 344 and 60 days, respectively. The E-AVG and E-WAVG ensembles are not machine learning algorithms; thus, they did not require training and validation.
2.3. Data Pre-Processing
The dataset collected for this study consisted of 475 days. Datasets used for training machine learning algorithms need to undergo a cleaning process that mainly removes outliers, fills missing points, randomizes the dataset, and normalizes all the data features to the same scale. In this study, outliers were removed during data collection, and missing data points were filled using linear regression imputation (Lakshminarayan et al., 1999).
Randomization is done to remove the effect of phased operation and the use of multiple reactors when the dataset is split into training and evaluation datasets. The statistical properties of the training and evaluation datasets must be as close as possible to ensure proper evaluation of the models. Table 2 shows the maximum, minimum, mean and coefficient of variation for the training and evaluation datasets.
Table 2. Statistical properties of the training and evaluation dataset.
Parameter
Max.
Min.
Mean
Coef. Of Var.
Train.
Eval.
Train.
Eval.
Train.
Eval.
Train.
Eval.
Influent COD (mg/L)
8758
7445
1287
1518
3352
3069
0.56
0.51
Influent NH4-N (mg/L)
234
201
53
68
129
135
0.32
0.28
Influent PO43− (mg/L)
124
83
3
6
48
47
0.38
0.33
Influent Flowrate (L/d)
74.81
74.81
11.03
11.03
62.63
60.95
0.28
0.31
Volume (L)
19
19
4.5
4.5
17.49
17.16
0.25
0.28
Influent pH
9.06
8.47
0.00
6.62
7.17
7.19
0.08
0.04
OLR (kg COD/m3)
33.08
26.87
3.48
4.47
11.06
9.70
0.66
0.57
HRT (h)
9.67
9.67
5.92
5.92
7.69
7.79
0.10
0.09
Exchange Ratio (%)
0.56
0.56
0.35
0.35
0.50
0.50
0.09
0.09
Superficial Air Vel (cm/s)
3
3
1.56
1.56
2.51
2.53
0.18
0.18
Temperature (°C)
24.1
23.7
12.4
14.3
20.5
20.5
0.09
0.09
Settling time (min)
30
30
3
3
13.14
14.17
0.41
0.40
Aeration time (min)
221
221
163
163
187
185
0.13
0.13
MLSS (mg/L)
25157
24411
779
2485
7966
7446
0.66
0.61
MLVSS (mg/L)
19303
18675
523
2015
6329
6023
0.61
0.57
SVI5 (mL/g)
446
241
20
22
113
117
0.56
0.49
SVI30 (mL/g)
278
137
18
21
73
75
0.48
0.39
Granule Size (μm)
952
930
66
76
440
468
0.47
0.44
F/M Ratio
12.14
4.21
0.54
0.92
2.07
1.89
0.51
0.34
Effluent COD (mg/L)
4227
2940
0
9
210
136
3.03
3.11
Effluent NH3-N (mg/L)
116
115
0
0
24
31
1.19
1.13
Effluent PO43− (mg/L)
51
27
0
0
8
8
1.14
1.09
Following randomization, each feature in the training dataset is normalized to the scale of (0 - 1) by dividing the feature by its maximum value. The evaluation dataset was normalized using the training maximum to keep the evaluation dataset unseen during the model development; therefore, the normalized values might slightly exceed one if the maximum of the evaluation dataset was higher than that of the training dataset.
2.4. Feature Selection
The choice of model inputs has a significant effect on model performance. The dataset collected contained input parameters that are correlated at varying degrees and contributed differently to each of the outputs. Each output had a pool of parameters to choose its inputs from using feature selection methods such as multicollinearity reduction and RReleifF algorithms.
Linearly correlated input parameters reduce the orthogonality of the model, which is also known as multicollinearity (Alin, 2010). Multicollinearity is a source of overfitting during training and results in models with low reliability (Read & Belsley, 1994). The level of multicollinearity in a set of parameters can be measured using the variance inflation factor (VIF), as shown in Eq. (1). It is generally accepted that VIF values of 5 and below are accepted for regression problems. If the VIF is larger than 5, the parameter with the highest VIF is removed, and the test is repeated till the maximum VIF is 5 or less.(1)
After multicollinearity is reduced and redundant parameters are removed, the remaining parameters were sorted according to a weight calculated for each parameter using the RReliefF algorithm. The RReliefF algorithm is one of the filter methods for feature selection that assigns weights to input parameters based on their effect on the output using the k-nearest neighbours approach for input-output instances, where higher weights correspond to more important inputs (Urbanowicz et al., 2018). Weights are calculated based on three probabilities at the nearest instances: a different input value at nearest outputs, a different output value at nearest inputs, and a different output value when there is a difference in the input value. Detailed mathematical formulation and the algorithm structure can be found in (Robnik-Šikonja & Kononenko, 1997). Inputs that are more consistent with the nearest neighbours in explaining the variation in outputs receive higher weights. The RReliefF algorithm, being one of the filter methods, carries the advantage that it is not affected by the induction algorithms applied to the raw data (data pre-processing) (Urbanowicz et al., 2018). This allows the chosen inputs to be used with different machine learning algorithms with confidence.
The number of nearest neighbours (k) used in this study was determined by calculating the input weights as k is increased from 1 to 500. Weights were used at k = 200, where the results have stabilized, as shown in Fig. 5.
2.5. Artificial neural networks
Artificial neural networks mimic the way neurons work in the human brain to perform complex operations. The ANNs type used in this study, feed-forward neural networks, utilize an error minimization algorithm to tune the network weights (Fernando & Shamseldin, 2009; Sammut & Webb, 2016). Well designed and trained neural networks can achieve outstanding prediction accuracies; however, this comes at the expense of long training time as the error minimization functions are generally slow to converge. Additionally, large training datasets are needed to reach high prediction accuracies without overfitting. Neural networks can also overfit if the inputs are not well selected or the layers architecture is not well designed (Lawrence & Giles, 2000; Ye et al., 2020).
In this study, Bayesian Regularization was used as the objective function for error minimization using a linear formulation of squared errors and network weights (Foresee & Hagan, 1997). The network architectures were selected by training all ANN combinations of (1-3) hidden layers and (1-10) hidden nodes in each layer. This approach resulted in the training of 8880 neural networks for the 8 outputs. The best performing network selected for each of the outputs was the one with the most accurate prediction (lowest error) and with similar training and test performance. These conditions ensured that the chosen networks did not overfit or underfit.
2.6. Adaptive neuro-fuzzy inference systems
The adaptive neuro-fuzzy inference systems (ANFIS) is used to simulate complex processes with measurement uncertainty. It is a universal approximator that utilizes logical rules to reach an output through human-like reasoning (Jang, 1993). In ANFIS, membership functions are used to map numerical inputs to fuzzy sets. A learning algorithm, similar to the back-propagation algorithm, is used to minimize the errors by optimizing the ANFIS parameters.
Each of the outputs in this study was predicted using a separate ANFIS model. The clustering method used was Grid Partitioning, as it allows for choosing the desired membership functions. Grid partitioning, however, assigns a fuzzy rule for each input-membership function combination, which exponentially increases the number of rules and the computational requirement for training the models.
2.7. Support vector regression
Support vector regression (SVR) is an algorithm, based on the statistical learning theory, that uses the structural risk minimization (SRM) method to minimize the modelling error and maintain low model complexity (Smola & Schölkopf, 2004; Vapnik, 2000). The nonlinear SVR model formula is shown in Eq. (2) for an input vector , output vector , and N number of samples.(2)where is a weight vector, is bias, is a nonlinear kernel function that maps the training data to a higher dimensional feature space, making it possible to linearize the model (Smola & Schölkopf, 2004). A Gaussian kernel was used as it is easier to tune than other functions, and it can also handle complex error boundaries (Goyal & Ojha, 2011).
Three hyperparameters need tuning in SVR: the kernel scale (γ), box constraint (C) and the error band (ε). The kernel scale (γ) determines how much the kernel function will detect variation in the input vectors. It is inversely proportional to the sensitivity of the kernel function to input variation. The SVR model can underfit if the kernel function is not sensitive enough to detect changes in inputs and can overfit if the kernel function was too sensitive to detect the smallest variation in inputs. The box constraint (C) is a regularization factor needed by the SRM to control the penalty on large prediction residual errors. It represents the trade-off between the training error and model complexity, where small C values will result in poor predictions, and large values will cause overfitting. Finally, the error band represents the space where the predictions can be made around the actual measured values. Tighter error bands will result in more accurate predictions at the expense of the model complexity. More details on the mathematical formulation of SVR can be found in (Awad & Khanna, 2015; Cristianini and Shawe-Taylor, 2000; Smola & Schölkopf, 2004).
3. Results and Discussion
3.1. Aerobic Granular Sludge Performance
The reactors simulated in this study were operated for data collection for a collective of 475 days. Periods of stable operation have been observed along with some disruptions due to biomass washout. The average (± standard deviation) influent COD, NH4+-N, and PO43− concentrations in the reactors are 3309±1838, 130±41, and 48±18 mg/L, respectively. The systems exhibited stable organics and nutrients removal throughout the duration of the experiment, with an average COD, NH4+-N, and PO43− removal efficiencies of 96±8, 81±18, and 84±18%, respectively. Aerobic granular sludge has been proven to have consistent good removal of organics and nutrients (de Kreuk et al., 2005; Iorhemen et al., 2020; Nancharaiah & Reddy, 2018). The stratification of aerobic, anoxic, and anaerobic microbial communities has also been observed, resulting in better nitrogen and phosphorus removal (Wang et al., 2008; Yilmaz et al., 2008).
The average MLSS and MLVSS concentrations were 7888±5158 and 6284±3810mg/L, respectively. The average MLSS/MLVSS ratio was 80%, which is a typical value in aerobic treatment reactors. The average SVI5 and SVI30 were 114±63, and 73±34 mL/g, respectively, demonstrating the fast settling of the granules. Granules are considered to have good settling properties when SVI30 is below 100 mL/mg (Hamza et al., 2018; Liu et al., 2007). The average ratio of granulation in AGS reactors, calculated by the SVI30/SVI5 (Hamza et al., 2018), was 64%, and the average granule diameter was 445±206 µm, which showed that the biomass was mostly granular. The presence of some floccular biomass and fluctuation in settling properties were expected due to the variation in F/M ratio (2±1) (Hamza et al., 2018).
3.2. Feature Selection
The ANN, SVR and ANFIS models that were used to predict outputs 1, 2 and 3 (Fig. 3) were developed using the operation and performance dataset, which was collected from the laboratory (Table 2). Feature selection was necessary to overcome the multicollinearity between different parameters and remove the inputs that adversely affect the model performance. The feature selection methods used in this study were able to reduce the level of multicollinearity and identify the most effective parameters for each output, which resulted in the elimination of some parameters. Calculating the VIF for the inputs of each stage resulted in the reduction of inputs by 4, 5, 8, and 7 for stages 2, 3, 4, and 5, respectively. The results of the multicollinearity reduction are shown in Table 3. The level of multicollinearity is accepted once the maximum VIF is below 5. The data collection plan was quite thorough in the selection of parameters to measure. This resulted in a high level of multicollinearity in the initial dataset due to the close relationships between parameters, such as the OLR-COD, or the flowrate-reactor volume-HRT (Price, 1998).
Table 3. Multicollinearity reduction.
Model stage
Max. VIF before reduction
Max. VIF after reduction
Initial number of inputs
Final number of inputs
Number of inputs removed
Stage 2
990.55
4.63
13
9
4
Stage 3
1078.9
2.49
15
10
5
Stage 4
1132.1
3.11
18
10
8
Stage 5
1141.9
4.28
19
12
7
Further dimensionality reduction was applied using the RReliefF algorithm to rank the inputs for each output. Unlike the VIF test that was input dependent, the RReliefF weights depended on the relationship between the inputs and outputs, which resulted in different input weights for each of the outputs even within the same stage. Table 4 shows the final inputs used for the ANN, SVR, and ANFIS models. In Table 4, the ANN was denoted by “N”, SVR was denoted by “S”, and ANFIS was denoted by “A”. Table 4 shows, for each output, the models in which each of the predictors was used, for example, the influent NH4-N, influent PO43−, OLR and HRT were used as inputs for the MLVSS ANFIS model. The ranking done using the RReliefF algorithm was used for the three modelling algorithms. Selected inputs were chosen from the ranked list and noting the model performance until the model stopped improving or was unable to complete the training due to computational limitation (in ANFIS models).
Table 4. Algorithms in which each of the inputs was used for each output.
Parameter
Outputs
MLSS
MLVSS
SVI5
SVI30
Granule Size
Effluent COD
Effluent NH4-N
Effluent PO43−
Influent NH4-N (mg/L)
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
Influent PO43− (mg/L)
N - S - A
N - S - A
N - S
N - S
N - S
N - S - A
N - S - A
N - S - A
Volume (L)
N - S - A
N - S
N - S
N - S - A
N - S - A
Influent pH
N - S - A
N - S
N - S
N - S
N - S
N - S
N - S
OLR (kg COD/m3)
N - S - A
N - S - A
HRT (h)
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
N - S
Superficial Air Vel. (cm/s)
A
N - S - A
N - S - A
N - S - A
N - S - A
N - S - A
Temperature (°C)
N - S
N - S
N - S
N - S - A
N - S - A
N - S - A
N - S
Settling time (min)
S
N - S - A
N - S - A
N - S
N - S - A
N - S
MLVSS (mg/L)
N - S - A
N - S - A
N - S
N - S - A
N - S - A
SVI5 (mL/g)
N - S - A
SVI30 (mL/g)
N - S - A
Granule Size (μm)
N - S - A
N
N - S - A
F/M Ratio
N - S
N
N - S
The ANFIS model was restricted by the number of inputs to be used as is was found that adding more inputs would significantly increase the required training time and CPU usage. ANFIS was limited to a maximum of six inputs selected according to the RReliefF ranking, except for the MLVSS, where the best performance was achieved with four inputs and the effluent ammonia where seven inputs had to be used to achieve acceptable performance.
Feature selection was found to be the most important step in the development, as it has improved the performance of the ANN, SVR, and ANFIS models by raising the overall evaluation average R2 from 93%, 85%, and 83% to 94.2%, 92.4% and 85.6% respectively using the same data and model structure.
3.3. Model Development
The ANN model was trained using the selected inputs in Table 4. The network architectures for each ANN model are shown in Table 5. All ANN models had three hidden layers with different combinations of hidden nodes. Table 5 also shows the tuned SVR hyperparameters. There is a large variation in the values of C, γ and ε between the models. The available literature explains the individual effect of each of the hyperparameters on the performance of the trained SVR models and the risk of overfitting. However, the hyperparameters have a combined effect on the performance. The values of C, γ and ε, were optimized within the SVR training process using Bayesian optimization to minimize 5-fold cross-validation error. The optimization of the hyperparameters results in the best possible prediction accuracy without significant overfitting. Finally, the ANFIS models were all assigned two input membership functions. It was difficult to increase the number of membership functions due to computational limitations that resulted in failing to train the models. Different membership functions were tested, and the best-performing ones were chosen for the final model. Triangular-type membership functions were used for the MLSS, MLVSS, SVI30 and effluent NH4-N (Eq. 3).(3)where a, b and c are constants determined through the ANFIS training. The rest of the ANFIS models used Gaussian Combination membership functions (Eq. 4).(4)where σ is the standard deviation, and µ is the mean of the training data. The average training performance of the ANN model for all outputs was 96%, 0.03, and 3.3% for R2, nRMSE and sMAPE, respectively. The average training performance of the SVR model for all outputs was 95.8%, 0.026, and 1.9% for R2, nRMSE and sMAPE, respectively. The average training performance of the ANFIS model for all outputs was 92.5%, 0.047, and 4.6% for R2, nRMSE and sMAPE, respectively. The overall performance of SVR was better than ANN in terms of nRMSE and sMAPE. The ANFIS model was considerably less accurate than the ANN and SVR, which can be attributed to the computational limitation on the number of inputs and membership functions that were used.
Table 5. ANN architectures, SVR hyperparameters and ANFIS membership functions.
Output
ANN Architectures
SVR Hyperparameters
ANFIS Membership Function
C
γ
ε
Empty Cell
MLSS (mg/L)
6-4-3
1.763
1.8302
1.33E-04
Triangular
MLVSS (mg/L)
5-4-9
244.46
8.6594
0.0004098
Triangular
SVI5 (mL/g)
6-3-1
49.959
4.1236
0.0001425
Gaussian Combination
SVI30 (mL/g)
2-7-4
0.59194
5.2062
0.013026
Triangular
Granule Size (μm)
2-9-8
121.55
1.4949
0.0003605
Gaussian Combination
Effluent COD (mg/L)
7-1-1
7.9007
2.7712
7.226E-06
Gaussian Combination
Effluent NH4-N (mg/L)
6-6-1
1.2535
1.0669
0.0007429
Triangular
Effluent PO43− (mg/L)
5-1-1
17.613
2.87
0.0002448
Gaussian Combination
The outputs of the ANN, SVR and ANFIS models were used as inputs to five ensemble alternatives: E-ANN, E-SVR, E-ANFIS, E-AVG and E-WAVG. The E-AVG was the arithmetic mean of the three inputs, which resulted in an overall average training performance of 97.3%, 0.027, and 2.8% for R2, nRMSE and sMAPE, respectively. E-WAVG was a weighted average of the three inputs using the training R2 of the ANN, SVR and ANFIS for each output as relative weights. This approach had the advantage of favouring the more accurate inputs which provided an improvement over E-AVG when there was a large difference in accuracy between the ANN, SVR and ANFIS. The overall average performance of the E-WAVG was 97.8%, 0.024, and 2.4% for R2, nRMSE and sMAPE, respectively. The E-ANN, E-SVR, and E-ANFIS are machine learning-based ensembles where the ANN, SVR and ANFIS outputs were used as inputs. Table 6 shows the architecture, hyperparameters and membership functions of the E-ANN, E-SVR, and E-ANFIS, respectively. These algorithms were much simpler in their development as they were intended to use three versions of the true output to make the predictions. Also, the number of inputs was much less than the original three models. Maintaining a simple model was essential to ensure that overfitting was minimal, considering the small number of inputs (three) and that the inputs are variants of the same parameters that are already close to the desired solution. All neural networks had one hidden layer with a single neuron providing a training performance of 98.7%, 0.016, and 1.8% for R2, nRMSE and sMAPE respectively. The E-SVR hyperparameters provide insight into the performance of the algorithm, where large C values indicate larger penalties on errors in all outputs. However, the γ values were also much higher than the SVR model, indicating that the kernel function is not as sensitive to the variation in the inputs. The overall average training performance of the E-SVR was 98.7%, 0.016, and 1.6% for R2, nRMSE and sMAPE, respectively. The E-ANFIS models had 2 Gaussian Combination membership functions with an overall training performance of 94.3%, 0.031, and 1.9% for R2, nRMSE and sMAPE, respectively.
Table 6. E-ANN architectures, E-SVR hyperparameters and E-ANFIS membership functions.
Output
E-ANN Architectures
E-SVR Hyperparameters
E-ANFIS Membership Function
C
γ
ε
Empty Cell
MLSS (mg/L)
1
274.69
3.1267
0.0001948
Gaussian Combination
MLVSS (mg/L)
1
969.84
16.172
0.0006375
Gaussian Combination
SVI5 (mL/g)
1
932.21
222.88
0.0005993
Gaussian Combination
SVI30 (mL/g)
1
286.84
38.591
0.0071388
Gaussian Combination
Granule Size (μm)
1
892.45
377.84
0.0002951
Gaussian Combination
Effluent COD (mg/L)
1
998.2
115.86
0.0001844
Gaussian Combination
Effluent NH4-N (mg/L)
1
886.38
329.18
0.0002088
Gaussian Combination
Effluent PO43− (mg/L)
1
942.59
231.49
0.0012618
Gaussian Combination
3.4. Model Performance
The developed algorithms in this study (ANN, SVR, ANFIS, E-ANN, E-SVR, E-ANFIS, E-AVG, E-WAVG) were all validated using the evaluation dataset that was isolated before developing the models. Table 7 shows the overall evaluation performance averaged for all outputs for each algorithm. It was found that the E-ANN provided the best performance in terms of R2, nRMSE, and sMAPE. The E-SVR and E-WAVG provided a close performance to the E-ANN, but the E-ANFIS was not able to reach the same level of performance.
Table 7. Overall average evaluation performance.
Metric
ANN
SVR
ANFIS
E-ANN
E-SVR
E-ANFIS
E-AVG
E-WAVG
R2
94.2%
92.4%
85.6%
95.2%
94.5%
80.3%
94.6%
95%
nRMSE
0.037
0.043
0.062
0.034
0.036
0.081
0.037
0.035
sMAPE
4.2%
4.6%
7.7%
3.8%
4%
6.4%
4.5%
4.2%
Although E-ANN outperformed the other ensemble algorithm in the overall performance, the E-WAVG was able to provide a somewhat better prediction accuracy for the granule size with an R2 of 89% as opposed to the E-ANN with an R2 of 85%. The ensemble models did not provide an improvement over the ANN for the prediction of the effluent COD, where the ANN provided an R2 of 99.65% as opposed to the 99.3% R2 of the E-ANN. The best performing models for each of the outputs were selected for the final model as shown in Table 8. Fig. 6 shows the diagonal plots of the final selected models.
Table 8. Final model performance using the best performing algorithms.
Output
Chosen Algorithm
R2 (%)
nRMSE
sMAPE (%)
Training
Evaluation
Training
Evaluation
Training
Evaluation
MLSS (mg/L)
E-NN
97.80%
96.11%
0.031
0.036
3.06%
4.05%
MLVSS (mg/L)
E-NN
97.58%
94.30%
0.031
0.043
2.91%
4.56%
SVI5 (mL/g)
E-NN
98.53%
95.89%
0.017
0.028
2.29%
3.16%
SVI30 (mL/g)
E-NN
95.81%
92.19%
0.025
0.030
2.93%
3.49%
Granule Size (μm)
E-WAVG
97.06%
88.75%
0.038
0.073
2.54%
3.78%
Effluent COD (mg/L)
NN
99.65%
99.65%
0.009
0.008
3.06%
5.63%
Effluent NH4-N (mg/L)
E-NN
99.89%
99.53%
0.008
0.021
0.92%
1.98%
Effluent PO43− (mg/L)
E-NN
99.72%
98.96%
0.010
0.018
1.85%
2.88%
After using the best performing ensemble algorithms, the final model improved the overall prediction accuracy over the ANN model, the best performing single algorithm, by raising the overall average R2 from 94.2% to 95.2% and reducing the overall average nRMSE from 0.037 to 0.032, and the overall average sMAPE from 4.2% to 3.7%. The most significant improvement was observed in the granule size, where the R2 was increased from 81% to 88.2%.
It was found that even though the E-SVR was close to the E-ANN in terms of prediction accuracy of the evaluation data, it suffered from a slightly higher level of overfitting in some of the outputs where the difference between the training and evaluation predictions was larger than that of E-ANN. The E-ANFIS did not provide predictions as accurately as the other ensembles, as there was not enough distinction in the input parameters for the ANFIS to be able to develop a reliable rule-base.
Table 9 compares the model developed in this study to other machine learning models in the literature using the prediction R2 of the validation datasets as a performance indicator. The comparison shows that this study can achieve prediction accuracies that are similar to those made by AGS machine learning models with much larger datasets (Zaghloul et al., 2020). It can also be observed that the performance of the model developed in this study achieves prediction accuracies that are higher than AGS models that were developed with small datasets (Gong et al., 2018; Mahmod & Wahab, 2017). Other CAS models with small datasets resulted in predictions that are consistent with AGS models (El-Din et al., 2004; Manu & Thalla, 2017). The small size of test datasets can result in less reliable models that do not provide consistent results, which is demonstrated in the results of Mahmod and Wahab (2017), where the training and testing R2 were 78% and 91.17%, respectively (Mahmod & Wahab, 2017). SVR and ANNs were found to perform at the same level of accuracy in other CAS and AGS models, which is consistent with the models developed in this study (Gong et al., 2018; Guo et al., 2015; Mahmod & Wahab, 2017; Seshan et al., 2014).
Table 9. Comparison between the dataset size and prediction R2 (%) of this study and other machine learning models for AGS and CAS.
COD dataset was 205 days, TN dataset was 136 days.
4. Multi-Stage Model Structure
AGS machine learning models in the literature are all single-stage models where a group of inputs is used to predict the final effluent quality parameters without considering the process sequence (Gong et al., 2018; Mahmod & Wahab, 2017). Two-stage models were designed to improve the model structure with success (Zaghloul et al., 2020). The multi-stage model structure makes the model developed in this study more versatile than other machine learning models as it considers the effect of influent characteristics on the biomass properties and the consequent effect on the effluent quality. The multi-stage model structure also provides the ability to identify the source of predicted effluent quality issues, where the different biomass properties that were predicted at the same instance can be analyzed. This mitigates the disadvantage of the application of black-box models that can be difficult to use for process interpretation (Newhart et al., 2019).
5. Failure prediction
The model developed in this study can accurately predict the performance of AGS reactors under varying operational and influent conditions. The model provides a tool that can be used to forecast the reactor behaviour during operation, which will guide the operators on experimental design. Fig. 7 shows the predictions made for a portion of the evaluation dataset (chronologically ordered and obtained from the same reactor) plotted with failure thresholds. Operators can utilize such figures to identify sources of process failures and potential causes.
The local treated effluent standards were set as thresholds for this study. Maximum effluent COD, NH4-N, and PO43− were set to 20 mg/L, 10 mg/L, and 0.5 mg/L, respectively. The threshold for the granule size was set to 200 µm, below which the biomass would be considered floccular (Liu et al., 2010), which indicates either structural integrity failure of the granule or failure to achieve a state of granulation. The MLSS threshold was chosen for this study to be 4000 mg/L. The MLSS dropping below the threshold would indicate a washout of the biomass due to poor settling. The settling properties can also be predicted by the SVI values. Additionally, the SVI30/SVI5 indicates the percentage or granulation inside the reactor, as defined by Kocaturk and Erguder (2016). The threshold for the SVI30/SVI5 was set to 50% for this study.
It can be observed that a failure to remove NH4-N has occurred between samples 218 and 232, where the effluent concentration reached 28 mg/L. The MLSS plot shows a rapid decline in the MLSS concentration, which entailed a loss of the slow-growing nitrifying bacteria; thus, the delayed effect of poor NH4-N removal while the biomass recovered. Further analysis of the results shows that there was a drop in the granule size and the granulation ratio inside the reactor, indicating that a partial washout has occurred, followed by a rapid growth of heterotrophic biomass in floccular form due to the abundance of organics (F/M ratio was disturbed due to the loss of biomass).
The influent COD was reduced from around 7000 mg/L to 4500 mg/L after the biomass washout instance. This improved the observed COD removal efficiency as the new heterotrophic growth was able to handle the influent organics. The effluent COD concentration was above the allowed threshold as the reactor was being operated to treat high organic content wastewater, where the effluent wastewater was to be polished before disposal. The aerobic biological process, although successful in removing most of the organic load, was unable to bring the COD concentrations below the required limits (Hamza et al., 2018).
6. Conclusion
A machine learning model was developed for AGS reactors using a combination of neural networks, support vector regression and adaptive neuro-fuzzy inference systems. Feature selection methods were applied, and a five-stage model structure was adopted. This study shows that proper feature selection and combining multiple machine learning algorithms in an ensemble can improve the performance of data-driven models when the available dataset is small. The two feature selection methods were applied to reduce the dimensionality of the regression problem and reduce the multicollinearity of the input data. Combining multiple algorithms using simple neural networks or weighted average ensembles reduced the levels of over and under-fitting of the individual algorithms. The modular nature of the model structure allowed the use of best performing models, out of the eight alternatives, for each output. The model developed in this study was able to predict the behaviour of AGS reactors and provide insight into the process by explaining the causes of predicted failures.
Declaration of Competing Interest
None.
Acknowledgements
The authors would like to acknowledge the National Science and Engineering Research Council of Canada for funding this research.
M.W. Lee, S.H. Hong, H. Choi, J.-H.H. Kim, D.S. Lee, J.M. Park
Real-time remote monitoring of small-scaled biological wastewater treatment plants by a multivariate statistical process control and neural network-based software sensors
J. Liang, Q. Wang, Q.X. Li, L. Jiang, J. Kong, M. Ke, M. Arslan, M. Gamal El-Din, C Chen
Aerobic sludge granulation in shale gas flowback water treatment: Assessment of the bacterial community dynamics and modeling of bioreactor performance using artificial neural network
Artificial intelligence models for predicting the performance of biological wastewater treatment plant in the removal of Kjeldahl Nitrogen from wastewater
Rice, E. W., Baird, R. B., & Eaton, A. D. (2017). Standard Methods for the Examination of Water and Wastewater, 23rd Edition (23rd ed.). American Public Health Association, American Water Works Association, Water Environment Federation.
Sammut, C., & Webb, G. I. (2016). Encyclopedia of Machine Learning and Data Mining (C. Sammut & G. I. Webb (eds.)). Springer {US}. 10.1007/978-1-4899-7687-1
M.S. Zaghloul, R.A. Hamza, O.T. Iorhemen, J.H. Tay, O. Terna Iorhemen, J.H. Tay
Comparison of adaptive neuro-fuzzy inference systems (ANFIS) and support vector regression (SVR) for data-driven modelling of aerobic granular sludge reactors
Biochar materials have recently received considerable recognition as eco-friendly and cost-effective adsorbents capable of effectively removing hazardous emerging contaminants (e.g., pharmaceuticals, herbicides, and fungicides) to aquatic organisms and human health accumulated in aquatic ecosystems. In this study, ten tree-based machine learning (ML) models, including bagging, CatBoost, ExtraTrees, HistGradientBoosting, XGBoost, GradientBoosting, DecisionTree, Random Forest, Light gradient Boosting, and KNearest Neighbors, have been built to accurately predict the adsorption capacity of biochar materials toward ECs in aqueous solutions. A very large data set with 3,757 data points was generated using 24 input variables (i.e., pyrolysis conditions for biochar production (3 features), biochar characteristics (3 features), biochar compositions (6 features), and adsorption experimental conditions (12 features)) obtained from the batch adsorption experiments to remove 12 kinds of ECs using 18 different biochar materials. The rigorous evaluation and comparison of the ML model performances shows that CatBoost model had the highest test coefficient of determination (0.9433) and lowest mean absolute error (4.95 mg/g), outperformed clearly all other models. The feature importance analyzed by the shapley additive explanations (SHAP) indicated that the adsorption experimental conditions provided the highest impact on the model prediction for adsorption capacity (41 %) followed by the adsorbent composition (35 %), adsorbent characterization (20 %), and synthesis conditions (3)%). The optimized experimental conditions predicted by the modeling were a N/C ratio of 0.017, BET surface area of 1040 m2/g, content of C(%) contents of 82.1 %, pore volume of 0.46 cm3/g, initial ECs concentration of 100 mg/L, type of pollutant (CAR), adsorption type (Single) and adsorption contact time (720 min).
2022, Journal of Environmental Chemical Engineering
Citation Excerpt :
This paper presents a machine learning ensemble model for a full-scale BNR process. It is the first application of the modular multi-stage model structure developed by Zaghloul et al. [34] in full-scale using real wastewater, after its success in laboratory-scale applications using synthetic wastewater. It is also the first model to predict 15 process parameters using machine learning.
A full-scale biological nutrient removal wastewater treatment process was simulated using artificial intelligence. In wastewater treatment plants, adaptive machine learning models can reduce process disruptions and generate savings through optimized operation. Machine learning is also useful when simulating processes that are particularly complex and where the physio-chemical interactions are not well understood, such as biological nutrients removal. Current models in literature only focus on the prediction of a small number of effluent parameters using a direct input-output approach. This paper presents a machine learning ensemble model that combines artificial neural networks, adaptive neuro-fuzzy inference systems, and support vector regression to predict 15 process parameters that include biomass properties, operation parameters, and effluent characteristics. A historical dataset between 2010 and 2020 was used to develop and validate the model. The model features a six-stage modular model structure where each parameter was predicted using a separate model and based on the preceding predicted parameters. The average correlation coefficient, normalized root mean square error, and symmetric mean absolute error of 69%, 0.06%, and 7.5%, respectively. The ensemble approach improved the average prediction accuracy over individual base models by 5%. The model developed in this study was more versatile than other machine learning models in the literature and relatively reduced the ambiguity of black-box data-driven models.
The attempt in NRGS began by using various models, such as adaptive neuro-fuzzy inference system, support vector regression, and artificial neuron network, in AGS to achieve process-simulation and performance-prediction (Liang et al., 2020; Zaghloul et al., 2018; 2020). In recent studies, an ensemble of machine learning was developed with a good fit to model AGS reactors: A five-stage machine learning model was proposed, which could predict the performance of AGS and explain the reason for predicted failures (Zaghloul et al., 2021). Although these properties are artificially classified as physical-, chemical-, biological- and systematic, their inherent associations cannot be overlooked.
Nitrogen-removal granular sludge (NRGS) is a promising technology in wastewater treatment, with advantages of efficient nitrogen removal, less footprint, lower sludge production and energy consumption, and is a way for wastewater treatment plants to achieve carbon–neutrality. Aerobic granular sludge (AGS) and anammox granular sludge (AnGS) are two typical NRGS technologies that have attracted extensive attention. Mounting evidence has shown strong associations between NRGS properties and the status of NRGS systems; however, a holistic view is still missing. The aim of this article is to provide an overview of NRGS with an emphasis on characterization. Specifically, the integrated nitrogen transformation pathways inside NRGS and the performance of NRGS treating various wastewaters are discussed. NRGS properties are categorized as physical-, chemical-, biological- and systematical ones, presenting current advances and corresponding characterization technologies. Finally, the future prospects for furthering the mechanistic understanding and engineering application of NRGS are proposed. Overall, the technological advancements in characterization have greatly contributed to understanding NRGS properties, which are potential factors for optimizing the performance and evaluating the working status of NRGS. This review will provide guidance in characterizing NRGS properties and boost the introduction of novel characterization technologies.
Dark fermentation process for simultaneous wastewater treatment and H2 production is gaining attention. This study aimed to use machine learning (ML) procedures to model and analyze H2 production from wastewater during dark fermentation. Different ML procedures were assessed based on the mean squared error (MSE) and determination coefficient (R2) to select the most robust models for modeling the process. The research showed that gradient boosting machine (GBM), support vector machine (SVM), random forest (RF) and AdaBoost were the most appropriate models, which were optimized by grid search and deeply analyzed by permutation variable importance (PVI) to identify the relative importance of process variables. All four models demonstrated promising performances in predicting H2 production with high R2 values (0.893, 0.885, 0.902 and 0.889) and small MSE values (0.015, 0.015, 0.016 and 0.015). Moreover, RF-PVI demonstrated that acetate, butyrate, acetate/butyrate, ethanol, Fe and Ni were of high importance in decreasing order.
Then the outputs were combined as inputs for the subsequent ensemble algorithms, of which the best outputs were determined to be the final prediction. This approach improved the performance of ML models with a small dataset by combining the advantages of different algorithms instead of discussing and selecting the best single one (Zaghloul et al., 2020b). In addition to the common indicators mentioned above, some other parameters related to WWTPs have also been predicted.
Water resources of desired quality and quantity are the foundation for human survival and sustainable development. To better protect the water environment and conserve water resources, efficient water management, purification, and transportation are of critical importance. In recent years, machine learning (ML) has exhibited its practicability, reliability, and high efficiency in numerous applications; furthermore, it has solved conventional and emerging problems in both natural and engineered water systems. For example, ML can predict various water quality indicators in situ and real-time by considering the complex interactions among water-related variables. ML approaches can also solve emerging pollution problems with proven rules or universal mechanisms summarized from the related research. Moreover, by applying image recognition technology to analyze the relationships between image information and physicochemical properties of the research object, ML can effectively identify and characterize specific contaminants. In view of the bright prospects of ML, this review comprehensively summarizes the development of ML applications in natural and engineered water systems. First, the concept and modeling steps of ML are briefly introduced, including data preparation, algorithm selection and model evaluation. In addition, comprehensive applications of ML in recent studies, including predicting water quality, mapping groundwater contaminants, classifying water resources, tracing contaminant sources, and evaluating pollutant toxicity in natural water systems, as well as modeling treatment techniques, assisting characterization analysis, purifying and distributing drinking water, and collecting and treating sewage water in engineered water systems, are summarized. Finally, the advantages and disadvantages of commonly used algorithms are analyzed according to their structures and mechanisms, and recommendations on the selection of ML algorithms for different studies, as well as prospects on the application and development of ML in water science are proposed. This review provides references for solving a wider range of water-related problems and brings further insights into the intelligent development of water science.
Global development of supervision tools and reliable real-time control was applied to the wastewater treatment process. The ANNs have proven to be the universal tool for forecasting and prediction where the much-desired input to output is determined by external and supervised adjustment of the system parameters (Zaghloul et al., 2021). Artificial neural network (ANNs) are designed to solve problems with unknown required output (unsupervised learning algorithms) and known output (supervised learning algorithms) (Fernando and Surgenor, 2017).
Artificial intelligence is finding its ways into the mainstream of day-to-day operations. Novel AI application techniques such as the artificial neural network (ANN), fuzzy logic, genetic algorithms and expert systems have gained popularity in the fourth industrial revolution era. Due to the chemical composition, inherent complexity, incoherent flow rate and higher safety factor in the effective operation of the biological wastewater treatment process, the AI-based model was extensively tested in managing the wastewater treatment operations. The interrelationship between COD and trace metals was studied using an AI-based prediction model with ANNs incorporated in MATLAB. A supervised learning algorithm was used for training the ANNs and to relate input data to output data. The training was aimed at estimating, validating, predicting the parameters by an error function minimization. The goodness of the prediction was attained with the coefficient of determination (R2) of 0.98–0.99, a sum of square error (SSE) 0.00029–0.1598, room mean-square error (RMSE) of 0.0049–0.8673, mean squared error (MSE) 2.7059e-14 to 2.3175e-15. The ANNs models were found to be a robust tool for predicting WWTP performance. The predictive approaches can be used in the prediction of environmental management and other emerging technologies. This will meet the cost-effectiveness, effective environmental and technical criteria with a wide range of big-data support and implementation of the sustainable development goals, circular bio-economy and industry 4.0.