Elsevier

Water Research 水研究

Volume 189, 1 February 2021, 116657
卷 189,2021 年 2 月 1 日,116657
Water Research

Development of an ensemble of machine learning algorithms to model aerobic granular sludge reactors
开发一组机器学习算法以建模好氧颗粒污泥反应器

https://doi.org/10.1016/j.watres.2020.116657Get rights and content 获取权利和内容

Highlights 亮点

  • A new multi-stage model structure was developed to explain the effluent predictions.
    开发了一种新的多阶段模型结构,以解释排放预测。
  • Multicollinearity reduction and RReliefF algorithm improved the model performance.
    多重共线性减少和 RReliefF 算法提高了模型性能。
  • An ensemble of machine learning algorithms was used for more accurate predictions.
    使用了一组机器学习算法以实现更准确的预测。
  • The average R2, nRMSE, and sMAPE were 95.7%, 0.032, and 3.7% respectively.
    平均 R2、nRMSE 和 sMAPE 分别为 95.7%、0.032 和 3.7%。
  • The model was able to explain a failure instance in the predicted dataset.
    该模型能够解释预测数据集中一个失败实例。

Abstract 摘要

Machine learning models provide an adaptive tool to predict the performance of treatment reactors under varying operational and influent conditions. Aerobic granular sludge (AGS) is still an emerging technology and does not have a long history of full-scale application. There is, therefore, a scarcity of long-term data in this field, which impacted the development of data-driven models. In this study, a machine learning model was developed for simulating the AGS process using 475 days of data collected from three lab-based reactors. Inputs were selected based on RReliefF ranking after multicollinearity reduction. A five-stage model structure was adopted in which each parameter was predicted using separate models for the preceding parameters as inputs. An ensemble of artificial neural networks, support vector regression and adaptive neuro-fuzzy inference systems was used to improve the models’ performance. The developed model was able to predict the MLSS, MLVSS, SVI5, SVI30, granule size, and effluent COD, NH4-N, and PO43− with average R2, nRMSE and sMAPE of 95.7%, 0.032 and 3.7% respectively.
机器学习模型提供了一种自适应工具,用于预测在不同操作和进水条件下处理反应器的性能。好氧颗粒污泥(AGS)仍然是一项新兴技术,尚未有长时间的全规模应用历史。因此,该领域缺乏长期数据,这影响了数据驱动模型的发展。在本研究中,开发了一种机器学习模型,用于模拟 AGS 过程,使用了从三个实验室反应器收集的 475 天数据。输入变量是在多重共线性减少后基于 RReliefF 排名选择的。采用了五阶段模型结构,其中每个参数使用前一个参数作为输入的单独模型进行预测。为了提高模型的性能,使用了人工神经网络、支持向量回归和自适应神经模糊推理系统的集成。所开发的模型能够预测 MLSS、MLVSS、SVI5、SVI30、颗粒大小以及出水 COD、NH4-N 和 PO43−,其平均 R2、nRMSE 和 sMAPE 分别为 95.7%、0.032 和 3.7%。

Keywords 关键词

Machine Learning
Artificial neural networks
Adaptive Neuro-Fuzzy Inference Systems
Support Vector Regression
Aerobic granular sludge
Sequencing Batch Reactors

机器学习 人工神经网络 自适应神经模糊推理系统 支持向量回归 好氧颗粒污泥 序批反应器

1. Introduction 1. 引言

Aerobic granular sludge (AGS) is a promising biological wastewater treatment technology that has shown excellent performance in laboratories for the treatment of domestic and high-strength wastewater and is starting to be applied in full-scale wastewater treatment plants (WWTPs) (Pronk et al., 2015; Zheng et al., 2020). AGS has certain advantages over conventional activated sludge (CAS) in terms of lower reactor footprint, higher capacity for organic loading and simultaneous removal of nutrients and organics (He et al., 2020). The compact structure of the biomass granules gives the reactor higher resilience against shock loads and toxic wastewater and provides better biomass retention due to the enhanced settling properties (Franca et al., 2018; Nancharaiah & Reddy, 2018).
好氧颗粒污泥(AGS)是一种有前景的生物废水处理技术,在实验室中对生活污水和高强度废水的处理表现出色,并开始在全规模废水处理厂(WWTPs)中应用(Pronk 等,2015;Zheng 等,2020)。与传统活性污泥(CAS)相比,AGS 在反应器占地面积更小、有机负荷能力更高以及同时去除营养物质和有机物方面具有一定优势(He 等,2020)。生物质颗粒的紧凑结构使反应器对冲击负荷和有毒废水具有更高的抗压能力,并由于增强的沉降特性提供了更好的生物质保留(Franca 等,2018;Nancharaiah & Reddy,2018)。
Although AGS has consistently shown promise in terms of performance, the operation of AGS bioreactors is challenging due to the large number of factors affecting the process (Wilén et al., 2018). The characteristics of the influent wastewater, biomass properties within the reactor and operational conditions all play a significant role in the removal efficiency of the reactor. Additionally, these factors are interconnected and have complex nonlinear relationships (Khan et al., 2013). Influent characteristics and the mode of operation of the sequencing batch reactor (SBR) play a significant role in the biomass microbial culture, which in turn affects the integrity of the granule structure and settling ability. The settling ability of biomass is also affected by the settling time, volumetric exchange ratio, and discharge time. At the end of the settling time, slow settling biomass that does not settle below the effluent port gets washed out of the reactor during the decant phase, leaving the faster settling granules inside the reactor (Qin et al., 2004; Wang et al., 2004). This also affects the concentration of biomass left inside the reactor after every cycle of operation, which provides seed for new granule formation and, therefore, directly affects the level of organics and nutrients removal. A certain aeration time is necessary for aerobic degradation of organics, nitrification and for providing the required shear force that triggers the granulation process in the biomass flocs, the latter being the governing factor (Hamza et al., 2018). Other factors that affect the AGS process include the influent pH, volumetric exchange ratio, hydraulic retention time (HRT), and temperature (Khan et al., 2013). Sudden changes to these factors can lead to the failure of the structural integrity of the granules and the washout of the biomass out of the reactor, resulting in the reactor failing to meet the required effluent quality. Since these factors do continue to change, the operation of an AGS system is challenging and requires careful monitoring.
尽管 AGS 在性能方面一直显示出良好的前景,但由于影响过程的因素众多,AGS 生物反应器的操作仍然具有挑战性(Wilén 等,2018)。进水废水的特性、反应器内的生物质特性以及操作条件在反应器的去除效率中都起着重要作用。此外,这些因素是相互关联的,并且具有复杂的非线性关系(Khan 等,2013)。进水特性和序批反应器(SBR)的操作模式在生物质微生物培养中起着重要作用,这反过来又影响颗粒结构的完整性和沉降能力。生物质的沉降能力还受到沉降时间、体积交换比和排放时间的影响。在沉降时间结束时,未能沉降到排放口下方的缓慢沉降生物质会在排水阶段被冲出反应器,留下沉降较快的颗粒在反应器内(Qin 等,2004;Wang 等,2004)。 这也影响了每个操作周期后反应器内剩余生物量的浓度,这为新颗粒的形成提供了种子,因此直接影响有机物和营养物质的去除水平。为了有氧降解有机物、硝化以及提供触发生物量絮凝体颗粒化过程所需的剪切力,必须有一定的曝气时间,后者是主导因素(Hamza et al., 2018)。影响 AGS 过程的其他因素包括进水 pH 值、体积交换比、水力停留时间(HRT)和温度(Khan et al., 2013)。这些因素的突然变化可能导致颗粒的结构完整性失效以及生物量从反应器中冲洗出去,导致反应器无法满足所需的出水质量。由于这些因素持续变化,AGS 系统的操作具有挑战性,需要仔细监测。
A tool that can simulate AGS reactors considering all previously mentioned factors would help alleviate some of this challenge. Such a tool can provide operators with the ability to predict the reactor performance and adapt as the quality of influent wastewater changes.
There are several studies in the literature that present physical models for AGS reactors (Baeten et al., 2019; Ni & Yu, 2010). Physical AGS models utilize the biofilm model to simulate the diffusion of the substrate into the granules and Activated Sludge Models (ASM) based equations to simulate the kinetics of the biological process (Cui et al., 2020). Many restrictions and assumptions have to be made to be able to develop physical models without being overly complicated (Ni & Yu, 2010). The calibrated kinetic and stoichiometric parameters will change with any change in operation or influent wastewater, making it challenging to use physical models for process control (Baeten et al., 2018). Physical models, however, are excellent for understanding the biological processes, conversion rates, and for studying the factors affecting the process performance.
Machine learning provides an excellent alternative to physical models for predicting reactor performance and process control. Data-driven models can overcome the need for continuous re-calibration of physical models. They are more adaptive and can learn from new data that is collected as the process continues to run (El-Din et al., 2004). The use of machine learning for AGS modelling is still not as studied as physical models (Baeten et al., 2019). Artificial neural networks (ANN) were used to simulate the AGS process in a simple model structure where only chemical oxygen demand (COD) and total nitrogen (TN) removals were predicted with an R2 of 0.9 and 0.81 (Gong et al., 2018). Single hidden layer ANNs were used to predict the effluent COD using six inputs, resulting in an R2 of 0.91 (Mahmod & Wahab, 2017). Another AGS model was developed using single hidden layer feed-forward ANNs using eight inputs to predict the effluent COD, NH4-N, and TN with an R2 of 0.9988, 0.9997, and 0.9991, respectively (Liang et al., 2020). A more comprehensive model structure was developed using feed-forward multi-layer ANNs to simulate the full AGS process, including the prediction of biomass characteristics and the removal rates of COD, ammonia, and phosphates, with a minimum prediction R2 of 99% (Zaghloul et al., 2018). Adaptive neuro-fuzzy inference systems (ANFIS) and support vector regression (SVR) were investigated as alternative algorithms to ANN, concluding that SVR provided comparable results to ANN with a minimum R2 of 0.997, while ANFIS provided lower prediction accuracy than ANN and SVR with a minimum R2 of 0.815 when simulating AGS reactors (Zaghloul et al., 2020).
Aside from modelling AGS, machine learning has shown excellent performance in simulating other wastewater treatment processes such as CAS, showing the potential application of machine learning in forecasting and process control (Corominas et al., 2018). An ANN model was successfully used for modelling the BOD and TSS removal in a full-scale CAS process using single input-single output models with an R2 of 0.665 and 0.542, respectively (Hamed et al., 2004). ANN was also used for the development of software sensors that predict the effluent TN, TP and COD for a real-time remote monitoring system in another full-scale CAS treatment plant, with an R2 of 0.952, 0.934, and 0.921, respectively (Lee et al., 2008). SVR was used to predict the removal of COD, ammonia, and nitrates in a CAS process using microbial community data with an R2 of 0.9501, 0.7936, and 0.8916, respectively (Seshan et al., 2014). ANN and SVR were compared for the prediction of effluent TN in a CAS process treating food waste leachate, showing that both algorithms performed similarly where the R2 was 0.47 and 0.46 respectively, however, the SVR suffered from overfitting where the training R2 was 1.00 (Guo et al., 2015). ANFIS was compared to SVR for predicting the removal of TKN in a full-scale BNR plant, where SVR provided better performance than ANFIS with R2 values of 0.85 and 0.91, respectively (Manu & Thalla, 2017).
The studies above concluded that ANN, SVR and ANFIS are capable of simulating various biological treatment processes in WWTPs. It was also observed that while ANN provided reliable results, it required the largest training datasets to provide good quality modelling. SVR was reported to provide unique solutions to regression problems and not as likely to get trapped in local error minima during error optimization as ANN, but it is hard to interpret the final model formulation, and the computational requirements increase with larger datasets (Karamizadeh et al., 2014, September). ANFIS models can be relatively easier to interpret than ANN and SVR, but the number of fuzzy rules increases exponentially with the number of input variables and input membership functions (Stathakis et al., 2006). Ye et al. (2020) detailed the characteristics, advantages and limitations of several algorithms, including the ones used in this study (Ye et al., 2020). They showed that: (1) ANNs are accurate but have the risk of overfitting and harder to find the best architecture. (2) SVR works well with noisy data and does not require as much training data but needs higher computation power that other algorithms. (3) ANFIS can optimally solve nonlinear problems, but it is difficult to find the best model structure.
Machine learning ideally requires large datasets for training the algorithms (Liu et al., 2017). Databases from AGS WWTPs are still not large enough for conventional machine learning simulations. Small datasets are challenging when used for training machine learning models, i.e. the training process becomes highly affected by data quality issues, dimensionality, and multicollinearity (Shaikhina & Khovanova, 2017). Additionally, small datasets with high dimensionality increase the required level of model complexity to achieve reasonable prediction accuracies (Wójcik & Kurdziel, 2019). Data pre-processing and feature selection play an important role in handling outliers and gaps, normalizing features, and reducing dimensionality and multicollinearity in the dataset, which improves the model training and final performance.
This work presents a modelling approach for AGS reactors when only small datasets are available. Data were pre-processed and cleaned, then feature-selection was performed using the variance inflation factor (VIF) for reducing multicollinearity and the RReliefF algorithm for ranking inputs. A combination of ANN, SVR and ANFIS algorithms was used via different ensemble techniques, and the best performing technique was used for the final model. A multi-stage model structure was developed to provide stepwise predictions where outputs of each stage get added to the potential pool of inputs for the following stage. The purpose of this model is to provide a tool that can predict the biomass characteristics inside AGS reactors, effluent characteristics (concentrations of COD, NH4-N, and PO43−), and potential failure to meet user-defined treatment requirements.

2. Methods

2.1. Experimental Setup

Three SBRs were set up and operated to collect the required data for this study. Reactor R1 had a diameter of 89 mm, and a working volume of 4.5 L. Reactors R2 and R3 had a diameter of 150 mm, and a working volume of 19 L. Fig. 1 shows the general setup of the reactors.
Fig 1
  1. Download: Download high-res image (283KB)
  2. Download: Download full-size image

Fig. 1. SBR reactors setup.

The SBR operation was automated with scheduled times for fill, idle, aeration (reaction), settling, and draw (decant). Table 1 shows the cycle times and superficial air velocity used for the duration of the data collection period. Aeration was provided using air compressors and controlled using Cole-Parmer airflow meters and regulators. Air was diffused into the reactor using Paintair fine bubble ceramic diffusers (AS4). Masterflex peristaltic pumps were used for feeding the reactors.

Table 1. Reactor operation parameters.

ParameterEmpty CellReactor R1Reactor R2Reactor R3
Fill Time (min)6 – 76 – 860
Idle Time (min)0 – 51 – 32
Aeration Time (min)180 – 182180 – 222145 – 172
Settling Time (min)3 – 158 – 305 – 30
Decanting Time (min)1 – 611
Superficial Air Velocity (cm/s)1.6 – 42.113
The reactors were operated using synthetic wastewater prepared as detailed in (Tay et al., 2002). The main carbon, nitrogen and phosphorus sources were sodium acetate, ammonium chloride, monopotassium and dipotassium phosphate, respectively. Return activated sludge (RAS) was procured from the Pine Creek wastewater treatment plant for seeding the granulation process. The reactors were run at a stable temperature of 18±2°C. Influent, effluent and biomass samples were collected daily. Mixed liquor suspended solids (MLSS), mixed liquor volatile suspended solids (MLVSS), 5-minute sludge volumetric index (SVI5) and 30-minute SVI (SVI30) were measured according to standard methods (Rice et al., 2017). The United States Environmental Protection Agency (USEPA) reactor digestion method was adopted for the measurement of COD using a HACH DR-2400 spectrophotometer. The salicylate method was used to measure ammonia with TNT 830, 831, 832 and 833 kits. Ion chromatography was used to measure reactive phosphate using a Metrohm Compact IC Flex based on the Standard Methods for the Examination of Water and Wastewater (Rice et al., 2017). Laser particle size analysis was used to measure the granule size (Malvern MasterSizer Series 2000).

2.2. Model Structure

This study adopted a 5-stage model structure where each of the stages 2 to 5 is predicted using the preceding stages as potential inputs, as shown in Fig. 2. The multi-stage model structure is designed to simulate the cause-effect process in AGS reactors, where the influent characteristics and operational parameters affect the biomass concentration due to growth and decay of the microbial community. The biomass concentration and the SBR operation directly affect the biomass settling properties, which in turn affects the granule growth. All the previous parameters and interactions affect the removal efficiency and the effluent wastewater quality. Each of the parameters in stages 2 - 5 were predicted using a separate model, except for the F/M ratio that was calculated using the influent organics and biomass concentrations then added as input for stages 4 and 5. The multi-stage structure also adds versatility during model development as it allows using different inputs for each output.
Fig 2
  1. Download: Download high-res image (257KB)
  2. Download: Download full-size image

Fig. 2. Multi-stage model structure (stage 1 contains parameters after multicollinearity reduction).

In this study, three algorithm alternatives were individually used for simulating the AGS process: ANN, SVR and ANFIS. The outputs of individual models were combined as inputs to ensemble algorithms using five different alternative methods: ANN, SVR, ANFIS, arithmetic mean (E-AVG), and weighted average (E-WAVG). In total, each output was predicted eight times using the individual and ensemble alternatives. The best performing algorithm out of the eight alternatives was chosen for the final model. The ensemble algorithms were denoted with the prefix “E-”. Fig. 3 shows the algorithm choice approach.
Fig 3
  1. Download: Download high-res image (188KB)
  2. Download: Download full-size image

Fig. 3. Algorithm alternatives flowchart: each model output is predicted using eight different algorithms.

The dataset of 475 days was divided into 404 days for developing the models and 71 days for evaluation. The evaluation dataset was completely isolated and was only used after the models were trained and chosen. Fig. 4 shows the data divisions for the algorithms used in this study. The model development data (404 days) was divided according to the requirements of the algorithm being trained. The ANN and E-ANN models had a data division scheme of 70% for training, 15% for test and 15% for validation, which corresponded to 284, 60 and 60 days, respectively. The SVR and E-SVR models utilized the full 404 days for training. The ANFIS and E-ANFIS models used 85% of the data for training and 15% for validation, which corresponded to 344 and 60 days, respectively. The E-AVG and E-WAVG ensembles are not machine learning algorithms; thus, they did not require training and validation.
Fig 4
  1. Download: Download high-res image (387KB)
  2. Download: Download full-size image

Fig. 4. Data division for model development and evaluation.

2.3. Data Pre-Processing

The dataset collected for this study consisted of 475 days. Datasets used for training machine learning algorithms need to undergo a cleaning process that mainly removes outliers, fills missing points, randomizes the dataset, and normalizes all the data features to the same scale. In this study, outliers were removed during data collection, and missing data points were filled using linear regression imputation (Lakshminarayan et al., 1999).
Randomization is done to remove the effect of phased operation and the use of multiple reactors when the dataset is split into training and evaluation datasets. The statistical properties of the training and evaluation datasets must be as close as possible to ensure proper evaluation of the models. Table 2 shows the maximum, minimum, mean and coefficient of variation for the training and evaluation datasets.

Table 2. Statistical properties of the training and evaluation dataset.

ParameterMax.Min.MeanCoef. Of Var.
Train.Eval.Train.Eval.Train.Eval.Train.Eval.
Influent COD (mg/L)8758744512871518335230690.560.51
Influent NH4-N (mg/L)23420153681291350.320.28
Influent PO43− (mg/L)124833648470.380.33
Influent Flowrate (L/d)74.8174.8111.0311.0362.6360.950.280.31
Volume (L)19194.54.517.4917.160.250.28
Influent pH9.068.470.006.627.177.190.080.04
OLR (kg COD/m3)33.0826.873.484.4711.069.700.660.57
HRT (h)9.679.675.925.927.697.790.100.09
Exchange Ratio (%)0.560.560.350.350.500.500.090.09
Superficial Air Vel (cm/s)331.561.562.512.530.180.18
Temperature (°C)24.123.712.414.320.520.50.090.09
Settling time (min)30303313.1414.170.410.40
Aeration time (min)2212211631631871850.130.13
MLSS (mg/L)25157244117792485796674460.660.61
MLVSS (mg/L)19303186755232015632960230.610.57
SVI5 (mL/g)44624120221131170.560.49
SVI30 (mL/g)278137182173750.480.39
Granule Size (μm)95293066764404680.470.44
F/M Ratio12.144.210.540.922.071.890.510.34
Effluent COD (mg/L)42272940092101363.033.11
Effluent NH3-N (mg/L)1161150024311.191.13
Effluent PO43− (mg/L)512700881.141.09
Following randomization, each feature in the training dataset is normalized to the scale of (0 - 1) by dividing the feature by its maximum value. The evaluation dataset was normalized using the training maximum to keep the evaluation dataset unseen during the model development; therefore, the normalized values might slightly exceed one if the maximum of the evaluation dataset was higher than that of the training dataset.

2.4. Feature Selection

The choice of model inputs has a significant effect on model performance. The dataset collected contained input parameters that are correlated at varying degrees and contributed differently to each of the outputs. Each output had a pool of parameters to choose its inputs from using feature selection methods such as multicollinearity reduction and RReleifF algorithms.
Linearly correlated input parameters reduce the orthogonality of the model, which is also known as multicollinearity (Alin, 2010). Multicollinearity is a source of overfitting during training and results in models with low reliability (Read & Belsley, 1994). The level of multicollinearity in a set of parameters can be measured using the variance inflation factor (VIF), as shown in Eq. (1). It is generally accepted that VIF values of 5 and below are accepted for regression problems. If the VIF is larger than 5, the parameter with the highest VIF is removed, and the test is repeated till the maximum VIF is 5 or less.(1)VIF=11Ri2
After multicollinearity is reduced and redundant parameters are removed, the remaining parameters were sorted according to a weight calculated for each parameter using the RReliefF algorithm. The RReliefF algorithm is one of the filter methods for feature selection that assigns weights to input parameters based on their effect on the output using the k-nearest neighbours approach for input-output instances, where higher weights correspond to more important inputs (Urbanowicz et al., 2018). Weights are calculated based on three probabilities at the nearest instances: a different input value at nearest outputs, a different output value at nearest inputs, and a different output value when there is a difference in the input value. Detailed mathematical formulation and the algorithm structure can be found in (Robnik-Šikonja & Kononenko, 1997). Inputs that are more consistent with the nearest neighbours in explaining the variation in outputs receive higher weights. The RReliefF algorithm, being one of the filter methods, carries the advantage that it is not affected by the induction algorithms applied to the raw data (data pre-processing) (Urbanowicz et al., 2018). This allows the chosen inputs to be used with different machine learning algorithms with confidence.
The number of nearest neighbours (k) used in this study was determined by calculating the input weights as k is increased from 1 to 500. Weights were used at k = 200, where the results have stabilized, as shown in Fig. 5.
Fig 5
  1. Download: Download high-res image (1MB)
  2. Download: Download full-size image

Fig. 5. Stability of input scores (y-axes) with varying k values (x-axes).

2.5. Artificial neural networks

Artificial neural networks mimic the way neurons work in the human brain to perform complex operations. The ANNs type used in this study, feed-forward neural networks, utilize an error minimization algorithm to tune the network weights (Fernando & Shamseldin, 2009; Sammut & Webb, 2016). Well designed and trained neural networks can achieve outstanding prediction accuracies; however, this comes at the expense of long training time as the error minimization functions are generally slow to converge. Additionally, large training datasets are needed to reach high prediction accuracies without overfitting. Neural networks can also overfit if the inputs are not well selected or the layers architecture is not well designed (Lawrence & Giles, 2000; Ye et al., 2020).
In this study, Bayesian Regularization was used as the objective function for error minimization using a linear formulation of squared errors and network weights (Foresee & Hagan, 1997). The network architectures were selected by training all ANN combinations of (1-3) hidden layers and (1-10) hidden nodes in each layer. This approach resulted in the training of 8880 neural networks for the 8 outputs. The best performing network selected for each of the outputs was the one with the most accurate prediction (lowest error) and with similar training and test performance. These conditions ensured that the chosen networks did not overfit or underfit.

2.6. Adaptive neuro-fuzzy inference systems

The adaptive neuro-fuzzy inference systems (ANFIS) is used to simulate complex processes with measurement uncertainty. It is a universal approximator that utilizes logical rules to reach an output through human-like reasoning (Jang, 1993). In ANFIS, membership functions are used to map numerical inputs to fuzzy sets. A learning algorithm, similar to the back-propagation algorithm, is used to minimize the errors by optimizing the ANFIS parameters.
Each of the outputs in this study was predicted using a separate ANFIS model. The clustering method used was Grid Partitioning, as it allows for choosing the desired membership functions. Grid partitioning, however, assigns a fuzzy rule for each input-membership function combination, which exponentially increases the number of rules and the computational requirement for training the models.

2.7. Support vector regression

Support vector regression (SVR) is an algorithm, based on the statistical learning theory, that uses the structural risk minimization (SRM) method to minimize the modelling error and maintain low model complexity (Smola & Schölkopf, 2004; Vapnik, 2000). The nonlinear SVR model formula is shown in Eq. (2) for an input vector x, output vector y, and N number of samples.(2)y=i=1Nwiφi(x)+bwhere wi is a weight vector, b is bias, φi is a nonlinear kernel function that maps the training data to a higher dimensional feature space, making it possible to linearize the model (Smola & Schölkopf, 2004). A Gaussian kernel was used as it is easier to tune than other functions, and it can also handle complex error boundaries (Goyal & Ojha, 2011).
Three hyperparameters need tuning in SVR: the kernel scale (γ), box constraint (C) and the error band (ε). The kernel scale (γ) determines how much the kernel function will detect variation in the input vectors. It is inversely proportional to the sensitivity of the kernel function to input variation. The SVR model can underfit if the kernel function is not sensitive enough to detect changes in inputs and can overfit if the kernel function was too sensitive to detect the smallest variation in inputs. The box constraint (C) is a regularization factor needed by the SRM to control the penalty on large prediction residual errors. It represents the trade-off between the training error and model complexity, where small C values will result in poor predictions, and large values will cause overfitting. Finally, the error band represents the space where the predictions can be made around the actual measured values. Tighter error bands will result in more accurate predictions at the expense of the model complexity. More details on the mathematical formulation of SVR can be found in (Awad & Khanna, 2015; Cristianini and Shawe-Taylor, 2000; Smola & Schölkopf, 2004).

3. Results and Discussion

3.1. Aerobic Granular Sludge Performance

The reactors simulated in this study were operated for data collection for a collective of 475 days. Periods of stable operation have been observed along with some disruptions due to biomass washout. The average (± standard deviation) influent COD, NH4+-N, and PO43− concentrations in the reactors are 3309±1838, 130±41, and 48±18 mg/L, respectively. The systems exhibited stable organics and nutrients removal throughout the duration of the experiment, with an average COD, NH4+-N, and PO43− removal efficiencies of 96±8, 81±18, and 84±18%, respectively. Aerobic granular sludge has been proven to have consistent good removal of organics and nutrients (de Kreuk et al., 2005; Iorhemen et al., 2020; Nancharaiah & Reddy, 2018). The stratification of aerobic, anoxic, and anaerobic microbial communities has also been observed, resulting in better nitrogen and phosphorus removal (Wang et al., 2008; Yilmaz et al., 2008).
The average MLSS and MLVSS concentrations were 7888±5158 and 6284±3810mg/L, respectively. The average MLSS/MLVSS ratio was 80%, which is a typical value in aerobic treatment reactors. The average SVI5 and SVI30 were 114±63, and 73±34 mL/g, respectively, demonstrating the fast settling of the granules. Granules are considered to have good settling properties when SVI30 is below 100 mL/mg (Hamza et al., 2018; Liu et al., 2007). The average ratio of granulation in AGS reactors, calculated by the SVI30/SVI5 (Hamza et al., 2018), was 64%, and the average granule diameter was 445±206 µm, which showed that the biomass was mostly granular. The presence of some floccular biomass and fluctuation in settling properties were expected due to the variation in F/M ratio (2±1) (Hamza et al., 2018).

3.2. Feature Selection

The ANN, SVR and ANFIS models that were used to predict outputs 1, 2 and 3 (Fig. 3) were developed using the operation and performance dataset, which was collected from the laboratory (Table 2). Feature selection was necessary to overcome the multicollinearity between different parameters and remove the inputs that adversely affect the model performance. The feature selection methods used in this study were able to reduce the level of multicollinearity and identify the most effective parameters for each output, which resulted in the elimination of some parameters. Calculating the VIF for the inputs of each stage resulted in the reduction of inputs by 4, 5, 8, and 7 for stages 2, 3, 4, and 5, respectively. The results of the multicollinearity reduction are shown in Table 3. The level of multicollinearity is accepted once the maximum VIF is below 5. The data collection plan was quite thorough in the selection of parameters to measure. This resulted in a high level of multicollinearity in the initial dataset due to the close relationships between parameters, such as the OLR-COD, or the flowrate-reactor volume-HRT (Price, 1998).

Table 3. Multicollinearity reduction.

Model stageMax. VIF before reductionMax. VIF after reductionInitial number of inputsFinal number of inputsNumber of inputs removed
Stage 2990.554.631394
Stage 31078.92.4915105
Stage 41132.13.1118108
Stage 51141.94.2819127
Further dimensionality reduction was applied using the RReliefF algorithm to rank the inputs for each output. Unlike the VIF test that was input dependent, the RReliefF weights depended on the relationship between the inputs and outputs, which resulted in different input weights for each of the outputs even within the same stage. Table 4 shows the final inputs used for the ANN, SVR, and ANFIS models. In Table 4, the ANN was denoted by “N”, SVR was denoted by “S”, and ANFIS was denoted by “A”. Table 4 shows, for each output, the models in which each of the predictors was used, for example, the influent NH4-N, influent PO43−, OLR and HRT were used as inputs for the MLVSS ANFIS model. The ranking done using the RReliefF algorithm was used for the three modelling algorithms. Selected inputs were chosen from the ranked list and noting the model performance until the model stopped improving or was unable to complete the training due to computational limitation (in ANFIS models).

Table 4. Algorithms in which each of the inputs was used for each output.

ParameterOutputs
MLSSMLVSSSVI5SVI30Granule SizeEffluent CODEffluent NH4-NEffluent PO43−
Influent NH4-N (mg/L)N - S - AN - S - AN - S - AN - S - AN - S - AN - S - AN - S - AN - S - A
Influent PO43− (mg/L)N - S - AN - S - AN - SN - SN - SN - S - AN - S - AN - S - A
Volume (L)N - S - AN - SN - SN - S - AN - S - A
Influent pHN - S - AN - SN - SN - SN - SN - SN - S
OLR (kg COD/m3)N - S - AN - S - A
HRT (h)N - S - AN - S - AN - S - AN - S - AN - S - AN - S - AN - S - AN - S
Superficial Air Vel. (cm/s)AN - S - AN - S - AN - S - AN - S - AN - S - A
Temperature (°C)N - SN - SN - SN - S - AN - S - AN - S - AN - S
Settling time (min)SN - S - AN - S - AN - SN - S - AN - S
MLVSS (mg/L)N - S - AN - S - AN - SN - S - AN - S - A
SVI5 (mL/g)N - S - A
SVI30 (mL/g)N - S - A
Granule Size (μm)N - S - ANN - S - A
F/M RatioN - SNN - S
The ANFIS model was restricted by the number of inputs to be used as is was found that adding more inputs would significantly increase the required training time and CPU usage. ANFIS was limited to a maximum of six inputs selected according to the RReliefF ranking, except for the MLVSS, where the best performance was achieved with four inputs and the effluent ammonia where seven inputs had to be used to achieve acceptable performance.
Feature selection was found to be the most important step in the development, as it has improved the performance of the ANN, SVR, and ANFIS models by raising the overall evaluation average R2 from 93%, 85%, and 83% to 94.2%, 92.4% and 85.6% respectively using the same data and model structure.

3.3. Model Development

The ANN model was trained using the selected inputs in Table 4. The network architectures for each ANN model are shown in Table 5. All ANN models had three hidden layers with different combinations of hidden nodes. Table 5 also shows the tuned SVR hyperparameters. There is a large variation in the values of C, γ and ε between the models. The available literature explains the individual effect of each of the hyperparameters on the performance of the trained SVR models and the risk of overfitting. However, the hyperparameters have a combined effect on the performance. The values of C, γ and ε, were optimized within the SVR training process using Bayesian optimization to minimize 5-fold cross-validation error. The optimization of the hyperparameters results in the best possible prediction accuracy without significant overfitting. Finally, the ANFIS models were all assigned two input membership functions. It was difficult to increase the number of membership functions due to computational limitations that resulted in failing to train the models. Different membership functions were tested, and the best-performing ones were chosen for the final model. Triangular-type membership functions were used for the MLSS, MLVSS, SVI30 and effluent NH4-N (Eq. 3).(3)y={0,xaxaba,axbcxcb,bxc0,xcwhere a, b and c are constants determined through the ANFIS training. The rest of the ANFIS models used Gaussian Combination membership functions (Eq. 4).(4)y=e(xμ)22σ2where σ is the standard deviation, and µ is the mean of the training data. The average training performance of the ANN model for all outputs was 96%, 0.03, and 3.3% for R2, nRMSE and sMAPE, respectively. The average training performance of the SVR model for all outputs was 95.8%, 0.026, and 1.9% for R2, nRMSE and sMAPE, respectively. The average training performance of the ANFIS model for all outputs was 92.5%, 0.047, and 4.6% for R2, nRMSE and sMAPE, respectively. The overall performance of SVR was better than ANN in terms of nRMSE and sMAPE. The ANFIS model was considerably less accurate than the ANN and SVR, which can be attributed to the computational limitation on the number of inputs and membership functions that were used.

Table 5. ANN architectures, SVR hyperparameters and ANFIS membership functions.

OutputANN ArchitecturesSVR HyperparametersANFIS Membership Function
Cγε
Empty Cell
MLSS (mg/L)6-4-31.7631.83021.33E-04Triangular
MLVSS (mg/L)5-4-9244.468.65940.0004098Triangular
SVI5 (mL/g)6-3-149.9594.12360.0001425Gaussian Combination
SVI30 (mL/g)2-7-40.591945.20620.013026Triangular
Granule Size (μm)2-9-8121.551.49490.0003605Gaussian Combination
Effluent COD (mg/L)7-1-17.90072.77127.226E-06Gaussian Combination
Effluent NH4-N (mg/L)6-6-11.25351.06690.0007429Triangular
Effluent PO43− (mg/L)5-1-117.6132.870.0002448Gaussian Combination
The outputs of the ANN, SVR and ANFIS models were used as inputs to five ensemble alternatives: E-ANN, E-SVR, E-ANFIS, E-AVG and E-WAVG. The E-AVG was the arithmetic mean of the three inputs, which resulted in an overall average training performance of 97.3%, 0.027, and 2.8% for R2, nRMSE and sMAPE, respectively. E-WAVG was a weighted average of the three inputs using the training R2 of the ANN, SVR and ANFIS for each output as relative weights. This approach had the advantage of favouring the more accurate inputs which provided an improvement over E-AVG when there was a large difference in accuracy between the ANN, SVR and ANFIS. The overall average performance of the E-WAVG was 97.8%, 0.024, and 2.4% for R2, nRMSE and sMAPE, respectively. The E-ANN, E-SVR, and E-ANFIS are machine learning-based ensembles where the ANN, SVR and ANFIS outputs were used as inputs. Table 6 shows the architecture, hyperparameters and membership functions of the E-ANN, E-SVR, and E-ANFIS, respectively. These algorithms were much simpler in their development as they were intended to use three versions of the true output to make the predictions. Also, the number of inputs was much less than the original three models. Maintaining a simple model was essential to ensure that overfitting was minimal, considering the small number of inputs (three) and that the inputs are variants of the same parameters that are already close to the desired solution. All neural networks had one hidden layer with a single neuron providing a training performance of 98.7%, 0.016, and 1.8% for R2, nRMSE and sMAPE respectively. The E-SVR hyperparameters provide insight into the performance of the algorithm, where large C values indicate larger penalties on errors in all outputs. However, the γ values were also much higher than the SVR model, indicating that the kernel function is not as sensitive to the variation in the inputs. The overall average training performance of the E-SVR was 98.7%, 0.016, and 1.6% for R2, nRMSE and sMAPE, respectively. The E-ANFIS models had 2 Gaussian Combination membership functions with an overall training performance of 94.3%, 0.031, and 1.9% for R2, nRMSE and sMAPE, respectively.

Table 6. E-ANN architectures, E-SVR hyperparameters and E-ANFIS membership functions.

OutputE-ANN ArchitecturesE-SVR HyperparametersE-ANFIS Membership Function
Cγε
Empty Cell
MLSS (mg/L)1274.693.12670.0001948Gaussian Combination
MLVSS (mg/L)1969.8416.1720.0006375Gaussian Combination
SVI5 (mL/g)1932.21222.880.0005993Gaussian Combination
SVI30 (mL/g)1286.8438.5910.0071388Gaussian Combination
Granule Size (μm)1892.45377.840.0002951Gaussian Combination
Effluent COD (mg/L)1998.2115.860.0001844Gaussian Combination
Effluent NH4-N (mg/L)1886.38329.180.0002088Gaussian Combination
Effluent PO43− (mg/L)1942.59231.490.0012618Gaussian Combination

3.4. Model Performance

The developed algorithms in this study (ANN, SVR, ANFIS, E-ANN, E-SVR, E-ANFIS, E-AVG, E-WAVG) were all validated using the evaluation dataset that was isolated before developing the models. Table 7 shows the overall evaluation performance averaged for all outputs for each algorithm. It was found that the E-ANN provided the best performance in terms of R2, nRMSE, and sMAPE. The E-SVR and E-WAVG provided a close performance to the E-ANN, but the E-ANFIS was not able to reach the same level of performance.

Table 7. Overall average evaluation performance.

MetricANNSVRANFISE-ANNE-SVRE-ANFISE-AVGE-WAVG
R294.2%92.4%85.6%95.2%94.5%80.3%94.6%95%
nRMSE0.0370.0430.0620.0340.0360.0810.0370.035
sMAPE4.2%4.6%7.7%3.8%4%6.4%4.5%4.2%
Although E-ANN outperformed the other ensemble algorithm in the overall performance, the E-WAVG was able to provide a somewhat better prediction accuracy for the granule size with an R2 of 89% as opposed to the E-ANN with an R2 of 85%. The ensemble models did not provide an improvement over the ANN for the prediction of the effluent COD, where the ANN provided an R2 of 99.65% as opposed to the 99.3% R2 of the E-ANN. The best performing models for each of the outputs were selected for the final model as shown in Table 8. Fig. 6 shows the diagonal plots of the final selected models.

Table 8. Final model performance using the best performing algorithms.

OutputChosen AlgorithmR2 (%)nRMSEsMAPE (%)
TrainingEvaluationTrainingEvaluationTrainingEvaluation
MLSS (mg/L)E-NN97.80%96.11%0.0310.0363.06%4.05%
MLVSS (mg/L)E-NN97.58%94.30%0.0310.0432.91%4.56%
SVI5 (mL/g)E-NN98.53%95.89%0.0170.0282.29%3.16%
SVI30 (mL/g)E-NN95.81%92.19%0.0250.0302.93%3.49%
Granule Size (μm)E-WAVG97.06%88.75%0.0380.0732.54%3.78%
Effluent COD (mg/L)NN99.65%99.65%0.0090.0083.06%5.63%
Effluent NH4-N (mg/L)E-NN99.89%99.53%0.0080.0210.92%1.98%
Effluent PO43− (mg/L)E-NN99.72%98.96%0.0100.0181.85%2.88%
Fig 6
  1. Download: Download high-res image (866KB)
  2. Download: Download full-size image

Fig. 6. Diagonal plots of the final model predictions vs. target measured values using the evaluation dataset.

After using the best performing ensemble algorithms, the final model improved the overall prediction accuracy over the ANN model, the best performing single algorithm, by raising the overall average R2 from 94.2% to 95.2% and reducing the overall average nRMSE from 0.037 to 0.032, and the overall average sMAPE from 4.2% to 3.7%. The most significant improvement was observed in the granule size, where the R2 was increased from 81% to 88.2%.
It was found that even though the E-SVR was close to the E-ANN in terms of prediction accuracy of the evaluation data, it suffered from a slightly higher level of overfitting in some of the outputs where the difference between the training and evaluation predictions was larger than that of E-ANN. The E-ANFIS did not provide predictions as accurately as the other ensembles, as there was not enough distinction in the input parameters for the ANFIS to be able to develop a reliable rule-base.
Table 9 compares the model developed in this study to other machine learning models in the literature using the prediction R2 of the validation datasets as a performance indicator. The comparison shows that this study can achieve prediction accuracies that are similar to those made by AGS machine learning models with much larger datasets (Zaghloul et al., 2020). It can also be observed that the performance of the model developed in this study achieves prediction accuracies that are higher than AGS models that were developed with small datasets (Gong et al., 2018; Mahmod & Wahab, 2017). Other CAS models with small datasets resulted in predictions that are consistent with AGS models (El-Din et al., 2004; Manu & Thalla, 2017). The small size of test datasets can result in less reliable models that do not provide consistent results, which is demonstrated in the results of Mahmod and Wahab (2017), where the training and testing R2 were 78% and 91.17%, respectively (Mahmod & Wahab, 2017). SVR and ANNs were found to perform at the same level of accuracy in other CAS and AGS models, which is consistent with the models developed in this study (Gong et al., 2018; Guo et al., 2015; Mahmod & Wahab, 2017; Seshan et al., 2014).

Table 9. Comparison between the dataset size and prediction R2 (%) of this study and other machine learning models for AGS and CAS.

StudyAlgorithmModelDataset Size (days)BiomassEffluent
MLSSMLVSSSVI5SVI30Granule SizeCODNH4-NTKNTNPO43−
This studyEnsembleAGS47596.194.395.992.288.7599.799.5--99.0
(Zaghloul et al., 2020)ANFISAGS292087.586.696.395.681.598.599.6--86.7
SVR99.999.999.999.899.899.999.9--99.7
(Zaghloul et al., 2018)ANNAGS288699.599.699.699.099.2100.099.9--99.9
(Gong et al., 2018)ANNAGS205 (136)-----90.0--81.0-
(Mahmod & Wahab, 2017)ANNAGS21-----91.2----
(Manu & Thalla, 2017)ANFISCAS88-------72.0--
SVR-------82.5--
(Guo et al., 2015)ANNCAS305--------47.0-
SVR--------46.0-
COD dataset was 205 days, TN dataset was 136 days.

4. Multi-Stage Model Structure

AGS machine learning models in the literature are all single-stage models where a group of inputs is used to predict the final effluent quality parameters without considering the process sequence (Gong et al., 2018; Mahmod & Wahab, 2017). Two-stage models were designed to improve the model structure with success (Zaghloul et al., 2020). The multi-stage model structure makes the model developed in this study more versatile than other machine learning models as it considers the effect of influent characteristics on the biomass properties and the consequent effect on the effluent quality. The multi-stage model structure also provides the ability to identify the source of predicted effluent quality issues, where the different biomass properties that were predicted at the same instance can be analyzed. This mitigates the disadvantage of the application of black-box models that can be difficult to use for process interpretation (Newhart et al., 2019).

5. Failure prediction

The model developed in this study can accurately predict the performance of AGS reactors under varying operational and influent conditions. The model provides a tool that can be used to forecast the reactor behaviour during operation, which will guide the operators on experimental design. Fig. 7 shows the predictions made for a portion of the evaluation dataset (chronologically ordered and obtained from the same reactor) plotted with failure thresholds. Operators can utilize such figures to identify sources of process failures and potential causes.
Fig 7
  1. Download: Download high-res image (937KB)
  2. Download: Download full-size image

Fig. 7. Measured vs predicted values with the local treated effluent regulations.

The local treated effluent standards were set as thresholds for this study. Maximum effluent COD, NH4-N, and PO43− were set to 20 mg/L, 10 mg/L, and 0.5 mg/L, respectively. The threshold for the granule size was set to 200 µm, below which the biomass would be considered floccular (Liu et al., 2010), which indicates either structural integrity failure of the granule or failure to achieve a state of granulation. The MLSS threshold was chosen for this study to be 4000 mg/L. The MLSS dropping below the threshold would indicate a washout of the biomass due to poor settling. The settling properties can also be predicted by the SVI values. Additionally, the SVI30/SVI5 indicates the percentage or granulation inside the reactor, as defined by Kocaturk and Erguder (2016). The threshold for the SVI30/SVI5 was set to 50% for this study.
It can be observed that a failure to remove NH4-N has occurred between samples 218 and 232, where the effluent concentration reached 28 mg/L. The MLSS plot shows a rapid decline in the MLSS concentration, which entailed a loss of the slow-growing nitrifying bacteria; thus, the delayed effect of poor NH4-N removal while the biomass recovered. Further analysis of the results shows that there was a drop in the granule size and the granulation ratio inside the reactor, indicating that a partial washout has occurred, followed by a rapid growth of heterotrophic biomass in floccular form due to the abundance of organics (F/M ratio was disturbed due to the loss of biomass).
The influent COD was reduced from around 7000 mg/L to 4500 mg/L after the biomass washout instance. This improved the observed COD removal efficiency as the new heterotrophic growth was able to handle the influent organics. The effluent COD concentration was above the allowed threshold as the reactor was being operated to treat high organic content wastewater, where the effluent wastewater was to be polished before disposal. The aerobic biological process, although successful in removing most of the organic load, was unable to bring the COD concentrations below the required limits (Hamza et al., 2018).

6. Conclusion

A machine learning model was developed for AGS reactors using a combination of neural networks, support vector regression and adaptive neuro-fuzzy inference systems. Feature selection methods were applied, and a five-stage model structure was adopted. This study shows that proper feature selection and combining multiple machine learning algorithms in an ensemble can improve the performance of data-driven models when the available dataset is small. The two feature selection methods were applied to reduce the dimensionality of the regression problem and reduce the multicollinearity of the input data. Combining multiple algorithms using simple neural networks or weighted average ensembles reduced the levels of over and under-fitting of the individual algorithms. The modular nature of the model structure allowed the use of best performing models, out of the eight alternatives, for each output. The model developed in this study was able to predict the behaviour of AGS reactors and provide insight into the process by explaining the causes of predicted failures.

Declaration of Competing Interest

None.

Acknowledgements

The authors would like to acknowledge the National Science and Engineering Research Council of Canada for funding this research.

References

Cited by (40)

  • Application of machine learning techniques to model a full-scale wastewater treatment plant with biological nutrient removal

    2022, Journal of Environmental Chemical Engineering
    Citation Excerpt :

    This paper presents a machine learning ensemble model for a full-scale BNR process. It is the first application of the modular multi-stage model structure developed by Zaghloul et al. [34] in full-scale using real wastewater, after its success in laboratory-scale applications using synthetic wastewater. It is also the first model to predict 15 process parameters using machine learning.

  • Multidisciplinary characterization of nitrogen-removal granular sludge: A review of advances and technologies

    2022, Water Research
    Citation Excerpt :

    The attempt in NRGS began by using various models, such as adaptive neuro-fuzzy inference system, support vector regression, and artificial neuron network, in AGS to achieve process-simulation and performance-prediction (Liang et al., 2020; Zaghloul et al., 2018; 2020). In recent studies, an ensemble of machine learning was developed with a good fit to model AGS reactors: A five-stage machine learning model was proposed, which could predict the performance of AGS and explain the reason for predicted failures (Zaghloul et al., 2021). Although these properties are artificially classified as physical-, chemical-, biological- and systematic, their inherent associations cannot be overlooked.

  • Machine learning in natural and engineered water systems

    2021, Water Research
    Citation Excerpt :

    Then the outputs were combined as inputs for the subsequent ensemble algorithms, of which the best outputs were determined to be the final prediction. This approach improved the performance of ML models with a small dataset by combining the advantages of different algorithms instead of discussing and selecting the best single one (Zaghloul et al., 2020b). In addition to the common indicators mentioned above, some other parameters related to WWTPs have also been predicted.

  • Performance prediction of trace metals and cod in wastewater treatment using artificial neural network

    2021, Computers and Chemical Engineering
    Citation Excerpt :

    Global development of supervision tools and reliable real-time control was applied to the wastewater treatment process. The ANNs have proven to be the universal tool for forecasting and prediction where the much-desired input to output is determined by external and supervised adjustment of the system parameters (Zaghloul et al., 2021). Artificial neural network (ANNs) are designed to solve problems with unknown required output (unsupervised learning algorithms) and known output (supervised learning algorithms) (Fernando and Surgenor, 2017).

View all citing articles on Scopus
#
Deceased.
View Abstract