Abstract
The increasing penetration of renewable energy resources and reduced system inertia pose risks to frequency security of power systems, necessitating the development of fast frequency regulation (FFR) methods using flexible resources. However, developing effective FFR policies is challenging because different power system operating conditions require distinct regulation logics. Traditional fixed-coefficient linear droop-based control methods are suboptimal for managing the diverse conditions encountered. This paper proposes a dynamic nonlinear P-f droop-based FFR method using a newly established meta-reinforcement learning (meta-RL) approach to enhance control adaptability while ensuring grid stability. First, we model the optimal FFR problem under various operating conditions as a set of Markov decision processes and accordingly formulate the frequency stability-constrained meta-RL problem. To address this, we then construct a novel hierarchical neural network (HNN) structure that incorporates a theoretical frequency stability guarantee, thereby converting the constrained meta-RL problem into a more tractable form. Finally, we propose a two-stage algorithm that leverages the inherent characteristics of the problem, achieving enhanced optimality in solving the HNN-based meta-RL problem. Simulations validate that the proposed FFR method shows superior adaptability across different operating conditions, and achieves better trade-offs between regulation performance and cost than benchmarks.
WITH the rapid advancement of the global power system transformation, the traditional synchronous generators in power systems are gradually being replaced by renewable energy resources such as solar and wind energy. This shift results in lower system inertia and reduced primary frequency regulation (PFR) reserves, which threaten power system frequency security [
Due to their mechanical characteristics, synchronous generators primarily achieve PFR through fixed-coefficient linear droop control. In contrast, flexible resources, connected to the grid via inverters, offer faster and more precise frequency response [
The above-mentioned FFR services all adopt static control laws with fixed droop curves, which lack adaptability to varying operating conditions. Considering the superior control flexibility of new resources, some dynamic FFR strategies have been proposed to enhance transient frequency dynamics and improve the cost-efficiency of frequency regulation. An asymmetric droop coefficient optimization method is proposed in [
Some existing studies leverage reinforcement learning (RL) methods to develop dynamic FFR policies for flexible resources. Well-trained RL controllers can avoid online optimization and reduce the computational burden during practical implementation. Reference [
Existing RL-based FFR methods typically assume that system frequency dynamics can be modeled as a single Markov decision process (MDP). However, these dynamics actually vary significantly with the size of load disturbances. Given the randomness and diversity of load disturbances in actual power systems, it is more appropriate to consider the optimal FFR problem as achieving fast adaption to any MDP sampled from a distribution. To date, traditional RL algorithms often solve each MDP independently and can hardly realize the rapid adaption required in the FFR context. Meta-reinforcement learning (meta-RL) is a promising method to solve this problem, whose core idea is to learn data-efficient RL algorithms capable of producing policies that adapt well to various MDPs with minimal data [
In summary, research gaps can be summarized as follows. Firstly, existing FFR methods are predominantly based on linear static droop control schemes or dynamic approaches burdened by heavy computation or communication demands. These methods fail to fully utilize the potential of flexible resources and lack adaptability to varying sizes of random load disturbances. Secondly, while RL methods offer potential for adaptive FFR with low computational burden during implementation, their effectiveness is limited by imperfect problem formulations in existing literature and concerns about stability guarantees. To address these gaps, this paper develops a dynamic nonlinear P-f droop-based FFR method using a newly established meta-RL approach to ensure both adaptability and stability. The proposed FFR method is applicable to various flexible resources integrated into power systems through power electronic inverters, presenting a possible solution for enhancing frequency stability in future power systems with high penetration of inverter-based generation. The main contributions can be summarized as follows.
1) The dynamic nonlinear FFR optimization problem is formulated as a frequency stability-constrained meta-RL problem, which leverages flexible resources to achieve stable FFR with fast adaptation to randomly varying load disturbances.
2) A hierarchical neural network (HNN) structure is proposed to parameterize dynamic nonlinear droop-based FFR policies with a theoretical frequency stability guarantee, converting the proposed meta-RL problem into a more tractable form.
3) A two-stage algorithm is specifically designed to solve the HNN-based meta-RL problem with enhanced optimality.
4) Simulations demonstrate that the proposed method provides FFR policies with superior adaptability, achieving a better balance between frequency quality and regulation cost compared with benchmark methods.
The rest of this paper is organized as follows. Section II describes the system model for controller optimization and simulation and the system model for theoretical analysis. Section III first models the optimal FFR as a stochastic optimization and then reformulates it into a constrained meta-RL problem. The HNN architecture is proposed in Section IV, and Section V presents the two-stage algorithm to solve the HNN-based meta-RL problem. Numerical simulation results are presented in Section VI. Finally, conclusions are drawn in Section VII.
Considering that a control area may contain numerous flexible resources, this paper adopts the centralized optimization and distributed execution scheme for convenience of application and supervision in practical power systems. During the optimization stage, we design an aggregated FFR controller, denoted as , based on the system frequency response (SFR) model of the target control area, as illustrated in

Fig. 1 Block diagram of target control area.
All variables in
The system dynamics can be represented as a set of state-space functions as:
(1a) |
(1b) |
(1c) |
(1d) |
(1e) |
(1f) |
(1g) |
where is the state vector; and is the deadband width for generators.
In this paper, the aggregated FFR controller designed in subsequent sections takes only local available information as inputs. During the application, the aggregated controller is decomposed into distributed controllers by multiplying different participation factors depending on the regulation capacity of each flexible resource. Distributed controllers work with the locally measured frequency, which can be different with the CoI frequency considered in the SFR model. Consequently, the transient frequency stability analysis should consider the specific network structure and frequency differences across the target control area, such that the frequency stability is guaranteed during the practical operation.
We denote the target control area by an undirected connected graph , where is the set of lossless buses indexed by or , and is the set of transmission lines indexed by . Each bus is equipped with an equivalent generator and an equivalent flexible resource unit aggregated from the connected resources. System dynamics model in [
(2a) |
(2b) |
where , , , , , , and are the local frequency, phase angle, distributed FFR control signal, net load disturbance, system inertia, load-damping coefficient, and droop coefficient of synchronous generator of bus , respectively; and is the susceptance of line . All variables in (2) represent deviations from their nominal values. Note that the AGC is omitted in (2) because it operates at a slower pace in practical power systems and therefore has limited effect on the transient frequency stability. The generator dynamics are simplified as a classical second-order model widely used in existing literature. The inverter dynamics are omitted for its much smaller time constant than the generator.
A static droop controller for flexible resources without linearity requirement can be denoted as , taking only local frequency measurement as input. Theorem 1 gives a sufficient condition for the frequency stability of system (2) under , which will be applied in the subsequent dynamic controller optimization.
Theorem 1 [
Proofs can be found in [
In this section, we first describe the optimal FFR problem under random load disturbances from the perspective of stochastic optimization in Section III-A. Then, we show that this classical formulation can be tricky to solve if the control logic is complex. To address this, we reformulate the problem as a set of MDPs in Section III-B. Finally, in Section III-C, we formulate a frequency stability-constrained meta-RL problem to solve these MDPs.
In this subsection, we formulate the optimal FFR problem as a stochastic optimization. To be specific, the frequency quality and regulation cost are balanced through a weighted sum type objective function, and the controller is defined as a function of local measurements, including the system frequency, to facilitate distributed execution:
(3) |
where is the objective consisting of three terms , , and , which denote the control cost, the summed square error of CoI frequency deviations, and the CoI frequency nadir (or peak), respectively; , , and are the weight coefficients; is the expectation taken with respect to the random variable , and follows a distribution ; is the duration when the frequency is outside the frequency deadband after each disturbance; is the index of timesteps with small intervals such as 0.1 s; and and are the total upward and downward regulation capacities of flexible resources in the target control area, respectively.
This optimization formulation casts the optimal FFR problem as an infinite-dimensional optimization, making it challenging to solve. Traditional linear droop control methods simplify the problem by assuming that is a linear function of the system frequency, i.e., , where a single coefficient is tuned to handle all scenarios. This reduction transforms the infinite-dimensional problem into a one-dimensional problem. However, this simplification leads to suboptimal performance for the following reasons. First, the linearity specification restricts the control flexibility. Flexible resources can provide nonlinear frequency responses, which have been shown in [
To address the above concerns, this paper removes the static linear type restriction and instead optimizes dynamic nonlinear controllers that can adapt rapidly to each specific disturbance event encountered during operation, although the disturbance sizes cannot be directly observed. To manage the infinite-dimensional challenge, we first reformulate the FFR optimization as a set of MDPs.
For any fixed load disturbance , the FFR process can be formulated as an MDP denoted as a 5-tuple [
(4) |
The FFR controller can be denoted as a policy , which maps states to action probabilities. We consider policies parameterized by neural network parameters . A policy can interact with the MDP and collect episodes of length . This paper defines an episode as a duration that starts when a load disturbance occurs and the system frequency deviates from a specific deadband, i.e., 0.015 Hz, and ends when the frequency is restored within the deadband.
Considering the stochastic load disturbances, the FFR optimization problem is actually a set of MDPs. Assume that the load disturbance occurring in different episodes follows a distribution . Then, during each episode, the controller encounters an MDP sampled from a distribution with shared , but with different dynamics .
RL algorithms are widely used to find an optimal policy for an MDP, which maximizes the expected accumulated return within an episode based on the collected episodes. An RL algorithm can be defined as a function (5) [
(5) |
In traditional RL algorithms, is typically chosen as classical RL algorithms, such as deep Q-learning (DQN) [
To achieve fast adaption to each disturbance event without destabilizing the system, we formulate a frequency stability-constrained meta-RL problem. Instead of a static policy , we optimize a parameterized RL algorithm that can quickly learn the optimal for each MDP sampled from the distribution , which lasts for only one episode. With the objective to maximize the expected return during the whole life of the dynamic policy , the stability-constrained meta-RL model can be formulated as (6), which includes two simultaneous learning loops.
(6) |
where denotes the expectation taken with respect to ; and is an RL algorithm parameterized by . The outer loop learns , while the inner loop, which shares a similar mechanism with traditional RL algorithms, applies the algorithm to dynamically update the control policy based on the interacting experience with MDPs. An update at timestep of an episode can be expressed as:
(7) |
where the dataset is collected within the current episode under , and it is reset at the beginning of a new episode. An ideal must be data-efficient to enable effective adaption within each episode.
Based on this meta-RL framework, we introduce non-linearity through neural network-based inner-loop policy and achieve dynamic control logic adjustment with the outer-loop RL algorithm , which is capable of rapid adaption.
Due to the frequency stability constraint in the stability-constrained meta-RL model (6), existing approaches, such as those in [
In (6), each MDP differs in load disturbance , leading to different dynamics . However, different dynamics also share many similarities such as the generator and inverter dynamics, indicating that optimal policies of different may also share common features. Accordingly, we divide the policy parameters into fixed network parameters and variable external parameters . Specifically, we model the common parts of different policies with the bottom neural network parameterized by , and represent an RL algorithm with another top neural network, which adapts as a variable input of the policy network. The two parts form an HNN structure, as illustrated in

Fig. 2 HNN structure with stability guarantee.
The bottom neural network named executor can be expressed as , which takes the frequency as input and produces the aggregated FFR signal . As common parameters of all policies, is optimized during training and then fixed during implementation, while is always updated by the top neural network during both stages. The executor is designed as an unconstrained monotonic neural network (UMNN) [
(8) |
where is a neural network with the input and parameters .
First, the partial derivative of w.r.t. , which is a scalar function, is parameterized as the neural network , whose output is forced to be positive through the exponential linear unit (ELU) increased by 1. The output control signal is then calculated as the integral of the positive partial derivative. In this way, the parameterized policy is always monotonically increasing w.r.t. the system frequency . Namely, the executor can be considered as a cluster of monotonic droop controllers indexed by with zero output at . Note that the network constraint (8) poses no limitation on the structure of the bottom neural network with parameters , which can be arbitrarily complex, as long as we set a positive activation function for the final layer and add an integral layer after that.
Once the top neural network updates the output, the bottom neural network executes a different monotonic droop curve indexed by the new . Therefore, the top neural network is named as the selector. While the executor updates the output at each timestep , the selector works in an event-triggered mode, with the timestep of the trigger denoted as . The detailed explanation is deferred to Section IV-B. The input of the selector is an observation of the system states at timestep , which is chosen as . The top neural network is designed as a recurrent neural network (RNN). The first layer comprises gate recurrent units (GRUs) [
Constrained by (8), if we fix the output of the top neural network, the proposed HNN degenerates to a static monotonic controller. Based on this characteristic, we set the selector to work in an event-triggered mode with the following triggering condition:
(9) |
That is to say, the selector is triggered if and only if the frequency deviation gets worse.
Under the triggering condition (9), the selector dynamically adjusts the droop curve selection according to its observations during the frequency arrest stage. Then, the bottom neural network keeps executing the selected static droop curve until the frequency is settled and recovered, or another disturbance occurs, inducing a larger frequency deviation and triggering the selector to update . In any case, the whole network stays static and monotonic after the system frequency reaches the nadir or peak, which satisfies the sufficient condition for frequency stability described in Theorem 1.
The unrolled structure of the proposed HNN is given in

Fig. 3 Unrolled structure of proposed HNN.
At each evenly-spaced timestep , is measured, and the action , i.e., the control signal , is updated by the executor based on provided by the selector. A reward for the single timestep is then obtained from the environment.
As for the selector,

Fig. 4 Control logic comparison of different methods. (a) Method 1. (b) Method 2. (c) Proposed method.
The former analysis indicates that the network constraint (8) and the trigger condition (9) constitute a sufficient but not necessary condition for frequency stability. Consequently, the stability-constrained meta-RL problem (6) can be conservatively reformulated as follows.
(10a) |
(10b) |
Compared with (6), the stability constraint is replaced by network shape and trigger condition constraints that are much easier to handle.
The HNN-based meta-RL model (10) enables the optimization of a dynamic droop-based controller with a stability guarantee. Next, the goal is to solve the proposed HNN-based meta-RL problem. Inspired by [
We view the interaction process from different perspectives and reuse the experience collected by the HNN-based controller. From the view of the selector , the executor actions and rewards can be considered as a part of the environment dynamics. The training data collected during an episode for updating include the selector’s observation, action, and the reward for each trigger , which can be denoted as , where is the total trigger number of the selector within an episode. Then, from the view of the executor, the decision process of the selector can be treated as environment transitions. The system frequency and the selector’s action constitute the executor’s observation . The training data for the executor can be expressed as . After collecting the interaction experience of multiple episodes, any off-the-shelf RL algorithms can be used to train the network by mapping the experience buffers and to new parameters and , respectively. However, we observed that simultaneous training of both selector and executor from randomly initialized and leads to poor performance.
To optimize the training process and achieve high performance, we propose a two-stage algorithm, which is summarized in
Algorithm 1 : HNN-based meta-RL for optimal FFR |
---|
Initialize: , |
Executor training: |
for do |
Initialize an empty executor experience buffer |
for do |
Sample an MDP , and fix |
Collect timesteps of experience using |
end for |
Update based on |
end for |
United training: |
for do |
Initialize an empty executor experience buffer |
Initialize an empty selector experience buffer |
for do |
Sample an MDP |
Collect T timesteps of experience using and |
end for |
Update based on , and update based on |
end for |
Implementation: |
if then |
Begin an FFR episode, and initialize |
for timestep do |
Get an observation |
if then |
Break |
else |
if condition (9) is satisfied then |
Select |
end if |
Execute |
end if |
end for |
end if |
1) Executor training stage
At the first stage, only the executor is trained to get a cluster of diversified droop curves. Since the load disturbance is a key parameter for distinguishing different MDPs, we block the selector and set the selection to be . Note that although the disturbance cannot be measured during the application, it is available during training and is exclusively used at the executor training stage. Only executor experience is collected at this stage, based on which is iteratively updated.
2) United training stage
The selector network is activated at this stage, generating as the input of the executor trained at the first stage. The whole HNN interacts with the environment. The experience collected at this stage is reused to generate both and , and parameters and are simultaneously updated.
3) Implementation
The implementation part in
The executor training state before the united training has been empirically validated to improve the final performance significantly. Through
The effectiveness of the proposed HNN-based meta-RL model and the solution algorithm is validated via numerical simulations. The block diagram of the simulation system is shown in
Parameter | Value | Parameter | Value | Parameter | Value |
---|---|---|---|---|---|
9.2 s | 2.0 p.u. | 0.1 | |||
12 s | 0.3 s | 0.2 | |||
0.07 | 0.2 s | 0.15 | |||
0.015 | 0.5 | 0.5 | |||
0.03 | 24 |
The control interval of the optimized FFR controller is set to be 0.1 s. For more realistic simulations of practical systems, AGC in
The time required for the executor training and united training stages is 2 hours and 10 hours on average, respectively. During the implementation stage, the calculation time for the selector and the executor is 0.3 ms and 0.7 ms on average, respectively, which is fast enough for practical online applications.
Time-domain simulations on the system illustrated in

Fig. 5 Dynamics of FFR signals and frequencies under step load disturbances of different sizes. (a) FFR signals. (b) Frequencies.
To further show the adaptability of the proposed method, it is tested under consecutive step disturbances. Specifically, a 0.04 p.u. load disturbance and a 0.06 p.u. load disturbance occur at and s, respectively. The dynamics of FFR signals and frequencies under the consecutive step disturbances are shown in

Fig. 6 Dynamics of FFR signals and frequencies under consecutive step disturbances. (a) FFR signals. (b) Frequencies.
The curves in
This subsection compares the performance of the proposed method with the two benchmark FFR methods. Method 1 is static linear droop control with a typical droop value of 1%, whose droop curve is shown in

Fig. 7 Droop curves of two benchmark FFR methods for flexible resources. (a) Droop curve of method 1. (b) Droop curve of method 2.
The optimal control objective value in (3) and the proportion of the control cost term under various step load disturbances are listed in
(11) |
(p.u.) | Method 1 | Method 2 | Proposed | |||
---|---|---|---|---|---|---|
(%) | (%) | (%) | ||||
0.01 | -0.22 | 78 | -0.22 | 78 | -0.15 | 40 |
0.02 | -0.49 | 68 | -0.48 | 67 | -0.42 | 39 |
0.03 | -0.80 | 60 | -0.80 | 60 | -0.76 | 41 |
0.04 | -1.16 | 53 | -1.16 | 53 | -1.15 | 43 |
0.05 | -1.58 | 48 | -1.57 | 48 | -1.58 | 45 |
0.06 | -2.06 | 44 | -2.04 | 44 | -2.04 | 46 |
0.07 | -2.61 | 40 | -2.56 | 41 | -2.54 | 46 |
0.08 | -3.25 | 36 | -3.16 | 38 | -3.08 | 46 |
0.09 | -3.98 | 33 | -3.83 | 35 | -3.67 | 45 |
0.10 | -4.80 | 31 | -4.58 | 33 | -4.31 | 44 |
where is the objective value of method 1. The numerator is an absolute value because the objective values are all negative. The performance of different methods under various load disturbances is plotted in

Fig. 8 Performance of different methods under various load disturbances.
From
Compared with other methods, the proportion of obtained by the proposed method is higher under larger disturbances and lower under smaller disturbances, as shown in
The proposed algorithm has an executor training stage before the united training. To validate the effectiveness of the proposed algorithm, this subsection compares the performance of the proposed algorithm and another algorithm performing united training only (denoted as algorithm 2). The performance comparison of different algorithms is shown in

Fig. 9 Performance comparison of different algorithms.
It can be observed from
The objective of the optimal control problem is formulated as the weighted sum of different terms in (3) to balance the control cost and frequency deviations. Different values of weight coefficients , , and in (3) result in different trade-offs. This subsection takes the coefficient as an example to show the impact of weight coefficients on the optimization results of the proposed method. The value of is set to be 0.4, 0.1, and 0.025, respectively. The dynamics of frequencies and FFR signals after step load disturbances with size p.u. and 0.05 p.u. are plotted in

Fig. 10 Dynamics of frequencies and FFR signals after step load disturbances with size p.u. and 0.05 p.u.. (a) Dynamics of frequencies with p.u.. (b) Dynamics of FFR signals with p.u.. (c) Dynamics of frequencies with p.u.. (d) Dynamics of FFR signals with p.u..
A larger value denotes a higher cost of flexible resource-based FFR service. As shown in
Although the SFR model depicted in

Fig. 11 Block diagram of reheat and non-reheat generators.
We also compare the proposed method with the two benchmark methods as detailed in Section VI-C. Method 1 maintains its typical droop value of 1%. Method 2 and the proposed method undergo training using PPO and the proposed algorithm, respectively, under the modified SFR model. The control objective value under various load disturbances are presented in
(p.u.) | |||
---|---|---|---|
Method 1 | Method 2 | Proposed | |
0.01 | -0.28 | -0.23 | -0.11 |
0.02 | -0.57 | -0.49 | -0.33 |
0.03 | -0.89 | -0.79 | -0.64 |
0.04 | -1.25 | -1.15 | -1.03 |
0.05 | -1.66 | -1.56 | -1.47 |
0.06 | -2.11 | -2.02 | -1.97 |
0.07 | -2.61 | -2.53 | -2.51 |
0.08 | -3.15 | -3.10 | -3.09 |
0.09 | -3.73 | -3.82 | -3.72 |
0.10 | -4.41 | -4.69 | -4.43 |

Fig. 12 Performance comparisons of different methods in modified SFR model.
This paper investigates the flexible resource-based FFR optimization problem considering the guarantee of system frequency stability. A new meta-RL approach is proposed to realize dynamic nonlinear P-f droop-based FFR with rapid adaptability to different operating conditions.
We first formulate a frequency stability-constrained meta-RL problem, then reformulate it into a more tractable HNN-based form with the well-designed network constraint and trigger condition. A two-stage algorithm is proposed to enhance the optimality in solving the HNN-based meta-RL problem. Simulation results validate that the proposed method can adapt rapidly to different operating conditions with the system frequency stability guaranteed. Compared with benchmarks including static linear control and static nonlinear control methods, the proposed method achieves better trade-offs between frequency quality and regulation cost. Future research directions include the coordinated FFR optimization of multiple inter-connected control areas and the differentiated utilization of heterogeneous flexible resources in FFR.
References
R. W. Kenyon, M. Bossart, M. Marković et al., “Stability and control of power systems with high penetrations of inverter-based resources: an accessible review of current knowledge and open questions,” Solar Energy, vol. 210, pp. 149-168, Nov. 2020. [Baidu Scholar]
J. Boyle, T. Littler, S. M. Muyeen et al., “An alternative frequency-droop scheme for wind turbines that provide primary frequency regulation via rotor speed control,” International Journal of Electrical Power & Energy Systems, vol. 133, p. 107219, Dec. 2021. [Baidu Scholar]
F. Sattar, S. Ghosh, Y. J. Isbeih et al., “A predictive tool for power system operators to ensure frequency stability for power grids with renewable energy integration,” Applied Energy, vol. 353, p. 122226, Jan. 2024. [Baidu Scholar]
M. H. Marzebali, M. Mazidi, and M. Mohiti, “An adaptive droop-based control strategy for fuel cell-battery hybrid energy storage system to support primary frequency in stand-alone microgrids,” Journal of Energy Storage, vol. 27, p. 101127, Feb. 2020. [Baidu Scholar]
M. Mousavizade, F. Bai, R. Garmabdari et al., “Adaptive control of V2Gs in islanded microgrids incorporating EV owner expectations,” Applied Energy, vol. 341, p. 121118, Jul. 2023. [Baidu Scholar]
C. Christiansen and N. Hillmann. (2017, May). Feasibility of fast frequency response obligations of new generators. [Online]. Available: https://www.aemc.gov.au/sites/default/files/content/661d5402-3ce5-477 5-bb8a-9965f6d93a94/AECOM-Report-Feasibility-of-FFR-Obligations-of-New-Generators.pdf [Baidu Scholar]
L. Meng, J. Zafar, S. K. Khadem et al., “Fast frequency response from energy storage systems – a review of grid standards, projects and technical issues,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1566-1581, Mar. 2020. [Baidu Scholar]
National Grid Group. (2016, Mar.). Enhanced frequency response: frequently asked questions. [Online]. Available: https://www.nationalgrid.com/sites/default/files/documents/Enhanced%20Frequency%20Respon-se%20FAQs%20v5.0_.pdf [Baidu Scholar]
P. Du, N. V. Mago, W. Li et al., “New ancillary service market for ERCOT,” IEEE Access, vol. 8, pp. 178391-178401, Sept. 2020. [Baidu Scholar]
Y. Yuan, Y. Zhang, J. Wang et al., “Enhanced frequency-constrained unit commitment considering variable-droop frequency control from converter-based generator,” IEEE Transactions on Power Systems, vol. 38, no. 2, pp. 1094-1110, Mar. 2023. [Baidu Scholar]
M. F. M. Arani and Y. A. R I. Mohamed, “Cooperative control of wind power generator and electric vehicles for microgrid primary frequency regulation,” IEEE Transactions on Smart Grid, vol. 9, no. 6, pp. 5677-5686, Nov. 2018. [Baidu Scholar]
W. Cui, Y. Jiang, and B. Zhang, “Reinforcement learning for optimal primary frequency control: a Lyapunov approach,” IEEE Transactions on Power Systems, vol. 38, no. 2, pp. 1676-1688, Mar. 2023. [Baidu Scholar]
C. Zhao, U. Topcu, N. Li et al., “Design and stability of load-side primary frequency control in power systems,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1177-1189, May 2014. [Baidu Scholar]
Y. Liu, Y. Song, Z. Wang et al., “Optimal emergency frequency control based on coordinated droop in multi-infeed hybrid AC-DC system,” IEEE Transactions on Power Systems, vol. 36, no. 4, pp. 3305-3316, Jul. 2021. [Baidu Scholar]
Z. Ding, K. Yuan, J. Qi et al., “Robust and cost-efficient coordinated primary frequency control of wind power and demand response based on their complementary regulation characteristics,” IEEE Transactions on Smart Grid, vol. 13, no. 6, pp. 4436-4448, Nov. 2022. [Baidu Scholar]
E. Ekomwenrenren, J. W. Simpson-Porco, E. Farantatos et al. (2022, Aug.). Data-driven fast frequency control using inverter-based resources. [Online]. Available: https://arxiv.org/abs/2208.01761 [Baidu Scholar]
E. Ekomwenrenren, Z. Tang, J. W. Simpson-Porco et al., “Hierarchical coordinated fast frequency control using inverter-based resources,” IEEE Transactions on Power Systems, vol. 36, no. 6, pp. 4992-5005, Nov. 2021. [Baidu Scholar]
R. Chakraborty, A. Chakrabortty, E. Farantatos et al., “Hierarchical frequency control in multi-area power systems with prioritized utilization of inverter based resources,” in Proceedings of 2020 IEEE PES General Meeting, Montreal, Canada, Aug. 2020, pp. 1-5. [Baidu Scholar]
Q. Yang, L. Yan, X. Chen et al., “A distributed dynamic inertia-droop control strategy based on multi-agent deep reinforcement learning for multiple paralleled VSGs,” IEEE Transactions on Power Systems, vol. 38, no. 6, pp. 5598-5612, Nov. 2023. [Baidu Scholar]
Z. Yan, Y. Xu, Y. Wang et al., “Deep reinforcement learning-based optimal data-driven control of battery energy storage for power system frequency support,” IET Generation, Transmission & Distribution, vol. 14, no. 25, pp. 6071-6078, Dec. 2020. [Baidu Scholar]
J. Beck, R. Vuorio, E. Z. Liu et al. (2023, Jan.). A survey of meta-reinforcement learning. [Online]. Available: https://arxiv.org/abs/2301. 08028 [Baidu Scholar]
C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” International Conference on Machine Learning, Sydney, Australia, Aug. 2017, pp. 1126-1135. [Baidu Scholar]
Y. Duan, J. Schulman, X. Chen et al. (2016, Nov.). R
J. Li, T. Zhou, K. He et al., “Distributed quantum multiagent deep meta reinforcement learning for area autonomy energy management of a multiarea microgrid,” Applied Energy, vol. 343, p. 121181, Aug. 2023. [Baidu Scholar]
R. Huang, Y. Chen, T. Yin et al., “Learning and fast adaptation for grid emergency control via deep meta reinforcement learning,” IEEE Transactions on Power Systems, vol. 37, no. 6, pp. 4168-4178, Nov. 2022. [Baidu Scholar]
Q. Shi, F. Li, and H. Cui, “Analytical method to aggregate multi-machine SFR model with applications in power system dynamic studies,” IEEE Transactions on Power Systems, vol. 33, no. 6, pp. 6355-6367, Nov. 2018. [Baidu Scholar]
D. L. Poole and A. K. Mackworth, Artificial Intelligence. Cambridge, UK: Cambridge University Press, 2010. [Baidu Scholar]
V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529-533, Feb. 2015. [Baidu Scholar]
T. P. Lillicrap, J. J. Hunt, A. Pritzel et al. (2015, Sept.). Continuous control with deep reinforcement learning. [Online]. Available: https://arxiv.org/abs/1509.02971 [Baidu Scholar]
J. Schulman, F. Wolski, P. Dhariwal et al. (2017, Jul.). Proximal policy optimization algorithms. [Online]. Available: https://arxiv.org/abs/1707.06347 [Baidu Scholar]
A. Wehenkel and G. Louppe, “Unconstrained monotonic neural networks,” in Proceedings of 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, Jun. 2019, pp. 1545-1555. [Baidu Scholar]
K. Cho, B. van Merriënboer, D. Bahdanau et al. (2014, Sept.). On the properties of neural machine translation: encoder-decoder approaches. [Online]. Available: https://arxiv.org/abs/1409.1259 [Baidu Scholar]
K. Frans, J. Ho, and X. Chen. (2017, Oct.). Meta learning shared hierarchies. [Online]. Available: https://arxiv.org/abs/1710.09767 [Baidu Scholar]