Abstract
Peer-to-peer (P2P) energy trading in active distribution networks (ADNs) plays a pivotal role in promoting the efficient consumption of renewable energy sources. However, it is challenging to effectively coordinate the power dispatch of ADNs and P2P energy trading while preserving the privacy of different physical interests. Hence, this paper proposes a soft actor-critic algorithm incorporating distributed trading control (SAC-DTC) to tackle the optimal power dispatch of ADNs and the P2P energy trading considering privacy preservation among prosumers. First, the soft actor-critic (SAC) algorithm is used to optimize the control strategy of device in ADNs to minimize the operation cost, and the primary environmental information of the ADN at this point is published to prosumers. Then, a distributed generalized fast dual ascent method is used to iterate the trading process of prosumers and maximize their revenues. Subsequently, the results of trading are encrypted based on the differential privacy technique and returned to the ADN. Finally, the social welfare value consisting of ADN operation cost and P2P market revenue is utilized as a reward value to update network parameters and control strategies of the deep reinforcement learning. Simulation results show that the proposed SAC-DTC algorithm reduces the ADN operation cost, boosts the P2P market revenue, maximizes the social welfare, and exhibits high computational accuracy, demonstrating its practical application to the operation of power systems and power markets.
WITH the increasing penetration of distributed energy resources (DERs), battery energy storage (BES), and adjustable loads, the distribution networks face operational problems such as overloading, voltage overruns, and network losses. Under the unified management of distribution system operator (DSO), the active distribution network (ADN) [
Consequently, the optimal power dispatch problems for ADNs are usually formulated as mixed-integer nonlinear models [
Some devices such as DER and BES in ADNs may belong to independent individuals with different interest claims [
The effectiveness of P2P markets has been extensively studied and validated [
In the centralized scheme, a central entity (such as P2P operator or DSO) is responsible for coordinating energy trading and benefit distribution, with the advantage of maximizing the social welfare [
In recent years, there has been an increase in research on P2P markets. At the level of information interaction and market operation, most studies have primarily employed block chain [
Furthermore, the P2P markets encompass energy trading at the information layer, which requires secure transmission at the physical layer of the distribution network. A fully decentralized two-loop algorithm is proposed in [
However, the existing studies generally need to consider the control of the device governed by DSOs. The lack of transparency regarding the respective behaviors of DSO and prosumers may result in problems such as voltage overruns and network loss increase in the distribution network [
Although these studies provide valuable insights, they are constrained by several limitations, such as difficulties in privacy protection, ignoring distribution network constraints, and an insufficient consideration of the control of devices in ADN, as shown in
Reference | Privacy protection | Distribution network constraints | Control of devices |
---|---|---|---|
[ | |||
[ | |||
[ | |||
[ | |||
[ | |||
This paper |
Note: the symbol represents that the corresponding factor is considered; and the symbol represents that the corresponding factor is not considered.
With all the above, this paper establishes a soft actor-critic algorithm incorporating distributed trading control (SAC-DTC) based on data-driven (deep reinforcement learning (DRL) algorithm) and physical modeling (information-driven distributed algorithm) [
1) The coordinated optimization for the power dispatch of ADN and P2P energy trading is constructed as a Markov decision process (MDP) and formulated as a social welfare maximization problem. The agent can explore the dispatch strategy that minimizes the ADN operation cost and creates an environment conducive to conducting P2P energy trading under the stochastic and uncertain conditions.
2) This paper proposes an SAC-DTC algorithm based on data-driven and physical modeling to solve the above problems. This proposed SAC-DTC algorithm utilizes differential privacy noise to protect users’ information and price signals to effectively guide users’ behavior, thus coupling the coordinated optimization process of ADN and P2P markets, and ultimately reducing the ADN operation cost and increasing the P2P market revenue.
3) The proposed SAC-DTC algorithm is superior in real-time optimization and operation processes of power systems because of its fast computation speed and small node voltage error of the obtained results.
The remainder of this paper is organized as follows. Section II introduces the framework of distribution network that contains both ADN and P2P markets. Section III formulates the optimal power dispatch model of ADN and P2P energy trading model. The proposed SAC-DTC algorithm based on data-driven and physical modeling is presented in Section IV to coordinate the ADN and local P2P market. Section V conducts empirical case studies to evaluate the effectiveness of the proposed SAC-DTC algorithm. Finally, Section VI concludes this paper.
As shown in

Fig.1 Proposed framework applied to distribution network. (a) Overall framework. (b) Control areas of DSO and prosumers.
1) DSO: as shown in the red part of
2) Prosumers: as shown in the blue part of
The behavior of both DSO and prosumers causes changes in the network losses and node voltages of ADN. Therefore, an efficient coordination and control process between DSOs and prosumers is required to avoid problems such as over-regulation. The optimization process of the whole system seeks to minimize the ADN operation costs of and maximize the profits of all individuals in the P2P market, which is ultimately regarded as a social welfare maximization problem.
In a radial ADN connected to the external grid, the DSO is responsible for regulating the device in the ADN to ensure that the ADN meets the needs of all users while maintaining a safe and stable operating condition. All two-way users constitute a local P2P energy trading market, where each user can trade electricity and transmit it through the distribution network subjected to safety constraints.
1) The objective function for the optimal power dispatch of ADN is to minimize the regulation costs of OLTC, CB, and BES, costs of network losses, and cost of wind power and PV power curtailment, which is formulated as:
(1) |
(2) |
where is the total ADN operation cost at time t; is the unit cost of wind power and PV power curtailment; and are the unit regulation costs of CB and OLTC, respectively; is the grid electricity price; is the unit loss cost of BES; and are the tap positions of CB and OLTC at node i at time t, respectively; and are the switching losses of CB and OLTC at node i at time t, respectively [
2) The following constraints must be included in the optimization model to ensure the safe operation of the ADN with the P2P energy trading process.
(3) |
(4) |
(5) |
(6) |
where and are the inflow active and reactive power of branch b at time t, respectively; and are the active and reactive power losses from branches b to at time t, respectively; and are the active and reactive power of prosumers at node i at time t, respectively; , , and are the reactive power of CB, SVG, and DER at node i at time t, respectively; and are the active and reactive power of conventional loads at node i at time t, respectively; is the voltage amplitude at node i at time t; and are the minimum and maximum voltage levels of ADN, respectively; and are the resistance and reactance of branch b, respectively; and is the voltage reference value.
The OLTC, CB, SVG, DER, and ESS have their own constraints, which are depicted as (7)-(15), among which (7)-(9) are the operational constraints for the OLTC and CB; (10) and (11) are the operational constraints for the SVG and DER, respectively; and (12)-(15) are the constraints for the ESS.
(7) |
(8) |
(9) |
(10) |
(11) |
(12) |
(13) |
(14) |
(15) |
where is the base voltage of OLTC; is the voltage change per tap of OLTC; is the maximum number of OLTC operations; is the tap position of OLTC at time t, and and are its lower and upper bounds, respectively; is the reactive power change per tap of CB; is the maximum number of CB operations; is the tap position of CB at node i at time t, and and are its lower and upper bounds, respectively; and are the minimum and maximum reactive power of SVG, respectively; and are the maximum active and reactive power of DER at node i at time t, respectively; is the capacity of BES at node i at time t; is the charging/discharging efficiency; and are the charging and discharging power of BES at node i at time t, respectively, and and are their Boolean variables; and are the upper and lower bounds of the capacity of BES at node i at time t, respectively; and and are the maximum charging and discharging power of BES at node i, respectively.
P2P energy trading entities need a model for maximizing revenue internally. Prosumers have increasing marginal costs of electricity generation when they act as producers and decreasing marginal benefits of electricity use when they act as consumers. Therefore, the producers’ and sellers’ electricity consumption behaviors can be characterized by a quadratic function [
(16) |
(17) |
where is the total revenue of prosumer at node i at time t; is the function of power utility benefits of prosumer at node i at time t; , , and are the power utility parameters of prosumers, which are private information; and and are the marginal tariffs for active and reactive power at node i at time t, respectively.
In addition, the trading results need to satisfy the ADN security constraints as well as the market supply and demand balance constraints, which are shown as:
(18) |
(19) |
(20) |
(21) |
where is the active power of the prosumer’s own BES; and are the network active and reactive power losses from branches b to at time t caused by the P2P energy trading, respectively; is the amount of voltage amplitude change caused by the P2P energy trading; and are the upper and lower limits of active power regulation for prosumers, respectively; and and are the upper and lower limits of reactive power regulation for prosumers, respectively.
The ADN cannot access the specific power consumption information of prosumers for privacy protection and market fairness. Therefore, we decompose the original problem into multiple subproblems, thus facilitating the subsequent solution using a distributed approach.
The changes in active and reactive power for each prosumer impact the network losses and nodal voltages. Consequently, we incorporate all constraints into the electricity efficiency function for prosumer and differentiate it to determine the marginal tariffs for active and reactive power [
(22) |
(23) |
where and are the dual variables corresponding to the upper and lower voltage constraints at node i at time t, respectively; and are the dual variables corresponding to the active and reactive power balance constraints at node i at time t, respectively; and are the dual variables corresponding to the upper and lower active power constraints at node i at time t, respectively; and and are the dual variables corresponding to the upper and lower reactive power constraints at node i at time t, respectively.
During the ADN dispatching and P2P energy trading, if we do not consider the impact on the system, we may reach a trading and controlling result that violates the system operation constraints, ultimately leading to device failure or system instability. Therefore, we propose the SAC-DTC algorithm to coordinate the optimization process between the ADN and the P2P market to achieve the global optimum within a solution space that ensures the voltage levels safety. The objective is to minimize the ADN operation cost (including regulation costs of device and costs of network loss, etc.) and maximize the P2P market revenue, while ensuring the safe operation of the system.
The proposed SAC-DTC algorithm is a new type of algorithm by combining DRL algorithm and distributed control computing. The structure of the proposed SAC-DTC algorithm is shown in

Fig. 2 Structure of proposed SAC-DTC algorithm.
The optimization process of ADN and P2P market can be modeled using the MDP, as shown in

Fig. 3 Optimization process of ADN and P2P market using MDP.
First, the agent gives the optimal action of each device in the ADN based on the local state . Then, it calculates the network loss and node voltage in the ADN and issues the information to the P2P market. Subsequently, the prosumers adjust the output according to their interests and return the profits to ADN after differential privacy encryption processing. Finally, ADN calculates the reward value R based on (1) and (16), and then puts the data into the experience buffer pool to update the network parameters.
(24) |
The MDP consists of five key elements: state space s, action space a, state transfer probability , reward function , and discount factor , represented by .
For the reinforcement learning in continuous-discrete hybrid action space, assuming that there are n discrete devices, each with mn actions, the output action dimension of the state-action value function Q will be . The action dimension will grow exponentially as the number of devices n increases. If a separate Q value is estimated for each possible combination of actions, the data required to be calculated and stored will grow rapidly and fall into a curse of dimensionality. Therefore, inspired by [
(25) |
where is the Q value of device i; and and are the shared base value and the state parameters of device i, respectively.
During the training process, the formula for calculating the network target value is:
(26) |
where is the new state; is the temperature parameter used to control the contribution of entropy in the policy update; and are the continuous and discrete actions of the new state, respectively; and are the state-action value functions; and and are the strategy functions.
The parameters of the critic network are updated by minimizing the mean square error between the predicted Q value of the critic network and the target value y. Then, the parameters of actor network are updated by minimizing the loss function :
(27) |
(28) |
where represents the probability of taking action given state under the policy parameterized by .
Finally, the training network is slowly tracked by a soft update method:
(29) |
where or is the training network parameter; is the target network parameter; and is the soft update rate.
For the prosumers at each node, adjusting the active and reactive power during the energy trading process will bring changes to their benefits or costs as well as the node voltage and network loss. Therefore, in this paper, based on the dual ascent method of sensitivity calculation [
(30) |
(31) |
(32) |
(33) |
(34) |
(35) |
where is the state change matrix function; , , and are the linear mapping functions for the node voltage, active power loss of ADN, and reactive power loss of ADN, respectively; and and are the vectors of active power and reactive power adjustments during energy trading for the prosumers, respectively.
The mapping function can be fitted based on a neural network, but this requires a separate neural network for each variable and constraint, which will also fall into the curse of dimensionality. Therefore, in this paper, we utilize the sensitivity matrix as an equivalent alternative to the mapping function and validate the accuracy of the solution. The original problem (16)-(23) in the P2P market is transformed into a quadratic programming problem as:
(36) |
(37) |
where the matrix parameters , , , and are extracted from the objective function for prosumers shown in (17); and are the upper and lower matrices of node voltages, respectively; and are the matrices of upper and lower active power for prosumers, respectively; and and are the matrices of upper and lower reactive power for prosumers, respectively.
The dual function is:
(38) |
The lower definitive bound for this problem is taken at . By disregarding the constant term and changing the sign of the objective function, the maximization problem is transformed into a minimization problem to obtain the dyadic problem as:
(39) |
where is the vector of Lagrangian multipliers associated with the constraints.
When the original problem is convex, we can find the gradient of A for its dual problem and obtain:
(40) |
(41) |
For any two points in the dual function d, the value of function d is at least the linear approximation minus a quadratic term, which depends on the distance between the two points and Lipschitz constant. Lipschitz constant should be the largest eigenvalue of . This is because in the quadratic functions, the largest eigenvalue of the matrix determines the maximum curvature. According to [
After solving the dual problem (40), the optimal power for each prosumer is obtained, which is then substituted into (16) to obtain the maximum welfare for each prosumer . To protect the privacy of prosumers, a differential privacy technique is used. This involves adding random noise to the data through Laplace-distributed sampling , as expressed in (42). The noise is then returned to the agent for learning as part of the reward.
(42) |
where is the sampling sensitivity, representing the maximum variation that may experience; and is the privacy strength parameter, whose value is smaller for stronger privacy protection.
Since the noise is random and its mathematical expectation is 0, the effects of the noise are canceled when aggregating large amounts of data. This ensures that the statistical estimation of total P2P market revenues remains accurate.
The detailed calculation procedure of the proposed SAC-DTC algorithm is explained in
Algorithm 1 : detailed calculation procedure of proposed SAC-DTC algorithm |
---|
S1:Initialize , , , , , time step hour, and the maximum time step hours |
S2: Repeat |
S3: for do |
S4: ~ |
S5: Calculate power flow |
S6: Release , , , , and locational marginal price (LMP) to prosumers |
S7: Solve (16) for each prosumer |
S8: Update and LMP |
S9: Update , and , and store in |
S10: end for |
S11: Update and using (25)-(27) |
S12: Update using (28) |
S13: end |
This paper evaluates the proposed SAC-DTC algorithm using the IEEE 33-node system. We assumes that five prosumers participate in the P2P energy trading, and the basic parameters of the utility function can be found in [
Three operation models are set up to compare the effectiveness in reducing the ADN operation cost and improving the P2P market revenue.
Model 1: without considering voltage constraints, the ADN operation cost is minimized as the objective function for optimization, and the P2P market is optimized with the objective function of maximizing the operation revenue.
Model 2: based on Model 1, the system voltage constraints are further considered, and the P2P market is optimized for operation based on the method in [
Model 3: as illustrated in Section III, the voltage constraints are considered and the total social welfare of the sum of P2P market revenue and ADN operation cost is taken as the objective function, the joint optimization is run using the proposed SAC-DTC algorithm.
There have been several studies applying DRL algorithms to the power system domain. In this subsection, we focus on comparing the SAC algorithm with the widely-used DDPG and PPO algorithms. All the three DRL algorithms utilize an actor-critic architecture. The DDPG algorithm employs a deterministic strategy network (actors) to directly predict actions and evaluates the expected returns of these actions through a value network (critics). In contrast, the PPO algorithm ensures the stability and convergence of policy updates by introducing a clip loss function that limits the magnitude of these updates, while the SAC algorithm encourages broader exploration by increasing policy entropy. The hyperparameters are shown in Tables II-IV. The Ornstein-Uhlenbeck noises are provided in [
Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|
Architecture of actor and critic networks | [256, 256] | Activation function | ReLU |
Optimizer | Adam | Discount factor | 0.99 |
Actor learning rate |
1×1 | 24 hours | |
Critic learning rate |
5×1 | 1 hour | |
Minibatch size | 64 | Evaluation frequency | 3 |

Fig. 4 Training performance using SAC-DTC, DDPG-DTC, and PPO-DTC algorithms in IEEE 33-node system. (a) Total reward. (b) ADN operation cost. (c) P2P market revenue.
Hyperparameter | Value | |
---|---|---|
SAC algorithm | DDPG algorithm | |
Target network update rate | 0.005 | 0.005 |
Replay buffer size |
5×1 |
5×1 |
Entropy coefficient | Auto | |
Noise type | Ornstein-Uhlenbeck |
Hyperparameter | Value |
---|---|
Value function coefficient | 0.5 |
Generalized advantage estimation Lambda | 0.95 |
Clip ratio | 0.2 |
Number of epochs | 3 |
Gradient clipping | 0.1 |
The results of the three operation models are presented in
Model | ADN operation cost (CNY) | P2P market revenue (CNY) | Number of voltage violations | The maximum voltage difference (p.u.) |
---|---|---|---|---|
Model 1 | 1054 | 6491 | 188 | 0.12350 |
Model 2 | 1615 | 5561 | 0 | 0.07934 |
Model 3 | 1481 | 6283 | 0 | 0.07254 |

Fig. 5 Node voltage comparison of three operation models. (a) Model 1. (b) Model 2. (c) Model 3.

Fig. 6 Comparison of ADN operation costs and P2P market revenues.(a) ADN operation cost. (b) P2P market revenue.
From a system security perspective, during hours 8-20, Model 1 exhibits the largest voltage fluctuation deviation, with several node voltages crossing the lower limit at various time points. However, during other periods, the system does not experience voltage crossings. Model 2 and 3 are able to operate safely throughout all periods because the voltage constraints are considered in the optimization process of the ADN and P2P markets. In Model 2, the optimization process of ADN and P2P markets operates independently, and the lower bound of system voltage is generally higher than that in Model 3, but the maximum voltage variation is greater.
Additionally,

Fig. 7 Comparison results of LMPs. (a) PLMP in Model 1. (b) QLMP in Model 1. (c) PLMP in Model 2. (d) QLMP in Model 2. (e) PLMP in Model 3. (f) QLMP in Model 3.
From Figs.
As shown in
In Model 3, based on the proposed SAC-DTC algorithm, the encrypted information can be shared between the ADN and the prosumers. The system security regulation cost can be effectively shared with the ADN and each prosumer. As can be observed in
Overall, the joint optimization of ADN and P2P markets can reduce the feeder voltage drop and avoid violating the voltage constraints. Meanwhile, the economic cost paid by the market members to ensure system security in Model 3 is much smaller than that in Model 2 and close to that in Model 1. For all members in the ADN, the system security status should be the primary. Therefore, this paper concludes that trading a smaller economic cost for safer system operation is reasonable.
In order to verify the accuracy and scalability of the proposed SAC-DTC algorithm, its computational results are compared with those of the mixed-integer second-order cone programming (MISOCP) based centralized algorithm in IEEE 33-, 69-, and 136-node systems, with the specific settings shown in
System | Number of prosumers | Number of CBs | Number of SVGs | Number of DERs | Number of ESSs |
---|---|---|---|---|---|
IEEE 33-node | 5 | 2 | 2 | 2 | 1 |
IEEE 69-node | 28 | 4 | 5 | 2 | 1 |
IEEE 136-node | 40 | 6 | 8 | 8 | 2 |
Algorithm | System | ADN operation cost (CNY) | P2P market revenue (CNY) | Computation time (s) |
---|---|---|---|---|
MISOCP-based centralized algorithm | IEEE 33-node | 5752 | 24556 | 20.90 |
IEEE 69-node | 26248 | 73784 | 143.00 | |
IEEE 136-node | 45380 | 207560 | 501.00 | |
Proposed SAC-DTC algorithm | IEEE 33-node | 6072 | 24508 | 4.27 |
IEEE 69-node | 27108 | 73743 | 14.50 | |
IEEE 136-node | 47937 | 207440 | 33.80 |
In the IEEE 33-, 69-, and 136-node systems, the ADN operation costs obtained by the proposed SAC-DTC algorithm are slightly higher than those by the MISOCP-based centralized algorithm, while the P2P market revenues are almost the same. Based on the characteristics of distributed computation, the proposed SAC-DTC algorithm can effectively protect the privacy information, and the computation speed is 4.9, 9.8, and 14.8 times faster than that of MISOCP-based centralized algorithm in IEEE 30-, 69-, 136-node systems, respectively.
In addition, the linearization of voltage mapping in the proposed SAC-DTC algorithm may introduce some errors in the final results. Therefore, we perform power flow calculations using the proposed SAC-DTC algorithm and MISOCP-based centralized algorithm, and compare the node voltages. As shown in
System | Error in voltage magnitude (%) | ||
---|---|---|---|
Maximum | Minimum | Average | |
IEEE 33-node | 0.249 |
1.10×1 | 0.0281 |
IEEE 69-node | 0.269 |
1.60×1 | 0.0528 |
IEEE 136-node | 0.258 |
1.51×1 | 0.0735 |

Fig. 8 Error in voltage magnitude. (a) IEEE 33-node system. (b) IEEE 69-node system. (c) IEEE 136-node system.
Therefore, the proposed SAC-DTC algorithm is more suitable for the fast-changing operation of ADN and P2P markets to meet the real-time demand.
In this paper, an SAC-DTC algorithm based on data-driven and physical modeling is proposed to tackle the coordinated optimization problem of ADN and P2P energy trading, which is analyzed via simulation based on the real-world dataset. The results show that the proposed SAC-DTC algorithm can effectively reduce the ADN operation cost and increase the P2P market revenue under the network security constraints. Specifically, the conclusions can be summarized as follows.
1) Compared with mainstream DDPG algorithms with the same network structure, the agents trained by the proposed SAC-DTC algorithm perform better in terms of the training speed and convergence results.
2) Considering the network security constraints, the proposed SAC-DTC algorithm for coordinated optimization can reduce the ADN operation cost by 8.3% and increase the P2P market revenue by 12.9% on average.
3) In the IEEE 33-, 69-, and 136-node systems, the proposed SAC-DTC algorithm effectively protects the privacy of prosumers although the ADN operation cost is slightly higher compared with the traditional MISOCP-based centralized algorithm. The computation speed is 4.9, 9.8, and 14.8 times faster, and the voltage magnitude error is no more than 0.08% on average.
Future work will investigate additional scenarios, including the integration of electrical, thermal, and cooling energy systems for consumers. Moreover, efforts will be made to deploy larger-scale networks utilizing multiple agents to manage complex coordination tasks involving both discrete and continuous actions. Additionally, there will be a focus on optimizing the linearization process to further enhance accuracy.
References
K. H. M. Azmi, N. A. M. Radzi, N. A. Azhar et al., “Active electric distribution network: applications, challenges, and opportunities,” IEEE Access, vol. 10, pp. 134655-134689, Dec. 2022. [Baidu Scholar]
Z. Yang, H. Li, and H. Zhang, “Dynamic collaborative pricing for managing refueling demand of hydrogen fuel cell vehicles,” IEEE Transactions on Transportation Electrification, vol. PP, no. 99, pp. 1-1, Mar. 2024. [Baidu Scholar]
S. Gorbachev, A. Mani, L. Li et al., “Distributed energy resources based two-layer delay-independent voltage coordinated control in active distribution network,” IEEE Transactions on Industrial Informatics, vol. 20, no. 2, pp. 1220-1230, Feb. 2024. [Baidu Scholar]
Z. Deng, M. Liu, H. Chen et al., “Optimal scheduling of active distribution networks with limited switching operations using mixed-integer dynamic optimization,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 4221-4234, Jul. 2019. [Baidu Scholar]
H. Zhu and H. Liu, “Fast local voltage control under limited reactive power: optimality and stability analysis,” IEEE Transactions on Power Systems, vol. 31, no. 5, pp. 3794-3803, Sept. 2016. [Baidu Scholar]
H. Liu and W. Wu, “Online multi-agent reinforcement learning for decentralized inverter-based volt-var control,” IEEE Transactions on Smart Grid, vol. 12, no. 4, pp. 2980-2990, Jul. 2021. [Baidu Scholar]
Q. Yang, G. Wang, A. Sadeghi et al., “Two-timescale voltage control in distribution grids using deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2313-2323, May 2020. [Baidu Scholar]
W. Shi, D. Zhang, X. Han et al., “Coordinated operation of active distribution network, networked microgrids, and electric vehicle: a multi-agent PPO optimization method,” CSEE Journal of Power and Energy Systems, doi: 10.17775/CSEEJPES.2022.05640 [Baidu Scholar]
M. Mansourlakouraj, M. Gautam, H. Livani et al., “Multi-stage volt/var support in distribution grids: risk-aware scheduling with real-time reinforcement learning control,” IEEE Access, vol. 11, pp. 54822-54838, May 2023. [Baidu Scholar]
A. R. Sayed, C. Wang, H. I. Anis et al., “Feasibility constrained online calculation for real-time optimal power flow: a convex constrained deep reinforcement learning approach,” IEEE Transactions on Power Systems, vol. 38, no. 6, pp. 5215-5227, Nov. 2023. [Baidu Scholar]
D. Cao, W. Hu, X. Xu et al., “Deep reinforcement learning based approach for optimal power flow of distribution networks embedded with renewable energy and storage devices,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 5, pp. 1101-1110, Sept. 2021. [Baidu Scholar]
H. Liu, W. Wu, and Y. Wang, “Bi-level off-policy reinforcement learning for two-timescale volt/var control in active distribution networks,” IEEE Transactions on Power Systems, vol. 38, no. 1, pp. 385-395, Jan. 2023. [Baidu Scholar]
K. Schmitt, R. Bhatta, M. Chamana et al., “A review on active customers participation in smart grids,” Journal of Modern Power Systems and Clean Energy, vol. 11, no. 1, pp. 3-16, Jan. 2023. [Baidu Scholar]
W. Tushar, T. K. Saha, C. Yuen et al., “Peer-to-peer trading in electricity networks: an overview,” IEEE Transactions on Smart Grid, vol. 11, no. 4, pp. 3185-3200, Jul. 2020. [Baidu Scholar]
W. Tushar, C. Yuen, T. K. Saha et al., “Peer-to-peer energy systems for connected communities: a review of recent advances and emerging challenges,” Applied Energy, vol. 282, p. 116131, Jan. 2021. [Baidu Scholar]
Y. Zou, Y. Xu, X. Feng et al., “Transactive energy systems in active distribution networks: a comprehensive review,” CSEE Journal of Power and Energy Systems, vol. 8, no. 5, pp. 1302-1317, Sept. 2022. [Baidu Scholar]
D. Han, L. Wu, X. Ren et al., “Calculation model and allocation strategy of network usage charge for peer-to-peer and community-based energy transaction market,” Journal of Modern Power Systems and Clean Energy, vol. 11, no. 1, pp. 144-155, Jan. 2023. [Baidu Scholar]
T. AlSkaif, J. L. Crespo-Vazquez, M. Sekuloski et al., “Blockchain-based fully peer-to-peer energy trading strategies for residential energy systems,” IEEE Transactions on Industrial Informatics, vol. 18, no. 1, pp. 231-241, Jan. 2022. [Baidu Scholar]
F. Luo, Z. Y. Dong, G. Liang et al., “A distributed electricity trading system in active distribution networks based on multi-agent coalition and blockchain,” IEEE Transactions on Power Systems, vol. 34, no. 5, pp. 4097-4108, Sept. 2019. [Baidu Scholar]
X. Yang, G. Wang, H. He et al., “Automated demand response framework in ELNs: decentralized scheduling and smart contract,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 50, no. 1, pp. 58-72, Jan. 2020. [Baidu Scholar]
J. Zheng, Z. Liang, Y. Li et al., “Multi-agent reinforcement learning with privacy preservation for continuous double auction-based P2P energy trading,” IEEE Transactions on Industrial Informatics, vol. 20, no. 4, pp. 6582-6590, Apr. 2024. [Baidu Scholar]
L. Chen, N. Liu, and J. Wang, “Peer-to-peer energy sharing in distribution networks with multiple sharing regions,” IEEE Transactions on Industrial Informatics, vol. 16, no. 11, pp. 6760-6771, Nov. 2020. [Baidu Scholar]
L. Wang, Y. Zhang, W. Song et al., “Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading,” IEEE Transactions on Industrial Informatics, vol. 18, no. 3, pp. 1447-1457, Mar. 2022. [Baidu Scholar]
J. Li, C. Zhang, Z. Xu et al., “Distributed transactive energy trading framework in distribution networks,” IEEE Transactions on Power Systems, vol. 33, no. 6, pp. 7215-7227, Nov. 2018. [Baidu Scholar]
W. Tushar, B. Chai, C. Yuen et al., “Energy storage sharing in smart grid: a modified auction-based approach,” IEEE Transactions on Smart Grid, vol. 7, no. 3, pp. 1462-1475, May 2016. [Baidu Scholar]
W. Lee, L. Xiang, R. Schober et al., “Direct electricity trading in smart grid: a coalitional game analysis,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 7, pp. 1398-1411, Jul. 2014. [Baidu Scholar]
N. Liu, X. Yu, C. Wang et al., “Energy sharing management for microgrids with PV prosumers: a Stackelberg game approach,” IEEE Transactions on Industrial Informatics, vol. 13, no. 3, pp. 1088-1098, Jun. 2017. [Baidu Scholar]
Y. Liu, C. Sun, A. Paudel et al., “Fully decentralized P2P energy trading in active distribution networks with voltage regulation,” IEEE Transactions on Smart Grid, vol. 14, no. 2, pp. 1466-1481, Mar. 2023. [Baidu Scholar]
Y. Jia, C. Wan, and B. Li, “Strategic peer-to-peer energy trading framework considering distribution network constraints,” Journal of Modern Power Systems and Clean Energy, vol. 11, no. 3, pp. 770-780, May 2023. [Baidu Scholar]
Y. Zhou, B. Zhang, C. Xu et al., “A data-driven method for fast AC optimal power flow solutions via deep reinforcement learning,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1128-1139, Nov. 2020. [Baidu Scholar]
D. Cao, J. Zhao, W. Hu et al., “Data-driven multi-agent deep reinforcement learning for distribution system decentralized voltage control with high penetration of PVs,” IEEE Transactions on Smart Grid, vol. 12, no. 5, pp. 4137-4150, Sept. 2021. [Baidu Scholar]
P. Giselsson, “Improved dual decomposition for distributed model predictive control,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 1203-1209, Oct. 2014. [Baidu Scholar]
C. Feng, B. Liang, Z. Li et al., “Peer-to-peer energy trading under network constraints based on generalized fast dual ascent,” IEEE Transactions on Smart Grid, vol. 14, no. 2, pp. 1441-1453, Mar. 2023. [Baidu Scholar]
A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183-202, Jan. 2009. [Baidu Scholar]
Y. Zhang and Z. Ren, “Optimal reactive power dispatch considering costs of adjusting the control devices,” IEEE Transactions on Power Systems, vol. 20, no. 3, pp. 1349-1356, Aug. 2005. [Baidu Scholar]
Z. Li, L. Wu, and Y. Xu, “Risk-averse coordinated operation of a multi-energy microgrid considering voltage/var control and thermal flow: an adaptive stochastic approach,” IEEE Transactions on Smart Grid, vol. 12, no. 5, pp. 3914-3927, Sept. 2021. [Baidu Scholar]
X. Chang, Y. Xu, H. Sun et al., “Privacy-preserving distributed energy transaction in active distribution networks,” IEEE Transactions on Power Systems, vol. 38, no. 4, pp. 3413-3426, Jul. 2023. [Baidu Scholar]
P. Sunehag, G. Lever, A. Gruslys et al. (2017, Jun.). Value-decomposition networks for cooperative multi-agent learning. [Online]. Available: https://arxiv.org/abs/1706.05296 [Baidu Scholar]
Z. Zhang, C. Dou, D. Yue et al., “Regional coordinated voltage regulation in active distribution networks with PV-BESS,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 2, pp. 596-600, Feb. 2023. [Baidu Scholar]
Y. Zhang, Y. Han, D. Liu et al., “Low-carbon economic dispatch of electricity-heat-gas integrated energy systems based on deep reinforcement learning,” Journal of Modern Power Systems and Clean Energy, vol. 11, no. 6, pp. 1827-1841, Nov. 2023. [Baidu Scholar]
R. S. Sutton and A. G. Barto, (2024, Apr.). Reinforcement learning: an introduction. [Online]. Available: https://books.google.com/books?hl=en&lr=&id=uWV0DwAAQBAJ&oi=fnd&pg=PR7&dq=info:t8N5xiW9 bXoJ:scholar.google.com&ots=mjoHs_Z0k1&sig=CKvWTrZ0FoBPRCmO4-Yoo4uv5z0 [Baidu Scholar]