1 Introduction

China is one of the world pioneering countries in promoting the development and acceptance of electric vehicles (EVs) due to the unprecedent challenges of severe air pollutions and significant amount of CO2 emissions countrywide. In 2013, large-scale smog inflicted more than 100 large cities spread in 30 provinces. The dramatic growth of motor vehicles in the past decade partly contributes to the heavy haze in many metropolis areas [1].

Both pure battery EVs and plug-in hybrid EVs are very promising technical alternatives of fossil fuel powered vehicles and they play an important role in decarbonizing the transportation section and in significantly reducing the exhaust gas emission in urban areas. However, to form an EV-oriented ecosystem for future transportation is a totally new and complicated decision process, necessitating a large number of trials and errors. Governments are putting forward incentive policies to support the development of EVs yet based on very limited information about the future behavior of customers, complicated with the uncertainty about the ultimate shape of the EV systems. The large difference between the government’s plans and the reality is discussed in the recent reports, and the goal to have 0.5 million EVs sold and used until 2015 is facing a big challenge [2]. In this highly-connected world, the future EV systems will in no doubt impose significant impacts on the future power systems as well [3].

The modelling and analysis of human behaviors is still one of the most challenging issues. Recent economical researches suggest that the actual human decisions are based on very limited local information, hence the optimal solutions primarily exist in very confined spaces of limited dimensions. Individual preferences and mental rules dictate the decision boundary, thus the decision results. To form such decision boundaries therefore becomes very important.

Many local city councils have put in place incentive policies to promote the development of the EV industry. To avoid the blindness in this process, it is imperative to quantitatively assess customers’ response to these policies beforehand, and to adequately handle any challenges that may arise when the power systems are integrated with a huge number of EVs [3]. This requires the collection of a massive number of data, which are then used for modelling and analysis in a bid to apprehend the whole picture, including how the EV-involved systems are operated, how the policy can implemented and their consequences, what are the customers’ preferences in purchasing and using EVs, and how the EV’s prospect and participants’ willingness are interrelated, etc.

By adopting the simulation-based optimization and decision-making concepts in nature science researches, experimental economics (EE) methods [4, 5] present themselves as an attractive methodology to achieve aforementioned analysis work. A questionnaire based survey [6, 7] is a simple yet effective EE method to investigate the subjective preferences of human participants, and it is popular in marketing, social science and economics studies, and suitable for large sample analysis.

Data for existent ecosystems, such as fossil fuel-powered vehicles, can be collected from various sources, like transaction records, videos, GPS trackers, ultra-sonic detections and mobile phone locating data, and these data can be used to analyze and understand the purchasing and driving behaviors [811]. However, the data for nonexistent ecosystems, such as for EVs, can only be attainable through human participants, and questionnaire based surveys are one option.

Researchers often need to extract and utilize knowledge as much as possible from the survey data. However, the information that can be extracted from surveys is often limited, and there is no direct path to bridge knowledge deduction from survey based approach to simulation based analysis. However, only model-based simulations or calculations can support optimization in a complicated system. On the other hand, it is extremely costly to train and deploy a large number of human participants for a substantial period of time in an EE-based simulation environment. Further, the consistency of participants’ behaviors in different simulation trials of the same scenario can be hardly guaranteed. The idea to combine well designed multi-agents to represent the majority of people, and human participants acting as the minority in an EE-based simulation is a solution to effectively handle difficulties arising from the both sides. In this paper, a software platform has been developed to achieve such simulations, and the primary aim is to act as an effective quantitative analysis tool for social scientific problems [1214].

In order to implement the concept, it is essential to ensure the conformity of statistical results between multi-agents and human participants. Further, difficulties in regards to data mining, information extraction and multi-agent modeling have to be overcome in order to achieve the goals.

Current research on the agent method mainly focuses on the development of the logic inference ability of agents [15, 16], by utilizing the decision optimization theories or expert-knowledge systems. However, the behaviors of “well-developed” multi-agents are often statistically far different from those of the experimenters.

In our previous work [17], multi-layer correlation information among different factors was extracted from questionnaires acquired in regards to the willingness of customers to buy EVs. A stochastic multi-agent model was then created and validated by fitting it to the probabilistic distributions that describe the full-dimensional information related to human behaviors.

In this paper, the previous work in [17] is first revisited, and then sensitivity analyses are conducted by using the multi-agent model to investigate the impact of a certain factor on the purchasing ratio. Finally, the influences of customers’ preferences on the purchasing ratio are discussed.

2 Complete behavioral information extracted from key dimensions

2.1 Initial choice of key factors

Factors influencing EV purchasers are almost countless, which is the often-met situation when a decision is made in the real world. Therefore, key dimensions (factors) from the whole decision space should be identified.

According to the world-wide investigations of the willingness of the customers on EV purchasing, the Deloitte’s Global Manufacturing Industry Group published a report [18] indicating that the top 5 factors that an EV purchaser considers include the maximum range, minimum charging time, price difference compared with the same class of fossil fuel powered vehicles, purchase price and fuel price. The report presents the statistical result for each factor, indicating a psychological threshold that people of a given percentage would be willing to buy EVs, but the information of the correlation among different factors is not revealed. Due to the lack of necessary information, the joint distribution among the five factors is unattainable, which is however indispensable for a complete understanding of the purchasing willingness of individuals. The reason is that people only consider buying an EV if the minimum requirements of all the factors are met. The satisfaction of any single criterion is not a sufficient but a necessary condition. Besides that, the report does not include the factor of charging conditions, which is thought as the vital point to make EVs unpopular [19]. The charging problems as well as their influences will be studied in our further works.

In order to acquire such correlations among different factors influencing the willingness of purchasing EVs, a questionnaire based survey mainly among young Chinese people between the age range of 25 and 35 (as shown in Appendix A) was conducted by the authors. Most of respondents were chosen from college graduates who have worked in different large cities of China mainland for 2-6 years. Participants were asked to answer what are the minimum acceptable psychological thresholds of the given factors when he/she chooses an EV. The survey has collected 200 valid questionnaires for developing a new method of modeling human behaviors, though the samplings are perhaps not representative and random enough. Moreover, Tables A1 and A2 in Appendix compare the authors’ results with the Deloitte’s ones for China, and show very similar distributions.

2.2 Importance ranking of factors

Even if only the key factors are selected and surveyed in a limited number of questions, a complete comprehension of people’s preferences on these key factors still proves to be difficult, especially when the number of questionnaires is limited. Although each question in the questionnaire is only related to one factor, the whole answer sheet still reveals a deterministic view of a person in regards to the relations among all factors. Thus, the joint probability information of the psychological thresholds of all the five factors must have been embedded in the complete set of the questionnaires collected.

If a multi-agent model is built to replace a group of respondents, the threshold value of each factor for each agent must be tuned based on the high-dimensional joint probability distribution that reflects the purchasing willingness of the survey respondents, though 200 samples are still far from enough to create an accurate joint probability map of the five factors by any conventional method. In order to overcome this difficulty, one more question is included in the questionnaire which requires each respondent to sort the factors by their importance. The proportion of a factor being chosen as the first important one is listed in the following descending order: purchase price (37.0%), range (36.5%), charge time (11.5%), price difference (10.5%), and fuel price (4.5%). The proportions of these factors being selected as the second important one are: purchase price (12.0%), range (28.5%), charge time (37.5%), price difference (12.5%), and fuel price (9.5%), as shown in Appendix B. However, it is impossible to know from these questionnaires the probability that respondents take the range as the second important factor while taking the purchase price as the first important one. This question is not answered by this table, as well as from the results shown in [18]. The actual conditional probability is 16.4% which is not equal to the total probability of 28.5%. In order to reproduce the joint probabilistic distribution of such group of factors after bulk samplings, the sampling agents have to be simulated based on the joint probabilistic distribution.

In order to store joint probabilistic data effectively, the data structure with multiple layers representing conditional probability is adopted. The concept of “sorted importance layer” is defined because the ordering of factors in the sampling process may obviously influence the utilization of information from the questionnaires. In the data structure, layer i corresponds to the frequency distribution of the i th important factor in all ni+1 possibilities, and the joint frequency distribution between layer i and ni possibilities in layer i+1, here n is the total number of factors. The latter distribution depends not only on the sampling in the current layer, but also on previous sampling results. The ordering of the sampling process uses the importance data associated with sample factors, rather than the reverse. The joint probability of a layer with smaller i implies larger entropy, it therefore needs a sufficiently larger number of samples in order to obtain a reliable statistic result. The entropy of layers with larger i is relatively smaller, hence an approximate statistical results can be used.

Given the above discussions, Fig. 1 illustrates a tree correlation data structure among adjacent layers for sorting factor importance, reflecting the correlations of frequency distributions. Since there is no uncertainty associated with the final layer, the “tree” structure has 4 layers to represent the correlations among the five factors. Here, a k (k = 1, 2, 3, 4, 5) represents the five features respectively, namely range, charge time, price difference, purchase price and fuel price.

Fig. 1
figure 1

Multi-layer1 frequency distributions

2.3 Rules for data reconciliation for insufficient number of samples

The joint frequencies on the correlation “tree” are counted along with the layers from the top to the bottom. However, there are cases of insufficient samples which often occur at the bottom layers of the “tree”. If the frequency is too low for a certain node, this implies that the information provided to next layers might be meaningless, and approximate distributions therefore shall be used to replace the original ones for the purpose of compensating missing information. The designed rules to control the replacement are as follows:

Rule 1: If the number of samples in the selected layer is sufficient, the frequency counting is strictly conducted on the corresponding data collected from the questionnaires.

Rule 2: If the number of samples in the selected layer is insufficient, i.e. the number is less than a threshold value α (α is set as 8 in Fig. 1), the correlation between factors is ignored and the independent distribution of the corresponding factor is used directly.

Rule 2 will be adopted in the cases of very low frequencies, therefore the influence on the accuracy is limited, which is confirmed through a number of simulation studies.

2.4 Information extraction for different psychological thresholds of factors

Figure 2 shows the upper-triangular correlation matrix reflecting the joint probability between the 1st sorted importance layer and the 2nd layer (Table B2 in Appendix B shows the independent distribution of each single factor’s psychological threshold). Including repeated ones, the matrix contains n × n sub-matrices, where n is the number of factors. Here, the sub-matrix (i, j) records the frequency if factor j is drawn in the 2nd layer while factor i is selected in the 1st layer. For example, the sub-matrix (2, 4) records the frequency if the charge time is placed at the 1st importance layer and the purchase price is placed at the 2nd layer.

Fig. 2
figure 2

Joint distribution of psychological thresholds of factors in a two layer structure

The dimension of each sub-matrix is decided by the number of the thresholds of the corresponding factor, which is equal to 5 in Fig. 2. Every row represents a threshold value \( \mathop d\nolimits_{{ 1.g_{ 1} }} \) of the 1st important factor, while every column represents certain threshold value \( \mathop d\nolimits_{{ 2.g_{ 2} }} \) of the 2nd important factor. The value of a sub-matrix’s element records the joint frequency in which thresholds \( \mathop d\nolimits_{{ 1.g_{ 1} }} \) and \( \mathop d\nolimits_{{ 2.g_{ 2} }} \) are both selected by the participants of the questionnaire based survey. Figure 3 is the flow chart to compute the complete joint frequency \( \mathop d\nolimits_{{ 1.g_{ 1} }} \mathop d\nolimits_{{ 2.g_{ 2} }} \mathop d\nolimits_{{ 3.g_{ 3} }} \mathop d\nolimits_{{ 4.g_{ 4} }} \mathop d\nolimits_{{ 5.g_{ 5} }} \) among all factors.

Fig. 3
figure 3

Sorting algorithm based on psychological thresholds of factors

Since there are more choices for the psychological threshold of a factor, Rule 3 is introduced to fully utilize the information from the answer sheets and to reduce the approximation error.

Rule 3: If the number of samples is less than a threshold value β in a group where the participants choose exactly the given psychological threshold, then a new group will be used for counting where participants choose a value equal or larger than the given psychological threshold.

3 Multi-agent modeling with full set of behaviors

The first half of Fig. 4 shows the algorithm to build multi-agents reflecting customers’ willingness to buy EVs based on the multi-dimensional information embedded in the collected questionnaires. The key part is the extraction of joint probabilistic distributions from the sorted importance data of different factors, and the distribution of psychological thresholds of these factors. The second half of Fig. 4 uses these distributions to generate individual agents as many as needed.

Fig. 4
figure 4

Flow chart for generating individual agents

4 Verification of multi-agent simulation results

In the verification, different target EV types are tested with all factors being randomly selected, and the ratio of questionnaires with all the thresholds being reached is used as the benchmark reflecting respondents’ willingness to buy EVs for the comparison purpose. Meanwhile, different sets of Monte-Carlo simulations, where 100000 agents for each set, are generated using the above algorithm to acquire the ratio of potential buyers who are satisfied with this EV type (the purchase ratio below for short), and the errors in comparison with the benchmark results are recorded. The statistics of simulation errors from a large number of trials with different EV types confirms the effectiveness of the rules adopted above for extracting the joint distribution statistics and the multi- agent models developed.

For a certain subset containing m answer sheets, samples are randomly selected from the total for building the statistical multi-agent model. The purchase ratio is acquired from simulations for the EV type 2 (as shown in Appendix C) used as the target vehicle, and presented as a point in Fig. 5. Repeating the procedure with m from 80 to 200, in a step of 10, the results are presented as a dashed curve. Then the whole process is renewed for the above random selections to obtain the 10 dashed curves in the figure.

Fig. 5
figure 5

Comparison between agent-based simulation results and data taken from questionnaires

The solid line with circles shows the benchmark, where the value with respect to m = 200 is the actual purchase ratio. As the value of m increases, all individual curves converge to the benchmark.

A curve marked by n in Fig. 6 is the average value of n dashed curves in Fig. 5. Figure 6 shows that the curve with a large enough n, e.g. 10, is highly matched with the benchmark.

Fig. 6
figure 6

Comparison between average agent-based simulation results and data taken from questionnaires

Figure 7 shows the influence of the rules on the simulation error. The horizontal axis represents the number of questionnaires taken for extracting the distributions of willingness to buy EVs. The vesrtical axis shows the relative error. Otherwise, the symbols and legends are the same as in Fig. 6. Figure 7a shows the results where the thresholds α and β that control the switches towards rule 2 and rule 3 respectively are both set as 0. It reveals that these two rules were never used. It also shows that the relative error can be confined to a level below 10% only when the number of questionnaires is equal or greater than 150. This indicates that the degree of information loss has a significant impact on the simulation precision.

Fig. 7
figure 7

Impact of the number of questionnaires on the simulation error

In Fig. 7b, α and β are set as 8 and 40 respectively, the relative error has clearly dropped. The comparison results confirm that the algorithm proposed in this paper will still be valid for a less number of questionnaires.

5 Simulation analysis of group behaviors

5.1 Hybrid EE-based simulation using both multi-agents with uncertain behaviors and human participants

In this paper, it has been shown that the multi-agent system should be modeled with deep information extracted from the questionnaires as much as possible, especially with the joint probability information of important factors, in order to accurately reflect the statistical regularities of decisions of respondents. With the multi-agent system, a study based on the hybrid simulation including a lot of individuals controlled either by human experimenters or by the probabilistic multi-agent model can be performed.

The following 2 experiments are our recent attempts along this research direction. The first experiment tests the agent model in different scenarios of vehicle types, in order to extract more information from the questionnaires and help EV producers to evaluate the popularity of their vehicle types; the second one extrapolates possible future morphology of the EV market based on the spatio-temporal information taken from the current questionnaires by partially customizing agents’ preferences.

5.2 Influence of vehicle parameters on EV purchasers’ willingness

This simulation studies the correlation of vehicle parameters and the influence of different psychological thresholds of factors on the EV purchasers’ willingness. The 3 test scenarios are listed in Appendix C.

As shown in Fig. 8a, the ratio of willingness to buy EVs increases slightly when the range varies from 0 to 80 km; the ratio rises quickly when the range is within 80–320 km; and finally the ratio’s growth rate drops again when the range is greater than 320 km. Figure 8b shows a linear relation between the charge time and the ratio. The relation between price difference (purchase price) and the ratio is exponential in different degrees, as shown in Fig. 8c, d respectively. The relation between fuel price and the ratio has two linear parts divided by a discontinuity point, above which customers are more sensitive to the parameter variance, as shown in Fig. 8e.

Fig. 8
figure 8

Influence of EV parameters on willingness to buy EVs

5.3 Influence of customers’ preference variation on purchase ratio

Customers’ preferences vary in different places and time [18]. The influences of preferences’ variation on the willingness to buy EVs can be studied through altering the probability distribution used in the multi-agent models, even turning it into a time-varying one. In Fig. 9, the preference to the range is studied, as the customers are more sensitive to this parameter. Curve 0 is the result obtained from the original questionnaires. For curve 1, the psychological thresholds of all the customers are set higher. Compared with Curve 0, Curve 1 leads to the most significant drop of purchase ratio at the range of 320 km (the median of the range’s possible variation). On Curve 2, all the customers’ psychological thresholds are distributed uniformly. The simulation result indicates that while the market of low-end EVs improves, the high-end EVs market however gets worse.

Fig. 9
figure 9

Influence of preferences to the range on the ratio of willingness to buy EVs

6 Conclusions

The hybrid EE-based simulation techniques combining multi-agents and human participants strike a balance among different computation considerations involving strong subjective willingness of participants and a large number of simulated individuals. In this paper, a model is first developed to describe the uncertain psychological thresholds for different characteristic factors. For the research of people’s willingness to buy EVs, the joint probability distributions can be extracted by this model from a limited number of relevant questionnaires collected from participants. A probabilistic multi-agent model is then constructed to fully reproduce the statistical distribution of respondents’ willingness in response to different characteristic factors. Case studies with various target EV types in this paper confirm the authenticity of the model. Further, by tuning parameters of the probabilistic agent model, the influence of preferences’ variations on the purchase ratio can be investigated. The method proposed in this paper helps to increase the simulation scale, maintain comparability among repeated trials, and reflect the effect of online attendance of human participants. It further offers a modeling approach for human behaviors, and the statistical rules proposed can help to effectively deal with difficulties arising from insufficient number of samples. In summary, the proposed method provides a powerful simulation platform for analyzing the influences of various factors on the development of EV industry.