Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (6): 1159-1175.doi: 10.23919/JSEE.2022.000140
• • 上一篇
收稿日期:2021-01-11
									
				
									
				
									
				
											出版日期:2022-12-18
									
				
											发布日期:2022-12-24
									
			
        
               		Peng LIU(
), Boyuan XIA(
), Zhiwei YANG(
), Jichao LI(
), Yuejin TAN(
)
			  
			
			
			
                
        
    
Received:2021-01-11
									
				
									
				
									
				
											Online:2022-12-18
									
				
											Published:2022-12-24
									
			Contact:
					Jichao LI   
											E-mail:liupeng81@nudt.edu.cn;xiaboyuan11@nudt.edu.cn;zhwyang88@126.com;ljcnudt@hotmail.com;yjtan@nudt.edu.cn
												About author:Supported by:. [J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1159-1175.
Peng LIU, Boyuan XIA, Zhiwei YANG, Jichao LI, Yuejin TAN. A deep reinforcement learning method for multi-stage equipment development planning in uncertain environments[J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1159-1175.
"
| Number | Variable | Symbol definition | 
| 1 | Number of equipment to be developed |   |  
| 2 | Equipment set to be developed |   |  
| 3 | Cost of equipment to be developed |   |  
| 4 | Expected number of years for equipment development |   |  
| 5 | Number of years the equipment has been developed |   |  
| 6 | Whether the equipment has been successfully developed |   |  
| 7 | Number of capabilities of concern |   |  
| 8 | Set of capabilities of concern |   |  
| 9 | Expected capabilities of the equipment to be developed |   |  
| 10 | Capabilities after multi-stage development |   |  
| 11 | Final capability requirement |   |  
| 12 | Number of stages |   |  
| 13 | Current stage |   |  
| 14 | Coefficient to transform year to stage |   |  
| 14 | The investment budget for each stage |   |  
| 15 | Development scheme |   |  
| 16 | Overall capability index |   |  
"
| Element | Data type | Normalization method | Normalized vector dimension | 
| Current state of equipment development | Category | One-hot |   |  
| Number of years taken to develop the equipment | Scale | Divided by the maximum value |   |  
| Current stage | Category | One-hot | 1 | 
| Investment amount at the current stage | Scale | Divided by the maximum value | 1 | 
| Capability requirement | Scale | Divided by the maximum value |   |  
"
| Equipment to be developed | Cost  (×1000 $)  |  Expected years of development | 
| w1: Digital signal processor | 31 | 4 | 
| w2: Digital image processor | 40 | 1 | 
| w3: Speech synthesizer | 66 | 3 | 
| w4: Low-voltage computer chip | 59 | 4 | 
| w5: High-efficiency solar cell | 51 | 1 | 
| w6: Digital-to-analog converter | 42 | 4 | 
| w7: Analog-to-digital converter | 50 | 4 | 
| w8: Frequency converter module | 60 | 1 | 
| w9: Conformal phased-array antenna | 65 | 2 | 
| w10: Radiofrequency mixer | 75 | 3 | 
"
| Equipment | s1 | a1 | s2 | a2 | s3 | a3 | s4 | a4 | s5 | a5 | s6 | a6 | s7 | a7 | s8 | a8 | s9 | a9 | S10 | |||
| Equipment 1 | 0 | ● | 1 | ○ | 1 | ● | 1 | ● | 1 | ● | 2 | ○ | 2 | ● | 2 | ○ | 2 | ● | 2 | |||
| Equipment 2 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | |||
| Equipment 3 | 0 | ● | 1 | ○ | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | ● | 2 | |||
| Equipment 4 | 0 | ● | 1 | ● | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 2 | 2 | ● | 2 | ||||
| Equipment 5 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | |||
| Equipment 6 | 0 | ● | 1 | ○ | 1 | ● | 1 | ○ | 1 | ● | 1 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | |||
| Equipment 7 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | |||
| Equipment 8 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | |||
| Equipment 9 | 0 | ○ | 0 | ● | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | |||
| Equipment 10 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | |||
| Development cost | 55.00 | 47.25 | 55.00 | 47.75 | 50.75 | 51.00 | 44.50 | 30.00 | 44.50 | End | ||||||||||||
"
| Equipment | s1 | a1 | s2 | a2 | s3 | a3 | s4 | a4 | s5 | a5 | s6 | a6 | s7 | a7 | s8 | a8 | s9 | a9 | S10 | 
| Equipment 1 | 0 | ● | 1 | ○ | 1 | ● | 1 | ● | 1 | ● | 2 | ○ | 2 | ● | 2 | ○ | 2 | ● | 2 | 
| Equipment 2 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | 
| Equipment 3 | 0 | ● | 1 | ○ | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | ● | 2 | 
| Equipment 4 | 0 | ● | 1 | ● | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 2 | 2 | ● | 2 | |
| Equipment 5 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | 
| Equipment 6 | 0 | ● | 1 | ○ | 1 | ● | 1 | ○ | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ○ | 1 | 
| Equipment 7 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | 
| Equipment 8 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | 
| Equipment 9 | 0 | ○ | 0 | ● | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | 
| Equipment 10 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | 
| Development cost | 55.00 | 47.25 | 55.00 | 47.75 | 50.75 | 51.00 | 44.50 | 60.00 | 44.50 | End | |||||||||
"
| Equipment | s1 | a1 | s2 | a2 | s3 | a3 | s4 | a4 | s5 | a5 | s6 | a6 | s7 | a7 | s8 | a8 | s9 | a9 | S10 | 
| Equipment 1 | 0 | ● | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 1 | ● | 2 | ○ | 2 | 
| Equipment 2 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | 
| Equipment 3 | 0 | ● | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | 
| Equipment 4 | 0 | ● | 1 | ● | 1 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | 
| Equipment 5 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | ○ | 2 | ○ | 2 | ○ | 2 | 
| Equipment 6 | 0 | ● | 1 | ○ | 1 | ● | 1 | ● | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | ○ | 2 | 
| Equipment 7 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | 
| Equipment 8 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ● | 2 | ○ | 2 | ○ | 2 | ● | 2 | 
| Equipment 9 | 0 | ○ | 0 | ○ | 0 | ● | 1 | ○ | 1 | ○ | 1 | ○ | 1 | ● | 2 | ○ | 2 | ○ | 2 | 
| Equipment 10 | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | ○ | 0 | 
| Development cost | 55.00 | 44.50 | 57.75 | 50.50 | 51.00 | 60.00 | 50.75 | 44.50 | 60.00 | End | |||||||||
| 1 | LORELL M A, LOWELL J, YOUNOSSI O. Evolutionary acquisition: implementation challenges for defense space programs. Santa Monica: Rand Corporation, 2006. | 
| 2 | LORELL M A, JULIA F L, OBAID Y. Evolutionary acquisition is a promising strategy, but has been difficult to implement. Santa Monica: Rand Corporation, 2006. | 
| 3 | SILBERGLITT R, SHERRY L. A decision framework for prioritizing industrial materials research and development. Santa Monica: Rand Corporation, 2002. | 
| 4 | PREISS B, GREENE L, KRIEBEL J, et al. Air force research laboratory space technology strategic investment model: analysis and outcomes for warfighter capabilities. Proc. of the Modeling & Simulation for Military Applications, 2006. DOI: 10.1117/12.657389. | 
| 5 | FEINBERG E A, SHWARTZ A. Handbook of Markov decision processes: methods and applications. New York: Springer Science & Business Media, 2002. | 
| 6 | LIU B D, ZHAO R Q, WANG G. Uncertain programming with application. Beijing: Tsinghua University Press, 2005. | 
| 7 | BIRGE J R, LOUVEAUX F. Introduction to stochastic programming. New York: Springer Science & Business Media, 2011. | 
| 8 | KALL P, WALLACE S W. Stochastic programming. Heidelberg: Springer Berlin, 1995. | 
| 9 | RUSZCZYNSKI A, SHAPIRO A. Stochastic programming models. https://doi.org/10.1137/1.9780898718751.ch1. | 
| 10 | SUTTON R S, BARTO A G. Reinforcement learning: an introduction. Cambridge: MIT Press, 2018. | 
| 11 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning. https://doi.org/10.48550/arXiv.1312.5602. | 
| 12 |  
											 MNIH V, KAVUKCUOGLU K, SILVER D, et al Human-level control through deep reinforcement learning. Nature, 2015, 518 (7540): 529- 533. 
											 												 doi: 10.1038/nature14236  | 
										
| 13 | DANTZIG G B Linear programming under uncertainty. Management Science, 1955, 1 (3): 197- 206. | 
| 14 |  
											 EPPEN G D, MARTIN R K, SCHRAGE L A scenario approach to capability planning. Operations Research, 1989, 37 (4): 517- 527. 
											 												 doi: 10.1287/opre.37.4.517  | 
										
| 15 | CHEN Z L, LI S L, TIRUPATI D A scenario-based stochastic programming approach for technology and capacity planning. Computers & Operations Research, 2002, 29 (7): 781- 806. | 
| 16 |  
											 LULLI G, SEN S A branch-and-price algorithm for multistage stochastic integer programming with application to stochastic batch-sizing problems. Management Science, 2004, 50 (6): 786- 796. 
											 												 doi: 10.1287/mnsc.1030.0164  | 
										
| 17 |  
											 SEN S, YU L H, GENC T A stochastic programming approach to power portfolio optimization. Operations Research, 2006, 54 (1): 55- 72. 
											 												 doi: 10.1287/opre.1050.0264  | 
										
| 18 |  
											 GENG N, JIANG Z B, CHEN F Stochastic programming based capacity planning for semiconductor wafer fab with uncertain demand and capacity. European Journal of Operational Research, 2009, 198 (3): 899- 908. 
											 												 doi: 10.1016/j.ejor.2008.09.029  | 
										
| 19 |  
											 PINAR M C Robust scenario optimization based on downside-risk measure for multi-period portfolio selection. OR Spectrum, 2007, 29 (2): 295- 309. 
											 												 doi: 10.1007/s00291-005-0023-2  | 
										
| 20 | SHAPIRO A Stochastic programming approach to optimization under uncertainty. Mathematical Programming, 2008, 112 (1): 183- 220. | 
| 21 | HØYLAND K, KAUT M, WALLACE S W. A heuristic for moment-matching scenario generation. Computational Optimization and Applications, 2003, 24(2/3): 169–185. | 
| 22 |  
											 DENIZ E, LUXHØJ J T A scenario generation method with heteroskedasticity and moment matching. The Engineering Economist, 2011, 56 (3): 231- 253. 
											 												 doi: 10.1080/0013791X.2011.599918  | 
										
| 23 |  
											 CASEY M S, SEN S The scenario generation algorithm for multistage stochastic linear programming. Mathematics of Operations Research, 2005, 30 (3): 615- 631. 
											 												 doi: 10.1287/moor.1050.0146  | 
										
| 24 |  
											 WETS S R L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal on Applied Mathematics, 1969, 17 (4): 638- 663. 
											 												 doi: 10.1137/0117061  | 
										
| 25 | CARE C C, TIND J. L-shaped decomposition of two-stage stochastic programs with integer recourse. Mathematical Programming, 1998, 83(1/3): 451–464. | 
| 26 |  
											 BLOMVALL J, LINDBERG P O A riccati-based primal interior point solver for multistage stochastic programming. European Journal of Operational Research, 2002, 143 (2): 452- 461. 
											 												 doi: 10.1016/S0377-2217(02)00301-6  | 
										
| 27 |  
											 ALONSO A A, ESCUDERO L F, GARIN A, et al An approach for strategic supply chain planning under uncertainty based on stochastic 0-1 programming. Journal of Global Optimization, 2003, 26 (1): 97- 124. 
											 												 doi: 10.1023/A:1023071216923  | 
										
| 28 |  
											 ALONSO A, ESCUDERO L F, ORTUNO M T BFC, a branch-and-fix coordination algorithmic framework for solving some types of stochastic pure and mixed 0-1 programs. European Journal of Operational Research, 2003, 151 (3): 503- 519. 
											 												 doi: 10.1016/S0377-2217(02)00628-8  | 
										
| 29 |  
											 BERKELAAR A, GROMICHO J A S, KOUWENBERG R, et al A primal-dual decomposition algorithm for multistage stochastic convex programming. Mathematical Programming, 2005, 104 (1): 153- 177. 
											 												 doi: 10.1007/s10107-005-0575-6  | 
										
| 30 |  
											 SANTOSO T, AHMED S, GOETSCHALCKX M, et al A stochastic programming approach for supply chain network design under uncertainty. European Journal of Operational Research, 2005, 167 (1): 96- 115. 
											 												 doi: 10.1016/j.ejor.2004.01.046  | 
										
| 31 |  
											 AHMED S Convexity and decomposition of mean-risk stochastic programs. Mathematical Programming, 2006, 106 (3): 433- 446. 
											 												 doi: 10.1007/s10107-005-0638-8  | 
										
| 32 |  
											 MILLER N, RUSZCZYNSKI A Risk-averse two-stage stochastic linear programming: modeling and decomposition. Operations Research, 2011, 59 (1): 125- 132. 
											 												 doi: 10.1287/opre.1100.0847  | 
										
| 33 |  
											 AHMED S, KING A J, PARIJA G A multi-stage stochastic integer programming approach for capacity expansion under uncertainty. Journal of Global Optimization, 2003, 26 (1): 3- 24. 
											 												 doi: 10.1023/A:1023062915106  | 
										
| 34 |  
											 SAHINIDIS A N V An approximation scheme for stochastic integer programs arising in capacity expansion. Operations Research, 2003, 51 (3): 461- 471. 
											 												 doi: 10.1287/opre.51.3.461.14960  | 
										
| 35 | HERNANDEZ P, ALONSO A A, BRAVO F, et al A branch-and-cluster coordination scheme for selecting prison facility sites under uncertainty. Computers & Operations Research, 2012, 39 (9): 2232- 2241. | 
| 36 | YILMAZ P, CATAY B Strategic level three-stage production distribution planning with capacity expansion. Computers & Industrial Engineering, 2006, 51 (4): 609- 620. | 
| 37 | TARHAN B, GROSSMANN I E. A multistage stochastic programming approach with strategies for uncertainty reduction in the synthesis of process networks with uncertain yields. Computers & Chemical Engineering, 2008, 32(4/5): 766–788. | 
| 38 |  
											 WANG K J, WANG S M, CHEN J C A resource portfolio planning model using sampling-based stochastic programming and genetic algorithm. European Journal of Operational Research, 2008, 184 (1): 327- 340. 
											 												 doi: 10.1016/j.ejor.2006.10.037  | 
										
| 39 | AHMADIZAR F, GHAZANFARI M, GHOMI S M T F Group shops scheduling with makespan criterion subject to random release dates and processing times. Computers & Operations Research, 2010, 37 (1): 152- 162. | 
| 40 |  
											 WANG S M, WATADA J Two-stage fuzzy stochastic programming with value-at-risk criteria. Applied Soft Computing, 2011, 11 (1): 1044- 1056. 
											 												 doi: 10.1016/j.asoc.2010.02.004  | 
										
| 41 | AGHAEI J, NIKNAM T, AZIZIPANAH A R, et al Scenario-based dynamic economic emission dispatch considering load and wind power uncertainties. International Journal of Electrical Power & Energy Systems, 2013, 47 (5): 351- 367. | 
| 42 |  
											 SEKER M, NOYAN N Stochastic optimization models for the airport gate assignment problem. Transportation Research Part E: Logistics and Transportation Review, 2012, 48 (2): 438- 459. 
											 												 doi: 10.1016/j.tre.2011.10.008  | 
										
| 43 |  
											 THANGARAJ R, PANT M, BOUVRY P, et al Solving stochastic programming problems using modified differential evolution algorithms. Logic Journal of IGPL, 2012, 20 (4): 732- 746. 
											 												 doi: 10.1093/jigpal/jzr017  | 
										
| 44 | CAO J L Algorithm research based on multi period fuzzy portfolio optimization model. Cluster Computing, 2019, 22 (2): 3445- 3452. | 
| 45 |  
											 GULTEN S, RUSZCZYNSKI A Two-stage portfolio optimization with higher-order conditional measures of risk. Annals of Operations Research, 2015, 229 (1): 409- 427. 
											 												 doi: 10.1007/s10479-014-1768-2  | 
										
| 46 | RAFIEE M, KIANFAR F. A scenario tree approach to multi-period project selection problem using real-option valuation method. The International Journal of Advanced Manufacturing Technology, 2011, 56(1/4): 411–420. | 
| 47 | HOSSEINALIZADEH R, KHAMSEH A A, AKHLAGHI M M. A multi-objective and multi-period model to design a strategic development program for biodiesel fuels. Sustainable Energy Technologies and Assessments, 2019. DOI: 10.1016/j.seta.2019.100545. | 
| 48 |  
											 KHORSI M, CHAHARSOOGHI S K, BOZORGI-AMIRI A, et al A multi-objective multi-period model for humanitarian relief logistics with split delivery and multiple uses of vehicles. Journal of Systems Science and Systems Engineering, 2020, 29, 360- 378. 
											 												 doi: 10.1007/s11518-019-5444-6  | 
										
| 49 |  
											 CHAN Y, DISALVO J P, GARRAMBONE M W A goal-seeking approach to capital budgeting. Socio-Economic Planning Sciences, 2005, 39 (2): 165- 182. 
											 												 doi: 10.1016/j.seps.2004.04.002  | 
										
| 50 | WHITACRE J M, ABBASS H A, SARKER R, et al. Strategic positioning in tactical scenario planning. Proc. of the 10th Annual Conference on Genetic and Evolutionary Computation, 2008: 1081–1088. | 
| 51 |  
											 GOLANY B, KRESS M, PENN M, et al Network optimization models for resource allocation in developing military countermeasures. Operations Research, 2012, 60 (1): 48- 63. 
											 												 doi: 10.1287/opre.1110.1002  | 
										
| 52 | XIONG J, YANG K W, LIU J, et al A two-stage preference-based evolutionary multi-objective approach for capability planning problems. Knowledge-Based Systems, 2012, 31, 128- 139. | 
| 53 | XIONG J, ZHOU Z B, TIAN K, et al A multi-objective approach for weapon selection and planning problems in dynamic environments. Journal of Industrial & Management Optimization, 2017, 13 (3): 1189- 1211. | 
| 54 | REMPEL M, YOUNG C A portfolio optimization model for investment planning in the department of national defence and Canadian Armed Forces. Proc. of the 46th Annual Meeting of the Decision Sciences Institute, 2015, 384- 408. | 
| 55 | WANG M, ZHANG H Q, ZHANG K. A model and solving algorithm of combination planning for weapon equipment based on Epoch–era analysis method. Proc. of the AIP Conference Proceedings, 2017. DOI: 10.1063/1.5005319. | 
| 56 | MOALLEMI E A, ELSAWAH S, TURAN H H, et al Multi-objective decision making in multi-period acquisition planning under deep uncertainty. Proc. of the Winter Simulation Conference, 2018, 1334- 1345. | 
| 57 |  
											 XIA B Y, ZHAO Q S, YANG K W, et al Scenario-based modeling and solving research on robust weapon project planning problems. Journal of Systems Engineering and Electronics, 2019, 30 (1): 85- 99. 
											 												 doi: 10.21629/JSEE.2019.01.09  | 
										
| 58 |  
											 BROWN G G, DELL R F, NEWMAN A M Optimizing military capital planning. Interfaces, 2004, 34 (6): 415- 425. 
											 												 doi: 10.1287/inte.1040.0107  | 
										
| 59 |  
											 TSAGANEA D Appropriation of funds for anti-ballistic missile defense: a dynamic model. Kybernetes, 2005, 34 (6): 824- 833. 
											 												 doi: 10.1108/03684920510595517  | 
										
| 60 | BAKER S, BENDER A, ABBASS H, et al. A scenario-based evolutionary scheduling approach for assessing future supply chain fleet capabilities. Berlin: Springer, 2007. | 
| 61 | XIN B, CHEN J, PENG Z H, et al An efficient rule-based constructive heuristic to solve dynamic weapon-target assignment problem. IEEE Trans. on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2010, 41 (3): 598- 606. | 
| 62 |  
											 FISHER B, BRIMBERG J, HURLEY W J An approximate dynamic programming heuristic to support non-strategic project selection for the Royal Canadian Navy. The Journal of Defense Modeling and Simulation, 2015, 12 (2): 83- 90. 
											 												 doi: 10.1177/1548512913509031  | 
										
| 63 |  
											 FLEISCHER F M, VESTLI M, GLAERUM S Optimization model for robust acquisition decisions in the Norwegian armed forces. Interfaces, 2013, 43 (4): 352- 359. 
											 												 doi: 10.1287/inte.2013.0690  | 
										
| 64 | ZHANG P L, YANG K W, DOU Y J, et al Scenario-based approach for project portfolio selection in army engineering and manufacturing development. Journal of Systems Engineering and Electronics, 2016, 27 (1): 166- 176. | 
| 65 |  
											 SHAFI K, ELSAYED S, SARKER R, et al Scenario-based multi-period program optimization for capability-based planning using evolutionary algorithms. Applied Soft Computing, 2017, 56, 717- 729. 
											 												 doi: 10.1016/j.asoc.2016.07.009  | 
										
| 66 | FONTOURA A, HADDAD D, BEZERRA E. A deep reinforcement learning approach to asset-liability management. Proc. of the 8th Brazilian Conference on Intelligent Systems, 2019. DOI: 10.1109/BRACIS.2019.00046. | 
| 67 | MAO H Z, ALIZADEH M, MENACHE I, et al. Resource management with deep reinforcement learning. Proc. of the 15th ACM Workshop on Hot Topics in Networks, 2016: 50–56. | 
| 68 | MIRHOSEINI A, PHAM H, LE Q V, et al. Device placement optimization with reinforcement learning. Proc. of the 34th International Conference on Machining Learning, 2017: 2430–2439. | 
| 69 | LUIS J J G, GUERSTER M, Del P I, et al. Deep reinforcement learning architecture for continuous power allocation in high throughput satellites. https://doi.org/10.48550/arXiv.1906.00571. | 
| 70 | KHADILKAR H A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. on Intelligent Transportation Systems, 2018, 20 (2): 727- 736. | 
| 71 | YANG Q Q, GAO Y Y, G Y, et al Target search path planning for naval battle field based on deep reinforcement learning. Systems Engineering and Electronics, 2022, 44 (11): 3486- 3485. | 
| 72 | VINYALS O, EWALDS T, BARTUNOV S, et al. Starcraft II: a new challenge for reinforcement learning. https://doi.org/10.48550/arXiv.1708.04782. | 
| 73 | HAUSKNECHT M, STONE P. Deep reinforcement learning in parameterized action space. Proc. of the International Conference on Learning Representations, 2016. DOI: 10.48550/arXiv.1511.04143. | 
| 74 | LAMPLE G, CHAPLOT D S Playing FPS games with deep reinforcement learning. Proc. of the AAAI Conference on Artificial Intelligence, 2017, 2140- 2146. | 
| 75 | KEMPKA M, WYDMUCH M, RUNC G, et al. Vizdoom: a doom-based AI research platform for visual reinforcement learning. Proc. of the IEEE Conference on Computational Intelligence and Games, 2016. DOI: 10.1109/CIG.2016.7860433. | 
| 76 | ZHU Y K, MOTTAGHI R, KOLVE E, et al Target-driven visual navigation in indoor scenes using deep reinforcement learning. Proc. of the IEEE International Conference on Robotics and Automation, 2017, 3357- 3364. | 
| 77 | GU S X, LILLICRAP T, SUTSKEVER I, et al Continuous deep Q-learning with model-based acceleration. Proc. of the International Conference on Machine Learning, 2016, 2829- 2838. | 
| 78 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning. https://doi.org/10.48550/arXiv.1509.02971. | 
| 79 | GU S X, HOLLY E, LILLICRAP T, et al Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proc. of the IEEE International Conference on Robotics and Automation, 2017, 3389- 3396. | 
| 80 | WANG W L, CHEN H L, LI G Q, et al Deep reinforcement learning for multi-depot vehicle routing problem. Control and Decision, 2022, 37 (8): 2101- 2109. | 
| 81 | KENDALL A, HAWKE J, JANZ D, et al Learning to drive in a day. Proc. of the International Conference on Robotics and Automation, 2019, 8248- 8254. | 
| 82 | XIONG X, WANG J Q, ZHANG F, et al. Combining deep reinforcement learning and safety based control for autonomous driving. https://doi.org/10.48550/arXiv.1612.00147. | 
| 83 | SALLAB A E L, ABDOU M, PEROT E, et al Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017, 19, 70- 76. | 
| 84 | SHARIFZADEH S, CHIOTELLIS I, TRIEBEL R, et al. Learning to drive using inverse reinforcement learning and deep Q-networks. https://doi.org/10.48550/arXiv.1612.03653. | 
| 85 | TAI L, PAOLO G, LIU M Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, 31- 36. | 
| 86 | TAI L, LIU M. Towards cognitive exploration through deep reinforcement learning for mobile robots. https://doi.org/10.48550/arXiv.1610.01733. | 
| 87 | ZHAO D B, ZHU Y H, LV L, et al Convolutional fitted Q iteration for vision-based control problems. Proc. of the International Joint Conference on Neural Networks, 2016, 4539- 4544. | 
| 88 | HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning. Proc. of the AAAI Conference on Artificial Intelligence, 2018. DOI: 10.1609/aaai.v32i1.11796. | 
| No related articles found! | 
| 阅读次数 | ||||||
| 
												        	全文 | 
											        	
												        	 | 
													|||||
| 
												        	摘要 | 
												        
															 | 
													|||||