moptipy.evaluation package¶
Components for parsing and evaluating log files generated by experiments.
Via the A utility to specify axis ranges. Bases: An object for simplifying axis range computations. Create a default axis ranger based on the axis type. The axis ranger will use the minimal values and log scaling options that usually make sense for the dimension, unless overridden by the optional arguments. name ( chosen_min ( chosen_max ( use_data_min ( use_data_max ( log_scale ( the AxisRanger Generate a function that provides the default per-axis ranger. chosen_min ( chosen_max ( use_data_min ( use_data_max ( log_scale ( a function in the shape of Get a reasonable positive finite value that can replace 0. a reasonable finite value that can be used to replace 0 Get a reasonable finite value that can replace positive infinity. a reasonable finite value that can be used to replace positive infinity Add some padding to the current detected range. This function increases the current detected or chosen maximum value and/or decreases the current detected minimum by a small amount. This can be useful when we want to plot stuff that otherwise would become invisible because it would be directly located at the boundary of a plot. This function works by computing a slightly smaller/larger value than the current detected minimum/maximum and then passing it to This method should be called only once and only after all data has been registered (via ValueError – if this axis ranger is not configured to use a detected minimum/maximum or does not have a detected minimum/maximum or any other invalid situation occurs Some internal helper functions and base classes. a description of the algorithm field a description of the encoding field a description of the instance field a description of the objective function field Bases: A base class for all the data classes in this module. The name of the normalized objective values data. The name of the raw objective values data. The name of the scaled objective values data. a key for the objective function name Bases: A multi-run data based on one time and one objective dimension. Bases: A class that represents statistics over a set of runs. If one algorithm*instance is used, then algorithm and instance are defined. Otherwise, only the parameter which is the same over all recorded runs is defined. Bases: An immutable record of information over a single run. The unit of the time axis of time is measured in FEs The unit of the time axis if time is measured in milliseconds. Check whether an objective value name is valid. f_name ( the name of the objective function dimension Check that the time unit is OK. Get the algorithm of a given object. obj ( the algorithm string, or None if no algorithm is specified Get the instance of a given object. obj ( the instance string, or None if no instance is specified Print the standard csv footer for moptipy. the iterable with the footer comments Get the default sort key for the given object. The sort key is a tuple with well-defined field elements that should allow for a default and consistent sorting over many different elements of the experiment evaluation data API. Sorting should work also for lists containing elements of different classes. obj ( the sort key Approximate the The empirical cumulative distribution function (ECDF) for short illustrates the fraction of runs that have reached a certain goal over time. Let’s say that you have performed 10 runs of a certain algorithm on a certain problem. As goal quality, you could define the globally optimal solution quality. For any point in time, the ECDF then shows how many of these runs have solved the problem to this goal, to optimality. Let’s say the first run solves the problem after 100 FEs. Then the ECDF is 0 until 99 FEs and at 100 FEs, it becomes 1/10. The second-fastest run solves the problem after 200 FEs. The ECDF thus stays 0.1 until 199 FEs and at 200 FEs, it jumps to 0.2. And so on. This means that the value of the ECDF is always between 0 and 1. Nikolaus Hansen, Anne Auger, Steffen Finck, Raymond Ros. Real-Parameter Black-Box Optimization Benchmarking 2010: Experimental Setup. Research Report RR-7215, INRIA. 2010. inria-00462481. https://hal.inria.fr/inria-00462481/document/ Dave Andrew Douglas Tompkins and Holger H. Hoos. UBCSAT: An Implementation and Experimentation Environment for SLS Algorithms for SAT and MAX-SAT. In Revised Selected Papers from the Seventh International Conference on Theory and Applications of Satisfiability Testing (SAT’04), May 10-13, 2004, Vancouver, BC, Canada, pages 306-320. Lecture Notes in Computer Science (LNCS), volume 3542. Berlin, Germany: Springer-Verlag GmbH. ISBN: 3-540-27829-X. doi: https://doi.org/10.1007/11527695_24. Holger H. Hoos and Thomas Stützle. Evaluating Las Vegas Algorithms - Pitfalls and Remedies. In Gregory F. Cooper and Serafín Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI’98), July 24-26, 1998, Madison, WI, USA, pages 238-245. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. ISBN: 1-55860-555-X. Bases: The ECDF data. Create one single Ecdf record from an iterable of Progress records. the Ecdf record Compute one or multiple ECDFs from a stream of end results. f_goal ( consumer ( join_all_algorithms ( join_all_objectives ( join_all_encodings ( Record for EndResult as well as parsing, serialization, and parsing. When doing experiments with moptipy, you apply algorithm setups to problem instances. For each setup x instance combination, you may conduct a series of repetitions (so-called runs) with different random seeds. Each single run of an algorithm setup on a problem instances can produce a separate log file. From each log file, we can load a Bases: A csv parser for end results. Bases: A class for CSV writing of the description of best-F the description of the goal objective value the description of the last improvement FE the description of the last improvement time milliseconds a description of the budget as the maximum objective function evaluation a description of the budget in terms of maximum runtime a description of the random seed the description of the total FEs the total consumed time in milliseconds Bases: An immutable end result record of one run of one algorithm on one problem. This record provides the information of the outcome of one application of one algorithm to one problem instance in an immutable way. Get the index of the function evaluation when best_f was reached. the index of the function evaluation when best_f was reached Get the milliseconds when best_f was reached. the milliseconds when best_f was reached Get the total number of performed FEs. the total number of performed FEs Get the total time consumed by the run. the total time consumed by the run Parse a given CSV file to get Parse a given path and pass all end results found to the consumer. If path identifies a file with suffix .txt, then this file is parsed. The appropriate Via the parameters max_fes, max_time_millis, and goal_f, you can set virtual limits for the objective function evaluations, the maximum runtime, and the objective value. The There is one caveat when specifying max_time_millis: Let’s say that the log files only log improvements. Then you might have a log point for 7000 FEs, 1000ms, and f=100. The next log point could be 8000 FEs, 1200ms, and f=90. Now if your time limit specified is 1100ms, we know that the end result is f=100 (because f=90 was reached too late) and that the total runtime is 1100ms, as this is the limit you specified and it was also reached. But we do not know the number of consumed FEs. We know you consumed at least 7000 FEs, but you did not consume 8000 FEs. It would be wrong to claim that 7000 FEs were consumed, since it could have been more. We therefore set a virtual end point at 7999 FEs. In terms of performance metrics such as the path ( max_fes ( max_time_millis ( goal_f ( path_filter ( Produce a function that obtains the given dimension from EndResults. The following dimensions are supported: lastImprovementFE: totalFEs: totalTimeMillis: goalF: plainF, bestF: maxFEs: maxTimeMillis: fesPerTimeMilli: SampleStatistics aggregated over multiple instances of EndResult. The Bases: A csv parser for end results. Bases: A class for CSV writing of Render a single end result record to a CSV row. data ( the row strings Set up this csv writer based on existing data. data ( this writer Bases: Statistics over end results of one or multiple algorithm*instance setups. If one algorithm*instance is used, then algorithm and instance are defined. Otherwise, only the parameter which is the same over all recorded runs is defined. The statistics about the best encountered result. best_f / goal_f if goal_f is consistently defined and always positive. The ERT if FEs, while is inf if n_success=0, None if goal_f is None, and finite otherwise. The ERT if milliseconds, while is inf if n_success=0, None if goal_f is None, and finite otherwise. Get the statistics about the best objective value reached. the statistics about the best objective value reached Get the statistics about the scaled best objective value. the statistics about the scaled best objective value Get the statistics about the goal objective value. the statistics about the goal objective value Get the statistics about the last improvement FE. the statistics about the last improvement FE Get the statistics about the last improvement time millis. the statistics about the last improvement time millis Get the statistics about the maximum permitted FEs. the statistics about the maximum permitted FEs Get the statistics about the maximum permitted runtime in ms. the statistics about the maximum permitted runtime in ms Get the statistics about the FEs until success of the successful runs. the statistics about the FEs until success of the successful runs Get the statistics about the ms until success of the successful runs. the statistics about the ms until success of the successful runs Get the statistics about the total FEs. the statistics about the total FEs Get the statistics about the total time millis. the statistics about the total time millis The goal objective value. The statistics about the last improvement FE. The statistics about the last improvement time. The budget in FEs, if every run had one; None otherwise. The budget in milliseconds, if every run had one; None otherwise. The FEs to success, if n_success > 0, None otherwise. The time to success, if n_success > 0, None otherwise. The statistics about the total number of FEs. The statistics about the total time. The key for the best F. The key for the ERT in milliseconds. The key for the number of successful runs. The key for the success FEs. The key for the success time millis. Aggregate a stream of data into groups based on a parameter. param_value ( join_all_algorithms ( join_all_instances ( join_all_objectives ( join_all_encodings ( Create an EndStatistics Record from an Iterable of EndResult. the statistics Parse a CSV file and collect all encountered file ( the iterator with the results Aggregate statistics over a stream of end results. consumer ( join_all_algorithms ( join_all_instances ( join_all_objectives ( join_all_encodings ( Create a function that obtains the given dimension from EndStatistics. Store a set of data ( file ( the path to the generated CSV file Approximate the expected running time to reach certain goals. The (empirically estimated) Expected Running Time (ERT) tries to give an impression of how long an algorithm needs to reach a certain solution quality. The ERT for a problem instance is estimated as the ratio of the sum of all FEs that all the runs consumed until they either have discovered a solution of a given goal quality or exhausted their budget, divided by the number of runs that discovered a solution of the goal quality. The ERT is the mean expect runtime under the assumption of independent restarts after failed runs, which then may either succeed (consuming the mean runtime of the successful runs) or fail again (with the observed failure probability, after consuming the available budget). The ERT itself can be considered as a function that associates the estimated runtime given above to all possible solution qualities that can be attained by an algorithm for a give problem. For qualities/goals that an algorithm did not attain in any run, the ERT becomes infinite. Kenneth V. Price. Differential Evolution vs. The Functions of the 2nd ICEO. In Russ Eberhart, Peter Angeline, Thomas Back, Zbigniew Michalewicz, and Xin Yao, editors, IEEE International Conference on Evolutionary Computation, April 13-16, 1997, Indianapolis, IN, USA, pages 153-157. IEEE Computational Intelligence Society. ISBN: 0-7803-3949-5. doi: https://doi.org/10.1109/ICEC.1997.592287 Nikolaus Hansen, Anne Auger, Steffen Finck, Raymond Ros. Real-Parameter Black-Box Optimization Benchmarking 2010: Experimental Setup. Research Report RR-7215, INRIA. 2010. inria-00462481. https://hal.inria.fr/inria-00462481/document/ Bases: Estimate the Expected Running Time (ERT). Compute a single ERT. The ERT is the sum of the time that the runs spend with a best-so-far quality greater or equal than goal_f divided by the number of runs that reached goal_f. The idea is that the unsuccessful runs spent their complete computational budget and once they have terminated, we would immediately start a new, independent run. Warning: source must only contain progress objects that contain monotonously improving points. It must not contain runs that may get worse over time. the ERT Create one single Ert record from an iterable of Progress records. the Ert record Compute one or multiple ERTs from a stream of end results. f_lower_bound ( use_default_lower_bounds ( consumer ( join_all_algorithms ( join_all_instances ( join_all_objectives ( join_all_encodings ( Approximate the ECDF over the ERT to reach certain goals. The empirical cumulative distribution function (ECDF, see Now in the ERT-ECDF we combine both concepts to join several different optimization problems or problem instances into one plot. The goal becomes “solving the problem”. For each problem instance, we compute the ERT, i.e., estimate how long a given algorithm will need to reach the goal. This becomes the time axis. Over this time axis, the ERT-ECDF displays the fraction of instances that were solved. Thomas Weise, Zhize Wu, Xinlu Li, and Yan Chen. Frequency Fitness Assignment: Making Optimization Algorithms Invariant under Bijective Transformations of the Objective Function Value. IEEE Transactions on Evolutionary Computation 25(2):307-319. April 2021. Preprint available at arXiv:2001.01416v5 [cs.NE] 15 Oct 2020. http://arxiv.org/abs/2001.01416. doi: https://doi.org/10.1109/TEVC.2020.3032090 Bases: The ERT-ECDF. Create one single Ert-Ecdf record from an iterable of Progress records. the Ert-Ecdf record Compute one or multiple Ert-ECDFs from a stream of end results. f_goal ( consumer ( join_all_algorithms ( join_all_objectives ( join_all_encodings ( Load the encounter frequencies or the set of different objective values. This tool can load the different objective values that exist or are encountered by optimization processes. This may be useful for statistical evaluations or fitness landscape analyses. This tool is based on code developed by Mr. Tianyu LIANG (梁天宇), MSc student at the Institute of Applied Optimization (IAO, 应用优化研究所) of the School of Artificial Intelligence and Big Data (人工智能与大数据学院) of Hefei University (合肥大学). Parse a path, aggregate all discovered objective values to a consumer. A version of path ( consumer ( per_instance ( per_algorithm_instance ( report_progress ( report_lower_bound ( report_upper_bound ( report_h ( report_goal_f ( per_instance_known ( Parse a path, pass all discovered objective values per-run to a consumer. This function parses the log files in a directory recursively. For each log file, it produces a Counter filled with all encountered objective values and their “pseudo” encounter frequencies. “pseudo” because the values returned depend very much on how the function is configured. First, if all other parameters are set to False, the function passes a Counter to the consumer where the best encountered objective value has frequency 1 and no other data is present. If report_progress is True, then each time any objective value is encountered in the PROGRESS section, its counter is incremented by 1. If the PROGRESS section is present, that it. The best encountered objective value will have a count of at least one either way. If report_goal_f, report_lower_bound, or report_upper_bound are True, then it is ensured that the goal objective value of the optimization process, the lower bound of the objective function, or the upper bound of the objective function will have a corresponding count of at least 1 if they are present in the log files (in the SETUP section). If report_h is True, then a frequency fitness assignment H section is parsed, if present (see Generally, if we want the actual encounter frequencies of objective values, we could log all FEs to the log files and set report_progress to True and everything else to False. Then we get correct encounter frequencies. Alternatively, if we have a purly FFA-based algorithm (see, again, path ( consumer ( report_progress ( report_lower_bound ( report_upper_bound ( report_h ( report_goal_f ( per_instance_known ( Print the number of unique objective values to a CSV file. A version of input_dir ( output_file ( per_instance ( per_algorithm_instance ( report_lower_bound ( report_upper_bound ( report_goal_f ( per_instance_known ( Convert moptipy data to IOHanalyzer data. The IOHanalyzer (https://iohanalyzer.liacs.nl/) is a tool that can analyze the performance of iterative optimization heuristics in a wide variety of ways. It is available both for local installation as well as online for direct and free use (see, again, https://iohanalyzer.liacs.nl/). The IOHanalyzer supports many of the diagrams that our evaluation utilities provide - and several more. Here we provide the function Notice that we here have implemented the meta data format version “0.3.2 and below”, as described at https://iohprofiler.github.io/IOHanalyzer/data/#iohexperimenter-version-032-and-below. Carola Doerr, Furong Ye, Naama Horesh, Hao Wang, Ofer M. Shir, and Thomas Bäck. Benchmarking Discrete Optimization Heuristics with IOHprofiler. Applied Soft Computing 88(106027):1-21. March 2020. doi: https://doi.org/10.1016/j.asoc.2019.106027}, Carola Doerr, Hao Wang, Furong Ye, Sander van Rijn, and Thomas Bäck. IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics. October 15, 2018. New York, NY, USA: Cornell University, Cornell Tech. arXiv:1810.05281v1 [cs.NE] 11 Oct 2018. https://arxiv.org/pdf/1810.05281.pdf Hao Wang, Diederick Vermetten, Furong Ye, Carola Doerr, and Thomas Bäck. IOHanalyzer: Detailed Performance Analyses for Iterative Optimization Heuristics. ACM Transactions on Evolutionary Learning and Optimization 2(1)[3]:1-29. March 2022.doi: https://doi.org/10.1145/3510426. Jacob de Nobel and Furong Ye and Diederick Vermetten and Hao Wang and Carola Doerr and Thomas Bäck. IOHexperimenter: Benchmarking Platform for Iterative Optimization Heuristics. November 2021. New York, NY, USA: Cornell University, Cornell Tech. arXiv:2111.04077v2 [cs.NE] 17 Apr 2022. https://arxiv.org/pdf/2111.04077.pdf Data Format: Iterative Optimization Heuristics Profiler. https://iohprofiler.github.io/IOHanalyzer/data/ Convert moptipy log data to IOHanalyzer log data. results_dir ( dest_dir ( inst_name_to_func_id ( inst_name_to_dimension ( inst_name_to_inst_id ( suite ( f_name ( f_standard ( Parsers for structured log data produced by the moptipy experiment API. The moptipy Here we provide a skeleton for parsing such log files in form of the class For example in module Bases: A log parser following our pre-defined experiment structure. Decide whether to start parsing a file and setup meta-data. path ( True if the file should be parsed, False if it should be skipped (and Bases: A log parser can parse a log file and separate the sections. The log parser is designed to load data from text files generated by Enter a directory to parse all files inside. This method is called by End a file. This method is invoked when we have reached the end of the current file. Its return value, True or False, will then be returned by the return value to be returned by Consume all the lines from a section. This method receives the complete text of a section, where all lines are separated and put into one list lines. Each line is stripped from whitespace and comments, empty lines are omitted. If this method returns True, we will continue parsing the file and move to the next section, if any, or directly to the end of the file parsing process (and call True if further parsing is necessary and the next section should be fed to Parse either a directory or a file. If path identifies a file, path ( the result of the appropriate parsing routing ValueError – if path does not identify a directory or file Recursively parse the given directory. path ( True either if Parse the contents of a file. This method first calls the function This method can either be called directly or is called by path ( the return value received from invoking Enter a directory to parse all files inside. This method is called by Decide whether to start parsing a file. This method is called by path ( True if the file should be parsed, False if it should be skipped (and Start a section. If this method returns True, then all the lines of text of the section title will be read and together passed to title ( True if the section data should be loaded and passed to Bases: A log parser which loads and processes the basic data from the logs. This parser processes the SETUP and STATE sections of a log file and stores the performance-related information in member variables. Finalize parsing a file and invoke the This method invokes the True if parsing should be continued, False otherwise the objective function evaluation when the last improvement happened, in milliseconds the time step when the last improvement happened, in milliseconds Process the lines loaded from a section. If you process more sections, you should override this method. Your overridden method then can parse the data if you are in the right section. It should end with return super().lines(lines). Check whether we need to process more lines. You can overwrite this method if your parser parses additional log sections. Your overwritten method should return True if more sections except STATE and SETUP still need to be parsed and return super().needs_more_lines() otherwise. True if more data needs to be processed, False otherwise Process the result of the log parsing. This function is invoked by Plot a set of ECDF or ERT-ECDF objects into one figure. The empirical cumulative distribution function (ECDF, see Nikolaus Hansen, Anne Auger, Steffen Finck, Raymond Ros. Real-Parameter Black-Box Optimization Benchmarking 2010: Experimental Setup. Research Report RR-7215, INRIA. 2010. inria-00462481. https://hal.inria.fr/inria-00462481/document/ Dave Andrew Douglas Tompkins and Holger H. Hoos. UBCSAT: An Implementation and Experimentation Environment for SLS Algorithms for SAT and MAX-SAT. In Revised Selected Papers from the Seventh International Conference on Theory and Applications of Satisfiability Testing (SAT’04), May 10-13, 2004, Vancouver, BC, Canada, pages 306-320. Lecture Notes in Computer Science (LNCS), volume 3542. Berlin, Germany: Springer-Verlag GmbH. ISBN: 3-540-27829-X. doi: https://doi.org/10.1007/11527695_24. Holger H. Hoos and Thomas Stützle. Evaluating Las Vegas Algorithms - Pitfalls and Remedies. In Gregory F. Cooper and Serafín Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI’98), July 24-26, 1998, Madison, WI, USA, pages 238-245. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. ISBN: 1-55860-555-X. Plot a set of ECDF functions into one chart. x_axis ( y_axis ( legend ( distinct_colors_func ( distinct_line_dashes_func ( importance_to_line_width_func ( importance_to_alpha_func ( importance_to_font_size_func ( x_grid ( y_grid ( x_label ( x_label_inside ( y_label ( y_label_inside ( algorithm_priority ( goal_priority ( algorithm_namer ( color_algorithms_as_fallback_group ( algorithm_sort_key ( goal_sort_key ( the axes object to allow you to add further plot elements Violin plots for end results. Plot a set of end result boxes/violins functions into one chart. In this plot, we combine two visualizations of data distributions: box plots in the foreground and violin plots in the background. The box plots show you the median, the 25% and 75% quantiles, the 95% confidence interval around the median (as notches), the 5% and 95% quantiles (as whiskers), the arithmetic mean (as triangle), and the outliers on both ends of the spectrum. This allows you also to compare data from different distributions rather comfortably, as you can, e.g., see whether the confidence intervals overlap. The violin plots in the background are something like smoothed-out, vertical, and mirror-symmetric histograms. They give you a better impression about shape and modality of the distribution of the results. end_results ( dimension ( y_axis ( distinct_colors_func ( importance_to_line_width_func ( importance_to_font_size_func ( y_grid ( x_grid ( x_label ( x_label_inside ( x_label_location ( y_label ( y_label_inside ( y_label_location ( legend_pos ( instance_sort_key ( algorithm_sort_key ( instance_namer ( algorithm_namer ( the axes object to allow you to add further plot elements Plot the end results over a parameter. Plot a series of end result statistics over a parameter. data ( x_getter ( y_dim ( algorithm_getter ( instance_getter ( x_axis ( y_axis ( legend ( legend_pos ( distinct_colors_func ( distinct_line_dashes_func ( importance_to_line_width_func ( importance_to_font_size_func ( x_grid ( y_grid ( x_label ( x_label_inside ( x_label_location ( y_label ( y_label_inside ( y_label_location ( instance_priority ( algorithm_priority ( stat_priority ( instance_sort_key ( algorithm_sort_key ( instance_namer ( algorithm_namer ( stat_sort_key ( color_algorithms_as_fallback_group ( the axes object to allow you to add further plot elements Plot a set of The (empirically estimated) Expected Running Time (ERT, see Kenneth V. Price. Differential Evolution vs. The Functions of the 2nd ICEO. In Russ Eberhart, Peter Angeline, Thomas Back, Zbigniew Michalewicz, and Xin Yao, editors, IEEE International Conference on Evolutionary Computation, April 13-16, 1997, Indianapolis, IN, USA, pages 153-157. IEEE Computational Intelligence Society. ISBN: 0-7803-3949-5. doi: https://doi.org/10.1109/ICEC.1997.592287 Nikolaus Hansen, Anne Auger, Steffen Finck, Raymond Ros. Real-Parameter Black-Box Optimization Benchmarking 2010: Experimental Setup. Research Report RR-7215, INRIA. 2010. inria-00462481. https://hal.inria.fr/inria-00462481/document/ Plot a set of Ert functions into one chart. x_axis ( y_axis ( legend ( distinct_colors_func ( distinct_line_dashes_func ( importance_to_line_width_func ( importance_to_alpha_func ( importance_to_font_size_func ( x_grid ( y_grid ( x_label ( x_label_inside ( y_label ( y_label_inside ( instance_sort_key ( algorithm_sort_key ( instance_namer ( algorithm_namer ( instance_priority ( algorithm_priority ( the axes object to allow you to add further plot elements Plot a set of Progress or StatRun objects into one figure. Plot a set of progress or statistical run lines into one chart. progresses ( x_axis ( y_axis ( legend ( distinct_colors_func ( distinct_line_dashes_func ( importance_to_line_width_func ( importance_to_alpha_func ( importance_to_font_size_func ( x_grid ( y_grid ( x_label ( x_label_inside ( x_label_location ( y_label ( y_label_inside ( y_label_location ( instance_priority ( algorithm_priority ( stat_priority ( instance_sort_key ( algorithm_sort_key ( stat_sort_key ( color_algorithms_as_fallback_group ( instance_namer ( algorithm_namer ( the axes object to allow you to add further plot elements Objects embodying the progress of a run over time. An instance of Bases: An immutable record of progress information over a single run. Parse a given path and pass all progress data found to the consumer. If path identifies a file with suffix .txt, then this file is parsed. The appropriate path ( consumer ( time_unit ( f_name ( f_standard ( only_improvements ( A tool for selecting a consistent subset of data from partial experiments. When we have partial experimental data, maybe collected from experiments that are still ongoing, we want to still evaluate them in some consistent way. The right method for doing this could be to select a subset of that data that is consistent, i.e., a subset where the algorithms have the same number of runs on the instances using the same seeds. The function The current method to select the data is rather heuristic. It always begins with the full set of data and aims to delete the element that will cause the least other deletions down the road, until we arrive in a consistent state. I strongly suspect that doing this perfectly would be NP-hard, so we cannot implement this. Instead, we use different heuristics and then pick the best result. the type variable for the selector routine alias of TypeVar(‘T’, bound= Select data such that the numbers of runs are consistent. The input is a set of data items which represent some records over the runs of algorithms on instances. It may be that not all algorithms have been applied to all instances. Maybe the number of runs is inconsistent over the algorithm-instance combinations, too. Maybe some algorithms have more runs on some instances. Maybe the runs are even different, it could be that some algorithms have runs for seed A, B, and C on instance I, while others have runs for seed C and D. This function is designed to retain only the runs with seed C in such a case. It may discard algorithms or instances or algorithm-instance-seed combinations in order to obtain a selection of data where all algorithms have been applied to all instances as same as often and using the same seeds. Now there are different ways to select such consistent subsets of a dataset. Of course, we want to select the data such that as much as possible of the data is retained and as little as possible is discarded. This may be a hard optimization problem in itself. Here, we offer a heuristic solution. Basically, we step-by-step try to cut away the setups that are covered by the least amount of runs. We keep repeating this until we arrive in a situation where all setups have the same amount of runs. We then check if there were some strange symmetries that still make the data inconsistent and, if we found some, try to delete one run to break the symmetries and then repeat the cleaning-up process. In the end, we should get a list of overall consistent data elements that can be used during a normal experiment evaluation procedure. This iterative process may be rather slow on larger datasets, but it is maybe the best approximation we can offer to retain as much data as possible. data ( log ( thorough ( a list with the selected data Statistic runs are time-depending statistics over several runs. The statistics key for the arithmetic mean. The statistics key for the geometric mean. The key for the arithmetic mean minus the standard deviation. The key for the arithmetic mean plus the standard deviation. The key for the 15.9% quantile. In a normal distribution, this quantile is where “mean - standard deviation” is located- The key for the 84.1% quantile. In a normal distribution, this quantile is where “mean + standard deviation” is located- The statistics key for the standard deviation Bases: A time-value statistic over a set of runs. Compute statistics from an iterable of Progress objects. Aggregate statist runs over a stream of progress data. statistics ( consumer ( join_all_algorithms ( join_all_instances ( join_all_objectives ( join_all_encodings ( Get the statistic of a given object. obj ( the statistic string, or None if no statistic is specified Styler allows to discover groups of data and associate styles with them. Bases: A class for determining groups of elements and styling them. Provides function the default algorithm-instance statistics the default algorithm summary statistics Get the command-based names for columns, but in command format. This function returns LaTeX-style commands for the column headers. col ( put_dollars ( summary_name ( setup_name ( the column name Get a function to compute the best value in a column. The returned function can compute the best value in a column. If no value is best, it should return nan. Get the default name for columns. Get the number renderer for the specified column. Time columns are rendered with less precision. col ( the number renderer Tabulate the statistics about the end results of an experiment. A two-part table is produced. In the first part, it presents summary statistics about each instance-algorithm combination, sorted by instance. In the second part, it presents summary statistics of the algorithms over all instances. The following default columns are provided: I: the instance name lb(f): the lower bound of the objective value of the instance setup: the name of the algorithm or algorithm setup best: the best objective value reached by any run on that instance mean: the arithmetic mean of the best objective values reached over all runs sd: the standard deviation of the best objective values reached over all runs mean1: the arithmetic mean of the best objective values reached over all runs, divided by the lower bound (or goal objective value) performed per millisecond, over all runs mean(t): the arithmetic mean of the time in milliseconds when the last improving move of a run was applied, over all runs setup: the name of the algorithm or algorithm setup best1: the minimum of the best objective values reached divided by the lower bound (or goal objective value) over all runs divided by the lower bound (or goal objective value) over all runs worst1: the maximum of the best objective values reached divided by the lower bound (or goal objective value) over all runs divided by the lower bound (or goal objective value) over all runs gmean(FE/ms): the geometric mean of objective function evaluations performed per millisecond, over all runs gmean(t): the geometric mean of the time in milliseconds when the last improving move of a run was applied, over all runs You can freely configure which columns you want for each part and whether you want to have the second part included. Also, for each group of values, the best one is marked in bold face. Depending on the parameter text_format_driver, the tables can be rendered in different formats, such as file_name ( dir_name ( algorithm_instance_statistics ( algorithm_summary_statistics ( text_format_driver ( algorithm_sort_key ( instance_sort_key ( col_namer ( col_best ( col_renderer ( put_lower_bound ( lower_bound_getter ( lower_bound_name ( use_lang ( instance_namer ( algorithm_namer ( the path to the file with the tabulated end results Make an LaTeX end-statistics table with column wrapping. Make a table of end statistics that can wrap multiple pages, if need be. data ( dest ( n_wrap ( max_rows ( stats ( instance_get ( instance_name ( instance_sort_key ( algorithm_get ( algorithm_name ( algorithm_sort_key ( instance_cols ( best_format ( instance_header ( best_count_header ( Provides The function Daniel F. Bauer. Constructing Confidence Sets Using Rank Statistics. In Journal of the American Statistical Association. 67(339):687-690. September 1972. doi: https://doi.org/10.1080/01621459.1972.10481279. Sidney Siegel and N. John Castellan Jr. Nonparametric Statistics for The Behavioral Sciences. 1988 In the Humanities/Social Sciences/Languages series. New York, NY, USA: McGraw-Hill. ISBN: 0-07-057357-3. Myles Hollander and Douglas Alan Wolfe. Nonparametric Statistical Methods. 1973. New York, NY, USA: John Wiley and Sons Ltd. ISBN: 047140635X. Olive Jean Dunn. Multiple Comparisons Among Means. In Journal of the American Statistical Association. 56(293):52-64. March 1961. doi: https://doi.org/10.1080/01621459.1961.10482090. Tabulate the results of statistical comparisons of end result qualities. end_results contains a sequence of If p is sufficiently small, this means that it is unlikely that the difference in performance of the two compared algorithms that was observed stems from randomness. But what does “sufficiently small” mean? As parameter, this function accepts a significance threshold 0<alpha<0.5. alpha is, so to say, the upper limit of the “probability to be wrong” if we claim something like “algorithm A is better than algorithm B” that we are going to accept. In other words, if the table says that algorithm A is better than algorithm B, the chance that this is wrong is not more than alpha. However, if we do many such tests, our chance to make at least one mistake grows. If we do n_tests tests, then the chance that all of them are “right” would be 1-[(1-alpha)^n_tests]. Since we are going to do multiple tests, the Bonferroni correction is therefore applied and alpha’=alpha/n_tests is computed. Then, the chance to have at least one of the n_tests test results to be wrong is not higher than alpha. The test results are presented as follows: The first column of the generated table denotes the problem instances. Each of the other columns represents a pair of algorithms. In each cell, the pair is compared based on the results on the instance of the row. The cell ten holds the p-value of the two-tailed Mann-Whitney U test. If the first algorithm is significantly better (at p<alpha’) than the second algorithm, then the cell is marked with <. If the first algorithm is significantly worse (at p<alpha’) than the second algorithm, then the cell is marked with >. If the observed differences are not significant (p>=alpha’), then the cell is marked with ?. However, there could also be a situation where a statistical comparison makes no sense as no difference could reliably be detected anyway. For example, if one algorithm has a smaller median result but a larger mean result, or if the medians are the same, or if the means are the same. Regardless of what outcome a test would have, we could not really claim that any of the algorithms was better or worse. In such cases, no test is performed and - is printed instead (signified by &mdash; in the markdown format). Finally, the bottom row sums up the numbers of <, ?, and > outcomes for each algorithm pair. Depending on the parameter text_format_driver, the tables can be rendered in different formats, such as file_name ( dir_name ( alpha ( text_format_driver ( algorithm_sort_key ( instance_sort_key ( instance_namer ( algorithm_namer ( use_lang ( p_renderer ( value_getter ( the path to the file with the tabulated test resultsmoptipy.api
, it is possible to log the progress or end results of optimization algorithms runs in text-based log files. With the methods in this package here, you can load and evaluate them. This usually follows a multi-step approach: For example, you can first extract the end results from several algorithms and instances into a single file via the EndResult
. This could then be processed to per-algorithm or per-instance statistics using EndStatistics
.Submodules¶
moptipy.evaluation.axis_ranger module¶
object
str
) – the axis type name, supporting “ms”, “FEs”, “plainF”, “scaledF”, and “normalizedF”float
| None
, default: None
) – the chosen minimumfloat
| None
, default: None
) – the chosen maximumbool
| None
, default: None
) – should the data minimum be usedbool
| None
, default: None
) – should the data maximum be usedbool
| None
, default: None
) – the log scale indicatorfloat
| None
, default: None
) – the chosen minimumfloat
| None
, default: None
) – the chosen maximumbool
| None
, default: None
) – should the data minimum be usedbool
| None
, default: None
) – should the data maximum be usedbool
| None
, default: None
) – the log scale indicatorfor_axis()
with the provided defaultsregister_value()
. It can only work if the end(s) chosen for padding are in “detect” mode and the other end is either in “detect” or “chosen” mode.register_value()
register_array()
) and before calling apply()
.moptipy.evaluation.base module¶
object
MultiRunData
>>> p = MultiRun2DData("a", "i", "f", None, 3,
... TIME_UNIT_FES, F_NAME_SCALED)
>>> p.instance
'i'
>>> p.algorithm
'a'
>>> p.objective
'f'
>>> print(p.encoding)
None
>>> p.n
3
>>> print(p.time_unit)
FEs
>>> print(p.f_name)
scaledF
>>> try:
... MultiRun2DData("a", "i", "f", None, 3,
... 3, F_NAME_SCALED)
... except TypeError as te:
... print(te)
time_unit should be an instance of str but is int, namely 3.
>>> try:
... MultiRun2DData("a", "i", "f", None, 3,
... "sdfjsdf", F_NAME_SCALED)
... except ValueError as ve:
... print(ve)
Invalid time unit 'sdfjsdf', only 'FEs' and 'ms' are permitted.
>>> try:
... MultiRun2DData("a", "i", "f", None, 3,
... TIME_UNIT_FES, True)
... except TypeError as te:
... print(te)
f_name should be an instance of str but is bool, namely True.
>>> try:
... MultiRun2DData("a", "i", "f", None, 3,
... TIME_UNIT_FES, "blablue")
... except ValueError as ve:
... print(ve)
Invalid f name 'blablue', only 'plainF', 'scaledF', and 'normalizedF' are permitted.
EvaluationDataElement
>>> p = MultiRunData("a", "i", "f", None, 3)
>>> p.instance
'i'
>>> p.algorithm
'a'
>>> p.objective
'f'
>>> print(p.encoding)
None
>>> p.n
3
>>> p = MultiRunData(None, None, None, "x", 3)
>>> print(p.instance)
None
>>> print(p.algorithm)
None
>>> print(p.objective)
None
>>> p.encoding
'x'
>>> p.n
3
>>> try:
... MultiRunData(1, "i", "f", "e", 234)
... except TypeError as te:
... print(te)
algorithm name should be an instance of any in {None, str} but is int, namely 1.
>>> try:
... MultiRunData("x x", "i", "f", "e", 234)
... except ValueError as ve:
... print(ve)
Invalid algorithm name 'x x'.
>>> try:
... MultiRunData("a", 5.5, "f", "e", 234)
... except TypeError as te:
... print(te)
instance name should be an instance of any in {None, str} but is float, namely 5.5.
>>> try:
... MultiRunData("x", "a-i", "f", "e", 234)
... except ValueError as ve:
... print(ve)
Invalid instance name 'a-i'.
>>> try:
... MultiRunData("a", "i", True, "e", 234)
... except TypeError as te:
... print(te)
objective name should be an instance of any in {None, str} but is bool, namely True.
>>> try:
... MultiRunData("xx", "i", "d'@f", "e", 234)
... except ValueError as ve:
... print(ve)
Invalid objective name "d'@f".
>>> try:
... MultiRunData("yy", "i", "f", -9.4, 234)
... except TypeError as te:
... print(te)
encoding name should be an instance of any in {None, str} but is float, namely -9.4.
>>> try:
... MultiRunData("xx", "i", "f", "e-{a", 234)
... except ValueError as ve:
... print(ve)
Invalid encoding name 'e-{a'.
>>> try:
... MultiRunData("x", "i", "f", "e", -1.234)
... except TypeError as te:
... print(te)
n should be an instance of int but is float, namely -1.234.
>>> try:
... MultiRunData("xx", "i", "f", "e", 1_000_000_000_000_000_000_000)
... except ValueError as ve:
... print(ve)
n=1000000000000000000000 is invalid, must be in 1..1000000000000000.
EvaluationDataElement
>>> p = PerRunData("a", "i", "f", None, 234)
>>> p.instance
'i'
>>> p.algorithm
'a'
>>> p.objective
'f'
>>> print(p.encoding)
None
>>> p.rand_seed
234
>>> p = PerRunData("a", "i", "f", "e", 234)
>>> p.instance
'i'
>>> p.algorithm
'a'
>>> p.objective
'f'
>>> p.encoding
'e'
>>> p.rand_seed
234
>>> try:
... PerRunData(3, "i", "f", "e", 234)
... except TypeError as te:
... print(te)
algorithm name should be an instance of str but is int, namely 3.
>>> try:
... PerRunData("@1 2", "i", "f", "e", 234)
... except ValueError as ve:
... print(ve)
Invalid algorithm name '@1 2'.
>>> try:
... PerRunData("x", 3.2, "f", "e", 234)
... except TypeError as te:
... print(te)
instance name should be an instance of str but is float, namely 3.2.
>>> try:
... PerRunData("x", "sdf i", "f", "e", 234)
... except ValueError as ve:
... print(ve)
Invalid instance name 'sdf i'.
>>> try:
... PerRunData("a", "i", True, "e", 234)
... except TypeError as te:
... print(te)
objective name should be an instance of str but is bool, namely True.
>>> try:
... PerRunData("x", "i", "d-f", "e", 234)
... except ValueError as ve:
... print(ve)
Invalid objective name 'd-f'.
>>> try:
... PerRunData("x", "i", "f", 54.2, 234)
... except TypeError as te:
... print(te)
encoding name should be an instance of any in {None, str} but is float, namely 54.2.
>>> try:
... PerRunData("y", "i", "f", "x x", 234)
... except ValueError as ve:
... print(ve)
Invalid encoding name 'x x'.
>>> try:
... PerRunData("x", "i", "f", "e", 3.3)
... except TypeError as te:
... print(te)
rand_seed should be an instance of int but is float, namely 3.3.
>>> try:
... PerRunData("x", "i", "f", "e", -234)
... except ValueError as ve:
... print(ve)
rand_seed=-234 is invalid, must be in 0..18446744073709551615.
Any
) – the name of the objective function dimension>>> check_f_name("plainF")
'plainF'
>>> check_f_name("scaledF")
'scaledF'
>>> check_f_name("normalizedF")
'normalizedF'
>>> try:
... check_f_name(1.0)
... except TypeError as te:
... print(te)
f_name should be an instance of str but is float, namely 1.0.
>>> try:
... check_f_name("oops")
... except ValueError as ve:
... print(ve)
Invalid f name 'oops', only 'plainF', 'scaledF', and 'normalizedF' are permitted.
>>> check_time_unit("FEs")
'FEs'
>>> check_time_unit("ms")
'ms'
>>> try:
... check_time_unit(1)
... except TypeError as te:
... print(te)
time_unit should be an instance of str but is int, namely 1.
>>> try:
... check_time_unit("blabedibla")
... except ValueError as ve:
... print(ve)
Invalid time unit 'blabedibla', only 'FEs' and 'ms' are permitted.
PerRunData
| MultiRunData
) – the object>>> p1 = MultiRunData("a1", "i1", "f", "y", 3)
>>> get_algorithm(p1)
'a1'
>>> p2 = PerRunData("a2", "i2", "y", None, 31)
>>> get_algorithm(p2)
'a2'
PerRunData
| MultiRunData
) – the object>>> p1 = MultiRunData("a", "i1", None, "x", 3)
>>> get_instance(p1)
'i1'
>>> p2 = PerRunData("a", "i2", "f", "x", 31)
>>> get_instance(p2)
'i2'
>>> for s in motipy_footer_bottom_comments(None, "bla"):
... print(s[:49])
This data has been generated with moptipy version
bla
You can find moptipy at https://thomasweise.githu
>>> for s in motipy_footer_bottom_comments(None, None):
... print(s[:49])
This data has been generated with moptipy version
You can find moptipy at https://thomasweise.githu
PerRunData
| MultiRunData
) – the object>>> p1 = MultiRunData("a1", "i1", "f", None, 3)
>>> p2 = PerRunData("a2", "i2", "f", None, 31)
>>> sort_key(p1) < sort_key(p2)
True
>>> sort_key(p1) >= sort_key(p2)
False
>>> p3 = MultiRun2DData("a", "i", "f", None, 3,
... TIME_UNIT_FES, F_NAME_SCALED)
>>> sort_key(p3) < sort_key(p1)
True
>>> sort_key(p3) >= sort_key(p1)
False
moptipy.evaluation.ecdf module¶
Ecdf
to reach certain goals.MultiRun2DData
Union
[int
, float
, Callable
[[str
], int
| float
], Iterable
[Union
[int
, float
, Callable
]], None
], default: None
) – one or multiple goal valuesCallable
[[Ecdf
], Any
]) – the destination to which the new records will be passed, can be the append method of a list
bool
, default: False
) – should the Ecdf be aggregated over all algorithmsbool
, default: False
) – should the Ecdf be aggregated over all objective functionsbool
, default: False
) – should the Ecdf be aggregated over all encodingsmoptipy.evaluation.end_results module¶
EndResult
instance, which represents, well, the end result of the run, i.e., information such as the best solution quality reached, when it was reached, and the termination criterion. These end result records then can be the basis for, e.g., computing summary statistics via end_statistics
or for plotting the end result distribution via plot_end_results
.object
object
EndResult
.PerRunData
EndResult
Records.EndResult
is created and appended to the collector. If path identifies a directory, then this directory is parsed recursively for each log file found, one record is passed to the consumer. As consumer, you could pass any callable that accepts instances of EndResult
, e.g., the append method of a list
.EndResult
records will then not represent the actual final state of the runs but be synthesized from the logged progress information. This, of course, requires such information to be present. It will also raise a ValueError if the goals are invalid, e.g., if a runtime limit is specified that is before the first logged points.ert
, this would be the most conservative choice in that it does not over-estimate the speed of the algorithm. It can, however, lead to very big deviations from the actual values. For example, if your algorithm quickly converged to a local optimum and there simply is no log point that exceeds the virtual time limit but the original run had a huge FE-based budget while your virtual time limit was small, this could lead to an estimate of millions of FEs taking part within seconds…str
) – the path to parseUnion
[int
, None
, Callable
[[str
, str
], int
| None
]], default: None
) – the maximum FEs, a callable to compute the maximum FEs from the algorithm and instance name, or None if unspecifiedUnion
[int
, None
, Callable
[[str
, str
], int
| None
]], default: None
) – the maximum runtime in milliseconds, a callable to compute the maximum runtime from the algorithm and instance name, or None if unspecifiedUnion
[int
, float
, None
, Callable
[[str
, str
], int
| float
| None
]], default: None
) – the goal objective value, a callable to compute the goal objective value from the algorithm and instance name, or None if unspecifiedOptional
[Callable
[[Path
], bool
]], default: None
) – a filter allowing us to skip paths or files. If this Callable
returns True, the file or directory is considered for parsing. If it returns False, it is skipped.last_improvement_fe
total_fes
total_time_millis
goal_f
best_f
max_fes
max_time_millis
total_fes
/total_time_millis
moptipy.evaluation.end_statistics module¶
end_results
records hold the final result of a run of an optimization algorithm on a problem instance. Often, we do not want to compare these single results directly, but instead analyze summary statistics, such as the mean best objective value found. For this purpose, EndStatistics
exists. It summarizes the singular results from the runs into a record with the most important statistics.object
object
EndStatistics
.EndStatistics
) – the end result recordIterable
[EndStatistics
]) – the data to setup withMultiRunData
SampleStatistics
¶SampleStatistics
| None
¶int
| float
| None
¶int
| float
| None
¶SampleStatistics
| int
| float
| None
SampleStatistics
| int
| None
SampleStatistics
| int
| None
SampleStatistics
| int
| float
| None
¶SampleStatistics
¶SampleStatistics
¶SampleStatistics
| int
| None
¶SampleStatistics
| int
| None
¶SampleStatistics
| None
¶SampleStatistics
| None
¶SampleStatistics
¶SampleStatistics
¶Callable
[[EndResult
], int
| float
]) – the function obtaining a parameter valuebool
, default: False
) – should the statistics be aggregated over all algorithmsbool
, default: False
) – should the statistics be aggregated over all algorithmsbool
, default: False
) – should the statistics be aggregated over all objectives?bool
, default: False
) – should statistics be aggregated over all encodingstuple
[Callable
[[EndStatistics
], int
| float
], Iterable
[EndStatistics
]]EndStatistics
.str
) – the file to parseCallable
[[EndStatistics
], Any
]) – the destination to which the new records will be sent, can be the append method of a list
bool
, default: False
) – should the statistics be aggregated over all algorithmsbool
, default: False
) – should the statistics be aggregated over all algorithmsbool
, default: False
) – should the statistics be aggregated over all objectives?bool
, default: False
) – should statistics be aggregated over all encodingsEndStatistics
in a CSV file.Union
[EndStatistics
, Iterable
[EndStatistics
]]) – the data to storestr
) – the file to generatemoptipy.evaluation.ert module¶
MultiRun2DData
>>> from moptipy.evaluation.progress import Progress as Pr
>>> from numpy import array as a
>>> f = "plainF"
>>> t = "FEs"
>>> r = [Pr("a", "i", "f", "e", 1, a([1, 4, 8]), t, a([10, 8, 5]), f),
... Pr("a", "i", "f", "e", 2, a([1, 3, 6]), t, a([9, 7, 4]), f),
... Pr("a", "i", "f", "e", 3, a([1, 2, 7, 9]), t, a([8, 7, 6, 3]), f),
... Pr("a", "i", "f", "e", 4, a([1, 12]), t, a([9, 3]), f)]
>>> print(compute_single_ert(r, 11))
1.0
>>> print(compute_single_ert(r, 10))
1.0
>>> print(compute_single_ert(r, 9.5)) # (4 + 1 + 1 + 1) / 4 = 1.75
1.75
>>> print(compute_single_ert(r, 9)) # (4 + 1 + 1 + 1) / 4 = 1.75
1.75
>>> print(compute_single_ert(r, 8.5)) # (4 + 3 + 1 + 12) / 4 = 5
5.0
>>> print(compute_single_ert(r, 8)) # (4 + 3 + 1 + 12) / 4 = 5
5.0
>>> print(compute_single_ert(r, 7.3)) # (8 + 3 + 2 + 12) / 4 = 6.25
6.25
>>> print(compute_single_ert(r, 7)) # (8 + 3 + 2 + 12) / 4 = 6.25
6.25
>>> print(compute_single_ert(r, 6.1)) # (8 + 6 + 7 + 12) / 4 = 8.25
8.25
>>> print(compute_single_ert(r, 6)) # (8 + 6 + 7 + 12) / 4 = 8.25
8.25
>>> print(compute_single_ert(r, 5.7)) # (8 + 6 + 9 + 12) / 4 = 8.75
8.75
>>> print(compute_single_ert(r, 5)) # (8 + 6 + 9 + 12) / 4 = 8.75
8.75
>>> print(compute_single_ert(r, 4.2)) # (8 + 6 + 9 + 12) / 3 = 11.666...
11.666666666666666
>>> print(compute_single_ert(r, 4)) # (8 + 6 + 9 + 12) / 3 = 11.666...
11.666666666666666
>>> print(compute_single_ert(r, 3.8)) # (8 + 6 + 9 + 12) / 2 = 17.5
17.5
>>> print(compute_single_ert(r, 3)) # (8 + 6 + 9 + 12) / 2 = 17.5
17.5
>>> print(compute_single_ert(r, 2.9))
inf
>>> print(compute_single_ert(r, 2))
inf
float
| None
, default: None
) – the lower bound for the objective valuebool
, default: True
) – should we use the default lower boundsCallable
[[Ert
], Any
]) – the destination to which the new records will be passed, can be the append method of a list
bool
, default: False
) – should the Ert be aggregated over all algorithmsbool
, default: False
) – should the Ert be aggregated over all algorithmsbool
, default: False
) – should the statistics be aggregated over all objective functions?bool
, default: False
) – should the statistics be aggregated over all encodings?moptipy.evaluation.ertecdf module¶
ecdf
) is a function that shows the fraction of runs that were successful in attaining a certain goal objective value over the time. The (empirically estimated) Expected Running Time (ERT, see ert
) is a function that tries to give an estimate how long a given algorithm setup will need (y-axis) to achieve given solution qualities (x-axis). It uses a set of runs of the algorithm on the problem to make this estimate under the assumption of independent restarts.Ecdf
Union
[int
, float
, Callable
[[str
], int
| float
], Iterable
[Union
[int
, float
, Callable
]], None
], default: None
) – one or multiple goal valuesCallable
[[Ecdf
], Any
]) – the destination to which the new records will be passed, can be the append method of a list
bool
, default: False
) – should the Ecdf be aggregated over all algorithmsbool
, default: False
) – should the Ecdf be aggregated over all objective functionsbool
, default: False
) – should the Ecdf be aggregated over all encodingsmoptipy.evaluation.frequency module¶
from_logs()
that aggregates results on a per-instance and/or per-algorithm-instance combination. The basic process of loading the data is described in from_logs()
.str
) – the path to parseCallable
[[MultiRunData
, Counter
[int
| float
]], Any
]) – the consumer receiving the aggregated resultsbool
, default: True
) – pass results to the consumer that are aggregated over all algorithms and setups and runs for a given instancebool
, default: True
) – pass results to the consumer that are aggregated over all runs and setups for a given algorithm-instance combinationbool
, default: True
) – see from_logs()
bool
, default: False
) – see from_logs()
bool
, default: False
) – see from_logs()
bool
, default: True
) – see from_logs()
bool
, default: False
) – see from_logs()
Callable
[[str
], Iterable
[int
| float
]], default: <function <lambda> at 0x7f668580fba0>
) – see from_logs()
fea1plus1
). Such a section contains tuples of objective values and encounter frequencies. These encounter frequencies are added to the counter. This means that if you set both report_progress and report_h to True, you will get frequencies that are too high. Finally, the function per_instance_known may return a set of known objective values for a given instance (based on its parameter, the instance name). Each such objective value will have a frequency of at least 1.fea1plus1
), then we can set report_progress to True and everything else to False to get a similar result, but the encounter frequencies then depend on the selection scheme. Alternatively, if we only care about whether an objective value was encountered or not, we can simply set both to True. Finally, if we want to get all possible objective values, then we may also set report_goal_f, report_lower_bound, or report_upper_bound to True if we are sure that the corresponding objective values do actually exist (and are not just bounds that can never be reached).str
) – the path to parseCallable
[[PerRunData
, Counter
[int
| float
]], Any
]) – the consumer receiving, for each log file, an instance of PerRunData
identifying the run and a dictionary with the objective values and lower bounds of their existence or encounter frequency. Warning: The dictionary will be cleared and re-used for all files.bool
, default: True
) – should all values in the PROGRESS section be reported, if such section exists?bool
, default: False
) – should the lower bound reported, if any lower bound for the objective function is listed?bool
, default: False
) – should the upper bound reported, if any upper bound for the objective function is listed?bool
, default: True
) – should all values in the H section be reported, if such section exists?bool
, default: False
) – should we report the goal objective value, if it is specified?Callable
[[str
], Iterable
[int
| float
]], default: <function <lambda> at 0x7f668580f2e0>
) – a function that returns a set of known objective values per instanceaggregate_from_logs()
that collects the existing objective values and prints an overview to a file.str
) – the path to parsestr
) – the output file to generatebool
, default: True
) – pass results to the consumer that are aggregated over all algorithms and setups and runs for a given instancebool
, default: True
) – pass results to the consumer that are aggregated over all runs and setups for a given algorithm-instance combinationbool
, default: False
) – see from_logs()
bool
, default: False
) – see from_logs()
bool
, default: False
) – see from_logs()
Callable
[[str
], Iterable
[int
| float
]], default: <function <lambda> at 0x7f668580f7e0>
) – see from_logs()
moptipy.evaluation.ioh_analyzer module¶
moptipy_to_ioh_analyzer()
which converts the data generated by the moptipy experimentation function run_experiment()
to the format that the IOHanalyzer understands, as documented at https://iohprofiler.github.io/IOHanalyzer/data/.str
) – the directory where we can find the results in moptipy formatstr
) – the directory where we would write the IOHanalyzer style dataCallable
[[str
], str
], default: <function __prefix at 0x7f6685d5cea0>
) – convert the instance name to a function IDCallable
[[str
], int
], default: <function __int_suffix at 0x7f66855f3380>
) – convert an instance name to a function dimensionCallable
[[str
], int
], default: <function <lambda> at 0x7f66855f11c0>
) – convert the instance name an instance ID, which must be a positive integer numberstr
, default: 'moptipy'
) – the suite namestr
, default: 'plainF'
) – the objective namedict
[str
, int
| float
] | None
, default: None
) – a dictionary mapping instances to standard valuesmoptipy.evaluation.log_parser module¶
Execution
and experiment-running facility (run_experiment()
) uses the class Logger
from module logger
to produce log files complying with https://thomasweise.github.io/moptipy/#log-files.LogParser
. It works similar to SAX-XML parsing in that the data is read is from files and methods that consume the data are invoked. By overwriting these methods, we can do useful things with the data.end_results
, the method from_logs()
can load EndResult
records from the logs and the method from_logs()
in module progress
reads the whole Progress
that the algorithms make over time.LogParser
Path
) – the file pathparse_file()
should return True).object
FileLogger
. It can also recursively parse directories.parse_dir()
. If it returns True, every sub-directory inside of it will be passed to start_dir()
and every file will be passed to start_file()
.parse_file()
, which is the entry point for the file parsing process.parse_file()
end_file()
) if there is no more section in the file.start_section()
, False if the parsing process can be terminated, in which case we will fast-forward to end_file()
parse_file()
is invoked and its result is returned. If path identifies a directory, then parse_dir()
is invoked and its result is returned.str
) – a path identifying either a directory or a file.str
) – the path to the directorystart_dir()
returned False or end_dir()
returned True, False otherwisestart_file()
to see whether the file should be parsed. If start_file()
returns True, then the file is parsed. If start_file()
returns False, then this method returns False directly. If the file is parsed, then start_section()
will be invoked for each section (until the parsing is finished) and lines()
for each section content (if requested). At the end, end_file()
is invoked.parse_dir()
. In the latter case, if parse_file()
returned True, the next file in the current directory will be parsed. If it returns False, then no file located in the current directory will be parsed, while other directories and/or sub-directories will still be processed.str
) – the file to parseend_file()
parse_dir()
. If it returns True, every sub-directory inside of it will be passed to start_dir()
and every file will be passed to start_file()
. Only if True is returned, end_dir()
will be invoked and its return value will be the return value of parse_dir()
. If False is returned, then parse_dir()
will return immediately and return True.parse_file()
. If it returns True, then we will open and parse the file. If it returns False, then the fill will not be parsed and parse_file()
will return True immediately.Path
) – the file pathparse_file()
should return True).lines()
. If this method returns False, then the section will be skipped and we fast-forward to the next section, if any, or to the call of end_file()
.str
) – the section titlelines()
, False of the section can be skipped. In that case, we will fast-forward to the next start_section()
.ExperimentParser
process()
method.process()
method to process the parsed data.int
| None
¶int
| None
¶end_file()
if the end of the parsing process is reached. By now, all the data should have been loaded and it can be passed on to wherever it should be passed to.moptipy.evaluation.plot_ecdf module¶
ecdf
) is a function that shows the fraction of runs that were successful in attaining a certain goal objective value over the time. The combination of ERT and ECDF is discussed in ertecdf
.Union
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function AxisRanger.for_axis at 0x7f66861bd300>
) – the x_axis rangerUnion
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function AxisRanger.for_axis at 0x7f66861bd300>
) – the y_axis rangerbool
, default: True
) – should we plot the legend?Callable
[[int
], Any
], default: <function distinct_colors at 0x7f668579b2e0>
) – the function returning the paletteCallable
[[int
], Any
], default: <function distinct_line_dashes at 0x7f668579b380>
) – the function returning the line stylesCallable
[[int
], float
], default: <function importance_to_line_width at 0x7f668579b1a0>
) – the function converting importance values to line widthsCallable
[[int
], float
], default: <function importance_to_alpha at 0x7f668579b100>
) – the function converting importance values to alphasCallable
[[int
], float
], default: <function importance_to_font_size at 0x7f668579afc0>
) – the function converting importance values to font sizesbool
, default: True
) – should we have a grid along the x-axis?bool
, default: True
) – should we have a grid along the y-axis?Union
[None
, str
, Callable
[[str
], str
]], default: <function <lambda> at 0x7f6685c4f240>
) – a callable returning the label for the x-axis, a label string, or None if no label should be putbool
, default: True
) – put the x-axis label inside the plot (so that it does not consume additional vertical space)Union
[None
, str
, Callable
[[str
], str
]], default: <function Lang.translate_func.<locals>.__tf at 0x7f6685938900>
) – a callable returning the label for the y-axis, a label string, or None if no label should be putbool
, default: True
) – put the y-axis label inside the plot (so that it does not consume additional horizontal space)float
, default: 5.0
) – the style priority for algorithmsfloat
, default: 0.333
) – the style priority for goal valuesCallable
[[str
], str
], default: <function <lambda> at 0x7f6685938400>
) – the name function for algorithms receives an algorithm ID and returns an instance name; default=identity functionbool
, default: True
) – if only a single group of data was found, use algorithms as group and put them in the legendCallable
[[str
], Any
], default: <function <lambda> at 0x7f6685938ae0>
) – the sort key function for algorithmsCallable
[[str
], Any
], default: <function <lambda> at 0x7f668593a840>
) – the sort key function for goalsmoptipy.evaluation.plot_end_results module¶
Iterable
[EndResult
]) – the iterable of end resultsstr
, default: 'scaledF'
) – the dimension to displayUnion
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function AxisRanger.for_axis at 0x7f66861bd300>
) – the y_axis rangerCallable
[[int
], Any
], default: <function distinct_colors at 0x7f668579b2e0>
) – the function returning the paletteCallable
[[int
], float
], default: <function importance_to_line_width at 0x7f668579b1a0>
) – the function converting importance values to line widthsCallable
[[int
], float
], default: <function importance_to_font_size at 0x7f668579afc0>
) – the function converting importance values to font sizesbool
, default: True
) – should we have a grid along the y-axis?bool
, default: True
) – should we have a grid along the x-axis?Union
[None
, str
, Callable
[[str
], str
]], default: <function Lang.translate at 0x7f6686722160>
) – a callable returning the label for the x-axis, a label string, or None if no label should be putbool
, default: True
) – put the x-axis label inside the plot (so that it does not consume additional vertical space)float
, default: 1.0
) – the location of the x-labelUnion
[None
, str
, Callable
[[str
], str
]], default: <function Lang.translate at 0x7f6686722160>
) – a callable returning the label for the y-axis, a label string, or None if no label should be putbool
, default: True
) – put the y-axis label inside the plot (so that it does not consume additional horizontal space)float
, default: 0.5
) – the location of the y-labelstr
, default: 'best'
) – the legend positionCallable
[[str
], Any
], default: <function <lambda> at 0x7f6685306e80>
) – the sort key function for instancesCallable
[[str
], Any
], default: <function <lambda> at 0x7f6685306ca0>
) – the sort key function for algorithmsCallable
[[str
], str
], default: <function <lambda> at 0x7f6685306d40>
) – the name function for instances receives an instance ID and returns an instance name; default=identity functionCallable
[[str
], str
], default: <function <lambda> at 0x7f6685306c00>
) – the name function for algorithms receives an algorithm ID and returns an instance name; default=identity functionmoptipy.evaluation.plot_end_statistics_over_parameter module¶
Iterable
[EndStatistics
]) – the iterable of EndStatisticsCallable
[[EndStatistics
], int
| float
]) – the function computing the x-value for each statistics objectstr
, default: 'scaledF.geom'
) – the dimension to be plotted along the y-axisCallable
[[EndStatistics
], str
| None
], default: <function <lambda> at 0x7f6685246700>
) – the algorithm getterCallable
[[EndStatistics
], str
| None
], default: <function <lambda> at 0x7f66852467a0>
) – the instance getterUnion
[AxisRanger
, Callable
[[], AxisRanger
]], default: <class 'moptipy.evaluation.axis_ranger.AxisRanger'>
) – the x_axis rangerUnion
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function __make_y_axis at 0x7f6685246660>
) – the y_axis rangerbool
, default: True
) – should we plot the legend?str
, default: 'upper right'
) – the legend positionCallable
[[int
], Any
], default: <function distinct_colors at 0x7f668579b2e0>
) – the function returning the paletteCallable
[[int
], Any
], default: <function distinct_line_dashes at 0x7f668579b380>
) – the function returning the line stylesCallable
[[int
], float
], default: <function importance_to_line_width at 0x7f668579b1a0>
) – the function converting importance values to line widthsCallable
[[int
], float
], default: <function importance_to_font_size at 0x7f668579afc0>
) – the function converting importance values to font sizesbool
, default: True
) – should we have a grid along the x-axis?bool
, default: True
) – should we have a grid along the y-axis?str
| None
, default: None
) – the label for the x-axi or None if no label should be putbool
, default: True
) – put the x-axis label inside the plot (so that it does not consume additional vertical space)float
, default: 0.5
) – the location of the x-axis labelUnion
[None
, str
, Callable
[[str
], str
]], default: <function __make_y_label at 0x7f6685244040>
) – a callable returning the label for the y-axis, a label string, or None if no label should be putbool
, default: True
) – put the y-axis label inside the plot (so that it does not consume additional horizontal space)float
, default: 1.0
) – the location of the y-axis labelfloat
, default: 0.666
) – the style priority for instancesfloat
, default: 0.333
) – the style priority for algorithmsfloat
, default: 0.0
) – the style priority for statisticsCallable
[[str
], Any
], default: <function <lambda> at 0x7f6685246840>
) – the sort key function for instancesCallable
[[str
], Any
], default: <function <lambda> at 0x7f66852468e0>
) – the sort key function for algorithmsCallable
[[str
], str
], default: <function <lambda> at 0x7f6685246980>
) – the name function for instances receives an instance ID and returns an instance name; default=identity functionCallable
[[str
], str
], default: <function <lambda> at 0x7f6685246a20>
) – the name function for algorithms receives an algorithm ID and returns an instance name; default=identity functionCallable
[[str
], str
], default: <function <lambda> at 0x7f6685246ac0>
) – the sort key function for statisticsbool
, default: True
) – if only a single group of data was found, use algorithms as group and put them in the legendmoptipy.evaluation.plot_ert module¶
Ert
objects into one figure.ert
) is a function that tries to give an estimate how long a given algorithm setup will need (y-axis) to achieve given solution qualities (x-axis). It uses a set of runs of the algorithm on the problem to make this estimate under the assumption of independent restarts.Union
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function AxisRanger.for_axis at 0x7f66861bd300>
) – the x_axis rangerUnion
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function AxisRanger.for_axis at 0x7f66861bd300>
) – the y_axis rangerbool
, default: True
) – should we plot the legend?Callable
[[int
], Any
], default: <function distinct_colors at 0x7f668579b2e0>
) – the function returning the paletteCallable
[[int
], Any
], default: <function distinct_line_dashes at 0x7f668579b380>
) – the function returning the line stylesCallable
[[int
], float
], default: <function importance_to_line_width at 0x7f668579b1a0>
) – the function converting importance values to line widthsCallable
[[int
], float
], default: <function importance_to_alpha at 0x7f668579b100>
) – the function converting importance values to alphasCallable
[[int
], float
], default: <function importance_to_font_size at 0x7f668579afc0>
) – the function converting importance values to font sizesbool
, default: True
) – should we have a grid along the x-axis?bool
, default: True
) – should we have a grid along the y-axis?Union
[None
, str
, Callable
[[str
], str
]], default: <function Lang.translate at 0x7f6686722160>
) – a callable returning the label for the x-axis, a label string, or None if no label should be putbool
, default: True
) – put the x-axis label inside the plot (so that it does not consume additional vertical space)Union
[None
, str
, Callable
[[str
], str
]], default: <function Lang.translate_func.<locals>.__tf at 0x7f6685241260>
) – a callable returning the label for the y-axis, a label string, or None if no label should be putbool
, default: True
) – put the y-axis label inside the plot (so that it does not consume additional horizontal space)Callable
[[str
], Any
], default: <function <lambda> at 0x7f66852411c0>
) – the sort key function for instancesCallable
[[str
], Any
], default: <function <lambda> at 0x7f66852413a0>
) – the sort key function for algorithmsCallable
[[str
], str
], default: <function <lambda> at 0x7f6685241940>
) – the name function for instances receives an instance ID and returns an instance name; default=identity functionCallable
[[str
], str
], default: <function <lambda> at 0x7f6685241620>
) – the name function for algorithms receives an algorithm ID and returns an instance name; default=identity functionfloat
, default: 0.666
) – the style priority for instancesfloat
, default: 0.333
) – the style priority for algorithmsmoptipy.evaluation.plot_progress module¶
Iterable
[Progress
| StatRun
]) – the iterable of progresses and statistical runsUnion
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function AxisRanger.for_axis at 0x7f66861bd300>
) – the x_axis rangerUnion
[AxisRanger
, Callable
[[str
], AxisRanger
]], default: <function AxisRanger.for_axis at 0x7f66861bd300>
) – the y_axis rangerbool
, default: True
) – should we plot the legend?Callable
[[int
], Any
], default: <function distinct_colors at 0x7f668579b2e0>
) – the function returning the paletteCallable
[[int
], Any
], default: <function distinct_line_dashes at 0x7f668579b380>
) – the function returning the line stylesCallable
[[int
], float
], default: <function importance_to_line_width at 0x7f668579b1a0>
) – the function converting importance values to line widthsCallable
[[int
], float
], default: <function importance_to_alpha at 0x7f668579b100>
) – the function converting importance values to alphasCallable
[[int
], float
], default: <function importance_to_font_size at 0x7f668579afc0>
) – the function converting importance values to font sizesbool
, default: True
) – should we have a grid along the x-axis?bool
, default: True
) – should we have a grid along the y-axis?Union
[None
, str
, Callable
[[str
], str
]], default: <function Lang.translate at 0x7f6686722160>
) – a callable returning the label for the x-axis, a label string, or None if no label should be putbool
, default: True
) – put the x-axis label inside the plot (so that it does not consume additional vertical space)float
, default: 0.5
) – the location of the x-axis labelUnion
[None
, str
, Callable
[[str
], str
]], default: <function Lang.translate at 0x7f6686722160>
) – a callable returning the label for the y-axis, a label string, or None if no label should be putbool
, default: True
) – put the y-axis label inside the plot (so that it does not consume additional horizontal space)float
, default: 1.0
) – the location of the y-axis labelfloat
, default: 0.666
) – the style priority for instancesfloat
, default: 0.333
) – the style priority for algorithmsfloat
, default: 0.0
) – the style priority for statisticsCallable
[[str
], Any
], default: <function <lambda> at 0x7f66852432e0>
) – the sort key function for instancesCallable
[[str
], Any
], default: <function <lambda> at 0x7f6685240c20>
) – the sort key function for algorithmsCallable
[[str
], Any
], default: <function <lambda> at 0x7f6684fa4d60>
) – the sort key function for statisticsbool
, default: True
) – if only a single group of data was found, use algorithms as group and put them in the legendCallable
[[str
], str
], default: <function <lambda> at 0x7f6684fa4e00>
) – the name function for instances receives an instance ID and returns an instance name; default=identity functionCallable
[[str
], str
], default: <function <lambda> at 0x7f6684fa4ea0>
) – the name function for algorithms receives an algorithm ID and returns an algorithm name; default=identity functionmoptipy.evaluation.progress module¶
Progress
holds one time
vector and an objective value (f
) vector. The time dimension (stored in time_unit
) can either be in FEs or in milliseconds and the objective value dimension (stored in f_name
) can be raw objective values, standardized objective values, or normalized objective values. The two vectors together thus describe how a run of an optimization algorithm improves the objective value over time.PerRunData
Progress
is created and appended to the collector. If path identifies a directory, then this directory is parsed recursively for each log file found, one record is passed to the consumer. The consumer is simply a callable function. You could pass in the append method of a list
.str
) – the path to parseCallable
[[Progress
], Any
]) – the consumer, can be the append method of a list
str
, default: 'FEs'
) – the time unitstr
, default: 'plainF'
) – the objective namedict
[str
, int
| float
] | None
, default: None
) – a dictionary mapping instances to standard valuesbool
, default: True
) – enforce that f-values should be improving and time values increasingmoptipy.evaluation.selector module¶
select_consistent()
offered by this module provides the functionality to make such a selection. It may be a bit slow, but hopefully it will pick the largest possible consistent sub-selection or, at least, get close to it.PerRunData
)Iterable
[TypeVar
(T
, bound= PerRunData
)]) – the source databool
, default: True
) – shall we log the progressbool
, default: True
) – use the slower method which may give us more datalist
[TypeVar
(T
, bound= PerRunData
)]>>> def __p(x) -> str:
... return (f"{x.algorithm}/{x.instance}/{x.objective}/{x.encoding}/"
... f"{x.rand_seed}")
>>> a1i1o1e1s1 = PerRunData("a1", "i1", "o1", "e1", 1)
>>> a1i1o1e1s2 = PerRunData("a1", "i1", "o1", "e1", 2)
>>> a1i1o1e1s3 = PerRunData("a1", "i1", "o1", "e1", 3)
>>> a1i2o1e1s1 = PerRunData("a1", "i2", "o1", "e1", 1)
>>> a1i2o1e1s2 = PerRunData("a1", "i2", "o1", "e1", 2)
>>> a1i2o1e1s3 = PerRunData("a1", "i2", "o1", "e1", 3)
>>> a2i1o1e1s1 = PerRunData("a2", "i1", "o1", "e1", 1)
>>> a2i1o1e1s2 = PerRunData("a2", "i1", "o1", "e1", 2)
>>> a2i1o1e1s3 = PerRunData("a2", "i1", "o1", "e1", 3)
>>> a2i2o1e1s1 = PerRunData("a1", "i2", "o1", "e1", 1)
>>> a2i2o1e1s2 = PerRunData("a2", "i2", "o1", "e1", 2)
>>> a2i2o1e1s3 = PerRunData("a2", "i2", "o1", "e1", 3)
>>> list(map(__p, select_consistent((
... a1i1o1e1s1, a1i1o1e1s2, a1i1o1e1s3,
... a1i2o1e1s1, a1i2o1e1s2, a1i2o1e1s3,
... a2i1o1e1s1, a2i1o1e1s2,
... a2i2o1e1s2, a2i2o1e1s3))))
['a1/i1/o1/e1/1', 'a1/i1/o1/e1/2', 'a1/i2/o1/e1/2', 'a1/i2/o1/e1/3', 'a2/i1/o1/e1/1', 'a2/i1/o1/e1/2', 'a2/i2/o1/e1/2', 'a2/i2/o1/e1/3']
>>> list(map(__p, select_consistent((
... a1i1o1e1s2, a1i1o1e1s3,
... a1i2o1e1s1, a1i2o1e1s2, a1i2o1e1s3,
... a2i1o1e1s1, a2i1o1e1s2,
... a2i2o1e1s2, a2i2o1e1s3))))
['a1/i2/o1/e1/2', 'a1/i2/o1/e1/3', 'a2/i2/o1/e1/2', 'a2/i2/o1/e1/3']
>>> list(map(__p, select_consistent((
... a1i1o1e1s2, a1i1o1e1s3,
... a1i2o1e1s1, a1i2o1e1s2, a1i2o1e1s3,
... a2i1o1e1s1, a2i1o1e1s2,
... a2i2o1e1s2))))
['a1/i1/o1/e1/2', 'a1/i2/o1/e1/2', 'a2/i1/o1/e1/2', 'a2/i2/o1/e1/2']
>>> list(map(__p, select_consistent((
... a1i1o1e1s1, a1i1o1e1s2, a1i1o1e1s3,
... a2i2o1e1s1, a2i2o1e1s2, a2i2o1e1s3))))
['a1/i1/o1/e1/1', 'a1/i1/o1/e1/2', 'a1/i1/o1/e1/3']
>>> list(map(__p, select_consistent((
... a1i1o1e1s1, a1i1o1e1s2, a1i1o1e1s3,
... a2i1o1e1s1, a2i1o1e1s2, a2i1o1e1s3))))
['a1/i1/o1/e1/1', 'a1/i1/o1/e1/2', 'a1/i1/o1/e1/3', 'a2/i1/o1/e1/1', 'a2/i1/o1/e1/2', 'a2/i1/o1/e1/3']
>>> list(map(__p, select_consistent((
... a1i1o1e1s1, a1i1o1e1s2, a1i2o1e1s2, a1i2o1e1s3))))
['a1/i1/o1/e1/1', 'a1/i1/o1/e1/2', 'a1/i2/o1/e1/2', 'a1/i2/o1/e1/3']
>>> list(map(__p, select_consistent((
... a1i1o1e1s1, a1i1o1e1s2, a1i2o1e1s2))))
['a1/i1/o1/e1/1', 'a1/i1/o1/e1/2']
>>> list(map(__p, select_consistent((
... a1i1o1e1s1, a2i1o1e1s2))))
['a1/i1/o1/e1/1']
>>> list(map(__p, select_consistent((
... a1i1o1e1s1, a2i1o1e1s2, a2i1o1e1s3))))
['a2/i1/o1/e1/2', 'a2/i1/o1/e1/3']
>>> try:
... select_consistent((a1i1o1e1s1, a1i1o1e1s2, a1i2o1e1s2, a1i2o1e1s2))
... except ValueError as ve:
... print(ve)
Found 4 records but only 3 different keys!
>>> try:
... select_consistent(1)
... except TypeError as te:
... print(te)
data should be an instance of typing.Iterable but is int, namely 1.
>>> try:
... select_consistent((a2i1o1e1s2, a2i1o1e1s3), 3)
... except TypeError as te:
... print(te)
log should be an instance of bool but is int, namely 3.
>>> try:
... select_consistent({234})
... except TypeError as te:
... print(te)
dataElement should be an instance of moptipy.evaluation.base.PerRunData but is int, namely 234.
>>> try:
... select_consistent((a2i1o1e1s2, a2i1o1e1s3), True, 4)
... except TypeError as te:
... print(te)
thorough should be an instance of bool but is int, namely 4.
moptipy.evaluation.stat_run module¶
MultiRun2DData
Union
[str
, Iterable
[str
]]) – the statistics that should be computed per groupCallable
[[StatRun
], Any
]) – the destination to which the new stat runs will be passed, can be the append method of a list
bool
, default: False
) – should the statistics be aggregated over all algorithmsbool
, default: False
) – should the statistics be aggregated over all algorithmsbool
, default: False
) – should the statistics be aggregated over all objective functions?bool
, default: False
) – should the statistics be aggregated over all encodings?PerRunData
| MultiRunData
) – the objectmoptipy.evaluation.styler module¶
object
moptipy.evaluation.tabulate_end_results module¶
tabulate_end_results()
to tabulate end results.str
) – the column identifierbool
, default: True
) – surround the command with $Callable
[[bool
], str
], default: <function <lambda> at 0x7f6684b456c0>
) – the name function for the key “summary”Callable
[[bool
], str
], default: <function <lambda> at 0x7f6684b451c0>
) – the name function for the key KEY_ALGORITHMstr
) – the column nameMarkdown
, LaTeX
, and HTML
.str
, default: 'table'
) – the base file namestr
, default: '.'
) – the base directoryIterable
[str
], default: ('bestF.min', 'bestF.mean', 'bestF.sd', 'bestFscaled.mean', 'lastImprovementFE.mean', 'lastImprovementTimeMillis.mean')
) – the statistics to printOptional
[Iterable
[str
| None
]], default: ('bestFscaled.min', 'bestFscaled.geom', 'bestFscaled.max', 'bestFscaled.sd', 'lastImprovementFE.mean', 'lastImprovementTimeMillis.mean')
) – the summary statistics to print per algorithmUnion
[TextFormatDriver
, Callable
[[], TextFormatDriver
]], default: <function Markdown.instance at 0x7f6684bed3a0>
) – the text format driverCallable
[[str
], Any
], default: <function <lambda> at 0x7f6684bee520>
) – a function returning sort keys for algorithmsCallable
[[str
], Any
], default: <function <lambda> at 0x7f6684bee5c0>
) – a function returning sort keys for instancesCallable
[[str
], str
], default: <function default_column_namer at 0x7f6684b44c20>
) – the column namer functionCallable
[[str
], Callable
[[Iterable
[int
| float
| None
]], int
| float
]], default: <function default_column_best at 0x7f6684bee2a0>
) – the column-best getter functionCallable
[[str
], NumberRenderer
], default: <function default_number_renderer at 0x7f6684bee3e0>
) – the number renderer for the columnbool
, default: True
) – should we put the lower bound or goal objective value?Optional
[Callable
[[EndStatistics
], int
| float
| None
]], default: <function __getter.<locals>.__fixed at 0x7f6684bee700>
) – the getter for the lower boundstr
| None
, default: 'lower_bound'
) – the name key for the lower bound to be passed to col_namerbool
, default: True
) – should we use the language to define the filename?Callable
[[str
], str
], default: <function <lambda> at 0x7f6684bee7a0>
) – the name function for instances receives an instance ID and returns an instance name; default=identity functionCallable
[[str
], str
], default: <function <lambda> at 0x7f6684bee840>
) – the name function for algorithms receives an algorithm ID and returns an instance name; default=identity functionmoptipy.evaluation.tabulate_end_stats module¶
Iterable
[EndStatistics
]) – the source dataCallable
[[int
], TextIO
| TextIOBase
]) – the destination generatorint
, default: 3
) – the number of times we can wrap a tableint
, default: 50
) – the maximum rows per destinationIterable
[tuple
[Callable
[[EndStatistics
], int
| float
| None
], str
, bool
, Callable
[[int
| float
], str
]]], default: ((<function getter.<locals>.__combo_no_sd at 0x7f668521cfe0>, '\\\\bestFmean', True, <function <lambda> at 0x7f668521cf40>),)
) – the set of statistics: tuples of statistic, title, whether minimization or maximization, and a to-string converterCallable
[[EndStatistics
], str
], default: <function <lambda> at 0x7f668521d080>
) – get the instance identifierCallable
[[str
], str
], default: <function <lambda> at 0x7f668521e340>
) – get the instance name, as it should be printedCallable
[[str
], Any
], default: <function <lambda> at 0x7f668521e7a0>
) – get the sort key for the instanceCallable
[[EndStatistics
], str
], default: <function <lambda> at 0x7f668521fa60>
) – get the algorithm identifierCallable
[[str
], str
], default: <function <lambda> at 0x7f668521fb00>
) – get the algorithm name, as it should be printedCallable
[[str
], Any
], default: <function <lambda> at 0x7f668521f9c0>
) – get the sort key for the algorithmIterable
[tuple
[str
, Callable
[[str
], str
]]], default: ()
) – the fixed instance columnsCallable
[[str
], str
], default: <function <lambda> at 0x7f668521f920>
) – format the best valuestr
, default: 'instance'
) – the header for the instancestr
| None
, default: '\\\\nBest'
) – the header for the best countmoptipy.evaluation.tabulate_result_tests module¶
tabulate_result_tests()
creating statistical comparison tables.tabulate_result_tests()
can compare two or more algorithms on multiple problem instances by using the Mann-Whitney U test [1-3] with the Bonferroni correction [4].EndResult
records, each of which represents the result of one run of one algorithm on one instance. This function performs a two-tailed Mann-Whitney U test for each algorithm pair on each problem instance to see if the performances are statistically significantly different. The results of these tests are tabulated, together with their p-values, i.e., the probabilities that the observed differences would occur if the two algorithms would perform the same.Markdown
, LaTeX
, and HTML
.str
, default: 'tests'
) – the base file namestr
, default: '.'
) – the base directoryfloat
, default: 0.02
) – the threshold at which the two-tailed test result is accepted.Union
[TextFormatDriver
, Callable
[[], TextFormatDriver
]], default: <function Markdown.instance at 0x7f6684bed3a0>
) – the text format driverCallable
[[str
], Any
], default: <function <lambda> at 0x7f66855f1580>
) – a function returning sort keys for algorithmsCallable
[[str
], Any
], default: <function <lambda> at 0x7f66855f18a0>
) – a function returning sort keys for instancesCallable
[[str
], str
], default: <function <lambda> at 0x7f66855f1620>
) – the name function for instances receives an instance ID and returns an instance name; default=identity functionCallable
[[str
], str
], default: <function <lambda> at 0x7f66855f31a0>
) – the name function for algorithms receives an algorithm ID and returns an instance name; default=identity functionbool
, default: False
) – should we use the language to define the filenameNumberRenderer
, default: <moptipy.utils.number_renderer.NumberRenderer object at 0x7f6684bddaf0>
) – the renderer for all probabilitiesCallable
[[EndResult
], int
| float
], default: <function EndResult.get_best_f at 0x7f668761c860>
) – the getter for the values that should be compared. By default, the best obtained objective values are compared. However, if you let the runs continue until they reach a certain goal quality, then you may want to compare the runtimes consumed until that quality is reached. Basically, you can use any of the getters provided by moptipy.evaluation.end_results.getter()
, but you must take care that the comparison makes sense, i.e., compare qualities under fixed-budget scenarios (the default behavior) or compare runtimes under scenarios with goal qualities - but do not mix up the scenarios.