Selecting input probability distributions

17 important questions on Selecting input probability distributions

What happens when the input distribution is appropriate?

It leads to incorrect output --> bad decision.

What are the approaches to use data to specify a distribution?

1. Trace-driven simulation: use the data themselves directly in simulation.
2.  Use empirical distribution (histogram).
3. Fit theoretical distribution.

What are the (dis)advantages of fitting a theoretical distribution?

+ Generalizable, fills in holes in data.
+ Scalable.
- May not be valid.
- Difficult (mixture multiple distributions).
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What are the (dis)advantages of using an empirical distribution?

- Irregularities.
+ Valid, observations representative.

What are the (dis)advantages of using trace-driven simulation?

- Simulation can only reproduce what happened historically.
-  Seldom enough data to make all desired simulation runs.
+ Valid w.r.t. real world.

By which parameters are theoretical distributions characterized?

- Location parameter: shifts distributions over axis.
- Scale parameter: compress/expand distribution.
- Shape parameter: determined, distinct from location and scale, the basic form of a distribution within the general family of distributions of interest.

What are the steps in the selection of theoretical distributions?

1. Hypothesize family of distributions.
2. Estimation of parameters.
3. Goodness-of-fit-tests.

What continuous distributions can we use? When can we use them?

- Uniform: quantity with bounds known.
- Triangular: rough model in absence of data.
- Exponential: interarrivel times, failure times.
- Gamma: processing, repair times.
- Weibull: processing, repair times.
- Normal: errors, changes in stock price, sums of large number of quantities. 
- Lognormal: processing, repair times, products of large number of quantities.
- Beta: rough model in absence of data, random fractions (defectives).

What discrete distributions can we use? When can we use them?

- Bernoulli: flip of a coin, succes versus failure.
- Uniform: quantity with bounds known.
- Binomial: number of defectives in batches, demand/batch sizes.
- Geometric: number of failures before first succes, demand/batch sizes.
- Negative binomial distribution: number of failures before nth succes, demand/batch sizes.
- Poisson: number of events in time arrival, demand/batch sizes.

What are graphical techniques to check assumption?

- correlation plot.
- Scatter diagram.

How can we see form the scatter diagram whether there is independence?

- Independence if scattered randomly throughout first quadrant.
- dependence if along a line with positive/negative slope in first quadrant.

In what ways can we hypothesize families of distributions?

- Prior knowledge.
- Summary statistics (mean, median, coefficient of variation, skewness, etc.)
- Histograms.
- Quantile summaries.
- Box plots.

What are the advantages of maximum likelihood estimation?

+ Unique for most common distributions.
+ Asymptotically unbiased.
+ Invariant under transformation.
+ Asymptotically normally distributed.
+ Strongly consistent.

What are the heuristic procedures of goodness-of-fit tests?

- Density-histogram plots.
- Frequency comparisons.
- Distribution-function-differences plot.
- Quantile-Quantile plot (amplifies differences in tails).
- Probability-probability plot (amplifies differences in the middle).

What are the steps for a chi-square test?

- Divide range fitted distribution into k adjacent intervals.
- Tally Nj: number of Xi's in jth interval.
- Compute expexted proportion pj of the Xi's that would fall in the jth interval if we were sampling from fitted distribution.
- Statistic: X^2 = sum(j,k) (Nj-np_j)^2/npj > X^2_k-1;1-alpha

For what is the ExpertFit software needed?

It determines automatically and accurately which of 40 probability distributions best represent a data set.

What are the modules (steps) in ExpertFit?

- Data: summary statistics, histograms, correlation plots, scatter diagrams, test on homogeneity.
- Models: fits distribution (MLE), ranks on quality of fit, determines whether best fit is good enough (otherwise, recommends empirical distributions).
- Comparisons: further investigates quality of fit (plots, tests).
- Applications: computes characteristics of fitted distribution. Puts selected distribution into proper format chosen simulation package.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo