The Simulation Generate (Sim Gen) node provides an easy way to generate simulated data—either from scratch using user specified statistical distributions or automatically using the distributions obtained from running a Simulation Fitting (Sim Fit) node on existing historical data. This is useful when you want to evaluate the outcome of a predictive model in the presence of uncertainty in the model inputs.
simgennode properties |
Data type | Property description |
---|---|---|
fields |
Structured property | See example |
correlations |
Structured property | See example |
keep_min_max_setting |
boolean | |
refit_correlations |
boolean | |
max_cases |
integer | Minimum value is 1000, maximum value is 2,147,483,647 |
create_iteration_field |
boolean | |
iteration_field_name |
string | |
replicate_results |
boolean | |
random_seed |
integer | |
parameter_xml |
string | Returns the parameter Xml as a string |
fields example
This is a structured slot parameter with the following syntax:
simgennode.setPropertyValue("fields", [
[field1, storage, locked, [distribution1], min, max],
[field2, storage, locked, [distribution2], min, max],
[field3, storage, locked, [distribution3], min, max]
])
distribution
is a declaration of the distribution name followed by a list
containing pairs of attribute names and values. Each distribution is defined in the following
way:
[distributionname, [[par1], [par2], [par3]]]
simgennode = modeler.script.stream().createAt("simgen", u"Sim Gen", 726, 322)
simgennode.setPropertyValue("fields", [["Age", "integer", False, ["Uniform",[["min","1"],["max","2"]]], "", ""]])
For example, to create a node that generates a single field with a Binomial distribution, you might use the following script:
simgen_node1 = modeler.script.stream().createAt("simgen", u"Sim Gen", 200, 200)
simgen_node1.setPropertyValue("fields", [["Education", "Real", False, ["Binomial", [["n", 32],
["prob", 0.7]]], "", ""]])
The Binomial distribution takes 2 parameters: n
and prob
. Since
Binomial does not support minimum and maximum values, these are supplied as an empty string.
distribution
directly; you use it in conjunction with the
fields
property. The following examples show all the possible distribution types. Note that the threshold is
entered as thresh
in both NegativeBinomialFailures
and
NegativeBinomialTrial
.
stream = modeler.script.stream()
simgennode = stream.createAt("simgen", u"Sim Gen", 200, 200)
beta_dist = ["Field1", "Real", False, ["Beta",[["shape1","1"],["shape2","2"]]], "", ""]
binomial_dist = ["Field2", "Real", False, ["Binomial",[["n" ,"1"],["prob","1"]]], "", ""]
categorical_dist = ["Field3", "String", False, ["Categorical", [["A",0.3],["B",0.5],["C",0.2]]], "", ""]
dice_dist = ["Field4", "Real", False, ["Dice", [["1" ,"0.5"],["2","0.5"]]], "", ""]
exponential_dist = ["Field5", "Real", False, ["Exponential", [["scale","1"]]], "", ""]
fixed_dist = ["Field6", "Real", False, ["Fixed", [["value","1" ]]], "", ""]
gamma_dist = ["Field7", "Real", False, ["Gamma", [["scale","1"],["shape"," 1"]]], "", ""]
lognormal_dist = ["Field8", "Real", False, ["Lognormal", [["a","1"],["b","1" ]]], "", ""]
negbinomialfailures_dist = ["Field9", "Real", False, ["NegativeBinomialFailures",[["prob","0.5"],["thresh","1"]]], "", ""]
negbinomialtrial_dist = ["Field10", "Real", False, ["NegativeBinomialTrials",[["prob","0.2"],["thresh","1"]]], "", ""]
normal_dist = ["Field11", "Real", False, ["Normal", [["mean","1"] ,["stddev","2"]]], "", ""]
poisson_dist = ["Field12", "Real", False, ["Poisson", [["mean","1"]]], "", ""]
range_dist = ["Field13", "Real", False, ["Range", [["BEGIN","[1,3]"] ,["END","[2,4]"],["PROB","[[0.5],[0.5]]"]]], "", ""]
triangular_dist = ["Field14", "Real", False, ["Triangular", [["min","0"],["max","1"],["mode","1"]]], "", ""]
uniform_dist = ["Field15", "Real", False, ["Uniform", [["min","1"],["max","2"]]], "", ""]
weibull_dist = ["Field16", "Real", False, ["Weibull", [["a","0"],["b","1 "],["c","1"]]], "", ""]
simgennode.setPropertyValue("fields", [\
beta_dist, \
binomial_dist, \
categorical_dist, \
dice_dist, \
exponential_dist, \
fixed_dist, \
gamma_dist, \
lognormal_dist, \
negbinomialfailures_dist, \
negbinomialtrial_dist, \
normal_dist, \
poisson_dist, \
range_dist, \
triangular_dist, \
uniform_dist, \
weibull_dist
])
correlations example
This is a structured slot parameter with the following syntax:
simgennode.setPropertyValue("correlations", [
[field1, field2, correlation],
[field1, field3, correlation],
[field2, field3, correlation]
])
Correlation can be any number between +1
and -1
. You can
specify as many or as few correlations as you like. Any unspecified correlations are set to zero. If
any fields are unknown, the correlation value should be set on the correlation matrix (or table).
When there are unknown fields, it's not possible to run the node.