Code documentation¶
- class rulexai.explainer.RuleExplainer(model, X: DataFrame, y: Union[DataFrame, Series], type: str = 'classification')¶
- Parameters
model (Model = Union[RuleClassifier, RuleRegressor, SurvivalRules, CN2UnorderedClassifier, CN2SDUnorderedClassifier, DecisionTreeClassifier, DecisionTreeRegressor, SurvivalTree, List[str]]) –
- Model to be analyzed. RuleXai supports the following Rule models:
RuleKit(https://adaa-polsl.github.io/RuleKit-python/): RuleClassifier, RuleRegressor, SurvivalRules
Orange (https://orangedatamining.com/): CN2UnorderedClassifier, CN2SDUnorderedClassifier
- It can also extract rules from decision trees:
scikit-learn (https://scikit-learn.org/stable/): DecisionTreeClassifier, DecisionTreeRegressor
scikit-survival (https://scikit-survival.readthedocs.io/en/stable/): SurvivalTree
- Or you can provide a list of rules as:
- classification:
IF attribute1 = (-inf, value) AND … AND attribute2 = <value1, value2) THEN label_atrribute = {class_name}
- regression:
IF attribute1 = (-inf, value) AND … AND attribute2 = <value1, value2) THEN target_attribute = {value}
- survival:
IF attribute1 = (-inf, value) AND … AND attribute2 = <value1, value2) THEN survival_status_attribute = {survival_status}
X (pd.DataFrame) – The training dataset used during provided model training
y (Union[pd.DataFrame, pd.Series]) – The target values (class labels, real number, survival status) used during provided model training
type (str = None) –
- The type of problem that the provided model solves. You can choose between:
”classification”
”regression”
”survival”
default: “classification”
- condition_importances_¶
Computed conditions importances
- Type
pd.DataFrame
- feature_importances_¶
Feature importances computed base on conditions importances
- Type
pd.DataFrame
- explain(measure: str = 'C2', basic_conditions: bool = False)¶
Compute conditions importances. The importances of a conditions are computed base on:
Marek Sikora: Redefinition of Decision Rules Based on the Importance of Elementary Conditions Evaluation. Fundam. Informaticae 123(2): 171-197 (2013)
https://dblp.org/rec/journals/fuin/Sikora13.html
- Parameters
measure (str) – Specifies the measure that is used to evaluate the quality of the rules. Possible measures for classification and regression problem are: C2, Lift, Correlation. Default: C2. It is not possible to select a measure for the survival problem, the LogRank test is used by default
basic_conditions (bool) – Specifies whether to evaluate the conditions contained in the input rules, or to break the conditions in the rules into base conditions so that individual conditions do not overlap
- Returns
self – Fitted explainer with calculated conditions
- Return type
- fit_transform(X: DataFrame, selector=None, y=None, POS=None) DataFrame ¶
Creates a dataset based on given dataset in which the examples, instead of being described by the original attributes, will be described with the specified conditions - it will be a set with binary attributes determining whether a given example meets a given condition. It can be considered as kind of dummification. Thanks to this function you can discretize data and get rid of missing values. It can be used as prestep for others algorithms.
- Parameters
X (pd.DataFrame) – The input samples from which you want to create binary dataset. Should have the same columns and columns order as X specified when creating Explainer
selector (string/float) – Specifies on what basis to select the conditions from the rules that will be included as attributes in the transformed set. If None all conditions will be included in the transformed set. If number 0-1 percent of the most important conditions will be selected based on condition importance ranking. If “reduct” the reduct of the conditions set will be selected. Preferably, the option with the percentage of most important conditions will be selected.
y (Union[pd.DataFrame, pd.Series]) – Only if selector = “reduct”.The target values for input sample, used in the determination of the reduct
POS (float) – Only if selector = “reduct”.Target reduct POS
- Returns
X_transformed – Transformed dataset
- Return type
pd.DataFrame
- get_rules()¶
Return rules from model
- Returns
rules – Rules from model
- Return type
List[str]
- get_rules_covering_example(x: DataFrame, y: Union[DataFrame, Series]) List[str] ¶
Return rules that covers the given example
- Parameters
x (pd.DataFrame) – The input sample.
y (Union[pd.DataFrame, pd.Series]) – The target values for input sample.
- Returns
rules – Rules that covers the given example
- Return type
List[str]
- get_rules_with_basic_conditions()¶
Return rules from model with conditions broken down into base conditions so that individual conditions do not overlap
- Returns
rules – Rules from the model containing the base conditions
- Return type
List[str]
- local_explainability(x: DataFrame, y: Union[DataFrame, Series], plot: bool = False)¶
Displays information about the local explanation of the example: the rules that cover the given example and the importance of the conditions contained in these rules
- Parameters
x (pd.DataFrame) – The input sample.
y (Union[pd.DataFrame, pd.Series]) – The target values for input sample.
plot (bool) – If True the importance of the conditions will also be shown in the chart. Default: False
- plot_importances(importances: DataFrame)¶
Plot importances :param importances: Feature/Condition importances to plot. :type importances: pd.DataFrame
- transform(X: DataFrame) DataFrame ¶
Creates a dataset based on given dataset in which the examples, instead of being described by the original attributes, will be described with the specified conditions - it will be a set with binary attributes determining whether a given example meets a given condition. It can be considered as kind of dummification. Thanks to this function you can discretize data and get rid of missing values. It can be used as prestep for others algorithms.
- Parameters
X (pd.DataFrame) – The input samples from which you want to create binary dataset. Should have the same columns and columns order as X given in fit_transform
- Returns
X_transformed – Transformed dataset
- Return type
pd.DataFrame
- class rulexai.explainer.Explainer(X: DataFrame, model_predictions: Union[DataFrame, Series], type: str = 'classification')¶
- Parameters
X (pd.DataFrame) – The training dataset used during provided model training
model_predictions (Union[pd.DataFrame, pd.Series]) – The training dataset used during provided model training
type (str) –
- The type of problem that the provided model solves. You can choose between:
”classification”
”regression”
default: “classification”
- condition_importances_¶
Computed conditions importances on given dataset
- Type
pd.DataFrame
- feature_importances_¶
Feature importances computed base on conditions importances
- Type
pd.DataFrame
- explain(measure: str = 'C2', basic_conditions: bool = False, X_org=None)¶
Compute conditions importances. The importances of a conditions are computed base on:
Marek Sikora: Redefinition of Decision Rules Based on the Importance of Elementary Conditions Evaluation. Fundam. Informaticae 123(2): 171-197 (2013)
https://dblp.org/rec/journals/fuin/Sikora13.html
- Parameters
measure (str) – Specifies the measure that is used to evaluate the quality of the rules. Possible measures for classification and regression problem are: C2, Lift, Correlation. Default: C2. It is not possible to select a measure for the survival problem, the LogRank test is used by default
basic_conditions (bool) – Specifies whether to evaluate the conditions contained in the input rules, or to break the conditions in the rules into base conditions so that individual conditions do not overlap
X_org – The dataset on which the rule-based model should be built. It can be the set on which the black-box model was learned or this set before preprocessing (imputation of missing values, dummification, scaling), because such a set can be handled by the rule model
- Returns
self – Fitted explainer with calculated conditions
- Return type
- fit_transform(X: DataFrame, selector=None, y=None, POS=None) DataFrame ¶
Creates a dataset based on given dataset in which the examples, instead of being described by the original attributes, will be described with the specified conditions - it will be a set with binary attributes determining whether a given example meets a given condition. It can be considered as kind of dummification. Thanks to this function you can discretize data and get rid of missing values. It can be used as prestep for others algorithms.
- Parameters
X (pd.DataFrame) – The input samples from which you want to create binary dataset. Should have the same columns and columns order as X specified when creating Explainer
selector (string/float) – Specifies on what basis to select the conditions from the rules that will be included as attributes in the transformed set. If None all conditions will be included in the transformed set. If number 0-1 percent of the most important conditions will be selected based on condition importance ranking. If “reduct” the reduct of the conditions set will be selected. Preferably, the option with the percentage of most important conditions will be selected.
y (Union[pd.DataFrame, pd.Series]) – Only if selector = “reduct”.The target values for input sample, used in the determination of the reduct
POS (float) – Only if selector = “reduct”.Target reduct POS
- Returns
X_transformed – Transformed dataset
- Return type
pd.DataFrame
- get_rules()¶
Return rules from model
- Returns
rules – Rules from model
- Return type
List[str]
- get_rules_covering_example(x: DataFrame, y: Union[DataFrame, Series]) List[str] ¶
Return rules that covers the given example
- Parameters
x (pd.DataFrame) – The input sample.
y (Union[pd.DataFrame, pd.Series]) – The target values for input sample.
- Returns
rules – Rules that covers the given example
- Return type
List[str]
- get_rules_with_basic_conditions()¶
Return rules from model with conditions broken down into base conditions so that individual conditions do not overlap
- Returns
rules – Rules from the model containing the base conditions
- Return type
List[str]
- local_explainability(x: DataFrame, y: Union[DataFrame, Series], plot: bool = False)¶
Displays information about the local explanation of the example: the rules that cover the given example and the importance of the conditions contained in these rules
- Parameters
x (pd.DataFrame) – The input sample.
y (Union[pd.DataFrame, pd.Series]) – The target values for input sample.
plot (bool) – If True the importance of the conditions will also be shown in the chart. Default: False
- plot_importances(importances: DataFrame)¶
Plot importances :param importances: Feature/Condition importances to plot. :type importances: pd.DataFrame
- transform(X: DataFrame) DataFrame ¶
Creates a dataset based on given dataset in which the examples, instead of being described by the original attributes, will be described with the specified conditions - it will be a set with binary attributes determining whether a given example meets a given condition. It can be considered as kind of dummification. Thanks to this function you can discretize data and get rid of missing values. It can be used as prestep for others algorithms.
- Parameters
X (pd.DataFrame) – The input samples from which you want to create binary dataset. Should have the same columns and columns order as X given in fit_transform
- Returns
X_transformed – Transformed dataset
- Return type
pd.DataFrame