Association Rules Mining in Business - Leveraging the PRISM algorithm | Articles

The PRISM Cendrowska algorithm is one of the first algorithms used for discovering association rules and was proposed in 1987. With the development of the field of artificial intelligence, many improved algorithms have emerged that are more efficient and precise. This algorithm is used in the field of artificial intelligence, where it serves to automatically discover association rules in data. There are also hybrid algorithms that combine the features of different algorithms and allow for even more effective discovery of association rules.

Association rules are dependencies between elements of a data set that often occur together. The PRISM Cendrowska algorithm allows for automatic discovery of these dependencies by analyzing large amounts of data and identifying frequent patterns.

The PRISM Cendrowska algorithm consists of three main steps. The first step is data processing and identification of frequent sets of elements. Then, in the second step, the algorithm creates a decision tree that represents the discovered association rules. In the third step, the algorithm applies the decision tree for classification of new data and identification of dependencies between elements.

Use case

Assume that we are the owner of a grocery store and we want to find out which products are most often bought together by our customers. For this purpose, we can apply the PRISM Cendrowska algorithm, which will allow us to discover association rules and point out products that most often appear together in shopping baskets.

First, we need to collect data, i.e., information about the products that have been purchased in our store. Then, using the PRISM Cendrowska algorithm, we can analyze this data and discover, for example, that customers who buy bread also often buy butter and jam. We may also discover that customers who buy pasta also often buy tomato sauce and parmesan.

With this information, we can improve our marketing strategy, e.g., by placing products that often occur together next to each other on store shelves or by offering promotions that encourage customers to buy these products in a bundle. In this way, the PRISM Cendrowska algorithm can help us improve sales efficiency and tailor the store's offer to the needs of customers.

Example in Python

It's worth noting that the choice of the appropriate algorithm for discovering association rules depends on many factors, such as the size of the data set, the complexity of the association rules, and the availability of tools and computational resources. Here are some sample data and Python code that applies the PRISM algorithm.

def prism(dataset, target_attribute):
    classes = set(item[target_attribute] for item in dataset)
    rules = []

    for target_class in classes:
        data = dataset[:]
        while any(item for item in data if item[target_attribute] == target_class):
            rule = {"conditions": [], "prediction": target_class}

            while data:
                attributes = [attr for attr in data[0].keys() if attr != target_attribute]
                conditions = []

                for attribute in attributes:
                    values = set(item[attribute] for item in data)

                    for value in values:
                        score = compute_score(data, attribute, value, target_attribute, target_class)
                        conditions.append({"attribute": attribute, "value": value, "score": score})

                best_condition = max(conditions, key=lambda condition: condition["score"], default=None)

                if not best_condition or best_condition["score"] < 1:
                    break

                rule["conditions"].append({best_condition["attribute"]: best_condition["value"]})
                data = [item for item in data if item[best_condition["attribute"]] != best_condition["value"] or item[target_attribute] != target_class]

            if rule["conditions"]:
                rules.append(rule)

    return rules


def compute_score(data, attribute, value, target_attribute, target_class):
    matching_items = [item for item in data if item[attribute] == value]
    correct_predictions = [item for item in matching_items if item[target_attribute] == target_class]

    return len(correct_predictions) / len(matching_items) if matching_items else 0


dataset = [
    {"weather": "rainy", "temp": "hot", "play": "yes"},
    {"weather": "rainy", "temp": "cool", "play": "no"},
    {"weather": "sunny", "temp": "mild", "play": "yes"},
    # etc
]

print(prism(dataset, "play"))

Running this code with the sample data will return the result:

[
    {
        'conditions': [
            {'temp': 'cool'}
        ],
        'prediction': 'no'
    },
    {
        'conditions': [
            {'weather': 'sunny'},
            {'temp': 'hot'}
        ],
        'prediction': 'yes'
    }
]