library(arules)data(Groceries) #加载数据集inspect(Groceries) #查看数据内容
freq=eclat(Groceries,parameter = list(support=0.05,maxlen=10))inspect(freq) #查看频繁项集情况
' items support
[1] {whole milk,yogurt} 0.05602440 [2] {whole milk,rolls/buns} 0.05663447 [3] {other vegetables,whole milk} 0.07483477 [4] {whole milk} 0.25551601 [5] {other vegetables} 0.19349263 [6] {rolls/buns} 0.18393493 [7] {yogurt} 0.13950178 [8] {soda} 0.17437722 [9] {root vegetables} 0.10899847 [10] {tropical fruit} 0.10493137 [11] {bottled water} 0.11052364 [12] {sausage} 0.09395018 [13] {shopping bags} 0.09852567 [14] {citrus fruit} 0.08276563 [15] {pastry} 0.08896797 [16] {pip fruit} 0.07564820 [17] {whipped/sour cream} 0.07168277 [18] {fruit/vegetable juice} 0.07229283 [19] {domestic eggs} 0.06344687 [20] {newspapers} 0.07981698 [21] {butter} 0.05541434 [22] {margarine} 0.05856634 [23] {brown bread} 0.06487036 [24] {bottled beer} 0.08052872 [25] {frankfurter} 0.05897306 [26] {pork} 0.05765125 [27] {napkins} 0.05236401 [28] {curd} 0.05327911 [29] {beef} 0.05246568 [30] {coffee} 0.05805796 [31] {canned beer} 0.07768175'从结果来看,总共有31个频繁项集,其中有很多只有一个条目的内容,最小支持度可能太大了。
set of 15 rules
rule length distribution (lhs + rhs):sizes
3 15Min. 1st Qu. Median Mean 3rd Qu. Max.
3 3 3 3 3 3summary of quality measures:
support confidence lift Min. :0.01007 Min. :0.5000 Min. :1.984 1st Qu.:0.01174 1st Qu.:0.5151 1st Qu.:2.036 Median :0.01230 Median :0.5245 Median :2.203 Mean :0.01316 Mean :0.5411 Mean :2.299 3rd Qu.:0.01403 3rd Qu.:0.5718 3rd Qu.:2.432 Max. :0.02227 Max. :0.5862 Max. :3.030mining info:
data ntransactions support confidence Groceries 9835 0.01 0.5接下来查看,具体的规则内容
< lhs rhs support
[1] {curd,yogurt} => {whole milk} 0.01006609 [2] {other vegetables,butter} => {whole milk} 0.01148958 [3] {other vegetables,domestic eggs} => {whole milk} 0.01230300 [4] {yogurt,whipped/sour cream} => {whole milk} 0.01087951 [5] {other vegetables,whipped/sour cream} => {whole milk} 0.01464159 [6] {pip fruit,other vegetables} => {whole milk} 0.01352313 [7] {citrus fruit,root vegetables} => {other vegetables} 0.01037112 [8] {tropical fruit,root vegetables} => {other vegetables} 0.01230300 [9] {tropical fruit,root vegetables} => {whole milk} 0.01199797 [10] {tropical fruit,yogurt} => {whole milk} 0.01514997 [11] {root vegetables,yogurt} => {other vegetables} 0.01291307 [12] {root vegetables,yogurt} => {whole milk} 0.01453991 [13] {root vegetables,rolls/buns} => {other vegetables} 0.01220132 [14] {root vegetables,rolls/buns} => {whole milk} 0.01270971 [15] {other vegetables,yogurt} => {whole milk} 0.02226741 confidence lift [1] 0.5823529 2.279125 [2] 0.5736041 2.244885 [3] 0.5525114 2.162336 [4] 0.5245098 2.052747 [5] 0.5070423 1.984385 [6] 0.5175097 2.025351 [7] 0.5862069 3.029608 [8] 0.5845411 3.020999 [9] 0.5700483 2.230969 [10] 0.5173611 2.024770 [11] 0.5000000 2.584078 [12] 0.5629921 2.203354 [13] 0.5020921 2.594890 [14] 0.5230126 2.046888 [15] 0.5128806 2.007235>我们可以按照支持度对各关联规则进行排名后进行查看
< lhs rhs support
[1] {other vegetables,yogurt} => {whole milk} 0.02226741 [2] {tropical fruit,yogurt} => {whole milk} 0.01514997 [3] {other vegetables,whipped/sour cream} => {whole milk} 0.01464159 [4] {root vegetables,yogurt} => {whole milk} 0.01453991 [5] {pip fruit,other vegetables} => {whole milk} 0.01352313 [6] {root vegetables,yogurt} => {other vegetables} 0.01291307 [7] {root vegetables,rolls/buns} => {whole milk} 0.01270971 [8] {other vegetables,domestic eggs} => {whole milk} 0.01230300 [9] {tropical fruit,root vegetables} => {other vegetables} 0.01230300 [10] {root vegetables,rolls/buns} => {other vegetables} 0.01220132 confidence lift [1] 0.5128806 2.007235 [2] 0.5173611 2.024770 [3] 0.5070423 1.984385 [4] 0.5629921 2.203354 [5] 0.5175097 2.025351 [6] 0.5000000 2.584078 [7] 0.5230126 2.046888 [8] 0.5525114 2.162336 [9] 0.5845411 3.020999 [10] 0.5020921 2.594890>可以看到结果中,当购物篮中有other vegetables和yogurt两个物品时,也有whole milk的支持度最好,达到0.02。
比如我们要求结果中,被关联项是whole mile 同时lift值要大于2.2inspect(subset(model,subset=rhs%in%"whole milk"&lift>=2.2))
< lhs rhs support confidence lift
[1] {curd,yogurt} => {whole milk} 0.01006609 0.5823529 2.279125 [2] {other vegetables,butter} => {whole milk} 0.01148958 0.5736041 2.244885 [3] {tropical fruit,root vegetables} => {whole milk} 0.01199797 0.5700483 2.230969 [4] {root vegetables,yogurt} => {whole milk} 0.01453991 0.5629921 2.203354>再看结果中,只剩下4个lift值较高的关联规则。
lift=P(L,R)/(P(L)P(R)) 是一个类似相关系数的指标。lift=1时表示L和R独立。这个数越大,越表明L和R存在在一个购物篮中不是偶然现象。相关的筛选规则的符合解释:
%pin%是部分匹配,也就是说只要item like '%A%' or item like '%B%' %ain%是完全匹配,也就是说itemset has ’A' and itemset has ‘B' 同时可以通过 条件运算符(&, |, !) 添加 support, confidence, lift的过滤条件。