string - how to implement jaccard coefficient in java? -
assume have 2 strings this.
query1: "ideas of march"
query2: "ceaser died in march"
function(j) = (query1 intersection query2)/ (query1 union query2)
i looking @ accuracy respect number of tokens(words), irrespective position.
query1 intersection query2 = 1 {march}
query1 union query2 = 6 {ideas, of, march,ceaser, died, in}
in context function(j) should return 1/6.
is there anyway can find intersection count , union count of 2 sentences? example, in here,
public double calculatesimilarity(string onecontent, string othercontent) { double numerator = intersection(onecontent,othercontet); double denominator = union(onecontent,othercontet); return denominator.size() > 0 ? (double)numerator.size()/(double)denominator.size() : 0; }
is these available function in java intersection count , union count without using external libraries google guava?
as interested in size of union/intersection, can calculate size of these 2 sets without creating union , intersection set (union(a, b).size()
a.size() + b.size() - intersection(a, b).size()
-> intersection size required).
public static void main(string[] args) { final string = "ideas of march"; final string b = "ceaser died in march"; final java.util.regex.pattern p = java.util.regex.pattern.compile("\\s+"); final double similarity = similarity( p.splitasstream(a).collect(java.util.stream.collectors.toset()), p.splitasstream(b).collect(java.util.stream.collectors.toset())); assert similarity == 1d / 6; system.out.println(similarity); // 0.1666... } public static double similarity(set<?> left, set<?> right) { final int sa = left.size(); final int sb = right.size(); if ((sa - 1 | sb - 1) < 0) return (sa | sb) == 0 ? emptyjaccardsimilaritycoefficient : 0; if ((sa + 1 & sb + 1) < 0) return parallelsimilarity(left, right); final set<?> smaller = sa <= sb ? left : right; final set<?> larger = sa <= sb ? right : left; int intersection = 0; (final object element : smaller) try { if (larger.contains(element)) intersection++; } catch (final classcastexception | nullpointerexception e) {} final long sum = (sa + 1 > 0 ? sa : left.stream().count()) + (sb + 1 > 0 ? sb : right.stream().count()); return 1d / (sum - intersection) * intersection; }
Comments
Post a Comment