string - how to implement jaccard coefficient in java? -


assume have 2 strings this.

query1: "ideas of march"

query2: "ceaser died in march"

function(j) = (query1 intersection query2)/ (query1 union query2)

i looking @ accuracy respect number of tokens(words), irrespective position.

query1 intersection query2 = 1 {march}

query1 union query2 = 6 {ideas, of, march,ceaser, died, in}

in context function(j) should return 1/6.

is there anyway can find intersection count , union count of 2 sentences? example, in here,

public double calculatesimilarity(string  onecontent,  string othercontent) {     double numerator   = intersection(onecontent,othercontet);     double denominator = union(onecontent,othercontet);      return denominator.size() > 0 ?      (double)numerator.size()/(double)denominator.size() : 0; } 

is these available function in java intersection count , union count without using external libraries google guava?

as interested in size of union/intersection, can calculate size of these 2 sets without creating union , intersection set (union(a, b).size() a.size() + b.size() - intersection(a, b).size() -> intersection size required).

public static void main(string[] args) {     final string = "ideas of march";     final string b = "ceaser died in march";     final java.util.regex.pattern p         = java.util.regex.pattern.compile("\\s+");     final double similarity = similarity(             p.splitasstream(a).collect(java.util.stream.collectors.toset()),             p.splitasstream(b).collect(java.util.stream.collectors.toset()));     assert similarity == 1d / 6;     system.out.println(similarity); // 0.1666... }  public static double similarity(set<?> left, set<?> right) {     final int sa = left.size();     final int sb = right.size();     if ((sa - 1 | sb - 1) < 0)         return (sa | sb) == 0 ? emptyjaccardsimilaritycoefficient : 0;     if ((sa + 1 & sb + 1) < 0)         return parallelsimilarity(left, right);     final set<?> smaller = sa <= sb ? left : right;     final set<?> larger  = sa <= sb ? right : left;     int intersection = 0;     (final object element : smaller) try {         if (larger.contains(element))             intersection++;     } catch (final classcastexception | nullpointerexception e) {}     final long sum = (sa + 1 > 0 ? sa : left.stream().count())                    + (sb + 1 > 0 ? sb : right.stream().count());     return 1d / (sum - intersection) * intersection; } 

Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -