pandas - Python interaction in dataframe -


i have following dataframe:

   exam_id   student  semester 0     01               1 1     02        b        2 2     03        c        3 3     01        d        1 4     02        e        2 5     03        f        3 6     01        g        1 

i create new dataframe containing 4 columns: "student", "shared exam with", "semester", "number of shared exams".

       student shared_exam_with  semester number_of_shared_exam     0            d                1             1     1            g                1             1     2     b        e                2             1     3     c        f                3             1     4     d                       1             1     5     d        g                1             1     6     e        b                2             1     7     f        c                3             1      8     g                       1             1     9     g        d                1             1 

any suggestion?

idx_cols = ['exam_id', 'semester'] std_cols = ['student_x', 'student_y'] d1 = df.merge(df, on=idx_cols) d2 = d1.loc[d1.student_x != d1.student_y, idx_cols + std_cols]  d2.loc[:, std_cols] = np.sort(d2.loc[:, std_cols])  d3 = d2.drop_duplicates().groupby(     std_cols + ['semester']).size().reset_index(name='count')  print(d3)    student_x student_y semester  count 0                 d        1      1 1                 g        1      1 2         b         e        2      1 3         c         f        3      1 4         d         g        1      1 

how works

  • self merge on semester , exam_id
  • get rid of self sharing
  • sort each row of student pairs can see duplicate combinations
  • drop duplicates
  • group students (include semester see in result)

Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -