pandas - Python interaction in dataframe -
i have following dataframe:
exam_id student semester 0 01 1 1 02 b 2 2 03 c 3 3 01 d 1 4 02 e 2 5 03 f 3 6 01 g 1
i create new dataframe containing 4 columns: "student", "shared exam with", "semester", "number of shared exams".
student shared_exam_with semester number_of_shared_exam 0 d 1 1 1 g 1 1 2 b e 2 1 3 c f 3 1 4 d 1 1 5 d g 1 1 6 e b 2 1 7 f c 3 1 8 g 1 1 9 g d 1 1
any suggestion?
idx_cols = ['exam_id', 'semester'] std_cols = ['student_x', 'student_y'] d1 = df.merge(df, on=idx_cols) d2 = d1.loc[d1.student_x != d1.student_y, idx_cols + std_cols] d2.loc[:, std_cols] = np.sort(d2.loc[:, std_cols]) d3 = d2.drop_duplicates().groupby( std_cols + ['semester']).size().reset_index(name='count') print(d3) student_x student_y semester count 0 d 1 1 1 g 1 1 2 b e 2 1 3 c f 3 1 4 d g 1 1
how works
- self
merge
onsemester
,exam_id
- get rid of self sharing
- sort each row of student pairs can see duplicate combinations
- drop duplicates
- group students (include semester see in result)
Comments
Post a Comment