Quick way to find all permutations of a pandas DataFrame that preserves a sort?
By : user3575333
Date : March 29 2020, 07:55 AM
it fixes the issue Since you're grouping by age, let's do that and return all the permutations for each group and then take the product (using itertools' product and permutation functions): code :
In [11]: age = df.groupby("age")
In [12]: age.get_group(21)
Out[12]:
age name
2 21 Chris
4 21 Evan
In [13]: list(permutations(age.get_group(21).index))
Out[13]: [(2, 4), (4, 2)]
In [14]: [df.loc[list(p)] for p in permutations(age.get_group(21).index)]
Out[14]:
[ age name
2 21 Chris
4 21 Evan, age name
4 21 Evan
2 21 Chris]
In [21]: [list(permutations(grp.index)) for (name, grp) in age]
Out[21]: [[(1,)], [(2, 4), (4, 2)], [(3,)], [(0,)]]
In [22]: list(product(*[(permutations(grp.index)) for (name, grp) in age]))
Out[22]: [((1,), (2, 4), (3,), (0,)), ((1,), (4, 2), (3,), (0,))]
In [23]: [sum(tups, ()) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[23]: [(1, 2, 4, 3, 0), (1, 4, 2, 3, 0)]
In [24]: [df.loc[list(sum(tups, ()))] for tups in product(*[list(permutations(grp.index)) for (name, grp) in age])]
Out[24]:
[ age name
1 20 Bob
2 21 Chris
4 21 Evan
3 22 David
0 28 Abe, age name
1 20 Bob
4 21 Evan
2 21 Chris
3 22 David
0 28 Abe]
In [25]: [list(df.loc[list(sum(tups, ())), "name"]) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[25]:
[['Bob', 'Chris', 'Evan', 'David', 'Abe'],
['Bob', 'Evan', 'Chris', 'David', 'Abe']]

quantile normalization on pandas dataframe
By : Sarah Yam
Date : March 29 2020, 07:55 AM
seems to work fine Ok I implemented the method myself of relatively high efficiency. After finishing, this logic seems kind of easy but, anyway, I decided to post it here for any one feels confused like I was when I couldn't googled the available code.

Rank Pandas dataframe by quantile
By : Cindy Xu
Date : March 29 2020, 07:55 AM
To fix the issue you can do Method 1 mul & np.ceil You were quite close with the rank. Just multiplying by 5 with .mul to get the desired quantile, also rounding up with np.ceil: code :
np.ceil(df.rank(axis=1, pct=True).mul(5))
AC BO C CCM CL CRD CT DA GC GF
20100119 5.0 2.0 2.0 4.0 1.0 1.0 3.0 4.0 5.0 3.0
20100120 2.0 2.0 5.0 1.0 1.0 3.0 4.0 5.0 3.0 4.0
20100121 5.0 2.0 2.0 4.0 1.0 1.0 3.0 4.0 5.0 3.0
np.ceil(df.rank(axis=1, pct=True).mul(5)).astype(int)
np.ceil(df.rank(axis=1, pct=True).mul(5)).astype('Int64')
AC BO C CCM CL CRD CT DA GC GF
20100119 5 2 2 4 1 1 3 4 5 3
20100120 2 2 5 1 1 3 4 5 3 4
20100121 5 2 2 4 1 1 3 4 5 3
d = df.apply(lambda x: [np.ceil(stats.percentileofscore(x, a, 'rank')*0.05) for a in x], axis=1).values
pd.DataFrame(data=np.concatenate(d).reshape(d.shape[0], len(d[0])),
columns=df.columns,
dtype='int',
index=df.index)
AC BO C CCM CL CRD CT DA GC GF
20100119 5 2 2 4 1 1 3 4 5 3
20100120 2 2 5 1 1 3 4 5 3 4
20100121 5 2 2 4 1 1 3 4 5 3

Quick way to find previous instance of a value in a pandas Dataframe or numpy array?
By : user3286220
Date : March 29 2020, 07:55 AM
may help you . I have an large data set (number of rows in millions) which I read into a pandas DataFrame called datafile. , This is faster: code :
datafile['Prev_Price'] = datafile.groupby('OrderId')['Price'].shift(fill_value=0)
Price Qty OrderId Prev_Price
0 26690 3000 1213772 0
1 26700 3000 1215673 0
2 26705 6000 1216656 0
3 26700 3000 1213772 26690
4 26710 3000 1215673 26700

How to Use Groupby Quantile with Pandas Dataframe
By : user3522291
Date : March 29 2020, 07:55 AM
like below fixes the issue If I understand you correctly, you want GroupBy with pd.qcut to get the quantiles and then take the rows in the highest quantile: code :
quantiles = (
df.groupby(['Name', 'Date'])['Value'].apply(lambda x: pd.qcut(x, 4, labels=[0, 0.25, 0.5, 1]))
)
top_quantile_df = df[quantiles.eq(1)]
Name Date Item Quantity Unit Cost Value
0 Alex 2018 Q1 AA 9 8.97 80.73
5 Alex 2018 Q2 AA 4 7.00 28.00
8 Ray 2018 Q1 AA 8 5.30 42.40
11 Ray 2018 Q2 DD 4 8.00 32.00

