logo
down
shadow

Find rows from the same dataframe based on condition


Find rows from the same dataframe based on condition

By : user3042609
Date : November 28 2020, 12:01 PM
wish help you to fix your issue I think the answer to your question is "yes", but the scenario you describe feels rather abstract. I am providing a similarly abstract example that illustrates some possibilities, and I hope that you will know how it applies to your situation.
Depending on what "similar" constitutes, change the mask definition inside the function.
code :
import pandas as pd
import numpy as np

# make example repeatable
np.random.seed(0)

# make dummy data
N = 100
df = pd.DataFrame(data=np.random.choice(range(5), size=(N, 8)))
df.columns = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
def similar_rows(idx, row, df):
    mask = np.logical_and.reduce([
        df['a'] == row['a'],
        abs(df['b'] - row['b']) <= 1,
        df['h'] == (3 - row['h'])
    ])
    df_tmp = df.loc[mask, :]
    df_tmp.insert(0, 'original_index', idx)
    return df_tmp

# create result
df_new = pd.concat([similar_rows(idx, row, df) for idx, row in df.iterrows()])
df_new.reset_index(inplace=True)
df_new.rename({'index': 'similar_index'}, axis=1, inplace=True)
print(df_new.head(10))
   similar_index  original_index  a  b  c  d  e  f  g  h
0              1               0  4  0  0  4  2  1  0  1
1             88               0  4  1  4  0  0  2  3  1
2              0               1  4  0  3  3  3  1  3  2
3             59               1  4  1  4  1  4  1  2  2
4             82               1  4  0  2  3  4  3  0  2
5              4               2  1  1  1  0  2  4  3  3
6              7               2  1  1  3  3  2  3  0  3
7             37               2  1  0  2  4  4  2  4  3
8             14               3  2  3  1  2  1  4  2  3
9             16               3  2  3  0  4  0  0  2  3
# get row at random
row = df.loc[np.random.choice(N), :]
print('Randomly Selected Row:')
print(pd.DataFrame(row).T)

# create and apply a mask for arbitrarily similar rows
mask = np.logical_and.reduce([
    df['a'] == row['a'],
    abs(df['b'] - row['b']) <= 1,
    df['h'] == (3 - row['h'])
])

print('"Similar" Results:')
df_filtered = df.loc[mask, :]
print(df_filtered)
Randomly Selected Row:
    a  b  c  d  e  f  g  h
23  3  2  4  3  3  0  3  0
"Similar" Results:
    a  b  c  d  e  f  g  h
26  3  2  2  4  3  1  2  3
60  3  1  2  2  4  2  2  3
86  3  2  4  1  3  0  4  3


Share : facebook icon twitter icon
Sorting rows of a dataframe by condition and splitting it into arrays based on another condition

Sorting rows of a dataframe by condition and splitting it into arrays based on another condition


By : poosingh
Date : March 29 2020, 07:55 AM
it helps some times I think you need cut for binning with groupby by bins and by column Category2 with aggregating count and add missing values by reindex:
code :
bins = [-np.inf, 5, 10, 15, 20, 25, np.inf]
bins = pd.cut(df['Category1'], bins=bins)

mux = pd.MultiIndex.from_product([bins.unique(), df['Category2'].unique()])
a = df.groupby([bins, df['Category2']])['Category3'].count().reindex(mux).unstack(0)
print (a)
   (-inf, 5]  (5, 10]  (10, 15]  (15, 20]  (20, 25]
0        2.0      1.0       1.0       1.0       NaN
1        NaN      NaN       2.0       NaN       1.0

#select by categories of column Category2
print (a.loc[0].values)
[  2.   1.   1.   1.  nan]

print (a.loc[1].values)
[ nan  nan   2.  nan   1.]
mux = pd.MultiIndex.from_product([bins.unique(), df['Category2'].unique()])
a = df.groupby([bins, df['Category2']])['Category3'].count()
      .reindex(mux, fill_value=0)
      .unstack(0)
print (a)
   (-inf, 5]  (5, 10]  (10, 15]  (15, 20]  (20, 25]
0          2        1         1         1         0
1          0        0         2         0         1

print (a.loc[0].values)
[2 1 1 1 0]

print (a.loc[1].values)
[0 0 2 0 1]
Swapping rows between columns within dataframe for multiple dataframe based on condition

Swapping rows between columns within dataframe for multiple dataframe based on condition


By : Jobayel Hossain
Date : March 29 2020, 07:55 AM
I hope this helps you . I have a dataframe which looks like following, , I think simplier is:
code :
foo[['start','end']] = foo[['start','end']].apply(np.sort, axis=1)
print (foo)
    CHR  start  end Strand  Peak Ratio Annotation
0  chr1      1    2      +  0.10    NA       TSS1
1  chr2      3    4      -  0.03    NA       TSS2
2  chr3      6    7      +  0.70    NA       TSS3
df1 = foo[['start','end']]
foo['start'] = df1.min(axis=1)
foo['end'] =   df1.max(axis=1)
print (foo)
    CHR  start  end Strand  Peak Ratio Annotation
0  chr1      1    2      +  0.10    NA       TSS1
1  chr2      3    4      -  0.03    NA       TSS2
2  chr3      6    7      +  0.70    NA       TSS3
b = foo['start'] < foo['end']
foo[['start','end']] = np.where(np.column_stack([b,b]),
                                foo[['start','end']],
                                foo[['end','start']])
print (foo)
    CHR  start  end Strand  Peak Ratio Annotation
0  chr1      1    2      +  0.10    NA       TSS1
1  chr2      3    4      -  0.03    NA       TSS2
2  chr3      6    7      +  0.70    NA       TSS3
def fun(foo):
    b = foo['start'] < foo['end']
    foo[['start','end']] = np.where(np.column_stack([b,b]), 
                                    foo[['start','end']], 
                                    foo[['end','start']])
    return foo

print (fun(foo))
    CHR  start  end Strand  Peak Ratio Annotation
0  chr1      1    2      +  0.10    NA       TSS1
1  chr2      3    4      -  0.03    NA       TSS2
2  chr3      6    7      +  0.70    NA       TSS3
Filter rows dataframe based on condition in different dataframe using dplyr

Filter rows dataframe based on condition in different dataframe using dplyr


By : user1983124
Date : March 29 2020, 07:55 AM
seems to work fine I have two dataframes: df1 and df2 (see example below). df1 contains a numeric start and end value per character id. df2 contains multiple events per character id, including a numeric time value. , What about something like this, with dplyr:
code :
  df1 %>% 
  left_join(df2) %>%                       #joining to have one dataset
  filter(time <= end, time >= start) %>%   # filter, you can use <, > in case
  select(-c(2,3))                          # remove useless column if necessary

# A tibble: 4 x 3
  id     time keep 
  <chr> <dbl> <lgl>
1 A         3 TRUE 
2 A         5 TRUE 
3 B         3 TRUE 
4 B         4 TRUE 
How to push rows of an existing dataframe to a new dataframe based on a condition?

How to push rows of an existing dataframe to a new dataframe based on a condition?


By : user2992278
Date : March 29 2020, 07:55 AM
I wish this help you Use botwise and by & and because priority precedence add () for chained boolean masks:
code :
minor = data[data.NAVD88 <= 5]
moderate = data[(data.NAVD88 > 5) & (data.NAVD88 < 7)]
major = data[data.NAVD88 >= 7]
Pandas dataframe apply lambda to selected rows only (based on a condition) within the dataframe

Pandas dataframe apply lambda to selected rows only (based on a condition) within the dataframe


By : Rex
Date : March 29 2020, 07:55 AM
may help you . Update: This solution works:
df['GlobalName'] = np.where(df['GlobalName']=='', df['IsPerson'].apply(lambda x: x if x==True else ''), df['GlobalName'])
Related Posts Related Posts :
  • Tuning the hyperparameter with gridsearch results in overfitting
  • some coordinates that I extracted from geocoder in Python are not saving in the variable I created
  • 7C in cs circles- python Im not sure what is wrong with this yet
  • How to fix 'AttributeError: 'list' object has no attribute 'shape'' error in python with Tensorflow / Keras when loading
  • python - thread`s target is a method of an object
  • Retrieve Variable From Class
  • What is the reason for matplotlib for printing labels multiple times?
  • Why would people use ThreadPoolExecutor instead of direct function call?
  • When clear_widgets is called, it doesnt remove screens in ScreenManager
  • Python can't import function
  • Pieces doesn't stack after one loop on my connect4
  • How to change font size of all .docx document with python-docx
  • How to store a word with # in .cfg file
  • How to append dictionaries to a dictionary?
  • How can I scrape text within paragraph tag with some other tags then within the paragraph text?
  • Custom entity ruler with SpaCy did not return a match
  • Logging with two handlers - one to file and one to stderr
  • How to do pivot_table in dask with aggfunc 'min'?
  • This for loop displays only the last entry of the student record
  • How to split a string by a specific pattern in number of characters?
  • Python 3: how to scrape research results from a website using CSFR?
  • Setting the scoring parameter of RandomizedSeachCV to r2
  • How to send alert or message from view.py to template?
  • How to add qml ScatterSeries to existing qml defined ChartView?
  • Django + tox: Apps aren't loaded yet
  • My css and images arent showing in django
  • Probability mass function sum 2 dice roll?
  • Cannot call ubuntu 'ulimit' from python subprocess without using shell option
  • Dataframe Timestamp Filter for new/repeating value
  • Problem with clicking select2 dropdownlist in selenium
  • pandas dataframe masks to write values into new column
  • How to click on item in navigation bar on top of page using selenium python?
  • Add multiple EntityRuler with spaCy (ValueError: 'entity_ruler' already exists in pipeline)
  • error when replacing missing ')' using negative look ahead regex in python
  • Is there a way to remove specific strings from indexes using a for loop?
  • select multiple tags by position in beautifulSoup
  • pytest: getting AttributeError: 'CaptureFixture' object has no attribute 'readouterror' capturing stdout
  • Shipping PyGObject/GTK+ app on Windows with MingW
  • Python script to deduplicate lines in multiple files
  • How to prevent window and widgets in a pyqt5 application from changing size when the visibility of one widget is altered
  • How to draw stacked bar plot from df.groupby('feature')['label'].value_counts()
  • Python subprocess doesn't work without sleep
  • How can I adjust 'the time' in python with module Re
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • shadow
    Privacy Policy - Terms - Contact Us © festivalmusicasacra.org