logo
down
shadow

calculate number and names of similar sounding words from a data frame


calculate number and names of similar sounding words from a data frame

By : user3100431
Date : January 12 2021, 07:00 PM
I hope this helps you . If I understood correctly, it seems like you are looking for Levenshtein distance between your subject words. The adist function in the utils package can calculate the Levenshtein distance for you. It returns a matrix with the number of substutions/insertions/deletions to get from the i-th word to the j-th word.
code :
dist <- utils::adist(Word)
dist
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
 [1,]    0    1    2    4    1    1    1    4    2     2     2
 [2,]    1    0    1    4    2    2    1    4    2     2     1
 [3,]    2    1    0    4    2    3    2    4    3     2     2
 [4,]    4    4    4    0    4    4    4    2    3     4     4
 [5,]    1    2    2    4    0    2    2    3    3     1     2
 [6,]    1    2    3    4    2    0    2    4    3     3     2
 [7,]    1    1    2    4    2    2    0    4    2     2     2
 [8,]    4    4    4    2    3    4    4    0    2     3     3
 [9,]    2    2    3    3    3    3    2    2    0     3     3
[10,]    2    2    2    4    1    3    2    3    3     0     2
[11,]    2    1    2    4    2    2    2    3    3     2     0
links <- apply(dist, 1, function(d) {
  paste0(Word[d == 1], collapse = ", ")
})
cbind.data.frame(Word, links)
   Word              links
1   bat cat, ban, bait, at
2   cat bat, cab, at, cant
3   cab                cat
4  some                   
5   ban           bat, ran
6  bait                bat
7    at           bat, cat
8  done                   
9   dot                   
10  ran                ban
11 cant                cat
counts <- apply(dist, 1, function(d){sum(d == 1)})


Share : facebook icon twitter icon
Detect similar sounding words in Ruby

Detect similar sounding words in Ruby


By : Matteo
Date : March 29 2020, 07:55 AM
it helps some times I think you're describing levenshtein distance. And yes, there are gems for that. If you're into pure Ruby go for the text gem.
code :
$ gem install text
Text::Levenshtein.distance('test', 'test')    # => 0
Text::Levenshtein.distance('test', 'tent')    # => 1
$ gem install levenshtein
String.module_eval do
   def similar?(other, threshold=2)
    distance = Text::Levenshtein.distance(self, other)
    distance <= threshold
  end
end
Detect similar sounding words in java

Detect similar sounding words in java


By : svirpav
Date : March 29 2020, 07:55 AM
Does that help There are several algorithms developed to compare words by how they sound. The most basic one is soundex, and there is an Apache implementation of it here:
http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Soundex.html
Fetching similar sounding names from a table

Fetching similar sounding names from a table


By : Janak
Date : March 29 2020, 07:55 AM
To fix the issue you can do You could use MySQL SOUNDEX:
code :
SELECT * FROM `stu_table` WHERE STRCMP(SOUNDEX(`stu_name`), SOUNDEX('Mrinmoy')) <= 0 
How to get the similar-sounding words together

How to get the similar-sounding words together


By : user2573844
Date : March 29 2020, 07:55 AM
will help you First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish:
code :
from jellyfish import soundex

print(soundex("two"))
print(soundex("to"))
T000
T000
def getSoundexList(dList):
    res = [soundex(x) for x in dList]   # iterate over each elem in the dataList
    # print(res)     # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
    return res

dataList = ['two','fourth','forth','dessert','to','desert']    
print([x for x in sorted(getSoundexList(dataList))])
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
import fuzzy
soundex = fuzzy.Soundex(4)

print(soundex("to"))
print(soundex("two"))
T000
T000
from itertools import groupby

def getSoundexList(dList):
    return sorted([soundex(x) for x in dList])

dataList = ['two','fourth','forth','dessert','to','desert']    
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
from operator import itemgetter

def getSoundexDict(dList):
    return sorted(dict_.items(), key=itemgetter(1))  # sorting the dict_ on val

dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList]    # to get the val for each elem
dict_ = dict(list(zip(dataList, res)))  # dict_ with k,v as name/val

print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
Calculate number and names of similar sounding words from two different data frames

Calculate number and names of similar sounding words from two different data frames


By : sanjeev K
Date : October 04 2020, 10:00 PM
I wish this helpful for you I have two data frames , An option is agrep. Based on the description off agrep
code :
cbind(df1['Word1'], do.call(rbind, lapply(df1$Word1, function(x) {
         i1 <- agrep(x, df2$Word2)
     data.frame(links = toString(df2$Word2[i1]) , counts = length(i1))})))
Related Posts Related Posts :
  • How to retrieve the data frame used in a GEE model fit?
  • R: How to find/replace and then automatically execute code?
  • Slope of time series (xts) object over rolling window
  • Is there an R function for comparing rows in data.frame?
  • Changing linetype and line color with plot_model()
  • Update existing package on CRAN
  • Delete NA data ,but with certain condition in R
  • Reset input fields of dynamically generated widgets through insertUI
  • Select certain region of column for lm
  • Convert multiple rows into one row depending on unique values in another column
  • Issues installing Plotly Dash for R
  • Is there an R function to retrieve values from a matrix of column names?
  • R;Too slow to overate loops for million vectors
  • How to optimize intersect of rows and columns in a matrix?
  • Format and export the output of Mann-Kendall test in R to excel from Rstudio
  • how to calculate cumsum with depreciation in a grouped dataframe?
  • reshape wide to long based on part of column name
  • How to get a hyperlink for the words in a description in an r dataframe?
  • shinymeta works locally but breaks when published to shinyapps.io
  • Deparse and (un)escape quotes
  • Regression table with clustered standard errors in R jupyter notebook?
  • Disaggregate quarterly data to daily data in R keeping values?
  • How to save output to console and file simultaneously in RStudio server?
  • Why does data.table j have a different environment when directly calling mget() vs calling mget() inside another functio
  • scale_fill_viridis_c color bar on a log scale
  • How to change the lab name corresponding to function in ggplot
  • R, filtering for an element in a list in a dataframe cell
  • Extracting only bottom temperature from 4d NetCDF file
  • How to add/wrap lines of text to .tex with .sh script
  • R - building new variables from sequenced data
  • Sum rows values one after the other
  • Nesting ifelse inside summarytools
  • How best to divide different levels of a factor by one another in dataframe in R?
  • Why does my code run multiple times before I type data into the table? How do I make an action button that creates a tab
  • How to impute missing values not at random?
  • Set the y limits of an added average line of a plotly plot
  • how to calculate a new column after grouping with dplyr
  • Extract data from rows creating new columns using R
  • Create a filled area line plot with plotly
  • When do I need parentheses around an if statement to control the sequence of a formula in R?
  • my graph in ggplot2 contains an "e" character in y-axis
  • Making variables immutable in R
  • R: Difference between the subsequent ranks of a item group by date
  • Match data within multiple time-frames with dplyr
  • Conditional manipulation and extension of rows in data.table also considering previous extensions without for-loop
  • Conditional formula referring to preview row in DF not working
  • Set hoverinfo text in plotly scatterplot
  • Histogram of Sums from Categorical/Binary Data
  • Efficiently find set differences and generate random sample
  • Find closest points from data set B to point in data set A, using lat long in R
  • dplyr join on column A OR column B
  • Replace all string if row starts with (within a column)
  • Is there a possibility to combine position_stack and nudge_x in a stacked bar chart in ggplot2?
  • How can I extract bounding boxes in a row-wise manner using R?
  • How do I easily sum up values in different columns?
  • How to identify all columns that contain binary representation
  • Filter different groups by different factor levels
  • Saving .xlsx file to disc, form http post request
  • Add an "all" option under the filter that selects the number of rows displayed in a datatable
  • How to select second column of every xts in list
  • shadow
    Privacy Policy - Terms - Contact Us © festivalmusicasacra.org