logo
down
shadow

Python script to deduplicate lines in multiple files


Python script to deduplicate lines in multiple files

By : user3100420
Date : January 12 2021, 07:00 PM
fixed the issue. Will look into that further yes, you should add the relative path (path to directory) when opening the file, as such
code :
cur_file = open(os.path.join(path, f), 'r')
output_file.write("\n".join(seen_lines))
for f in myfiles:
    cur_file = open(os.path.join(path, f), 'r')
    lines = set(cur_file.readlines())
    cur_file.close()
    with open(os.path.join(path, f), 'w') as of:
        of.write("\n".join(lines))


Share : facebook icon twitter icon
Perl script to search and replace multiple lines in multiple html files

Perl script to search and replace multiple lines in multiple html files


By : Jalal
Date : March 29 2020, 07:55 AM
To fix this issue perl -0777 -i.withdiv -pe 's{]+?id="user-info"[^>]*>.*?
}{}gsmi;' test.html
-0777 means split on nothing, so slurp in whole file (instead of line by line, the default for -p
Shell script copying lines from multiple files

Shell script copying lines from multiple files


By : Goncho
Date : March 29 2020, 07:55 AM
wish of those help I believe ordering of files is also important to make sure you get output in desired sequence.
Consider this script:
code :
n=8
while read f; do
   sed $n'q;d' "$f" >> output.txt
   ((n+=8))
done < <(printf "%s\n" values_*.txt|sort -t_ -nk2,2)
Passing individual lines from files into a python script using a bash script

Passing individual lines from files into a python script using a bash script


By : Rich
Date : March 29 2020, 07:55 AM
hop of those help? This might be a simple question, but I am new to bash scripting and have spent quite a bit of time on this with no luck; I hope I can get an answer here. , You probably meant:
code :
while read line; do
    VALUE=$line   ## No spaces allowed
    python fits_edit_head.py "$line" "$line" NEW_PARA 5  ## Quote properly to isolate arguments well
    echo "$VALUE+huh"  ## You don't expand without $
done < infile.txt
while read -u 4 line; do
     ...
done 4< infile.txt
readarray -t lines < infile.txt
for line in "${lines[@]}; do
    ...
done
Python script counting lines in multiple text files in one directory and producing simple report

Python script counting lines in multiple text files in one directory and producing simple report


By : nxdprinciple
Date : March 29 2020, 07:55 AM
wish of those help I need a python script counting lines in all text files in one directory and producing a general report on number of files with n number of lines. , Your names dict looks like that:
code :
{
    'file1.txt': 30,
    'file2.txt': 26,
    'file3.txt': 19,
    'file4.txt': 19
}
from collections import defaultdict

lines = defaultdict(int)
for val in names.values():
    lines[val] += 1

for k, v in lines.items():
    print("Files with {} lines: {}".format(k, v))
Files with 19 lines: 2
Files with 26 lines: 1
Files with 30 lines: 1
deduplicate records in multiple CSV files with varying columns

deduplicate records in multiple CSV files with varying columns


By : Sarvesh Kaushik
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , I do not known how to make it simpler. I have an equal script done for some data of mine. It just runs twice, first to determine the min / max cols in all documents and finally to rewrite the csv files in an new folder, to keep the original data.
I am just using the csv lib from python. https://docs.python.org/2/library/csv.html
code :
import os
import csv

mincols = 0xffffffff
maxcols = 0

srcdir = '/tmp/csv/'
dstdir = '/tmp/csv2/'

for dirName, subdirList, fileList in os.walk(srcdir):
    for fname in fileList:
        if fname[-4:].lower() == '.csv':
            with open(os.path.join(dirName, fname)) as csvfile:
                reader = csv.reader(csvfile, delimiter=',', quotechar='"')
                for row in reader:
                    if mincols > len(row):
                        mincols = len(row)
                    if maxcols < len(row):
                        maxcols = len(row)

print(mincols, maxcols)

for dirName, subdirList, fileList in os.walk(srcdir):
    for fname in fileList:
        if fname[-4:].lower() == '.csv':
            fullpath = os.path.join(dirName, fname)    
            newfile = os.path.join(dstdir, fullpath[len(srcdir):])

            if not os.path.exists(os.path.dirname(newfile)):
                os.makedirs(os.path.dirname(newfile))
            with open(fullpath) as csvfile:
                reader = csv.reader(csvfile, delimiter=',', quotechar='"')
                with open(newfile, 'w') as dstfile:
                    writer = csv.writer(dstfile, delimiter=',', quotechar='"',
                        quoting=csv.QUOTE_MINIMAL)
                    for row in reader:
                        #You can deduplicate here 
                        writer.writerow(row[:mincols])                  
Related Posts Related Posts :
  • Tuning the hyperparameter with gridsearch results in overfitting
  • some coordinates that I extracted from geocoder in Python are not saving in the variable I created
  • 7C in cs circles- python Im not sure what is wrong with this yet
  • How to fix 'AttributeError: 'list' object has no attribute 'shape'' error in python with Tensorflow / Keras when loading
  • python - thread`s target is a method of an object
  • Retrieve Variable From Class
  • What is the reason for matplotlib for printing labels multiple times?
  • Why would people use ThreadPoolExecutor instead of direct function call?
  • When clear_widgets is called, it doesnt remove screens in ScreenManager
  • Python can't import function
  • Pieces doesn't stack after one loop on my connect4
  • How to change font size of all .docx document with python-docx
  • How to store a word with # in .cfg file
  • How to append dictionaries to a dictionary?
  • How can I scrape text within paragraph tag with some other tags then within the paragraph text?
  • Custom entity ruler with SpaCy did not return a match
  • Logging with two handlers - one to file and one to stderr
  • How to do pivot_table in dask with aggfunc 'min'?
  • This for loop displays only the last entry of the student record
  • How to split a string by a specific pattern in number of characters?
  • Python 3: how to scrape research results from a website using CSFR?
  • Setting the scoring parameter of RandomizedSeachCV to r2
  • How to send alert or message from view.py to template?
  • How to add qml ScatterSeries to existing qml defined ChartView?
  • Django + tox: Apps aren't loaded yet
  • My css and images arent showing in django
  • Probability mass function sum 2 dice roll?
  • Cannot call ubuntu 'ulimit' from python subprocess without using shell option
  • Dataframe Timestamp Filter for new/repeating value
  • Problem with clicking select2 dropdownlist in selenium
  • pandas dataframe masks to write values into new column
  • How to click on item in navigation bar on top of page using selenium python?
  • Add multiple EntityRuler with spaCy (ValueError: 'entity_ruler' already exists in pipeline)
  • error when replacing missing ')' using negative look ahead regex in python
  • Is there a way to remove specific strings from indexes using a for loop?
  • select multiple tags by position in beautifulSoup
  • pytest: getting AttributeError: 'CaptureFixture' object has no attribute 'readouterror' capturing stdout
  • Shipping PyGObject/GTK+ app on Windows with MingW
  • How to prevent window and widgets in a pyqt5 application from changing size when the visibility of one widget is altered
  • How to draw stacked bar plot from df.groupby('feature')['label'].value_counts()
  • Python subprocess doesn't work without sleep
  • How can I adjust 'the time' in python with module Re
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • shadow
    Privacy Policy - Terms - Contact Us © festivalmusicasacra.org