Skip to main content

Posts

Showing posts from July, 2025

Filtering a large (> 2 GB) CSV file?

Are you filtering a large CSV file, typically  > 2 GB? Let's say you have a CSV file larger than 2 GB and you want to filter only the matching rows. First option that comes to our mind is shell script. It's simple and fast. Alright, lets do it. Scope I have CSV file, input.csv with 10.1 million rows The file size is 2.1 GB I need to search for any of the 200 words I have in terms.txt file  I want the matching rows in output.csv, containing any of the 200 search texts #!/bin/bash > output.csv # clear or create output file # Read each search string from search.txt while IFS= read -r search; do # Loop through all CSV files inside the folders find . -type f -name "*.csv" | while IFS= read -r csv; do # Search for the string in the CSV file and append matching rows grep -iF -- "$search" "$csv" >> output.csv done done < terms.txt   👏Cool. It completed in 5:49 mins. Problem solved! Wait.🤔 Im supposed to get only 1...