How to Merge Multiple CSV Files Into One
How to Merge Multiple CSV Files Into One
Meta description: Need to combine several CSV files into one? Here are 4 methods to merge CSV files — from simple copy-paste to Python scripts that handle edge cases.
You exported 12 monthly reports. Or your team sent you five different spreadsheets. Or your API dumps data into a new CSV every day. Now you need all of it in one file.
Merging CSV files is one of those tasks that sounds trivial until you actually do it and discover your headers are duplicated, your encodings don't match, or half your columns are misaligned.
Here's how to do it right.
Before You Merge: Check Compatibility
Open a couple of your files in a CSV viewer and verify:
- Same columns? Headers should match across all files (same names, same order ideally)
- Same delimiter? All commas, or all semicolons — not a mix
- Same encoding? All UTF-8, or all Latin-1 — not a mix
- Same data formats? Dates in the same format, numbers with the same decimal separator
If any of these differ, fix them before merging. Garbage in, garbage out.
Method 1: Copy-Paste (2-3 Small Files)
The brute-force approach. Works when you have a handful of small files.
- Open all files in a text editor
- Copy everything from file 2 except the header row
- Paste it at the end of file 1
- Repeat for each additional file
- Save
Pros: No tools needed.
Cons: Tedious beyond 3 files. Easy to accidentally include duplicate headers.
Method 2: Command Line (Fast, Any Number of Files)
The fastest method for combining many files with identical structures.
macOS / Linux
bash
Keep header from first file, then data from all files
head -1 file1.csv > merged.csv
tail -n +2 -q *.csv >> merged.csv
Windows (PowerShell)
powershell
Get header from first file
Get-Content (Get-ChildItem *.csv | Select-Object -First 1) |
Select-Object -First 1 | Out-File merged.csv -Encoding UTF8
Append data (skip headers) from all files
Get-ChildItem *.csv | ForEach-Object {
Get-Content $_ | Select-Object -Skip 1
} | Add-Content merged.csv -Encoding UTF8
Pros: Handles hundreds of files in seconds.
Cons: No validation — if columns don't match, you won't know until later.
Method 3: Python with Pandas (Recommended)
The most reliable method. Handles mismatched columns, different encodings, and deduplication.
Basic Merge
python
import pandas as pd
import glob
files = glob.glob('data/*.csv')
dfs = [pd.read_csv(f) for f in files]
merged = pd.concat(dfs, ignore_index=True)
merged.to_csv('merged.csv', index=False)
print(f"Merged {len(files)} files — {len(merged)} total rows")
With Source Tracking
Add a column so you know which file each row came from:
python
dfs = []
for f in files:
df = pd.read_csv(f)
df['source_file'] = f
dfs.append(df)
merged = pd.concat(dfs, ignore_index=True)
With Deduplication
Remove duplicate rows that might exist across files:
python
merged = pd.concat(dfs, ignore_index=True)
before = len(merged)
merged = merged.drop_duplicates()
print(f"Removed {before - len(merged)} duplicate rows")
Handling Different Column Orders
Pandas matches columns by name, not position. If file A has [name, email, city] and file B has [city, name, email], pd.concat handles it correctly.
Handling Different Column Sets
If files have different columns, pd.concat fills missing values with NaN:
python
File 1: name, email
File 2: name, email, phone
Result: name, email, phone (phone is NaN for file 1 rows)
merged = pd.concat(dfs, ignore_index=True)
To keep only columns that exist in ALL files:
python
common_cols = set.intersection(*[set(df.columns) for df in dfs])
merged = pd.concat([df[list(commoncols)] for df in dfs], ignoreindex=True)
Method 4: Google Sheets
For people who prefer a visual approach:
- Import each CSV into separate sheets (tabs)
- In a new sheet, use
IMPORTRANGEor copy-paste data from each tab
- Export the combined sheet as CSV
Works fine for small files but hits Google Sheets' 10 million cell limit quickly.
Common Pitfalls
Duplicate Headers
The most common mistake. If you concatenate files without removing headers, you end up with header rows scattered through your data:
name,email,city
Alice,alice@example.com,Paris
name,email,city ← this shouldn't be here
Bob,bob@example.com,London
All methods above handle this except raw cat concatenation. Always use tail -n +2 or equivalent to skip headers.
Mixed Encodings
One file in UTF-8, another in Latin-1. The merged result will have broken characters somewhere.
Fix: Convert all files to UTF-8 before merging:
bash
Convert Latin-1 to UTF-8
iconv -f ISO-8859-1 -t UTF-8 oldfile.csv > oldfile_utf8.csv
Or in Python:
python
df = pd.read_csv(f, encoding='latin-1') # read with original encoding
pandas will write UTF-8 by default
Mixed Delimiters
Some files use commas, others use semicolons. Pandas detects this automatically in most cases, but you can be explicit:
python
df = pd.read_csv(f, sep=';') # for semicolon-separated files
Overlapping Date Ranges
If you're merging monthly exports and the date ranges overlap (e.g., end of January and start of February both appear in two files), you'll get duplicate rows.
python
merged = merged.drop_duplicates(subset=['id', 'date'])
Which Method Should You Use?
| Scenario | Best Method |
|----------|-------------|
| 2-3 small identical files | Copy-paste or command line |
| Many identical files (same columns) | Command line |
| Files with different columns | Python (Pandas) |
| Need deduplication | Python (Pandas) |
| Recurring task (monthly merge) | Python script saved for reuse |
| Quick structure check before merging | CSV Viewer Online |
Automating Recurring Merges
If you merge files regularly, save a reusable script:
python
#!/usr/bin/env python3
"""Merge all CSV files in a directory."""
import pandas as pd
import glob
import sys
directory = sys.argv[1] if len(sys.argv) > 1 else '.'
output = sys.argv[2] if len(sys.argv) > 2 else 'merged.csv'
files = sorted(glob.glob(f'{directory}/*.csv'))
if not files:
print(f"No CSV files found in {directory}")
sys.exit(1)
dfs = []
for f in files:
df = pd.read_csv(f)
df['_source'] = f
dfs.append(df)
print(f" Read {f}: {len(df)} rows")
merged = pd.concat(dfs, ignore_index=True)
merged = merged.dropduplicates(subset=[c for c in merged.columns if c != 'source'])
merged.to_csv(output, index=False)
print(f"\nMerged {len(files)} files into {output}: {len(merged)} rows")
Usage: python mergecsvs.py ./monthlyexports/ combined_2024.csv
Merging CSVs is a solved problem. Pick the method that matches your comfort level and file complexity, verify the result in a viewer, and you're done.