A Python script to remove unwanted columns from user export CSV (or anything else)

Comment · June 17, 2025

Hi Atlassian Community!

Recently, I came across a discussion about managing large Jira user export files and the complications of importing the data, for example, into Google Sheets.
Usually, the approach to workaround this is to manually modify the CSV files, and this can be time-consuming depending on the situation.

I'm sharing a helpful Python script designed to clean up large CSV files by removing unwanted columns. This script addresses the challenge of handling extensive exports, such as those from Jira's user management. These exports often contain many columns and can be difficult to import or process in standard spreadsheet applications like Google Sheets due to file size or column limitations.

It's perfect for streamlining your data for analysis, making files manageable for specific tools, and focusing only on the information you need.

The solution

This script provides an interactive way to select and remove columns from any CSV file. It first lists all CSV files in the current directory, prompts you to choose one, and then displays all columns within that file by number. You can then specify which columns to remove, and the script will generate a new, cleaned CSV file, leaving your original file untouched.

Here's what the script focuses on:

Interactive CSV selection
Easily pick the CSV file you want to clean from a list of available files in your directory
Column selection by number
View a numbered list of all columns in your chosen CSV and simply enter the numbers of the columns you wish to remove
Clear confirmation
Before making any changes, the script will show you a summary of which columns will be removed and which will remain, and ask for your confirmation
New, cleaned CSV file
The script outputs a new/separate CSV file (e.g., your_file_cleaned.csv) with the specified columns removed, preserving your original data in the original file

Key features

User-friendly interface for selecting files and columns
Provides a clear overview of columns to be removed and those to remain
Generates a new CSV file, ensuring your original data is safe
Includes robust error handling for file operations and user input

Preparation

1) Ensure that the CSV file you wish to clean is in the same directory as the Python script
2) Install the necessary Python libraries by running:

pip install pandas

3) Run the script from your terminal: python your_script_name.py

The script

import pandas as pd
import os
import sys

def get_csv_files():
    """Lists all CSV files in the current directory."""
    csv_files = [f for f in os.listdir('.') if f.lower().endswith('.csv')]
    return csv_files

def select_csv_file(csv_files):
    """Allows the user to select a CSV file from a list."""
    if not csv_files:
        print("No CSV files found in the current directory.")
        sys.exit()

    print("\nAvailable CSV files:")
    for i, file_name in enumerate(csv_files):
        print(f"{i + 1}. {file_name}")

    while True:
        try:
            choice = input("Enter the number of the CSV file to clean: ")
            index = int(choice) - 1
            if 0 <= index < len(csv_files):
                return csv_files[index]
            else:
                print("Invalid number. Please try again.")
        except ValueError:
            print("Invalid input. Please enter a number.")

def get_columns_to_remove(df_columns):
    """Prompts the user to enter column numbers to remove and converts them to names."""
    print("\nColumns in the selected CSV file:")
    for i, col_name in enumerate(df_columns.tolist()):
        print(f"{i + 1} - {col_name}")

    while True:
        columns_input = input(
            "\nEnter the column NUMBERS to remove (comma-separated), or type 'list' to see columns again: "
        ).strip()

        if columns_input.lower() == 'list':
            print("\nColumns in the selected CSV file:")
            for i, col_name in enumerate(df_columns.tolist()):
                print(f"{i + 1} - {col_name}")
            continue

        if not columns_input:
            print("No column numbers entered. Please enter numbers or 'list'.")
            continue

        # Process the input numbers
        selected_indices = []
        invalid_inputs = []
        col_names_to_remove = []

        raw_numbers = [num.strip() for num in columns_input.split(',')]

        for num_str in raw_numbers:
            try:
                col_index = int(num_str) - 1 # Convert to 0-based index
                if 0 <= col_index < len(df_columns):
                    selected_indices.append(col_index)
                else:
                    invalid_inputs.append(num_str)
            except ValueError:
                invalid_inputs.append(num_str)

        if invalid_inputs:
            print(f"Warning: The following inputs were invalid (not numbers or out of range): {', '.join(invalid_inputs)}")
            print("Please re-enter valid column numbers.")
            continue # Ask for input again

        if not selected_indices:
            print("No valid column numbers selected. Please try again.")
            continue

        # Convert valid indices to column names
        for index in selected_indices:
            col_names_to_remove.append(df_columns[index])

        # Remove duplicates while preserving order for consistency
        col_names_to_remove = list(dict.fromkeys(col_names_to_remove))

        return col_names_to_remove

def confirm_changes(original_columns, columns_to_remove):
    """Presents a summary of changes and asks for user confirmation."""
    df_columns_list = original_columns.tolist()
    # Ensure columns_to_remove only contains actual existing columns
    valid_cols_to_remove = [col for col in columns_to_remove if col in df_columns_list]
    invalid_cols_entered = [col for col in columns_to_remove if col not in df_columns_list]

    # Calculate remaining columns
    remaining_columns = [col for col in df_columns_list if col not in valid_cols_to_remove]

    print("\n--- Proposed Changes Summary ---")

    print(f"**Original Columns ({len(original_columns)}):**")
    if original_columns.empty:
        print("  (No columns found)")
    else:
        for col in df_columns_list:
            print(f"  - {col}")

    print(f"\n**Columns to be REMOVED ({len(valid_cols_to_remove)}):**")
    if not valid_cols_to_remove:
        print("  (None selected for removal or no valid columns found for removal)")
    else:
        for col in valid_cols_to_remove:
            print(f"  - {col}")

    if invalid_cols_entered:
        print(f"\n**WARNING: The following columns were entered (by number) but NOT found in the CSV:**")
        for col in invalid_cols_entered:
            print(f"  - {col}")

    print(f"\n**Columns that will REMAIN ({len(remaining_columns)}):**")
    if not remaining_columns:
        print("  (No columns will remain after removal)")
    else:
        for col in remaining_columns:
            print(f"  - {col}")
    print("------------------------------")

    while True:
        action = input(
            "Confirm changes? (yes/no/update): "
        ).strip().lower()
        if action in ['yes', 'y']:
            return 'confirm'
        elif action in ['no', 'n']:
            return 'decline'
        elif action in ['update', 'u']:
            return 'update'
        else:
            print("Invalid input. Please type 'yes', 'no', or 'update'.")

def main():
    print("CSV Cleanup Script")
    print("------------------")

    # Step 1: List CSV files
    csv_files = get_csv_files()
    if not csv_files:
        print("No CSV files found in the current directory. Please place CSV files here.")
        sys.exit()

    # Step 2: User selects CSV
    selected_file = select_csv_file(csv_files)
    print(f"\nLoading '{selected_file}'...")

    try:
        df = pd.read_csv(selected_file)
    except Exception as e:
        print(f"Error reading CSV file: {e}")
        sys.exit()

    original_columns = df.columns

    while True:
        # Step 3 & 4: Get columns to remove (now by number)
        columns_to_remove = get_columns_to_remove(original_columns)

        # Step 5: Summary and Confirmation Loop
        action = confirm_changes(original_columns, columns_to_remove)

        if action == 'confirm':
            break  # Exit the loop and proceed to clean
        elif action == 'decline':
            print("Operation cancelled. Exiting.")
            sys.exit()
        elif action == 'update':
            print("Updating column selection...\n")
            continue # Go back to column selection

    # Identify actual columns to drop (already done by get_columns_to_remove logic)
    cols_to_drop_actual = [col for col in columns_to_remove if col in original_columns]

    if not cols_to_drop_actual:
        print("No valid columns selected for removal. No changes will be made.")
        sys.exit()

    # Step 6: Create cleaned CSV
    print(f"\nDropping columns: {', '.join(cols_to_drop_actual)}")
    cleaned_df = df.drop(columns=cols_to_drop_actual, inplace=False)

    # Construct output filename
    base_name, ext = os.path.splitext(selected_file)
    output_filename = f"{base_name}_cleaned{ext}"

    try:
        cleaned_df.to_csv(output_filename, index=False)
        print(f"\nSuccessfully cleaned CSV and saved to '{output_filename}'")
    except Exception as e:
        print(f"Error saving cleaned CSV: {e}")
        sys.exit()

if __name__ == "__main__":
    main()

Disclaimer:

While this script is designed to facilitate certain data manipulation as a convenience, it is essential to understand that its functionality is subject to change due to updates in underlying libraries or other conditions that could affect its operation.

Please note that this script is provided on an "as is" and "as available" basis without any warranties of any kind. This script is not officially supported or endorsed by Atlassian, and its use is at your own discretion and risk.

Cheers!

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

A Python script to remove unwanted columns from user export CSV (or anything else)

Hi Atlassian Community!

The solution

Key features

Preparation

The script

1 comment

Comment

Was this helpful?

Thanks!

About this author

TAGS

Atlassian Community Events