Excessively Adequate

Zeroing a Column of a CSV File With PowerShell

Earlier today at work, a coworker called in with a little data munging task for me to solve: a column of a CSV file supplied by a third party on a regular basis needed to be redacted (for technical rather than sneaky reasons) before the file was passed along to another third party.

For more complex operations on tabular data (transposing rows/columns or, say, inflicting math upon them), we usually rely on Python scripts (…anything’s better than Excel macros). But this “problem” seemed simple enough to make the overhead of setting up a virtual environment plus the usual CSV input/output boilerplate seem like shooting at sparrows with cannons – so I ignored my reflexive aversion1 to Microsoft technology and looked into how to get this done in PowerShell, which is conveniently built into our task automation software2 of choice.

Importing a CSV file into a PowerShell session

The CSV files in question come in a format that has its roots in ComTrader, a popular frontend to the European Energy Exchange, making it a quasi-standard for exchanging power trading data within parts of Germany’s energy industry.

Area;Type;B/S;Accnt;Product;Ctrct;Qty;Prc;BG;Txt;PQty;ValRes;ValDate;ExeRes
TNG;REG;S;P;XBID_Quarter_Hour_Power;23Q1;0.5;-500;Standard;Comment;;GTD;04.07.2024 21:50:00;NON
AMP;REG;S;P;XBID_Quarter_Hour_Power;23Q1;1.5;-500;Standard;Comment;;GTD;04.07.2024 21:50:00;NON

Reading a semicolon-separated CSV file into a PowerShell variable can be a accomplished using the Import-Csv cmdlet:

$csv = Import-Csv -Delimiter ';' 'C:\path\to\file.csv'

This’ll yield, as described in the documentation, a “table-like custom object from the items in the CSV file. Each column in the CSV file becomes a property of the custom object and the items in rows become the property values.”

Zeroing a column

The cool thing about these table-like objects appears to be that they’re query-able by, among3 others, the rather powerful Select-Object cmdlet. You could, for example, extract the columns4 Qty (power quantity if you’re curious) and Prc (price) like this:

$csv | Select-Object 'Qty','Prc'

That’d yield a two-column table, ready for re-serialization or further processing. Relatedly, here’s how to extract all columns except Prc:

$csv | Select-Object * -ExcludeProperty 'Prc'

You can also use Select-Object, albeit in a slightly-more-syntactically-convoluted manner, to generate new columns based on values of existing columns. If you wanted to, say, append a column HalfPrc containing the values of the Prc column divided by 2, you could run this command:

$csv | Select-Object *, @{Name='HalfPrc'; Expression={.5*[float]$_.prc}

Similarly, to empty the Prc column (which is what my colleague was after), I arrived at the following command which, as above, selects everything except the preexisting Prc column, then adds a new Prc column containing only empty strings, storing the result in a variable $csv_fixed:

$csv_fixed = $csv | Select-Object -ExcludeProperty 'Prc' *, @{Name='Prc'; Expression={''}}

Note that new columns created this way are appended to the table – so the new Prc column ends up, visually speaking, at the right end of the table, not in the same location as the previous Prc column.

Writing the result out

Knowing that there’s an Import-Csv cmdlet, you won’t have trouble guessing how to export $csv_fixed back into a CSV file:

$csv_fixed | Export-Csv -Delimiter ';' -NoTypeInformation 'C:\path\to\output_file.csv'

By default, PowerShell adds a type information header like #TYPE Selected.System.Management.Automation.PSCustomObject to the generated CSV file, which can (and probably should) be suppressed via the -NoTypeInformation switch.

The resulting CSV file, then, looks like this:

"Area";"Type";"B/S";"Accnt";"Product";"Ctrct";"Qty";"Prc";"BG";"Txt";"PQty";"ValRes";"ValDate";"ExeRes"
"TNG";"REG";"S";"P";"XBID_Quarter_Hour_Power";"23Q1";"0.5";"";"Standard";"Comment";"";"GTD";"04.07.2024 21:50:00";"NON"
"AMP";"REG";"S";"P";"XBID_Quarter_Hour_Power";"23Q1";"1.5";"";"Standard";"Comment";"";"GTD";"04.07.2024 21:50:00";"NON"

Notice that PowerShell wraps each value in quotes, which, while unnecessary in this case, is good practice and won’t confuse standards-compliant CSV consumers. (If you’re running PowerShell 7 or later, you can add -UseQuotes AsNeeded, which is delightfully self-explanatory, to the Export-Csv call.)

  1. Although I just learned that PowerShell is also available on macOS and Linux – and there’s some nifty stuff you can do with it. Hmm! 

  2. Another cmdlet worth looking at is Group-Object. But wait, Select-...? Group-...? That almost sounds like SQL

  3. What I call “columns” are really properties in PowerShell, but I’ll be sticking to CSV terminology.