Zeroing a Column of a CSV File With PowerShell
Earlier today at work, a coworker called in with a little data munging task for me to solve: a column of a CSV file supplied by a third party on a regular basis needed to be redacted (for technical rather than sneaky reasons) before the file was passed along to another third party.
For more complex operations on tabular data (transposing rows/columns or, say, inflicting math upon them), we usually rely on Python scripts (…anything’s better than Excel macros). But this “problem” seemed simple enough to make the overhead of setting up a virtual environment plus the usual CSV input/output boilerplate seem like shooting at sparrows with cannons – so I ignored my reflexive aversion1 to Microsoft technology and looked into how to get this done in PowerShell, which is conveniently built into our task automation software2 of choice.
Importing a CSV file into a PowerShell session
The CSV files in question come in a format that has its roots in ComTrader, a popular frontend to the European Energy Exchange, making it a quasi-standard for exchanging power trading data within parts of Germany’s energy industry.
Area;Type;B/S;Accnt;Product;Ctrct;Qty;Prc;BG;Txt;PQty;ValRes;ValDate;ExeRes
TNG;REG;S;P;XBID_Quarter_Hour_Power;23Q1;0.5;-500;Standard;Comment;;GTD;04.07.2024 21:50:00;NON
AMP;REG;S;P;XBID_Quarter_Hour_Power;23Q1;1.5;-500;Standard;Comment;;GTD;04.07.2024 21:50:00;NON
Reading a semicolon-separated CSV file into a PowerShell variable can be a accomplished using the Import-Csv
cmdlet:
$csv = Import-Csv -Delimiter ';' 'C:\path\to\file.csv'
This’ll yield, as described in the documentation, a “table-like custom object from the items in the CSV file. Each column in the CSV file becomes a property of the custom object and the items in rows become the property values.”
Zeroing a column
The cool thing about these table-like objects appears to be that they’re query-able by, among3 others, the rather powerful Select-Object
cmdlet. You could, for example, extract the columns4 Qty
(power quantity if you’re curious) and Prc
(price) like this:
$csv | Select-Object 'Qty','Prc'
That’d yield a two-column table, ready for re-serialization or further processing. Relatedly, here’s how to extract all columns except Prc
:
$csv | Select-Object * -ExcludeProperty 'Prc'
You can also use Select-Object
, albeit in a slightly-more-syntactically-convoluted manner, to generate new columns based on values of existing columns. If you wanted to, say, append a column HalfPrc
containing the values of the Prc
column divided by 2, you could run this command:
$csv | Select-Object *, @{Name='HalfPrc'; Expression={.5*[float]$_.prc}
Similarly, to empty the Prc
column (which is what my colleague was after), I arrived at the following command which, as above, selects everything except the preexisting Prc
column, then adds a new Prc
column containing only empty strings, storing the result in a variable $csv_fixed
:
$csv_fixed = $csv | Select-Object -ExcludeProperty 'Prc' *, @{Name='Prc'; Expression={''}}
Note that new columns created this way are appended to the table – so the new Prc
column ends up, visually speaking, at the right end of the table, not in the same location as the previous Prc
column.
Writing the result out
Knowing that there’s an Import-Csv
cmdlet, you won’t have trouble guessing how to export $csv_fixed
back into a CSV file:
$csv_fixed | Export-Csv -Delimiter ';' -NoTypeInformation 'C:\path\to\output_file.csv'
By default, PowerShell adds a type information header like #TYPE Selected.System.Management.Automation.PSCustomObject
to the generated CSV file, which can (and probably should) be suppressed via the -NoTypeInformation
switch.
The resulting CSV file, then, looks like this:
"Area";"Type";"B/S";"Accnt";"Product";"Ctrct";"Qty";"Prc";"BG";"Txt";"PQty";"ValRes";"ValDate";"ExeRes"
"TNG";"REG";"S";"P";"XBID_Quarter_Hour_Power";"23Q1";"0.5";"";"Standard";"Comment";"";"GTD";"04.07.2024 21:50:00";"NON"
"AMP";"REG";"S";"P";"XBID_Quarter_Hour_Power";"23Q1";"1.5";"";"Standard";"Comment";"";"GTD";"04.07.2024 21:50:00";"NON"
Notice that PowerShell wraps each value in quotes, which, while unnecessary in this case, is good practice and won’t confuse standards-compliant CSV consumers. (If you’re running PowerShell 7 or later, you can add -UseQuotes AsNeeded
, which is delightfully self-explanatory, to the Export-Csv
call.)
-
Although I just learned that PowerShell is also available on macOS and Linux – and there’s some nifty stuff you can do with it. Hmm! ↩
-
Whose name I shan’t utter for opsec reasons. ↩
-
Another cmdlet worth looking at is
Group-Object
. But wait,Select-...
?Group-...
? That almost sounds like SQL! ↩ -
What I call “columns” are really properties in PowerShell, but I’ll be sticking to CSV terminology. ↩