Mashing CSVs around using PowerShell

image for 'Mashing CSVs around using PowerShell' | CAT.Models.Category

Since I spend most of my day in the console, PowerShell also serves as my Excel. So, continuing my recent trend of PowerShell related posts, let's dig into a quick and easy way to parse up CSV files (or most any type of file) by creating objects!

We, of course, need a few rows of example data. Let's use this pseudo student roster.

Example data:

Student,Code,Product,IUID,TSSOC,Date
123456,e11234,Reading,jsmith,0:18,1/4/2012
123456,e11234,Reading,jsmith,1:04,1/4/2012
123456,e11234,Reading,jsmith,0:27,1/5/2012
123456,e11234,Reading,jsmith,0:19,1/7/2012
123456,e11235,Math,jsmith,0:14,1/7/2012


Now, for reporting, I want my Minutes to be a calculation of the TSSOC column (hours:minutes). Easy, we have PowerShell—it can split AND multiple!

The code:

Begin by creating an empty array to hold our output, importing our data into thepipe, and opening up an iteration (for each) function. The
final $out is our return value—calling our array so we can see our results.

$out = @()
import-csv data_example.csv |
   % {

   }
$out


Next, let's add in our logic to split out the hours and minutes. We have full access to the .NET string methods in PowerShell, which includes .Split().  

.Split() returns an array, so since we have HH:MM, our first number is our hours and our second number is our minutes. Hours then need to be multiplied by 60 to return the minutes per hour.

You'll also notice the [int] casting—this ensures we can properly multiply. Give it a whirl without and you‘ll get 60 0’s or 1's back (it multiples the string).

$out = @()
import-csv data_example.csv |
   % {
    $hours = [int]$_.TSSOC.Split(':')[0] * 60
    $minutes = [int]$_.TSSOC.Split(':')[1]
   }
$out


The next step is to create a new object to contain our return values. We can use the new PowerShell v2.0 syntax to create a quick hashtable of our properties and values. Once we have our item, add it to our $out array.

$out = @()
import-csv data_example.csv |
   % {
    $hours = [int]$_.TSSOC.Split(':')[0] * 60
    $minutes = [int]$_.TSSOC.Split(':')[1]
        $item = new-object PSObject -Property @{
            Date=$_.Date;
            Minutes=($hours %2B $minutes);
            UserId=$_.IUID;
            StudentId=$_.Student;
            Code=$_.Code;
            Course=$_.Product
        }
    $out = $out %2B $item
   }


With that, we're done, we can pipe it to an orderby for a bit of sorting, grouping, table formatting, or even export it BACK out as another CSV.

$out = @()
import-csv data_example.csv |
   % {
    $hours = [int]$_.TSSOC.Split(':')[0] * 60
    $minutes = [int]$_.TSSOC.Split(':')[1]
        $item = new-object PSObject -Property @{
            Date=$_.Date;
            Minutes=($hours %2B $minutes);
            UserId=$_.IUID;
            StudentId=$_.Student;
            Code=$_.Code;
            Course=$_.Product
        }
    $out = $out %2B $item
   } | sortby Date, Code
$out | ft -a


Quick and dirty CSV manipulation—all without opening anything but the command prompt!

UPDATE: Matt has an excellent point in the comments below. PowerShell isn't thegolden hammer for every task and finding the right tool for the job.  We're a mixed environment (Windows, Solaris, RHEL, Ubuntu), so PowerShell only applies to our Windows boxes. However, as a .net developer, I spend 80-90% of my time on those Windows boxes. So let‘s say it’s a silver hammer. :)

Now, the code in this post looks pretty long… and hopping back and forth between notepad, the CLI, and your CSV is tiresome. I bounce back and forth between the CLI and notepad2 with the ed and ex functions (these commands areborrowed from Oracle PL/SQL). More information here.

So how would I type this if my boss ran into my cube with a CSV and needed a count of Minutes?

$out=@();Import-Csv data_example.csv|%{$out+=(new-object psobject -prop @{ Date=$_.Date;Minutes=[int]$_.TSSOC.Split(':')[1]+([int]$_.TSSOC.Split(':')[0]*60);UserId=$_.IUID;StudentId=$_.Student;Code=$_.Code;Course=$_.Product }) };$out|ft -a

Now, that's quicker to type, but a LOT harder to explain! I'm sure this can be simplified down—any suggestions? If you could do automatic/implied property names, that'd REALLY cut it down.

original post http://tiredblogger.wordpress.com/2012/02/15/mashing-csvs-around-using-powershell/

  • powershell
  • automation
author photo - David Longnecker by David Longnecker

Comments

Stay Connected