Home > import > Import[ ] drops leading zeros from data when importing a CSV file

Import[ ] drops leading zeros from data when importing a CSV file

October 22Hits:4
Advertisement

I need to load data from the CSV file shown below.

Import[ ] drops leading zeros from data when importing a CSV file

Import[ ] drops the leading zeros in column 3 when it imports the CSV file, but they are security identifiers, and I need them. ReadList[ ] keeps the leading zeros, but I can't get it to break the columns into separate elements. Each record ends up as long string. The options: Word, WordSeparators, ect. don't do what they are supposed to do. I could probably save the file as a TSV, or go into the CSV file and prepend a "'" and the appropriate number of zeros, or use a function to count the number of characters in column three and prepend the missing zeros, but that's just klugy(sp). What is the right way to ReadList[ ] the file?

Here are the first few records in the file (copied by hand). The leading zeros in column three don't appear when the CSV file is loaded in Excel, but they do show up when the file is loaded in Notes++.

date,secid,cus_sed,tick,repno,dis_to_df,impl_rate,prob_df
01/31/2004,753,01381710,AA,0323N,3.20387,BBB,0.00068
01/31/2004,756,00431F10,AABC,34341,2.88794,BB,0.00194
01/31/2004,757,03237110,ANCPA,0591N,3.18587,BBB,0.00072
01/31/2004,759,00440310,AACE,A02B8,3.23057,BBB,0.00062 01/31/2004,767,45814710,FILM,A06F0,2.04099,B-,0.02063

Answers

ReadList

Assuming that you want to read all fields as strings, you could set up ReadList like this:

ReadList["sample.csv", Word,
 WordSeparators -> {",", " "},
 RecordSeparators -> {"\n"},
 RecordLists -> True
]

Import[ ] drops leading zeros from data when importing a CSV file

Import

ReadList is usually quite a bit faster, but if speed is not paramount you may find Import more convenient. It attempts to auto-detect the content of each field. Since you state that your cus_sed field will always be eight digits we can assume that any number less than eight digits has had zeros stripped from the left side, and pad it accordingly.

pad = IntegerString[#, 10, 8] &;

dat = Import["sample.csv"];

dat[[All, 3]] = dat[[All, 3]] /. n_Integer :> [email protected];

dat // Grid

Import[ ] drops leading zeros from data when importing a CSV file

The issue here is that data types are not consistent within a single column, but depending on your needs that may not matter. Here is the same as above but shown in InputForm to reveal the actual data:

Import[ ] drops leading zeros from data when importing a CSV file

Convoluted ReadList

I held off posting this because I don't remember handling commas in ReadList being this ugly. Perhaps I'm still missing an option. Anyway, you can get ReadList to read each field as a different type. Here is a rather badly convoluted example where I read a comma as Character and then drop it at the end with Part.

stream = OpenRead["sample.csv"];  (* open the file as a stream *)

Skip[stream, Record] (* skip the first line of the file *)

dat =
 ReadList[stream,
   {Record, Character, Number, Record, Record, Record, Character,
    Number, Character, Record, Character, Number},
   RecordSeparators -> {"\n", ",", " "}
 ];

dat = dat[[All, {1, 3, 4, 5, 6, 8, 10, 12}]];  (* remove commas *)

Types are now uniform in each column:

Grid[dat]

Import[ ] drops leading zeros from data when importing a CSV file

We can use the "Numeric" option to have Import interpret all data as strings:

Import["data.csv", "CSV", "Numeric" -> False]

The resulting table will only contain strings.

If desired, we could post-process the imported strings by applying a different parsing function to each input column (ignoring the header line for simplicity):

Inner[
  #2@#&
, Import["data.csv", "CSV", "Numeric" -> False, "HeaderLines" -> 1]
, {DateList, FromDigits, Identity, Identity, Identity, ToExpression, Identity, ToExpression}
, List
]

A bit more detailed solution for mixed data. First the CSV data is loaded as a list of records by ReadList (like Mr.Wizard's answer) and then the appropriate columns are converted to expressions, except for the first line that is expected to provide the column labels.

columnNumber = 8;
expressionColumns = {2, 6, 8};

data = ReadList["sample.csv", Table[Record, {columnNumber}], RecordSeparators -> {",", " ", "\n"}];

[email protected][[email protected]@data,
  Transpose@
   MapAt[ToExpression, [email protected]@data, List /@ expressionColumns]]

Import[ ] drops leading zeros from data when importing a CSV file

Import["temp.csv","Table"] should do what you want.

For my test .csv file I had:

1,0022002,3
0,0003101,0

and the following doesn't strip the leading zeros as desired:

Import["temp.csv","Table"]
(* {{"1,0022002,3"}, {"0,0003101,0"}} *)

Related Articles

Copyright (C) 2017 ceus-now.com, All Rights Reserved. webmaster#ceus-now.com 14 q. 0.751 s.