Correctly handle CSV files with a single separator throughout

better auto-detection of CSV delimiter
- files with a tsv extension are automatically detected as tab delimited
- other files parsed as CSV go through the following steps:
  - if the first line contains at least 3 of the same separator, it uses that separator as a delimiter
  - if the first line contains only one supported separator character, it uses that separator as a delimiter
  - otherwise it falls back to treating all supported delimiters as the delimiter

 supported delimiters, in precedence order:
 - comma `,`
 - semi-colon `;`
 - tab `\t`
 - pipe `|`
This commit is contained in:
Keith Hall
2021-12-13 23:29:44 +02:00
committed by Keith Hall
parent ac40f7cfd8
commit 512bfde7ce
14 changed files with 419 additions and 38 deletions

View File

@@ -0,0 +1,3 @@
foo,bar,baz,this|that,test,colors,cycle
1.2,1.7,2.5,blah;cool,test,colors,cycle
1 foo bar baz this|that test colors cycle
2 1.2 1.7 2.5 blah;cool test colors cycle

View File

@@ -1,7 +1,7 @@
first,last,address,city,zip
John,Doe,120 any st.,"Anytown, WW",08123
a,b
1,"ha 
first,last,address,city,zip
John,Doe,120 any st.,"Anytown, WW",08123
a,b
1,"ha 
""ha"" 
ha",120 any st.,"Anytown, WW",08123
3,4,120 any st.,"Anytown, WW",08123
ha",120 any st.,"Anytown, WW",08123
3,4,120 any st.,"Anytown, WW",08123
Can't render this file because it contains an unexpected character in line 2 and column 177.

View File

@@ -0,0 +1,3 @@
foo|bar|baz
1,2|1,7|2,7
1,5|8,5|-5,5
1 [3 38 2 253 151 31mfoo[38 2 253 151 31m|[38 2 102 217 239mbar[38 2 253 151 31m|[38 2 190 132 255mbaz
2 [3 38 2 253 151 31m1,2[38 2 253 151 31m|[38 2 102 217 239m1,7[38 2 253 151 31m|[38 2 190 132 255m2,7
3 [3 38 2 253 151 31m1,5[38 2 253 151 31m|[38 2 102 217 239m8,5[38 2 253 151 31m|[38 2 190 132 255m-5,5

View File

@@ -0,0 +1,3 @@
foo;bar;baz
1,2;1,7;2,7
1,5;8,5;-5,5
1 [3 38 2 253 151 31mfoo[38 2 253 151 31m [38 2 102 217 239mbar[38 2 253 151 31m [38 2 190 132 255mbaz
2 [3 38 2 253 151 31m1,2[38 2 253 151 31m [38 2 102 217 239m1,7[38 2 253 151 31m [38 2 190 132 255m2,7
3 [3 38 2 253 151 31m1,5[38 2 253 151 31m [38 2 102 217 239m8,5[38 2 253 151 31m [38 2 190 132 255m-5,5

View File

@@ -0,0 +1,3 @@
foo bar baz|;, test hello world tsv
1,2 1,7 2,7 a b c "hello again" tsv
";|," ;|, baz test "hello world" tsv
Can't render this file because it contains an unexpected character in line 2 and column 218.

View File

@@ -0,0 +1,3 @@
foo,bar,baz,this|that,test,colors,cycle
1.2,1.7,2.5,blah;cool,test,colors,cycle
1 foo bar baz this|that test colors cycle
2 1.2 1.7 2.5 blah;cool test colors cycle

View File

@@ -0,0 +1,3 @@
foo|bar|baz
1,2|1,7|2,7
1,5|8,5|-5,5
1 foo bar baz
2 1,2 1,7 2,7
3 1,5 8,5 -5,5

View File

@@ -0,0 +1,3 @@
foo;bar;baz
1,2;1,7;2,7
1,5;8,5;-5,5
1 foo bar baz
2 1,2 1,7 2,7
3 1,5 8,5 -5,5

View File

@@ -0,0 +1,3 @@
foo bar baz|;, test hello world tsv
1,2 1,7 2,7 a b c "hello again" tsv
";|," ;|, baz test "hello world" tsv
1 foo bar baz|;, test hello world tsv
2 1,2 1,7 2,7 a b c hello again tsv
3 ;|, ;|, baz test hello world tsv