Correctly handle CSV files with a single separator throughout

better auto-detection of CSV delimiter
- files with a tsv extension are automatically detected as tab delimited
- other files parsed as CSV go through the following steps:
  - if the first line contains at least 3 of the same separator, it uses that separator as a delimiter
  - if the first line contains only one supported separator character, it uses that separator as a delimiter
  - otherwise it falls back to treating all supported delimiters as the delimiter

 supported delimiters, in precedence order:
 - comma `,`
 - semi-colon `;`
 - tab `\t`
 - pipe `|`
This commit is contained in:
Keith Hall
2021-12-13 23:29:44 +02:00
committed by Keith Hall
parent ac40f7cfd8
commit 512bfde7ce
14 changed files with 419 additions and 38 deletions

View File

@@ -0,0 +1,3 @@
foo,bar,baz,this|that,test,colors,cycle
1.2,1.7,2.5,blah;cool,test,colors,cycle
1 foo bar baz this|that test colors cycle
2 1.2 1.7 2.5 blah;cool test colors cycle

View File

@@ -0,0 +1,3 @@
foo|bar|baz
1,2|1,7|2,7
1,5|8,5|-5,5
1 foo bar baz
2 1,2 1,7 2,7
3 1,5 8,5 -5,5

View File

@@ -0,0 +1,3 @@
foo;bar;baz
1,2;1,7;2,7
1,5;8,5;-5,5
1 foo bar baz
2 1,2 1,7 2,7
3 1,5 8,5 -5,5

View File

@@ -0,0 +1,3 @@
foo bar baz|;, test hello world tsv
1,2 1,7 2,7 a b c "hello again" tsv
";|," ;|, baz test "hello world" tsv
1 foo bar baz|;, test hello world tsv
2 1,2 1,7 2,7 a b c hello again tsv
3 ;|, ;|, baz test hello world tsv