# Description
There are times where explicitly specifying a schema for a dataframe is
needed such as:
- Opening CSV and JSON lines files and needing provide more information
to polars to keep it from failing or in a desire to override default
type conversion
- When converting a nushell value to a dataframe and wanting to override
the default conversion behaviors.
This pull requests provides:
- A flag to allow specifying a schema when using dfr into-df
- A flag to allow specifying a schema when using dfr open that works for
CSV and JSON types
- A new command `dfr schema` which displays schema information and will
allow display support schema dtypes
Schema is specified creating a record that has the key value and the
dtype. Examples usages:
```
{a:1, b:{a:2}} | dfr into-df -s {a: u8, b: {a: i32}} | dfr schema
{a: 1, b: {a: [1 2 3]}, c: [a b c]} | dfr into-df -s {a: u8, b: {a: list<u64>}, c: list<str>} | dfr schema
dfr open -s {pid: i32, ppid: i32, name: str, status: str, cpu: f64, mem: i64, virtual: i64} /tmp/ps.jsonl | dfr schema
```
Supported dtypes:
null
bool
u8
u16
u32
u64
i8
i16
i32
i64
f32
f64
str
binary
date
datetime[time_unit: (ms, us, ns) timezone (optional)]
duration[time_unit: (ms, us, ns)]
time
object
unknown
list[dtype]
structs are also supported but are specified via another record:
{a: u8, b: {d: str}}
Another feature with the dfr schema command is that it returns the data
back in a format that can be passed to provide a valid schema that can
be passed in as schema argument:
<img width="638" alt="Screenshot 2024-01-29 at 10 23 58"
src="https://github.com/nushell/nushell/assets/56345/b49c3bff-5cda-4c86-975a-dfd91d991373">
---------
Co-authored-by: Jack Wright <jack.wright@disqo.com>
All of the dataframe commands ported over with no issues...
### 11 tests are commented out (for now)
So 100 of the original 111 tests are passing with only 11 tests being
ignored for now..
As per our conversation in the core team meeting on Wednesday
I took @jntrnr suggestion and just commented out the tests dealing
with
[IntoDatetime](https://github.com/nushell/nushell/blob/main/crates/nu-command/src/conversions/into/mod.rs)
Later on we can move this functionality out of nu-command if we decide
it makes sense...
### The following tests were ignored...
```rust
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_day.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_hour.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_minute.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_month.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_nanosecond.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_ordinal.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_second.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_week.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_weekday.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/date/get_year.rs
modified: crates/nu-cmd-dataframe/src/dataframe/series/string/strftime.rs
```