* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
* ggml: new optimization interface
remove test2.c, test3.c
store adamw params in tensor
move grads from tensor to graph
* avoid segfault upon API misuse
* add ggml-opt.h to public headers
* remove dependence of ggml-opt.cpp on ggml-cpu.h