================================================================
PUNCHDOWN LIST

----------------------------------------------------------------

* document cloudthings, e.g.
  o go.yml
  o codespell.yml
    - codespell --check-filenames --skip *.csv,*.dkvp,*.txt,*.js,*.html,*.map,./tags,./test/cases --ignore-words-list denom,inTerm,inout,iput,nd,nin,numer,Wit,te,wee
  o readthedocs triggers
* sync-print option; or (yuck) another xprint variant; or ...; emph dump/eprint
? trace-mode ?

* doc
  o new-in-miller-6: missing:
    - dump syntax -- ?
    - emittable constraints -- ?
  o wut h1 spacing before/after ...
  o shell-commands: while-read example from issues
  E reference-dsl-user-defined-functions: UDSes -> non-factorial example -- maybe some useful aggregator
  o reference-main-arithmetic: ? test stats1/step -F flag
  o reference-dsl-control-structures:
    e while (NR < 10) will never terminate as NR is only incremented between
      records -> and each expression is invoked once per record so once for NR=1,
      once for NR=2, etc.
  o C-style triple-for loops: loop to NR -> NO!!!
  o Since uninitialized out-of-stream variables default to 0 for
    addition/subtraction and 1 for multiplication when they appear on expression
    right-hand sides (not quite as in awk, where they'd default to 0 either way)
    <-> xlink to other page
  r fzf-ish w/ head -n 4, --from, up-arrow & append verb, then cat -- find & update the existing section
  ! https://github.com/johnkerl/miller/issues/653 -- stats1 w/ empties? check stats2
    - needs UTs as well
  o while-read example from issues

* release ordering?
  conda
  brew macports chocolatey
  ubuntu debian fedora gentoo prolinux archlinux
  netbsd freebsd

* post-release:
  w installing-miller.md.in

================================================================
NON-BLOCKERS

* profile sort

* RT ngrams.sh -v -o 1 one-word-list.txt

* contact re https://jsonlines.org/on_the_web/

* Better functions for values manipulation, e.g. easier conversion of strings like "$1,234.56" into numeric values

* IP addresses and ranges as a datatype so one could do membership tests like "if 10.10.10.10 in 10.0.0.0/8".

* Ability to specify some formats that are fixed. Like we can process
  "5d18h53m20s" format in *dhms* commands, but what about "5-18:53:20"? This is
  a common format used by the SLURM resource manager.

* csvstat / Panda's describe

* cmp for array & map

* r-strings/implicit-r/297: double-check end of reference-main-data-types.md.in

* sysdate, sysdate_local; datediff ...

* JSON perf -- try alternate packages to encoding/json

* strptime w/ ...00.Z -> error

* pos/neg 0x/0b/0o UTs

* 0o into BNF

? BIFs as FCFs?

* pv: 'mlr --prepipex pv --gzin tail -n 10 ~/tmp/zhuge.gz' needs --gzin & --prepipex both
  o plus corresponding docwork

* JIT follow-ons:
  o UT:
    - JSON I/O
    - mlrval_cmp.go
    - mv from-array/from-map w/ copy & mutate orig & check new -- & vice versa
    - dash-O and octal infer
    - populate each bifs/X_test.go for each bifs/X.go etc etc
  o neatens
    - carefully read all $mlv files
    - check $types & $bifs as well
    - proofread all mlrval_cmp.go dispo mxes
    - update rmds x several
  o misc
    - grep tmiller map put ref vs map put copy
    - krepl miller6 new doclink-comments
    - git rm cmd/mprof{n}

* https://staticcheck.io/docs
  o lots of nice little things to clean up -- no bugs per se, all stylistic i *think* ...

* xtab splitter UT; nidx too

* integrate:
  o https://www.libhunt.com/r/miller
  o https://repology.org/project/miller/information

* verslink old relnotes

* more perf?
  - batchify source-quench -- experiment 1st
  - further channelize (CSV-first focus) mlrval infer vs record-put ?
  ? coalesce errchan & done-writing w/ Err to RAC, and close-chan *and* EOSMarker -- ?

* []Mlrval -> []*Mlrval ?

* funcptr away the ifs/ifsregex check in record-readers

* handling for --cpuprofile not at args[1] slot

* try simpler-than-regex-split-string for repeated-single -- especially for XTAB reader

* UT-per-se of XTAB channelizedStanzaScanner

* main-level (verb-level?) flag for "," -> X in verbs -- in case commas in field names
* golinter

* single UT, hard to invoke w/ new full go.mod path
  go test $(ls internal/pkg/lib/*.go|grep -v test) internal/pkg/lib/unbackslash_test.go
  etc

* file-formats: NIDX link to headerless CSV

* fmtnum(98, "%3d%%") -- ? workaround: fmtnum(98, "%3d") . "%"

* link to SE table ...
  https://github.com/johnkerl/miller/discussions/609#discussioncomment-1115715

* single-line JSON for DKVP/CSV/etc ...
  o mlr --j2x --no-auto-flatten cat $mlg/regtest/input/flatten-input-2.json
    - code: make sure this does single-line json ...
  o mlr --j2c --no-auto-flatten cat $mlg/regtest/input/flatten-input-2.json
    - code: this is ok ... maybe prefer single-line -- ?

* hofs section on typedecls
  o hofs+typedecls RT cases
* glossary re natural ordering
  o separate webdoc section ... somewhere ...
  o hofs.md.in link
  o numerics < bool < string

* nr, nf, keys
t* ranspose ...
* -colname syntax ... -x colname maybe into more verbs ...

* consider expanding '(error)' to have more useful error-text

* mlr -k
  o c,go
  o various test cases
  o OLH re limitations
  o check JSON-parser x 2 -- is there really a 'restart'?
    - infinite-loop avoidance for sure

* -S/-A/-O page with examples -- ?

* sysdate & sysdate_local

* broadly rethink os.Exit, especially as affecting mlr repl

* mlr stdlib -- ? how to deliver?
* some support for UDF help-strings -- ?

* separate -S/-A/-O inference page w/ examples of whywhens

* mlr -k
* transpose
* print w/ #{...}; defer variadic printf
* meta: nf,nr,keys?

* mlr -f {arg}, mlr -F {arg}, etc

* non-streaming DSL-enabled cut
  https://github.com/johnkerl/miller/discussions/613

* single cheatsheet page -- put out RFH?
  https://twitter.com/icymi_py/status/1426622817785765898/photo/1

* mlrval_json -- get file/line in internal-coding-error detected

? $0 as raw-record string -- ? would make mlr grep simpler and more natural ...

* IIFEs: 'func f = (func(){ return udf})()'
* BIFs in non-sigil context (UDFs already are)
* non-top-level func defs

* non-lite DKVP reader/writer

* precedence for `:` in slicing syntax

* full format-string parser for corner cases like "X%08lldX"

* more of:
  o colored-shapes.dkvp -> csv; also mkdat2
  o data/small -> csv throughout. and/or just use example.csv

* json-triple-quote -- what can be done here?

* godoc neatens at func/const/etc level

* unarrayify function

* XYZWasSpecified -> XYZ = "default" w/ check-after -- ?

* parquet -- ?

* case auxfiles: cat them too

* uniqify-field-names in record-readers -- which issue?

* non-blocker: commenting passes ...

* non-blocker: array and string slices on LHS of assignments

* non-blocker: feature/shorthand for repl newline before prompt

* non-blocker: new functions:
  o new columns-to-arrays and arrays-to-columns for stan format

? gzout, bz2out -- ? make sure this works through tee? bleah ...
? zip -- but that is an archive case too not just an encoding case
  ? miller support for archive-traversal; directory-traversal even w/o zip -- ?
  ? as 6.1 follow-on work -- ?

* more about HTTP logs in miller -- doc and encapsulate:
  mlr --icsv --implicit-csv-header --ifs space --oxtab --from elb-log put -q 'print $27'
* PR-template etc checklists

* clean up TODO/xxx in internal/pkg/platform
* mlr regtest doc -- focus on either go/regtest or internal/pkg/auxents/regtest, one linking to the other

* also: write up how git status after test should show any missed extra-outs

* help-refactor:
  o audit for DEFAULT_FOOs @
  o audit for '-z {zzz}'
  o audit for consistent usage style

* new columns-to-arrays and arrays-to-columns for stan format

* https://segment.com/blog/allocation-efficiency-in-high-performance-go-services/

* c/go both:
  o https://brandur.org/logfmt is simply DKVP w/ IFS = space (need dquot though)
  o https://docs.fluentbit.io/manual/pipeline/parsers/ltsv is just DKVP with IFS tab and IPS colon
* do some profiling every so often

* UDF nexts:
  o more functions (see below)
  o strmatch https://github.com/johnkerl/miller/issues/77#issuecomment-538790927

* bash completion script https://github.com/johnkerl/miller/issues/77#issuecomment-308247402
  https://iridakos.com/programming/2018/03/01/bash-programmable-completion-tutorial#:~:text=Bash%20completion%20is%20a%20functionality,key%20while%20typing%20a%20command.

* sliding-window averages into mapper step (C + Go)
* stats1 rank

* double-check rand-seeding
  o all rand invocations should go through the seeder for UT/other determinism

! quoted NIDX
  - how with whitespace regex -- ?
! quoted DKVP
  - what about csvlite-style -- ? needs a --dkvplite ?
! pprint emit on schema change, not all-at-end.

* widen DSL coverage
  o err-return for array/map get/put if incorrect types ... currently go-void ...
    ! the DSL needs a full, written-down-and-published spell-out of error-eval semantics
  o profile mand.mlr & check for need for idx-assign
    -> most definitely needed
  o multiple-valued return/assign -- ?
    - array destructure at LHS for multi-retval assign (maps too?)

* UT per se for lrec ops

* libify errors.New callsites for DSL/CST
* record-readers are fully in-channel/loop; record-writers are multi with in-channel/loop being
  done by ChannelWriter, which is very small. opportunity to refactor.
* address all manner of xxx and TODO comments
* empty csv ... reminder ...

* godoc notes:
  o go get golang.org/x/tools/cmd/godoc
  o dev mode:
    godoc -http=:6060 -goroot .
  o publish:
    godoc -http=:6060 -goroot .
    cd ~/tmp/bar
    wget -p -k http://localhost:6060/pkg
    mv localhost:6060 miller6
    file:///Users/kerl/tmp/bar/miller6/pkg
    maybe publish to ISP space

* het ifmt-reading
  - separate out InputFormat into per-file (or tbd) & have autodetect on file endings -- ?
  - maybe a TBD reader -- ?
  - InputFormat into Context
  - TBD writer -- defer factory until first context?
  - deeper refactor pulling format out of reader/writer options entirely -- ?

================================================================
MAYBES

* dotted-syntax support in verbs?

* repl as verb -- ? 'put --repl' maybe

* json-triple-quote -- what can be done here?

* non-blocker: _ variable feature?

* headerful/headerless mix -- ?
  TOptions as list, not single -- ?

* miller extensibility re golang plugins -- ?!?
  ? verbs ?
  ? DSL functions ?

* pkg graph:
  go get github.com/kisielk/godepgraph
  godepgraph miller | dot -Tpng -o ~/Desktop/mlrdeps.png
  flamegraph etc double-check

* more data formats:
  https://indico.cern.ch/event/613842/contributions/2585787/attachments/1463230/2260889/pivarski-data-formats.pdf

----------------------------------------------------------------
DEFER:

* once on go 1.16: get around ioutil.ReadFile depcreation
  o build-dsl hand-edit / sedder
  o io/ioutil -> os
  o ioutil.ReadFile -> os.ReadFile on internal/pkg/parsing/lexer/lexer.go

* parser-fu:
  o iterative LALR grok
- jackson notes
- gocc .txt/.go for simple grammars
o find/bookmark/grok rob's lexer slides
o iterate on a parser-generator with JSON config file
no need to bootstrap a parser for the parser-generator language

----------------------------------------------------------------
INFO

i https://github.com/all-contributors/all-contributors
  yarn all-contributors add namegoeshere ideas

i go tool nm -size mlr | sort -nrk 2

----------------------------------------------------------------
GOCC UPSTREAMS:

? support "abc" (not just 'a' 'b' 'c') in the lexer part

----------------------------------------------------------------
TBF:
* go 1.16 at some point
* tools/perf:
  o https://eng.uber.com/pprof-go-profiler/
  o profile mlr --j2x cat mappings.json
  o golang static-analysis tool -- ?
* iconv note
* AST insertions: make a simple NodeFromToken & have all interface{} be *ASTNode, not *token.Token
* cst printer with reflect.TypeOf -- ?
? makefile for build-dsl: if $bnf newer than productionstable.go
* I/O perf delta between C & Go is smaller for CSV, middle for DKVP, large for JSON -- debug
* neaten/error-proof:
  o mlrmapEntry -> own keys/mlrvals -- keep the kcopy/vcopy & be very clear,
    or remove. (keeping pointers allows nil-check which is good.)
  o inrec *types.Mlrmap is good for default no-copy across channels ... needs
    a big red flag though for things like the repeat verb (maybe *only* that one ...)
! clean up the AST API. ish! :^/
* json:
  d thorough UT for json mlrval-parser including various expect-fail error cases
  d doc re no jlistwrap on input if they want get streaming input
  d UT JSON-to-JSON cat-mapping should be identical
  d JSON-like accessor syntax in the grammar: $field[3]["bar"]
  d flatten/unflatten for non-JSON I/O formats -- maybe just double-quoted JSON strings -- ?
    - make a force-single-line writer
    - make a jsonparse DSL function -- ?
  d other formats: use JSON marshaler for collection types, maybe double-quoted
  o research gocc support
  o maybe a case for hand-roll
? dsl/ast.go -> parsing/ast.go? then, put new-ast ctor -> parsing package
  o if so, update r.mds
* relnotes: label b,i,x vs x,i,b change
* double-check dump CR-terminators depending on expression type
* good example of wording for why/when to make a breaking release:
  https://webpack.js.org/blog/2020-10-10-webpack-5-release/
* unset, unassign, remove -- too many different names. also assign/put ... maybe stick w/ 2?
* huge commenting pass
* profile mlr sort
* go exe 17MB, wut. try to discover. (gocc presumably but verify.)
* fill-down make columns required. also, --all.
* check triple-dash at mlr fill-down-h ; check others
* clean up unused exitCode arg in sort/put usage.
  o also document pre/post conditions for flag and non-flag usages x all mappers
? emit @x or emit x -- should make k/v pairs w/ "x" & value -- ? check against C impl
i emitp/emitf -- note for-loops didn't appear until 4.1.0 & emits are much older (emitp 3.5.0).
  if i were starting clean-slate, i'd have had just a single `emit`.
* asserting_{type}: os.Exit(1) -> return nil, err flow?
* test put/filter w/ various combinations of -s/-e/-f
* mt_void keep-or-not .......
  o check dispo matrices
  o if keep, need careful MT_VOID at from-string constructor -- ? or not ?
  o comment clearly regardless
* bitwise_and_dispositions et al should not have _absn for collections -- _erro instead
* ast-parex separate mlr auxents entrypoint?
* port u/window*.mlr from mlrc to mlrgo (actually, fix mlrgo of course)
* line/column caret at parse-error messages -- would require some GOCC refactoring
  in order to get the full DSL string and the line/number info into the same method
* csvlite rd/wr: comment for USV/ASV too. no need for escaping then.
* comment schema-change supported only in csvlite reader, not csv reader
* for-multi: C semantics 'k1: data k2: a:x v:1', sigh ...
* neaten mlr gap -g (default) print
! write out thorough min/max/cmp cases for all orderings by type
* silent zero-pass for-loops on non-collections:
  o intended as a heterogenity feature ...
  o consider a --errors or --strict mode; something
* note about non-determinism for DSL print/dump vs record output-stream now ...
* put/filter updates:
* [[...]] / [[[...]]]:
  o put '$array = [1,2,3,[4,5]]' is a syntax error unfortunately; need '$array = [1,2,3,[4,5] ]'
i https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision
* reorder locations of get/put/remove methods in mlrval/mlrmap
* grep out all error message from regtest outputs & doc them all & make sure index-searchable at readthedocs
* short 'asserting' functions (absent/error); and/or put --strict or somesuch
* function metadata: auto-sort on mlr -f?
* --x2b @ help-doc .go; etc
* os.Args[0] etc -> "mlr" throughout the codebase
* emitx later: 'emit([a,b,c],d,e,f)' for SR-conflict issues

* genmds multi-line something something for autogen of repl examples -- ?

* maybe split Context into varying & non-varying -- separate structs entirely

* idea: records as mlrmap -> mlrval?
  o reduce $* copy ...
  o opens the door to some (verb-subset) truly arbitrary-JSON processing ...

* mlr --opprint put $[[1]] = $[[2]]; unset $["a"] ./regtest/input/abixy
  o squint at pointer-handling
  o output varied after flatten-mods

* join
  > clean up VERBOSE in joiner-files
  > joinBucketKeeper & joinBucket need to be privatized
  > rewrite join-bucket-keeper.go entirely
  > also needs UT per se (not just regression)
* cli-doc --no-auto-flatten and --no-auto-unflatten
* note (fix? doc?) flatten of '$x={}' expands to nothing. not invertible.
* parex print regtest -- what about new ast-node types?
* all case-files could use top-notes
* dev-note on why `int` not `int64` -- processor-arch & those who most need it get it
* doc auto-flatten/auto-unflatten -- incl narrative from mlrcli_parse.go
* doc6: default flatsep is now "." not ":" in keeping with JSON culture
? allow [[...]] / [[[...]]] at assignment LHS

* readeropts/writeropts/readerwriteropts -> cliutil funcs
  o then put into join.go, put.go, & repl
* mlr inp parse error failstring retback?
* https://blog.golang.org/go1.13-errors
* split REPL lines on ';' -- ?
* tilde-expand for REPL load/open: if '~' is at the start of the string, run it though 'sh -c echo'
* doc shift/unshift as using [2:] and append
? ctx invars -> ptr w/ cmt
? string/array slices on assignment LHS -- ?
* beyond:
  o support 'x[1]["a"]' etc notation in various verbs?
  o sort within nested data structures?
  o array-sort, map-key sort, map-value sort in the DSL?
  o closures for sorting and more -- ?!?
  o or maybe just use UDFs ...
* optimize MlrvalLessThanForSort
  o mlr --cpuprofile cpu.pprof --from ~tmp/big sort -f a -nr x then nothing
  o GOGC=1000 mlr --cpuprofile cpu.pprof --from /Users/kerl/tmp/huge sort -f a -nr x then nothing
  o wc -l ~/tmp/big
    1000000 /Users/kerl/tmp/big
  o wc -l ~/tmp/huge
    10000000 /Users/kerl/tmp/huge
* optimize MlrvalGetMeanEB et al.
* data-copy reduction wup:
  o literal-type nodes -- now zero-copy
  x modify Evaluate to return pointer -- too much copying
  o wup for it was the binary-operator node, w/ the '*', that broke w/ no-output-copy & fibo UT
  o bonus: return MlrvalSqrt(MlrvalDivide(input1, input2))
  o type-gated mv -- should use passed-in storage slot -- ?
  o nice narrative write-up w/ the C stack-allocator problem, Go non-solution,
    profilng methods, GC readings/findings, before-and-after CST data structures,
    final perf results.
  o next round of data-copy reduction:
    - $z = min($x, $y) -- needs to return pointer to x or y
    o $z = $x + $y -- needs to have space for sum, and return pointer to it
    o therefore type BinaryFunc func(input1, input2 *Mlrval) *types.Mlrval
      > have the function z-allocate outputs when needed
      > the outputs must be on the stack, not statically allocated, to make them re-entrant
        and OK for recursive functions
      > var output types.Mlrval w/ field-setters, rather than return &Mlrval{... all of them ...}
    - then IEvaluable: Evaluate(state *runtime.State) *types.Mlrval
    - invalidate CopyFrom
    - check for under/over copy at Assign
    - global *ERROR / *ABSENT / etc
* for i, e in range c optimization -- always *copies* e
  o try and benchmark/compare ...
  o lots of array-of-pointer stuff, this is totally fine
  o take care w/ copying (non-pointer) mlrvals though
* more copy-on-retain for concurrent pointer-mods
  o make a thorough audit, and warn everywhere
  o either do copy for all retainers, or treat inrecs as immutable ...
  o 'this.recordsAndContexts.PushBack(inrecAndContext)' idiom needs copy everywhere ...
* consider -w/-W to stderr not stdout -- ?
* doc6 re warnings
* -W to mlr main also -- so it can be used with .mlrrc
* push/pop/shift/unshift functions
* 0035.cmd
  begin{@x=1} func f(x) { dump; print "hello"; tee  > ENV["outdir"]."/udf-x", $* } $o=f($i)

* zlib: n.b. brew install pigz, then pigz -z

* regex-capture follow-on: https://github.com/johnkerl/miller/issues/388 is much cleaner
  o keep current syntax for backward compatibility
  o but encourage use of this

* put -T -- ?

----------------------------------------------------------------
DOC6:

* mlrdoc false && 4, true || 4 because of short-circuiting requirement
* error if UDF has same name as built-in
* more text examples in mlr-put doc
* window.mlr, window2.mlr -> doc somewhere
* doc: substr in inferred-numeric fields: https://github.com/johnkerl/miller/issues/290.
  o xref to 1-up note.
* back-incompat:
  mlr -n put $vflag '@x=1; dump > stdout, @x'
  mlr -n put $vflag '@x=1; dump > stdout @x'

* document tee -p

* why no flagSet:

  can't be supported everywhere, so don't confuse the user by supporting it
  some places and not others.

	Unlike other transformers, we can't use flagSet here. The syntax of 'mlr
	sort' is it needs to take things like 'mlr sort -f a -n b -n c', i.e.
	first sort lexically on field a, then numerically on field b, then
	lexically on field c. The flagSet API would let the '-f c' clobber the
	'-f a', while we want both.

	Unlike other transformers, we can't use flagSet here. The syntax of 'mlr put'
	and 'mlr filter' is they need to be able to take -f and/or -e more than
	once, and Go flags can't handle that.

* sec2gmt --millis/--micros/--nanos doc
* sort-within-records --recursive doc

* docs nest simplers now that we have getoptish
* mongo examples to doc
* doclink re https://readthedocs.org/projects/miller/ & https://github.com/johnkerl/miller/settings/hooks
* dotted-map doc ...
  o $*.foo["bar"] = NR b04k b/c precedence :(
  o change precedence?

? * would LOVE to have small prev-page/next-page links at the *top* not bottom ...
  https://squidfunk.github.io/mkdocs-material/customization/#extending-the-theme
