-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathroadmap.norg
104 lines (96 loc) · 4.85 KB
/
roadmap.norg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
@document.meta
title: roadmap
description: Status and tasking
authors: mchristiani
categories: public
created: 2024-07-11T00:05:54-0500
updated: 2024-10-04T14:26:27-0500
version: 1.1.1
@end
* Codebase
** Functionality
*** (!) Test data generation @status=IP
Need to bring a proper formal way to test things together, this is high priority and in progress
See hermetic build tasks.
*** ( ) finite difference
*** ( ) In general if in pure inference mode then perhaps defer to zml but this might require a translation layer, so maybe just a model conversion actually
*** ( ) MKL to {https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html}[oneMKL] {https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2024-2/overview.html}[oneMKL C Ref]
*** ( ) oneMKL to oneDNN https://www.intel.com/content/www/us/en/developer/tools/oneapi/onednn.html
** ( ) Arch. + Perf.
*** (x) should be a backend abstraction
Comptime switchinig on BLAS impl is fine for now. More specific tasks to expand further will be added later.
*** (_) planning for bringing in an array dep. after I prove this point to myself...
Sticking with mine indefinitely.
*** ( ) custom pool allocator (low priority)
*** ( ) Op fusion, @muladd, matmulacc, matvecacc, simd, etc.
*** (x) arch organization first pass {:architecture:} @status=IP
Taking this further than expected, need to properly organize to keep it manageable. Good time to consider abstractions.
*** ( ) Decoupled backward graph, this also allows us to separately compile the graphs which will be important and separate data versioning and other logic.
const mkl = @cImport(@cInclude("mkl/mkl_.h"));
** (-) Refactoring
*** ( ) Rename tests https://ziglang.org/documentation/0.12.1/#Doctests
*** ( ) Naming conventions for inplace, elementwise, and scalar operations
*** (x) Children should be a slice, if not a different data type
*** ( ) Prefer pass by value when possible and rely on compiler more
** (-) Features
*** ( ) Multithreading
*** ( ) GPU
*** (=) Pull in the View abstraction, since this requires custom backward is likely dependent on the `Module`/`Function` abstraction
*** ( ) activation functions, might be same as above
*** (-) ZArray
**** (-) slice ops, potentially via View
**** ( ) infer dim, good idea is to use a tuple with comptime type inspection instead of trying to deal with a signed slice for shape/indices
**** ( ) reshape tests
**** ( ) transpose could be better
**** ( ) bmm acc
**** ( ) shape to use an fba
*** ( ) detach
** ( ) Future/Speculative Features
*** ( ) code gen for elementwise ops
This could make primitives like ndarray/ndtensor more maintainble, many operations can be grouped into patterns to facilitate this (binop, unop, scalar-op, elementwise-op, maybe even heap-alloc vs not).
*** ( ) lazy op fusion (sort of part of a compiler epic)
*** ( ) python bindings
** ( ) Build system
*** (x) Restructure project for proper exporting
*** (x) overall its hacked together for testing, unhack it
Follows with arch organization and test data generation, progress made.
*** ( ) refine lib testing https://ziggit.dev/t/how-to-test-modules/735/5
*** ( ) support real os's, but zigs build system changes every 5 minutes
*** ( ) hermetic build system w polyglot support for torch purposes and python bindings
1. nix+docker
2. bazel with zig rules
3. bazel with build.zig
4. buck2 with build.zig (makes sense over bazel in this case)
** (-) Known issues
*** (x) global mutable config context. Update: enabled comptime settings and runtime grad mode, its pretty good now.
*** (x) in the interest of iteration time, used arena allocators in the tests... actually check management, esp for `Shape`
- No. This has a performance penalty we are sticking to pools.
*** ( ) lack of errdefer in a few places to avoid leaks
*** ( ) shape should use an fba for performance
** Distributed Inference
*** ( ) Distributed comms for GPU (NCCL) and CPU (Gloo)
** Distributed Training
** ML Graph Compiler
*** ( ) compile the backprop graph
* ML
** (x) CV
*** (x) Conv
*** (x) MNIST Demo
*** ( ) MaxPool needs a rework at some point
*** ( ) rewrite conv ops with new packing scheme and no im2col/col2im.
** (-) RL
Roughly ordered list by dependency
This is really just unstable at this point, not sure the problems rn.
*** (x) Replay buffer
*** (x) Global nograd context
*** (x) Gather()
**** (x) forward
**** (x) backward
*** (x) argmax and max over dim
*** (-) DQN
*** (-) exploration schedule [ref](https://github.com/DLR-RM/stable-baselines3/blob/1a69fc831414626cbbcf13343c6e78d9accb9104/stable_baselines3/common/utils.py#L100)
* Demos
** (x) cartpole sim
** (x) cartpole rendering (sadly adds dep, making this optional complicates build so tbd what to do here.)
** (_) mujoco (works but quite the build process, i dont want to add that complexity to main rn.)
** ( ) LLM (could migrate the sharded gpt2 impl)