Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

funkygao / dbus Public

Notifications You must be signed in to change notification settings
Fork 7
Star 27

Code
Issues 1
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

dbus

/

TODO.md

Latest commit

History

146 lines (139 loc) · 5.5 KB

Breadcrumbs

dbus

/

TODO.md

File metadata and controls

146 lines (139 loc) · 5.5 KB

TODO

TODO

tweak of batcher yield
pack.Payload reuse memory, json.NewEncoder(os.Stdout)
metrics isolation by cluster
participant starts slow
- [06/06/17 15:06:11 CST] [TRAC] ( engine.go:281) engine starting...
- [06/06/17 15:06:11 CST] [TRAC] ( engine.go:343) [10.9.1.1:9877] participant starting...
- [06/06/17 15:06:41 CST] [INFO] ( engine.go:349) [10.9.1.1:9877] participant started
cluster
- monitor resources cost and rebalance
- support multiple projects
resource group
FIXME access denied leads to orphan resource
myslave should have no checkpoint, placed in Input
enhance Decision.Equals to avoid thundering herd
myslave server_id uniq across the cluster
add Operator for Filter
- count, filter, regex, sort, split, rename
RowsEvent avro
inc binlog replication recv buffer size
alert mysql binlog lags
dbc participants -i // show internal buffers
model.RowsEvent add dbus timestamp
HY000 auto heal
multiversion config in zk
model.RowsEvent add dbus timestamp
controller
- a participant is electing, then shutdown took a long time(blocked by CreateLiveNode)
- 2 phase rebalance: close participants then notify new resources
- what if RPC fails
- leader.onBecomingLeader is parallal: should be sequential
- hot reload raises cluster herd: participant changes too much
- when leader make decision, it persists to zk before RPC for leader failover
- owner of resource
- leader RPC has epoch info
- if Ack fails(zk crash), resort to local disk(load on startup)
- engine shutdown, controller still send rpc
- test cases
  - sharded resources
  - brain split
  - zk dies or kill -9, use cache to continue work
  - kill -9 participant/leader, and reschedule
  - cluster chaos monkey
kafka producer qos
batcher only retries after full batch ack, add timer?
KafkaConsumer might not be able to Stop
kguard integration
router finding matcher is slow
hot reload on config file changed
each Input have its own recycle chan, one block will not block others
when Input stops, Output might still need its OnAck
KafkaInput plugin
use scheme to distinguish type of DSN
plugins Run has no way of panic
(replication.go:117) [zabbix] invalid table id 2968, no correspond table map event
make canal, high cpu usage
- because CAS backoff 1us, cpu busy
ugly design of Input/Output ack mechanism
- we might learn from storm bolt ack
some goroutine leakage
telemetry mysql.binlog.lag/tps tag name should be input name
pipeline
- 1 input, multiple output
- filter to dispatch dbs of a single binlog to different output
kill Packet.input field
visualized flow throughput like nifi
- dump uses dag pkg
router metrics
dbusd api server
logging
share zkzone instance
presence and standby mode
graceful shutdown
master must drain before leave cluster
KafkaOutput metrics
- binlog tps
- kafka tps
- lag
hub is shared, what if a plugin blocks others
- currently, I have no idea how to solve this issue
Batcher padding
shutdown kafka
zk checkpoint vs kafka checkpoint
kafka follower stops replication
can a mysql instance with miltiple databases have multiple Log/Position?
kafka sync produce in batch
DDL binlog
- drop table y;
trace async producer Successes channel and mark as processed
metrics
telemetry and alert
what if replication conn broken
position will be stored in zk
play with binlog_row_image
project feature for multi-tenant
bug fix
- kill dbusd, dbusd-slave did not leave cluster
- next log position leads to failure after resume
- KafkaOutput only support 1 partition topic for MysqlbinlogInput
- table id issue
- what if invalid position
- router stat wrong Total:142,535,625 0.00B speed:22,671/s 0.00B/s max: 0.00B/0.00B
- ffjson marshalled bytes has NL before the ending bracket
test cases
- restart mysql master
- mysql kill process
- race detection
- tc drop network packets and high latency
- mysql binlog zk session expire
- reset binlog pos, and check kafka did not recv dup events
- MysqlbinlogInput max_event_length
- min.insync.replicas=2, shutdown 1 kafka broker then start
GTID
- place config to central zk znode and watch changes
Known issues
- Binlog Dump thread not close github/gh-ost#292
Roadmap
- pubsub audit reporter
- universal kafka listener and outputer

Issues

a big DELETE statement might kill dbusd
- It might exceed max event size: 1MB mysql seems to auto-chunk the big event into chunks of small events
- It might malloc a very big memory in RowsEvent struct
- mysql packet max payload len = (1<<24 -1)
OSC tools will make 'ALTER' very complex, whence dbusd not able to clear table columns cache
- use SQL comment to solve it

Memo

mysqlbinlog input peak with mock output
- 140k event per second
- 30k row event per second
- 260Mb network bandwidth
- KafkaOutput 35K msg per second
- it takes 2h25m to zero lag for platform of 2d lag
dryrun MockInput -> MockOutput
- 2.1M packet/s

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.