-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss: Plug-in hooks and building a plug-in culture #2225
Comments
Example: Tab-rejiggering, trimming space before/after It's been requested that we strip whitespace from the beginning and ending of code blocks. Also if someone pastes code that has 5-7 tab stops for ALL of it you could imagine perhaps removing 5 tab stops therefor scooting that content over to the left. These kind of things could trivially be handled by an input plugin. Handed the raw code, they would tweak the spacing, and then pass that code along to be highlighted. |
Example: shebang detection, This could be implemented with an auto-detect post or pre filter... it would analyze the first line of code (even before highlighting) and then could "focus" the highlighter to a single language or be used to "bump" a certain languages rank so that it would have a "bonus" in the auto-detection game. |
Example: Highlight duplicate HTML attributes This could be easily done post-parse by walking the parse tree and dynamically flagging certain nodes to be highlighted with a different class, etc. |
Example: Relevancy boosting for people who typically only use one type of code A post-autodetect plugin could slightly favor a language the content author knew they ALMOST ALWAYS used, thereby preventing auto-detect from doing the wrong thing. Say if you only blog about Pascal you could give it a 10% relevancy boost. |
Example: Replacing the parser entirely with experimental parsers An input plugin could do it's own parsing and if it returned a parseTree (rather than just code) then Hightlight.js would simply skip it's own parsing and jump straight to output processing of the parseTree into HTML. |
Nice idea! |
You don’t really need a per line hook if you can hook the whole output. Do you? Split, add stuff. Join. |
Well, you'll have to deal with html tags spanning on several rows. To simplify this task, In per-line mode we probably can close opened tags on the end of line and then re-open them on the beginning of the next line. |
Oh, I see. Yeah that'd be a whole other thing. I think there are MUCH easier ways to add line numbers than wrapping every line in a span though, but that's my opinion. :-) |
Like what? |
Like using grid or flex box and just having the line numbers in their own |
Not really opposed to that kind of hook (if someone else wants to write a plugin), but seems like a lot of needless complexity when there are easier ways to solve the problem. :-) |
Reference regarding one idea for sharing options: #1969 Not sure it's the best approach, but I wanted to save it for future reference when we come up with more of a plan for plugins. |
Related: |
When adding plugin support I started with the easiest piece And I think the sublanguage thing is a bit nastier to understand, and I'm leaning towards another set of callbacks so that plugin authors don't have to concern themselves with sublanguage UNLESS they really want to:
So someone just wanting to hook top-level Perhaps there are some cases where plugin authors don't care and want EVERY highlight call treated the same (you're always trying to do some transform on a particular language, for example). But I think those cases would be fewer and it's easy enough to just add the two extra hooks... while simplifying things for a lot of other plugin authors. With regard to autoDetect the complexity bothers me a little less. I guess I'm assuming many plugins will target explicit highlights or auto-highlights, not both. If that were to be the case then those authors wouldn't even notice. However someone who wanted to target auto and highlight would find their callbacks fired like this:
Is it just a plugin authors responsibility to keep this in mind? I'm thinking perhaps it we provided some meta-state that would be enough: // say I only care about explicit highlights, not auto-detect attempts
hljs.addPlugin( {
'before:highlight': ({block, data}) => {
if (data.flags.autodetect) return;
// continue with normal processing
}) |
Is there a reason one can care about explicit highlights but not about auto-highlights? |
Well off the top of my head you have different categories of plugins. Or really maybe the difference isn't between auto-and-not, it's between "just one language" vs "ALWAYS". IE: Just one language I have a kick-ass plugin that does impossible language analysis and really cool stuff with C++ code (or perhaps even something trivially simple)... I don't care about auto-highlighting (in general). If someone says the code is "C++" then I run OR if auto-detect decides the code is C++, then I run. But in general when the 283 other languages are tested/highlighted I don't give one flip - I only care about C++. All languages I want to strip all whitespace from before and after code blocks because whatever tool is building the HTML couldn't be bothered to do that properly - and for my use cases whitespace never matters. In this case I want my plugin to ALWAYS run before, and modify the code by stripping whitespace... I don't care if we know the language, I don't care if it's auto-detect, etc... I just want to strip space. I probably don't want to strip space from sublanguages though. What I want to avoid is a world where someone wants to write a "simple" plugin like one of the above but to do it has to hook 8 different hooks, and track whether highlight is being called initially, inside of auto-detect, inside of sublanguage, inside of a nested sublanguage, etc... IE the closer the plug-in psuedo-code is to this, the better: before("cpp", doAwesomeAnalysis)
after("cpp", applyAwesomeAnalysis)
beforeAll(stripWhitespace); |
There is kind of an implied split here we don't currently even have... "after:autodetect" and "before:highlight"... since the auto-detect is the highlight... it's possible my plugin would only want to act on C++ code once it had been determined it was actually C++ code, not just when the highlighter is GUESSING. |
I try to avoid designing this stuff abstractly, so the current focus is #2391. The idea being "detect frontmatter" and if present, possibly route to a particular grammar or else the plugin would take over parsing duties itself, calling two particular grammars internally. And already you have a case where a plugin might want to recursively call |
Looks very complicated... In plugin you'd probably want to set boundaries of languages and let the core do the actual highlighting itself. |
That's what I'd do here. The plugin would split the code into two pieces in a ...but the plugin author would be free to choose what works best. Perhaps they want to add support for a language that's really impossible to parse with pure regex but easy with a few lines of JS. In that case they wouldn't need to call
I don't know what you mean here but it sounds like the opposite of a plugin, sounds like adding features back to the core library.
We already have before/after for highlightBlock... technically it'd be possible to write this plugin today if it only needed to work with highlightBlock, but highlightBlock is browser-only... and that's pretty limiting. Most plugins are going to want to hook the actual library calls so that they would work inside NodeJS, etc. not just the browser environment. |
But since you brought up highlightBlock right now it's kind of a "catch-all"... the plugin author would have to figure out if a language was specified or going to be auto-detected - they also have the power to change or control this behavior... they could run their own auto-detect for example... or flag anything containing a magic string as a given language, etc. IE, my |
I think we want to hide/abstract the internals as much as we can so that our hooks all make sense but don't seem to be rooted too much in implementation details. Say highlight (for some funny reason) calls itself 20 times just because of some quirk (even in simple cases). Someone hooking before/after highlight would NOT expect their hook to be called 21 times... we'd want to abstract that to deal with only the external call to So that's why I think we need to have separate hooks for You could also make the same argument about auto-detect... TODAY it calls highlight a zillion times... but that's an implementation detail. Other libraries have really good results with machine learning for language recognition (and we've already talked about plugins that could copt recognition). So in that case if I hooked |
So I think that leaves us with more focusing on what is happening rather than internal method dispatch:
So now you have clear boundaries for functionality. If you wanted to ALWAYS touch say "pascal" you'd need to hook The last tricky party is when you mix sublanguage and autodetect... Is a |
Actually I don't think autodetect should trigger 284 times... that seems weird and very, very internal. Should plugins need to care about every single time attempt the auto-detect makes at detection? I need to think about this over coffee sometime. Seems so much of this is easy to bend to be very implementation specific if we don't think about it right. |
Say I've been given a snippet of text to auto-highlight. More abstractly if you think it terms of events:
So I count 4 events there, and 3 types ( If I hooked into BEFORE auto-detection I could write my own auto-detect, or influence the built in method (whatever it was). If I hooked AFTER I could say look at the list of potential matches and make my own choice (so perhaps augmenting the built-in support). I don't see a case there to call any hook 284 times. That seems to be an implementation detail of how we currently do auto-detection. That makes a LOT more sense to me.
Leading to:
Updated: Added autodetect for sublanguage. |
I do agree with your thought process. A thought: do we really need separate hooks for autodetection? We can have something like "choose language" with a list of alternatives. If there is just one, it corresponds to |
I don't follow what you mean. A plugin like
|
I mean you can treat anything like autodetection. But sometimes it can choose from exactly one predefined option. This can help with
|
Actually I think that would work just like the parent... the question is would you need a "sublanguage/autodect" hook... IE, |
I still don't follow you might have to give a very specific example of what you had in mind. |
Nevermind. Your event system makes sense. So, for a
Do I get it right? |
Or decide they are all rubbish and do something else entirely. :)
Well, honestly it'll help if you stop thinking about it that way. :-) Instead think: Auto-detection is done. One of the side-effects is that we also get the result of each potential highlight run, which may or may not always be useful. And may or may not be how it's implemented in the future. So, we run the auto-detection... and (assuming another plugin hasn't provided it's own output) then we decide the language is say UNLESS the plugin makes changes to |
I'll probably start next with just |
A bigger question is plugin vs plugin interaction. For example: I was just thinking about the API and how a plugin like IE, header = highlight(yaml, {skipPlugins: true})
body = highlight(markdown, {skipPlugins: true}) But then do I really want to skip ALL plugins, or do I really want to skip ONLY myself? Perhaps it's not too much to expect early plugins to deal with cruft like this and then after we see what the REAL problems are we circle back and try and fix them with plugins v2. :-) IE, for now if you were using the external API to call highlight from within a hook you'd have to be careful to avoid infinite recursion and that's the plugin authors responsibility. |
Learn it from the native event listener. We could define context. function c() {}
function d() {}
a.addEventListener('b', c);
a.addEventListener('b', d);
a.removeEventListener('b', c); // remove `c` only So for function foo() {}
function bar() {}
{skipPlugins: true} // ignore all
{skipPlugins: [foo, bar]} // ignore `foo` and `bar` only Here’s mine, for inspiration: https://github.com/taufik-nurrohman/color-picker/blob/bb7051afbf5f61ac9b691d28fb899e06b4fadffb/color-picker.js#L282-L322 var $ = this, // the class instance scope
hooks = {};
function hookLet(name, fn) {}
function hookSet(name, fn) {}
$.hooks = hooks;
$.off = hookLet;
$.on = hookSet; Usage: hljs.on('foo', bar);
hljs.on('foo', function() {});
hljs.off('foo', bar); // remove `bar` only |
Yes I understand how it might work conceptually. I'd like to see more plugins deal this themselves before we consider adding anything to core. Right now it should be pretty easy for a single plugin to prevent re-entering ITSELF. I'm not sure plugins should "be aware" of other plugins or be able to easily turn them on and off. You should be able to add multiple plugins and trust that they all "just work" unless they do mutually exclusive things. I don't want plugin A saying it doesn't work with plugin B so it just turns it on and off, etc... |
Closing and referencing at #2762. |
Related: #2174
This thread might get long. It'll try to summarize a list of hook points here and keep it up-to-date. The overall idea is to encourage a plugin culture - rather than adding the 100 tiny ideas people approach us with as options and spread all over the code I'd rather have a plug-in ecosystem that make it easy to write tiny plugins to enhance Highlight.js in various ways. Deciding WHERE the code for those plugins lives would be a separate consideration. Perhaps really useful/popular ones would exist in core, perhaps others would live elsewhere.
Worth discussing how these might work with auto-highlight also... which is really a whole bunch of different highlights happening sequentially and then one "winning"...
Basically a plugin would register itself with Highlight.js and then it would be called at various hook points and thru these well defined hook points would be able to do various things. Examples to follow.
It's also worth considering if individual languages could also act as their own plugins... ie, allowing them to hook into all the same places - except only when they are being used to highlight the code.
Hook Ideas
Post auto-detect
Called after auto-detection is complete and all registered languages have been "scored". Could change the relevancy of any of the results, thereby deciding which language is detected.
Input Hook
Would have access to the incoming state... such as relevancy, language, code, etc... could change any of these... increase relevancy, pre-filter the content, etc.
Output Hook
Called just prior to returning data... again could also any of the data in the return state...
Post-parse hook
Called after the parse tree is generated. The plugin could alter the parse tree. Or access any of the other normal state information (relevancy score, etc).
The text was updated successfully, but these errors were encountered: