-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saluations ! :) #34
Comments
I have been digging more into haskell yesterday, and i think my objective is very similar to typeclass / monad system. The principle is to have generic code only requiring variable/object it manipulate to be convertible to the type required in the code through monomorphized functions. The system of node i've been working does the same with C code, and allow to write generic C that doesn't need to know about the specialized type of object it manipulate, and with the reference counter there is no pointer ownsership so nodes are automatically freed where the last reference to it is released. I've been watching the code from your repository Haskell-Webservice-Framework , the way they program server side page generation is very similar to my script actually lol https://github.com/keean/Haskell-Webservice-Framework/blob/master/Modules/Admin/App.hs ` handle req response = do
` My script :) https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site
Almost exactly the same keywords and overall structure :) But in fact where i want to get at, also by reading the discussion on this github, and the concept is very similar to what i can see of this haskel webserver framework, is this : For the moment, i have a system to define blockchain p2p service in sort that the node first read packet header, and then a function in the protocol module associate a dictionary for each type of message identified by their header signature, then it can deserialize the binary data to this runtime defined dictionary, and then call the scripted handler routine with a reference to this deserialized object for the message handler. The code for this looks like this : https://github.com/NodixBlockchain/nodix/blob/master/protocol_adx/protocol.c#L1429
The structure of the message is constant and known at compile time for the moment, but it could be defined in a script as a dictionary to associate a message header signature to a type definition, if all the leaf member type size is known, it can deserialize automatically the object from binary data, and send the instance to the message handler associated with the message header dictionary. It would allow for high level definition of network protocol service, with dictionary of serialized object associated with protocol message , associated with an handler function with this object type as input. For most binary protocol it fit well, because most of them will have header signature and a sort of firt application layer in the network protocol to describe packet data, and it will allow to quickly define scripted handlers for binary protocols. The side with webservice is different, http need special handling to make it easy to write page generation script or rpc server, to handle query data variable, post data, cookies / headers and all the web related things. But i guess something could still be though off to have a better synergy between web server and cgi scripts than the super low coupling with apache + php, the web server could have a conception of the application it's running, and allow more easily persistent data, sessions, more in a way like tomcat, but without java or virtual machine, allowing for 'smart' generic function from the web server based on the script definition, like tomcat have it's system of encapsulation for servlets, and can allow for built-in handling of the HTTP protocol request in the script definition of the application, and not necessarily only with a function level granularity. Like can be easy to generate sitemap or other things automatically from the webserver based on the application definition. And potentially way to generate automatically jquery plugins script file based on module definition with rpc binding. But for the moment i would be interested to find the best formulation for defining objects or typeclass to handle binary network protocol request, with dictionary of object associated with the protocol message header, which be instantiated automatically/blindly by the node framework with generic code (typeclass like) and associated with an handler routine that take this object as parameter, and eventually associated with a template system to display the object data in html with specialized function for each type of object/array of object. Then network services can be defined as aggregation of such object defined potentially at runtime based on dictionary and typeclass like generic code. But still the integrity can be checked at both end, the node can check if the binary data fit the dictionary via message header and size, and the script handler can know the type of its argument and test for the presence of the member/property it needs at runtime. But i will look more into haskell probably, it seem to have lot of interesting idea with it. I'm only middly interested to use it, because i'm always a bit wondering with new language like this, when i need crypto, database system, and lot of underlying function, i'm never sure if their whole package doesn't contain bugs or are broken, sometime there are bugs in their modules or such and they don't even really say it, and need to dig in some git issue and then find out your stuck for 6 month the time they solve it lol But i think my objective is very similar in the principle, but all the core is made in C code and portable positon independant binary modules, and i still want to retain the possibility for easy bare metal booting for ARM/PI =) Scripts can make call to the modules's exported function with pointer to the reference to the node/objects as parameters, if the function in the C module take generic reference pointer as parameter. The presence of exported function can be detected at runtime, it still lack meta data to define the number of argument, but all the arguments are identical from C compiler point of view. And it would not be too hard to add manual definition in module to get the number of argument from the exported functions to detect problem at runtime. |
As you know I have been very busy on other matters, so I have not been able to fully digest your posts yet. Just glancing at your OP, I am not sure if you tied this in sufficiently with @keean’s Zenscript PL design goals. Perhaps you need to explain more the relevancy or parallels between your experimentation and @keean’s. But as I said, I have not yet read your posts carefully. I am rushed. Also I just learned yesterday that @keean is in fact very busy with his software company’s daily workload. I think he is probably most motivated by discussions which further his aims as expressed in the other discussion focused Issue threads. |
I have highlighted parallels more in the second post :) Yeah no rush, I post this now as im into it, he will answer when he has time :) For the moment im taking bit step back from core coding and will get more into the website / doc / explanation etc I have bit more time for chatting in coming days, less hard coding normally :) Some conceptual thinking à bit :) But not necessarily for short term, but if I find a good way to get with what I explain the 2nd post it's something that can be wisheable to integrate. |
@keean replied to me in private and wrote one sentence only that he will take a look when he has time. I know he was on a business trip this week also. |
I will try to answer some points in other issues too of things i saw i think i solved :) My solution is a bit gordian knot solving, but normally should still have good level of security etc I think the main difference in approach is i don't even attempt at making the compiler checking anything, but everything is (or can be) checked at runtime. The actual type are only resolved at runtime, and converted to the required type live from the node instance. All function to access nodes/monads have success/failure return state, in sort that all access can be tested for success, and if it return failure then the output value is not altered, and should not be used before to be initialized. And i limit multi thread interaction to simple case with asynchronous event framework , which can be green thread or heavy thread with lockless message list. But yeah no rush :) I will try to make the point i can see relevant in the other issues, i posted in the thread about concurency about the issue you talked about in pm on bct. |
Hi NodixBlockchain, you are taking a very different approach to types than I am. What I am interested in is proofs about programs, and generics algorithms. These roughly correspond to proofs about algebra and algebra respectively in maths. The idea with static types is that if you can prove variable 'x' only ever has an integer type assigned to it or read from it, then you can omit all the runtime checks, making the program faster. Obviously we would like to go on to prove other things about the programs to allow more runtime operations to be omitted. Dynamic languages like JavaScript and Python do exactly what you suggest and defer all typing and type checking to runtime. You may also be interested to know that a JavaScript Promise is a monad, (with bind = then and unit = Promise.resolve). Promises not only allow chaining asynchronous operations, but also have a success and failure return state (actually a continuation, but the same effect). This works well with an event model for asynchronous IO. |
Unlike javascript, there is still a concept of typed object, can distinguish different type of object in an array, and it's not completely type free like js. And there is no garbage collector, but manual reference counting so memory remain always clean ( and strictly bound), and it's not focus only on green threading like js and doesnt require virtual machine as it's compiled to binary executable. I think im more interested with emergent properties and allowing more flexibility in writing program and object interaction, even if the actual outcome cant be predicted :) |
JavaScript has "TypedArrays" to allow fast unboxed types in arrays. TypeScript can restrict the types in normal JS arrays too.
Reference counting is a garbage collecting (http://onlinelibrary.wiley.com/doi/10.1002/spe.4380140602/abstract) The kind of multi-generation mark-sweep GC used in JavaScript is faster than a reference counting garbage collector. |
Yes GC is faster but tend to use more memory and memory bound are very loose, js vm can quickly eat up lot of ram and there is not much you can do about it. Here the memory is available once it's not referenced. I agree GC are faster & all, but also they tend to not really free the memory a lot, especially with complex dynamic applications. For me GC are optimal where there can be sweet spot for flush the GC, like in a video game when the level is removed, or in browser when the page is closed. But for application like servers who need to run h24, there is not specially obvious sweet spot to flush the GC. And i don't like the idea of java GC to fill up the memory before to start collecting the garbage, it's a bit dangerous, and often lead to app crash on android on phone with not a lot of memory. Then need to clean up memory manually by clicking the clean up button because it's full. If the idea it's to fill up the memory until it's full and then the application crash yeah ok it's faster than reference counting i can easily understand why :p Specially if it's to run on system without virtual memory, there is not the easy sweeping under the carpet of swaping out garbaged memory and having 2Gb of virtual memory in the swap.
Also key members of objects can be hard typed, not only arrays. Essentially hashmap are same as arrays in my system, and all entries can be typed, and have automatic safe serialization routines. And serialization/hashing of complex object (blocks->txs->inputs/outputs->script addresses etc) is kinda the hearth of blockchain operations :) Having safe serialization/hashing function for objects is a must, or building merkle tree out of object arrays, and it's not that easy to have with js. Like in the idea, you can do operation on anonymous object for example to get any member of the object that is a message list, and then do operation on this message list, without knowing anything about the object at all. Any object can be considered as an array of typed key, and all object's and array's keys can be accessed by type etc |
The GC in V8 JavaScript can mark-and-sweep incrementally to avoid pauses. Using unboxed
This sounds like a security hole. Static typing is important for security. Designing and implementating a dynamic “framework” (as I hesitate to call it a language) is I think more expedient than designing and implementing a statically typed language with sufficient higher-order polymorphism. |
Firefox's GC does not seem to be so clever, I get frequent pauses if I do not try really hard to reduce the amount of garbage generated.
Are you sure? It sounds a lot like existential types to me, where know we have a list of objects that all implement the Haskell syntax: data MessageList = MessageList (forall A . Message(A) => [A]) Out Syntax (not agreed) data MessageList = messageList(forall A . List[A] requires Message[A]) |
I was referring to Chrome’s V8. I presume you can saturate it of course. I presume it tries to do incremental if that is plausible. But yeah I presume we need to minimize the load which is why I wrote:
One can even compile C code employing Emscripten which
It depends on whether that blackbox we are calling has been constrained to not access APIs which do not want it to. Without static-compilation, we do not have a fine-grained sandbox (as I proposed by limiting which imports we give the compiled code). So 100% dynamic defaults to the overall host language sandbox which is far too permissive to do any capabilities security. |
GC like this can works well when memory is compartimentalized, and there is good sense of pointer ownership. With js it's easy cause you can know easily when all variable from the à script is going to be un used, and there is no threading. When objects can be passed around and referenced in different threads without sense of strong ownership, im not sure this kind of GC is that efficient. |
Static typing is important for security if memory access depend on this static typing. As all memory access are made knowing the dynamic type, there can be no security hole at all. For me the idea is a bit similar to principle of godbel incompleteness, you can never have a single language who can be sound a consistent by itself. The consistence of the program come from the mind of the programmer, not from the compiler. If programmer want to screw up memory and write crap program who are inefficient and crash, they can always do it with any language. The thing is i'm not really sure if it's really supposed to be considered high level language. The thing that it's made in C is irrevelant, because the C compiler is not supposed to really understand the whole high level logic of the program. It's a bit in the middle in between low level and high level, it's low level code who implement high level construction. The only function of the low level code in the case is to provide abstraction for memory, threading, i/o, object hierarchy, and atomic operation on objects in parallel multi thread system. Until some weeks ago, i didn't really try to build up high level language to represent those high level concept, but the code is already very layered, and the C compiler can't really understand much of what is going on in the top application level, it's mostly message handlers and dynamic object manipulation, and it rely on abstraction of high level concept, even if it's C code, the whole logic of the program doesn't rely only on C abstraction level. It doesn't use one bit of the C runtime. For the moment it just use stdio on linux because i'm too lazy to make the code to use directly the kernel level unistd function, but that's about it. On windows it use directly kernel API CreateFile etc. and i use kernel level api for sockets on linux & windows. And that's the very low level thing, 95% of the program will never see a file handle or anything system specific, and it doesn't use any function of the libC / runtime anywhere. Anything that is beyond libcon & the launcher rely entirely on high level abstraction, even if they are made in C, because i also want to have the low level / cpu / memory part fully in check for safe multi threading & certain things that are better done in assembler ( even if it's 1% of the program, and the end user never have to deal directly with this at all) . For me most of what high level language do today is restricting program expression to simple case that avoid to deal with complex issue, and they are not even that good at doing this. And my idea i really want to have a monad like concept, with object and operations that do not depend on context, and that operations can be done on any object regardless of who allocated it, who is using it, that all object are exactly identical to each other regardless of high level definition. The goal is not necessarily to encourage 'bad design', but more that if people want to do good design, they can bother to study all the variable and sharing and all to optimize threading and parallelism, but if one want to be lazy and just scaling function with some shared object, that they don't have to bother about it even if there can be performance down side. If they want max performance, they can always inline SIMD assembler or whatever and deal with all the memory and stuff themselve. It's still C in the end. If they want lazy programming, they can just use the high level concept of nodes and object, and they don't have to bother about memory or threading at all. (the only thing that can affect the thread is atomic operation that are about one instruction wide). |
With my system of binary modules, all the import can be checked, and it can only import symbols exported by other modules. There is no way a binary module can get a direct pointer to any system function or a function exported in a dll any where on the system. And it wouldn't be too hard to add some restriction on the functions that a module can import, to have sand boxing at binary level. For the script, all is sandboxed, it can only access variables that are declared in the script globals or in the local function. There is just a reference the node object added to all scripts, to access directly node variables to inject them as js var in the html page, and it can save some rpc call / ajax call to get dynamic data from the node into the page with direct access to the global variable. Every script variable access can be controlled, including call to the modules api. Capacity based security can be expressed with the script language abstraction, and having dynamic type doesn't mean you can't specify type at compile time. If you create a node with compile time type, it will act like as a compile type, and the reference will always have this type. Constants can be added at compile time, or in variables outside of the script scope that can be checked by underlying functions. Scripts can't be modified at runtime, and a script can only access variable defined in its own file. With the tree system, it's very easy to limit the scope of access of a script function to only childs of a particular object or node. From the C code, i guess screw up can happen, but after it would be like running production apache server with binaries that you found on an alternative website, if there are glitch in the binary, it will screw up anyway, so if you want security, you can only run trust binaries, but that would be the case for any compiled language anyway. With my system of binary module, the same binary module can be used on linux/windows so it's easy to just copy paste binaries from a trusted source and check they are the good modules. After it's clear if you want to run any binary module you find on the internet, it can probably screw up, but i think my system would still be globally safer than system of dll, because at least they can't import directly any function outside of the modules, and only libcon contain access to the system. So by just preventing a module to import function from the system, it should be totally sandboxed from the system, and with position independent code, all memory locations can be randomized easily. but in the end, most of the time, the point of doing capacity permission is checking authority to modify some data, and if this said data is to be stored on a blockchain at the end of the day, the integrity and authority on the information can be checked via the blockchain protocol, rather than relying only on the capacity checking at program execution level. I described a system of capacity based permission quickly on devos forum some years ago, but it was before i got into blockchain, with blockchain i think it can simplify a lot this issue of data access security in the broad picture. |
And for me static type for code executed as binary is very weak regarding security. At cpu level, there is only process level granularity, any binary code loaded in the process memory can acess all the memory of the process, no matter if it's defined as constant, static, private, the cpu doesn't care. And if you use only static typing compiled as dll, it will always be loaded at the same virtual address, and all the static variables will always have the exact same memory location. It's very easy to inject a dependency in the process to make the system load some binary code in the process space, and then this binary code can access all the variable, classes and types at static location. And i've been doing this with many commercial app, most of them use C or C++, hence static type, it's very easy to inject a dll in the process and access all the variable live ( see diablo hack for example of this). All the high level definitions are only useful to check the security of program that are made with it, and from the moment you want to compile and run it as binary code, the whole memory space management is handled by the kernel, and all the memory structure for the statically typed static variable is defined at compile time, and will always be loaded at the same virtual address . And the high level definition can check nothing from program that are not made with it. If you want to have some kind of RPC, and expose interface to client application, the high level language can check nothing for the client code, or format of parameter it will share with the host program. If you want to be able to mix the compiled application with non trusted binaries, the high level definition doesn't matter all, all those abstraction are gone once the code is compiled to assembler, and the only granularity you have at binary level is process level. Saying that having dynamic type is a weakness in security is like saying that having mutable variable is weak for security. Having dynamic variable in a program doesn't mean all data has to be dynamic either, it's same for types. If you run only trusted code manipulating the object as a specific type, the object will never have another type. If you want to run non trusted binary in the same address space, with static typing it's virtually impossible to prevent this binary code to access anything in the process, even without giving it the definition of the class or a pointer to anything, all the static variables are at static location, with a static type anyway. And it's very easy to know the effect of overflowing this or that buffer, because all the variable are at static location in memory, so overflowing a buffer at a static location will always effect the same variable at the static adjacent position. With the system of multi threaded double buffer monad, dynamic type, and position independent code, already you can say it's much harder to figure out the location of any variable at runtime, without even explicit randomization of memory access. As all the loading of the binary code as position independent code, the relocation / export / import etc is made manually, it would be very easy to randomize the location of every single variable in the whole program at runtime. If you use dynamic arrays or complex hierarchy of objects with dynamic type, it's very unlikely a variable will be loaded all the time at the same location, and hence it make buffer overflow much harder, added to the fact that all memory access are programmed using the tree system on dynamic type even at very low level, so all the code using this system is already pretty resistant to buffer overflow, and it would be very hard to figure out what variable is next to the other in this kind of context, without even explicit memory randomization. Actually static type are always the security issue, if everything is dynamic, there is never any overflow, nothing is expected from the data at runtime, all operation even on objects are strictly memory bound, either they are the function to allocate the data, read it or write. An access to a dynamic variable will never overflow to another variable. |
@NodixBlockchain wrote:
I do not want to get off on a long tangential discussion right now, but efficiency comes in many flavors. GC is more efficient in aiding rapid programming (not talking about runtime performance). Generally anything that escapes the generational GC is less efficient than had it not. RAII stack frame allocation and deallocation is probably more efficient than generational GC. But in general reference counting is not more efficient in every way than GC and it does still suffer domino effects causing high-latency stalls. However, reference counting deallocation is prompt and more deterministic than GC mark-and-sweep (although I rebutted there, “But please note that reference counting can’t break cyclical references but GC can. And reference counting can cause a cascade/domino effect pause that would not be limited to a maximum pause as hypothetically V8’s incremental mark-and-sweep GC.”). Incremental GC schemes when not overloaded with allocation that escapes generational GC, decrease high-latency stalls. For 100% real-time performance (i.e. no stalls), then hand-tuning of memory allocation and deallocation is probably needed. Memory leaks can occur with any of those techniques, but GC eliminates some cases of memory leaks. Here are other posts that discussed reference counting: Afaik, threading has nothing to do with making reference ownership (for the purposes of deallocation not controlling shared access restriction) less deterministic. Perhaps you are thinking about for example browser integration with for example Java and Flash applets which may not be well integrated with a single GC instance. |
@NodixBlockchain wrote:
You appear to not even for example considering that security includes restricting access to certain APIs as I pointed out: @shelby3 wrote:
Assuming our dynamic language prevents access to global variables, we could restrict APIs by passing them as input arguments. But we then have no static checking on what those input arguments of the caller contain. We end up with instead some dynamic soup that can only be check with unit tests. Unit tests are not security.
Is that like being only a little bit pregnant?
A sandbox can be much higher-level than that.
Sometimes a static type checker does get in the way of expressing complex algorithms. @keean is quite knowledgeable on PL theory and has a lot of experience implementing algorithms on different languages. We have had in depth discussions about typeclasses, HRT, HKT, and modules for example. I have a learned a lot and contributed my slant/insights as I learn.
I will not speak for @keean’s opinion, but I know he and I have mentioned several times our agreement with the general principle to try not to have multiple paradigms in the same language, i.e. not multiple ways to do the same thing, if possible to avoid. Because readability of open source is a very high priority these days and the complexity budget is finite. |
I don't see why dynamic typing prevent restricting access to any API. You could define the methods or modules a certain script can call in the definition of the node, and then restrict script execution to those. Same for variables. If the api need to check credentials, there is in browser signature or sessions based on cookies, and then restricting access to certain function based on checking credentials. The function can check if the input data contain the data it needs, with the type it needs, it doesn't remove security from static typing. I think you are quite confused between the concept of data format (like network protocol), type, and interfaces/API =) The goal of making interfaces/API is to provide methods to access conceptual property of an object without knowing its type or internal data format. The goal of network protocol / data format as in serialization is to ensure objects can be transmitted over a network protocol between two programs, even if the representation of the object they have is different, or that they don't even use the same field of the network data. The goal of object is to provide abstraction for data localization in computer program. Data from network get serialized into object that are exposed via interfaces. Implementer side of the interface can check if the parameters fit what it expect, but ultimately what matter in term of high level concept of security is not the format of the data, but the information it contain, it's why interfaces are useful to abstract the data format from the conceptual type manipulated in the program. With json/rpc, from javascript all object are already dynamic anyway, so there is necessarily a step of checking the type of the input data. With anything connected to internet anyway, you can never really assume much anything from any type or data coming from the network, everything need to be checked anyway, especially more true with blockchain where every packet could be anything from a spam/DOS, a valid block, with anything in between of orphan blocks, and all the mess, but dynamic type can allow more flexbility in what can be safely accepted, without removing any security. PHP use 100% dynamic typing, i don't see why it make any security problem. Even to break that down at fundamental level, credential checking is a 3 item thing : the admin - definition of what is allowed or not When you think about it, it's exactly how bitcore script of checking transaction works. So you could easily make data on the blockchain to store the information a particular object is supposed to be able to do, or some kind of permission template, then a bitcore like script containing the script to match with an input credential, and the script can return true or false if the user have the required access with the provided credentials. The only reason why you would want capacity based thing to me, is to be able to run non trusted code. Like doing something operating system level thing to control access to local resource in environment where non trusted code can be run on the system. If it's in the trusted code scenario, it's very simple, you don't want an API to have a certain access, you don't program the function in it , period =) You don't want the server to be able to do certain kind of action, don't expose the interface to objects who does, that's an admin job. It's same with any server side software, if you want to keep it safe, only install code on it who expose the function you want to allow. If you want to install any kind of non trusted code on the server within the same domain etc, you're going to end in trouble. It's same with script, if the script is trusted code, all the type manipulation can be made static too, and in the case, most of the types are defined at compile time, if the json-like string to create the object is a constant in the C program or the script, the object will always have the same type. And that doesn't change anything with static type, in the context of web server, it's not your program who generate the requests, and they will be created most likely from language who already have dynamic typing. The only substantial difference between dynamic & static typing, is that with static typing if the input doesn't match exactly the object definition it would fail, with dynamic typing, it will try to instanciate the whole thing no matter what until it run out of memory, and then check if it fit with what the inputs needs. The protocol module in my system is what create the object template from the definition of network message, and the object will be instanciated from the data and passed to the message handler (either it's in C or script). The base layer can only check if the size of the data match with the object instance size, and higher layer can check if the objects data match what it needs. It can allow for unordered named parameters like AS3/js and json/rpc. And the handler only need to check the object has the good named property who can be converted to the type it needs to make it's operation (in this i think it's close to haskell principle with monomorphized function).
Normally they should only be doing things with the node / monad or the framework system, but the C is already there, and well documented, so it can be used too. But in the idea, in the future, most of the high level things should be done via the script / high level language, and then all is unified around the principles of message lists / event handlers and dynamic objects. Script variable resolution is always limited to a particular root node exclusive to the script where all the global variable of the script are, and to the variables in the function definition (aka mostly the input parameters, or the output buffer and http infos for page script). All the variables a script can access need to be child of the script root node, or child of the function definition node. |
Allocation / Deallocation is (non atomic) shared access to the memory pool / heap. Hence why need often need to be careful with allocation in interrupts or such thing sometime, cause it can trigger dead locks. But as i want to keep my system as lockless as possible, it mean a GC can only free memory allocated in the same thread it's running into. In multi thread environment, GC flush would probably need synchronization primitive with all threads when it need to flush all the reference and memory, and checking in real time which memory is used or not. It's much simpler with javascript like language in single thread, because you know all the application is stopped while the gc is flushing, and all memory not referenced at this particular point can be freed safely cause there is no other code that can be potentially manipulating this references in the same time. But the tradeoff with my system is that it's based on lockless internal allocator with free area stack, so it's very fast to allocate / free memory, and all references can be shared between threads with lockless atomic reference counting. The only thing that can't be done for the moment without lock is a thread freeing a memory allocated in another thread. I have the code for doing this, there are comments in the memory allocation code to aquire/release semaphore primitive, so in theory, just need to uncomment those, or replace them with synchronization primitives like semaphore, and then the memory allocation become thread safe (with a semaphore). But other than this, allocation and freeing of memory is lightning fast and completely lockless. There is only one call to system allocation memory / thread at initialization, and that's it, after all allocation is lockess based on free stack, and super fast. It could even do live memory defragmentation as applications only manipulate references, all the instances can be relocated transparently from application. And there is very useful feature to track memory leak, such as displaying all memory allocated since last time, and can track all objects and memory that is newly allocated at some point, to track when object are leaked, can know which object it is, and the data it contain, which can help tracking most memory leak in minutes. But i think there is still something similar in the principle to the mark & sweep thing, i use it for multi thread message passing, as thread can't free memory, a thread can't just push a message to the list and forget about it. So when threads are done processing a message, they set the message as 'done', and the thread who push the messages in the queue periodically flush the list from message with the done flag, and if it's the last reference to the message it will be freed locklessly by the thread who created it, so i guess in the principle it's not far the mark & sweep. It's also used in the block synchronization, because block packets can arrive in any order, sometime a block can't be processed before latter blocks arrive in the wrong order compared to the blockchain order, so there is a system to keep such message in the processing list, and all message beyond a certain age are wiped from the list regulary. In the absolute it's not hard to impement mark & sweep kind of algorithm with the tree system. |
It is as if you did not comprehend what I wrote about unit tests. Dynamic typing makes no assurances until runtime. And due to the Halting theorem, we can not prove all runtime scenarios have been accounted for.
Nope.
👎 You do not seem to understand that modularity is all about static types. But I am not going to have this sort of religious discussion. You are free to believe what you want to believe. You may be conflating encapsulation and typing, which are separate concerns.
Secure deserialization of typed objects is not a valid argument against static typing. The static types are enforced by the deserialization, which enables guarantees that are not plausible with unit tests.
Irrelevant to the context we were discussing. |
If you are discussing why threading affect how GC can free memory or not, then it's relevant. |
They can make the assurance that the program will not crash, and always return meaningful result based on dynamic data. All the polymorphic scenario at the end of the day are monomorphized function that are known at compile time, so the whole code to access dynamic type could be inlined in the program, to make sure the the condition are met with the input data. If the input data is invalid, then it will return an error, there is not much else to do. If the "monomorphized interface" to the input data can succeed at getting the value in the type required to the code, then the input data is valid, and the operation is made. If the monomorphized function fail to convert the input data to the type required by the code, then the function will fail and return an error. I don't see what kind of assurance you can have at compile time about the data contained in packets coming from the internet. |
You don't seem to understand that having dynamic typing doesn't mean you can't have static typing. It's like saying having non constant variable in a program make it insecure because you can't know all the value of the variables at compile time. If the type is defined statically at compile time, then it's static and known at compile time. If an object is allocated from runtime type definition, then yes the type definition can't be know at compile time. Which is the case for all json/rpc based request. If you want condition that can be checked at compile time, then don't allocate object with a type using a runtime definition. But then you can't parse json/rpc request based on javascript dynamic objects. As you don't know the definition of those object at compile time because they are generated at runtime by javascript, or python, or AS3, which use dynamic type. And in that case, static typing is just a burden from the client language perspective who already format his message based on dynamic object definition. |
The only real use case i see for 100% runtime defined dynamic type, outside of json/rpc communication with dynamically typed language, is for example to design quickly network protocol based on script definition. In the process of designing the protocol, just need to update the type definition in the script, and all the serialization/ deserialization process, as well as the script code is based on this type definition, so everything can still be checked for consistency without unit test or anything. But it's still more in the idea to be able to edit type definition from script without having to recompile anything, and then just need to copy this definition to all node, and all the network protocol is updated without you have to recompile a thing. But it's still in the idea to be able to check if the code based on the type definition is sound. Even if the actual object will be created based on a runtime variable. the definition of the object can still be known before it's run, and all the access to object member can be matched with the type definition. In case of object that will be manipulated by trusted code, aka via generic code using the script definition, it can check at each point that the data fit the expect definition, and that the access to the object in the script code correspond with the type. From the C point of view, it's dynamic typing, but from the script / node high level point of view, it's static typing, as the type definition won't change for the whole life time of the application, neither the serialization / deserialization routines, or the code to manipulate those objects. But the whole definition can be changed via simple text edition, and then translated to json format for web, and manipulated as tree of dynamically typed object from C, and serialized to binary format according to network protocol specification if the serialization is more complex than just concat all object member to a binary stream. |
What you are describing is a parser. However we can statically type the output of a parser by requiring it to confirm to a known interface. As such parsers produce existentially quantified types as their output. |
Already i think need to distinguish types that are 'leaf', which are node containing actual data that can be used in an algorithm, and node that doesn't contain data, but list of named/typed child nodes who contain the actual data. It's similar in the concept to XML nodes who can either contain text or other childs, except data can be typed and not only text. In case of leaf node, the type shouldn't matter as the functions to access leaf node data are already monomorphized based on the type that need to be used in the code. So the compiler doesn't need to know the type of the leaf node to know if the code will be valid, because no matter what the node type is, it will be able to output the data as integer, string, float, hash, or whatever is needed in the algorithm. The type that need to be used in a particular function is always known at compile time, so the particular function to access the leaf node can be used to automatically convert the data to the required type. It just add a new possible state for a variable access like javascript 'undefined' type, to indicate the variable name cannot resolve to an existing object or leaf node. It should be taken in account in the code to manipulate objects if the code is to be considered 100% safe using dynamic type. All function to access nodes data/childs can return 0 or 1, if they return 0 it mean it failed, and whatever value they were supposed to return is not initialized. It's why i also don't like too much c++ operators because they can't really be checked for failure based on return sate, only with the exception mechanism, they can't easily deal with the case of operator being called with uninitialized/ non-allocated instance pointers. Object instances are mostly a collection of leaf nodes, and are mostly existential to be able to execute certain function or filter on certain object in a list based on their type. They are mostly existential type for certain message or object contained in a list that need to be passed automatically to the good handler who will know what to do with the whole object. As i made a system to be able to evaluate node based on their child member value, like eval(myObject,"height<8"); I can easily have the synthax to register an handler to a message list that will be triggered for any message whose selection expression evaluate to true. I didn't do the operator to evaluate the type of the object, but should not be hard to do. I already have the operator to evaluate object length if it's an array. So i can have a synthax like map to process list of object with the function being selected based on dynamic expression evaluation ( do this with c++ :D ) |
But not all values are valid. Take Unicode strings, some values are not allowed. Also you may want a non-zero number (to avoid division by zero). In the general case you have to check every value and the relationship between them when they enter the system from an untrusted source. My choice is to always treat communication with other processes as a system edge, and parse the data. |
There is already an utf8 decoding in the json parser. But also the thing is as originally it was still made with the idea to be run on bare metal micro kernel, the global architecture is still oriented around concept of 'rings', and different module who are conceptually part of a ring, and data need to be checked before it cross a ring boundary. I had this kind of discussion on devos, and there is always certain point where the code need to assume the input data is in a certain format, aka input to kernel modules for ex, and you can't have data/type checking in every single function call, so function inside of a certain ring are already supposed to check the data they send to other function in lower / same ring. In the idea there is more or less 3 level The libcon is equivalent to kernel level, and those function don't check anything on their own, but they are not supposed to be called from high level interface without parameter check. The application module is equivalent to system function, and they are supposed to make the call to the libcon, and checking parameters they send to the libcon / system. The top level modules are the one exposed to the http interface for RPC/CGI, and those check the user data input, and then make call to application modules with the checked input, and the application modules make call to libcon / system checking their input parameters. There is no direct strict rule to enforce this, it's more conceptual, but as far as i know, there is not really a way to have this concept of 'ring' and module who belong to a certain ring, in order to know where the data checking must be done, when a module from a ring call module in a lower ring (with more permission). I guess some kind of automic filtering of function parameters , or of any data a function is supposed to operate on could be done based on the type definition of the object and the function, but it seem a bit tedious to need to define this for every function, But i guess it's there that compile time check can be useful to automatically detect if certain function code have implicit restriction on type or range due to the kind of operation they are doing, and inserting automatically the checking / filtering at ring boundaries based on this api definition. But in general, if i know in the code there can be potential value who can trigger exception or such, i will generally do this check statically at function level, unless it's a test that can be long and that you only want to do once, and then subsequent call must only be made with valid data, and module beyond a certain ring will only expect already valid data. |
Yes, so you need a parser/runtime type check whenever you cross a Ring boundary (which includes sending data to a different computer). All other type checking can be done statically. To facilitate this inter-ring communication can take place over 'channels' that implement 'protocols'. The type system to correctly type a protocol is much more sophisticated than most language type systems. Effectively you have to specify the types in every message (a data struct) and then specify which messages can be sent in what states of protocol state machine. There is a separate state machine for each computer taking part in the protocol, and we need to cope with them going out of sync, missing messages etc. |
@NodixBlockchain wrote:
@keean wrote:
You may want to relax that requirement to communication with untrusted processes. Static typing will work across communication channels without runtime checks if all processes are trusted. Note that such checks of untrusted processes could possibly become a constant-time proof with SNARKs or STARKs, so that static typing would still be favorable to runtime evaluation (even though the untrusted process has to do excess computation to form a proof, this reduces the (cost of the transaction fee that must be burned to fully eliminate the) DoS attack vulnerability). I am trying to convince you that work on blockchains and cryptography is dovetailing into computer science (even programming language theory) as we move into the decentralization era of computing.
We both know this is dependent typing and can not cope with unbounded indeterminism, i.e. can not work with unbounded, permissionless participation of processes. The constant time cryptographic proofs (as static types that are enforced at runtime) seem like potentially a more viable option than dependent typing. |
@NodixBlockchain wrote:
Actually that is not quite an entirely accurate statement. Dynamic (i.e. uni-) typing in a What Color is Your Function? infection reduces static typing to the least common denominator of the single type
Your response is inapplicable because of my use of the word “modularity”. I should emphasize I mean provable modularity, since that is the only form of modularity that scales*. My point is that runtime checks which are not enforcing static types, are thus unprovable at any time (unit tests are not provably thorough). Static typing (even if they must be enforced between untrusted processes) are provable at compile-time. * Note static-typing does not scale infinitely as the degrees-of-freedom in typings is finite. But unprovable modularity does not scale at all, because the unprovable interactions of modules can readily devolve into a clusterfuck of hidden bugs. |
@shelby3 I agree about crypto crossover, you can have a proof of security (that the program is type safe) as a certificate that is based on a hash of the code. This way the proof certificate is valid if the code has not been tampered with. Of course hash collisions are really bad in this kind of use, because it lets an attacker change the code and keep the same hash, so it needs to be a good hash, plus of course some way to know the hash itself has not been modified. Traditionally this involves trusting Apple or Microsoft to be the gatekeeper, but with a block-chain we can let authors publish their own hashes, however this still requires you to trust the author. In the end the only way to be sure is to type-check the code before running it, which is what Java's validator does. |
@keean wrote:
Cryptographic hashes are vetted by researchers for collision attacks. Given the absence of a viable cryptanalysis attack to lower the tractability from a brute force search, they are thus presumed to be mathematically intractable.
Such a vulnerability does not apply to SNARKs and STARKs.
Incorrect or perhaps you just misunderstand what I proposed or are unfamiliar with SNARKs. The proof proves that the code (i.e. the algebraic circuit) was executed verbatim. As I wrote previously, thus the code for the process can be type checked at compile-time. My point was that what ever runtime check a process would need to perform to insure that incoming data from an untrusted process conforms to the static types of the API of the first said process, the untrusted caller can do those checks instead and pass a SNARK or STARK cryptographic proof that it faithfully executed the code that performs those checks. There is no trust involved at all. It is not more efficient overall (i.e. more overall computation is done)— if we ignore the huge DoS attack hole opened otherwise. |
Collisions have been found in MD5 and SHA1, so a Hash we think is safe today, can be unsafe tomorrow.
But how do you prove the proof applies to the program as submitted. Lets take a really simple example, I have a program that adds two integers to produce an integer result. Now I construct a proof for a program that adds two floats together and produces a float result, and package this with the compiled output of the integer program. In other words there is no proof that the untrusted caller actually ran the checks on the binary it has sent you. |
The vetting process has been much more thorough on the SHA2 and SHA3 process. Because vulnerabilities were found in prior work. The stakes are much higher by now and thus more resources have been allocated by society. Afaik, SHA1 collision attacks are coming more than 20 years after the hash was designed and well beyond its projected security half-life. So I do not think it is accurate to argue that SHA1 was cracked during its intended lifespan. Indeed all crypto can break at some time and then the Internet and civilization will grind to a halt. As well, your compiler can have a Trusting Trust attack in it and deceive you as well. (And without cryptographic hashes we can not really defend well against Trusting Trust attacks) Shall we come back to our senses and talk about living in the world as it is? If we can not trust cryptographic hash functions, our modern Internet-age civilization collapses.
Study SNARKs. It is really amazing but it can be done. I will be adding an introductory summary explanation of the technology with links to resources for learning more about this to the blog I am composing. |
Some attempt to make some tentative judgment on the merits of @NodixBlockchain’s dynamically typed design for a scripting language. |
@shelby3 I am aware of SNARKs ( at least the concept, before it was named a SNARK). There are problems, and a lot depends on the details of the implementation. The 'extractable' property means the verifier does have to inspect the incoming code, and the restriction on this is it must be P (polynomial time). The problem is that type checking and verification takes place with the source code, but what we receive as a module is the binary. We cannot observe and extract the data needed from the source if we do not have it. There is also no way to ensure the binary code really is the compiler output for any given source code. So to get security we need to send the source code, and compile locally, to do which we need to know the types, and hence type check it. The SNARK gains us nothing. The only way a SNARK would be useful is if our type system and proofs apply to the machine code. So we would need type system advanced enough to type imperative machine code, and the functional high level language in the same type system, then proofs can be carried through every compilation step and transformed along with the code. |
After reading many time the Emerald language papers, i understand better what i'm into :) The global idea is basically that types can be values, and assigned to variable. And just like variable, it's possible to write algorithm and expression using some logic or arithmetic based on those type and then typically invoking the corresponding constructor and store a reference ot the object to a generic object variable. It's very similar to Emerald, where all variables are generic object with a manifest abstract type, but the type is only defined at runtime by a constructor which has no special relation with the abstract type or the object variable. And just like usual value expression, sometime the compiler can determine the value at compile time, sometime it need to explore code paths, or do complex analysis to detect what constant value can be assigned to a variable at runtime, and it can then deduce the value or the correct code based on static analysis of the type value at compile time. And just like regular value expression, sometime the value depend on runtime state, and cannot be determined at compile time, and the same algorithm will be compiled for runtime execution instead of the value being hard coded at compile time. With the concept of abstract type, it can allow the compiler to make type consistency check even without knowing the definition of the type. And it can only check the object access in the function implementing an operation on it, which are supposed to have access to the definition of the type, or the constructor used to create the object, for the compiler to check the code at compile time. In case of distributed system, sometime the definition and instance of the object can be present only on a remote node, and only the abstract type is known by the compiler, the abstract type allow to check at least conformity between the abstract type of the interface method definition and the type of the object. A node implementing an operation on an abstract type need to also host the definition the constructor (or the definition of the 'conqurete' type), for the compiler to check code for type conformity. |
@NodixBlockchain wrote:
I mentioned bounded polymorphic type to you more than once before. And this is still statically checked for the bound, and then yes the RTTI tag dispatches at runtime to the correct dictionary (if there is an interface bound, e.g. typeclass or subclassing) or case-logic code paths. |
@keean wrote:
What does receiving a module in binary have to do with the use case we were discussing? I do not see how this applies. I thought we were discussing that we want to allow an untrusted caller to call an API without forcing the callee to do runtime parsing of the inputs (as this can be a DDoS attack hole). Instead we have a portable (not “binary”) arithmetic circuit for runtime parsing which the caller computes a proof of and sends to the callee. In this way, the callee can check that the inputs are valid with a constant time/cost verification. So then the callee knows what constant cost to charge the caller such as a proof-of-work or ecash micropayment in order to disincentivize a DDoS attack (as more robust than IP based throttling). We might even go further and have the caller run the entire code for the API and provide a proof to the callee that is was run and the outputs obtained.
The callee has already pre-compiled to portable arithmetic circuits and thus knows which proofs to expect.
Incorrect. SNARKs can aggregate proofs of other SNARK proofs of other arithmetic circuits. |
This risk can be reduced by hashing the data + module size, or hashing different part of the file together ? |
What I see with this is that anyway it's impossible to prove a program has no bug at compile time, otherwise there would be no bugs at all. The notion of bug or error can only come from the programmer mind, from his understanding of the program and his own goal. The amount of help from the compiler for this is still very limited. Especially more that there will always be portion of application code escaping the compiler, either in the run-time, system libraries, kernel, inter process call routines etc Lot of things that cannot be "enforced" by the compiler because it's not compiled by it. You need to live in a world where not all code that be is compiled by the same compiler with available source code. |
One way i see to help with this, is to have all distributed modules associated with an application,distributer. The application distributer can insert a signature of the application entry in the blockchain, and modules are added as an entry signed with same key. Nodes hosting this modules need to be also manually validated by the application distributer. Like this à client can check the application entry correspond to trusted application distributer, that the module belong to this application, and the node is certified for this application by the distributer. But it need a trusted point with the application distributer. |
The type "polymorphic boundaries" cannot always be known at compile time. And maybe it doesnt have to even be checked at run-time if the variable is just passed between (not yet existing) modules by the code being compiled. It's for this case that abstract type can make sense. I think smalltalk allow this with a more loose concept of type & methods. Methods invocation can trigger runtime errors if the message type doesnt correspond. But with the emerald language, there is more opportunity for the compiler to catch this sort of problem in case of local invocation. Or if all the object definition and interface implementation is known at compile time. But still allow condition where the conqurete type of the object can only be known at run-time. Or where object with only known abstract type can be passed as argument to an interface method where only the abstract type is (and have to) be known at compile time. Only the implementation code, potentially in another module, need to know the conqurete type. It's like when making query to a remote db, you can only assume the table are formatted as defined in the documentation and that it's effectively the good database running on this service etc. There is no way to enforce the type of the result at compile time. It would require to compile statically the database engine with static definition of the tables along side with each application. And yet im not sure current high level compiler would check much, without a whole layer to deal with the database itself, and do the détection at run-time to enforce the type or fail. |
Then it is not a type in the sense that no operations can be done on the type at compile-time. Unbounded polymorphism is useful as a module that is consumed at compile-time giving it some bound at compile-time. RTTI is not typing. All dynamic typing is uni-typing, meaning not typing. It makes no sense to associate typing with something that can not prove any invariants. As @keean and I have explained to you that unit tests are not provably exhaustive, because of the Halting problem.
A bounded polymorphic type is an abstract data type. Note my understanding (definition) is that the bound could be a union of types (or constructors tags) or alternatively a typeclass bound. The former provides a compile-time bound for runtime case-logic (which select operations) and the latter provides operations. Actually hypothetically both could be monomorphised by the caller at compile-time for rank0 (i.e. if not callbacks) when the caller selects the concrete type for the bounded polymorphic type(s).
Typing is about that which can be proven. That you can not prove everything is not a refutation of the value of typing. If you prefer uni-typing, RTTI, and unit tests, then go ahead. Nobody is stopping you. The redundant argumentation is getting too tedious now. I am going to put you on ignore.
So throw away C and types such as |
Unless you want to design a programming language, then you do not need to learn it now. Although knowing what we are talking about will make you a better programmer. However, it would be best if you waited until we reach some conclusions and then start to write some guides on our new language to explain to average programmers in a way they can readily wrap their mind around and put it into use. Then after much of what we were talking about will start to make more sense. The discussion with @IadixDev (aka @NodixBlockchain) has been about the merits of compile-time (static) typing. @IadixDev tries to claim there is little to no benefits of having a compiler (presumably because he is not that knowledgeable about compilers and is proud of his dynamic framework which does not employ a compiler). In the Concurrency thread he also tried to downplay the importance of eliminating opportunities for creating contention amongst multiple threads, but that is an orthogonal topic. @keean and I think there is some value to compilers and are trying to nail down which compiler abstractions are best to focus on. @keean seems to want a kitchen sink of all possible abstractions. I am more cautious, but we are trying to reach some clarity of which things the compiler should prove and abstract and which it should not. I am trying to today to wrap my mind around it more holistically and see if I can make any deep insights. I should clarify that @keean does want to find unification of abstractions (to avoid a kitchen sink and to avoid having more than one way to code the same algorithm), but I have argued in some cases that the proposed unification (e.g. merging modules, products, and co-products) creates a kitchen sink of possibilities that is not desirable (in fact today I want to review our modules vs. typeclasses discussion). When I use the word ‘abstraction’ that probably does not register in your mind. What we mean is any concept that the compiler creates above the level of raw machine. So which concepts are important? We are trying to decide.
American slang jargon meaning throwing everything in because you can, not necessary because it is wise to do so. |
Anyway I dont think i will be posting much here, I mostly get called irrelevant while getting little useful thing to get where I want to get at, and I already solved all the pb you seem to struggle into, and you make more strawman than anything relevant to my design, code, or objective. I get more useful infos looking in google for thesis from expert who have exact same approach than me rather than trying to argue with someone who obviously dont understand many of the problematics involved with what I want to do, and only give solution that are worst on all plans to do what I want to do, a'c that's confirmed by 100% of expert littérature I could find on the topic, and you are not even really into discussing but hammering your mantra like a robot. |
Really your problems so far seem to be application level. If you can implement your solution in 'C' you should be able to implement it in this language. If we can make it easier for you to implement than in 'C', allowing you to have better structure/abstraction, have more readable/maintainable code, and giving stronger guarantees about safety, lack of bugs, and lack of exploits we will have succeeded in what we are trying to do.
I don't think you really understand what we are suggesting as a solution. When you see the Haskell code I wrote, you say it looks like your solution, yet that is enitrely statically typed, and in fact what we are proposing here has more dynamic typing features than Haskell. I agree with a lot of the final conclusions of the thesis you posted, although they didn't have a clear understanding in the beginning and tried some obviously wrong things (like implementing polymorphism without type variables). I can only conclude that you either don't understand what they wrote or are not expressing yourself very well, because I find myself disagreeing with various points you make, yet I agree with the authors of the paper. If we both agree that the solution offered in the paper looks reasonable, then we could instead discuss how that would integrate with this languages other features, which although not directly supporting what you want, provide facilities that we want. I want to remind you that this issues section is for dicussing this language, not your project. You should setup your own github repo, and create an issues section for discussions about your solutions, and any potential language design you want to implement coming out of that. If you want to help us make sure this langauage can do the sort of things you want, then I would be interested in continuing the conversation. |
@keean wrote:
I suggested @NodixBlockchain discuss his ideas with you, so I am responsible for him coming here. I am happy he came here. I not dissatisfied with the outcome. I just wanted to see a shift in his understanding and communication by now. I hoped a better synergy could be developed by now. The discussion was quite useful and appreciated up to a point, but by now it seems you (@NodixBlockchain) are not understanding us. Btw, I told @keean in private when you started posting that I had difficulty with your communication but that you were obviously very productive. I do not think either of us are claiming that you are not prolific in coding.
Agreed. If @NodixBlockchain can comprehend static typing and start to explain specifically (not dubious/incorrect generalized objections repeated over and over as form of handwaving) how static typing is totally useless and harmful for what he needs (not arguing that not everything can be typed, which is not an argument against static typing), then that becomes a useful discussion. From there, he will learn better how static typing fits into his use cases. Discussing specific use cases and specific code is useful. Who knows, we might even learn something also, but it is much more likely that @NodixBlockchain learns than you do (because @keean is already quite expert and experienced) yet it is possible. The ability to communicate specifics concisely with cogent code samples is one of the indicators of a high IQ and encourages other smart people to want to be involved in the discussion. I realize English is not @NodixBlockchain’s native language and I have forgotten all my French (and Latin) from high school (was nearly marginally fluent at that time both verbally and written). I think all smart people tend to really appreciate those who can teach them something new, especially when the overhead of getting to that learning experience is not too noisy. We also appreciate doing the teaching, when there is progress. When learning stops and everything becomes redundant mistakes, then teachers can start to ponder the opportunity costs. I guesstimate the cognitive dissonance here is partially because @NodixBlockchain has a steep learning curve on static typing and so for the moment he is in Dunning-Kruger mode (at least on the holistic aspect of deep understanding of type theory and its applications). He is obviously knowledgeable about some low-level (closer to the hardware) details of computer science. I suspect he is just beginning to get exposure to the sort of topics discussed on Lamda-the-Ultimate and I remember when I first landed on that site in 2009, I got chewed out for not understanding that subclassing (OOP) is an anti-pattern. I had so much to learn and I still do not really have a good formal education of programming language type theory. A valid argument against static typing which I have made on these issues threads, is that no static abstraction can be 100% leak-free. Meaning that eventually we always end up having to violate the invariants of the abstraction, or moreover that it becomes very convoluted to maintain type safety and have complete degrees-of-freedom of expression. So then static typing becomes a negative factor for productivity instead of a positive one. This is why selecting which abstractions to enforce with a compiler is an engineering design challenge (arguably even perhaps an art) of matching to use cases. @NodixBlockchain wrote:
You have not solved the goals that one justifies employing static typing (“better structure/abstraction, have more readable/maintainable code, and giving stronger guarantees about safety, lack of bugs, and lack of exploits”). You have not solved reduction of contention in concurrency (i.e. increasing parallelism). I can not claim your framework is useless though. I am guessing the other reason for the discord is because of @NodixBlockchain’s incorrect overconfidence that he is correct. Again this ostensibly stems from lack of knowledge about static typing theory, combined with an attitude that oneself must be correct and the other guys must be wrong.
I suggest @NodixBlockchain go posting on LtU and then he will see how they will roast his butt with snide remarks (as they did to me). We are at least trying to teach and be nice. But there has to be progress. We do not desire to admonish you @NodixBlockchain. We simply want progress. I mean of course @keean and I want more collaborators (I can interpret his desire from his prior post). But we need synergy also. Otherwise the Mythical Man Month communication overload and discord takes over.
You have yet to show us such an expert. Your incorrect overconfidence is your weakness. You would be wise to learn more and lower your estimation of your skill level. I wish you would be more amicable about acknowledging learning value you received from our effort and discussion. A win-win result would be more inspiring. But if you continue to insist that you have solved everything and our work is silly, then I suppose this will end not so inspiring. |
@keean wrote:
What @keean means here is that your use case is not a refutation of static typing. Like all application level programming, your use case is amenable to some benefits from static typing. Your use case is not a valid refutation of static typing in general. You might be able to formulate valid arguments for and against certain static typing abstractions for your use case though (but this requires you to have good base of knowledge about static typing paradigms) and in that way further the design considerations for our language design process.
I have not had time to read that paper he linked. Is any tl;dr possible? What relevance do you see to our past discussions? |
Unlike most programer , I understand better cpu & Electronics than high level language ;) I never have pb understanding 5000l of assembler, even debuging bare metal with machin code doesnt disturb me too much lol I will write on the site more than the git. But im more in brain storming mode rather than yet having definitive choice. So for the moment I'm collecting information, and trying to find system who are close enough. But cant find a lot of things on distributed objects, especially taking blockchain in account which solve lot of problematic that seemed insolvable before. But my plan originally was something like micro kernel + runtime & maybe drivers and some other low level things in C, and then high level language or script above. With blockchain it changed a bit the perpective, and it add a lot to the problematic of distributed system, but cant find good analysis of this. The closest is could find is the paper on boosted transactional database + emerald, and some stuff also posted by shelby and some others on BCT, relative to game theory, and blockchain economics & general logic. Which lead to blockchain security level with pow dépend highly on the coin value on markets. Could find some papper suggesting boosting emerald with transactional objects, but doesnt seem it has been done or pursued seriously. Even the guy who invented c++ say something along these line in "master mind of programing" with emerald to solve the problematic of distributed objects and say himself c++ is not necessarily made with distributed system in mind. That's one example of expert saying this same thing. Among dozains of other experts and paper that are tl;dr. Rigid static typing based on compiler/language native type is the enmy of distributed systems. Emerald is the mother of all language / lib / runtime dealing with distributed system. But Im on good track, I have all the good base elements with modules / interfaces & script + rpc, to get where I want :) Im filling my brain with everything I can read on rmi/rpc & high level OO language. My idea with distributed system & // computing is also two folds, there is the side acessing remote resources localized in one place with limited access, which was more the objective originally, but blockchain offer much more possibilities than this, and the // processing is not necessarily the main point, but rpc/rmi have same problematic than asynch // computing from the language perpective, but with added difficulty that call has to go through network layer and transparent cross language interface, hence requiring dynamic transtyping, such as dcom or rpc. Cant find a single example of something akin to dcom/rpc based on 100% static typing. And the most used ones are probably using php/wsdl, & js aka 100% dynamic typing. The distributed system based on language with (pseudo) static type like java or c++ are only middly used. Could easily conclude that all attempt to make distributed system based on 100% statically typed language like c++ and java are not very successful, even with the added runtime & libs to extend the language, which add lot of complexity to the code, and turn it to be barely useable. Distributed system based on dynamic typing are much more successful. And it's not so much about "my project" than finding the best solution to solve this problematic. But im reading everything I can find on high level language design, and distributed system, until I get the eureka moment where everything fall in place. |
@IadixDev wrote:
Some evidence of that. P.S. @keean has a degree in Electrical Engineering and I started out in that major as well. I also did a lot of tinkering with electronics (and other forms of non-software hacking) in my youth through my 20s, but gradually moved more exclusively towards software. |
I have zero degree :) Been two week at university, most depressing time eva ! Like teacher explain a+b, me like when do we get into opengl , kernel hacking ? And all the student in the class, pfiuuu it's too hard. I finished the home work of the week before the end of the course while the teacher explain a+b lol No, no way i stay there lol I infiltrated the 3 year latter course, everyone is sleeping, nobody care, im the only one who follow lol I said bye university lol Then I moved to Paris and made a startup with video streaming plugin, got 150k of funding from usa :) Met some very smart guys from mit or wall street, and learned com, atl , hacked windows kernel to the Bone lol It's why I know lot about what's under the hood, even better than most engineers :) Most people dont even look into dll, application runtime, com/atl , and lot of things I debuggued at assembly level. It's why I know this stuff :) But im staying aloof from social networking, each time it turn out to be waste of time and arguing with formatted drones lol Even once was trying to understand opengl drivers, dissambled some dll, and went to dri chan to ask questions about certain ooeration, the guy from dri dev on linux was like how can you know this about windows drivers while even me i dont know this lol i was like I read the dll source code man :) All the time i tried to get on social networking it's often waste of time and people rehashing informations I can already find easily on google thinking it's the alpha and omega of computer science lol Even the struggle to get across this idea of cross platform binary , I have a 3 pages thread on devos where everyone is arguing in chorus that it's not possible, even with all the code source and working example in front of their eyes lol that quite look like brainwashed drones lol It's why now I see quickly if people are really bringing me new infos, or rehashing stuff I already hacked to bone inside & out 15 years ago lol Im rather used to figure stuff out on my own, and not seeking for validation from academicians because they live in an abstract bureaucratic world of certitude based only on convention between themselves. I already know where I want to get at, and I already read all code source of linux kernel, x11, gfx drivers, everything lol If you want to post linus torvald thing ok, but then be ready for in depth discussion after, because I know this stuff like my pocket, made bootable opengl linux in 4mo 10 years ago. If you re just going to post this and then say tl;dr after, it represent zero interest for me. And im not into this for the fame, or to make friends, or get popular among recognized academic figure :) Im more like clouds in ffvii cut out the crap about your crew and your beautiful ideal and your model, let just blow the mako reactor down, and then next one, until shinra is done :) and I dont care about how impossible you think it is :) If you cant understand my motivation, I dont either :) it's much more intricate and personal / ptsd than this :) But im not shopping until I get the stuff done :) |
@NodixBlockchain I used to think the same way as you, and i took electronics engineering because I thought I knew everything about programming already, and wanted to do something related but different. Years later, Haskell got me into type theory, and the idea that code is both a program and implied logic statements that can be statically proved. My conclusion is that people that think they know everything infact know nothing, and that working code is much more important than talk. There is no point in discussing things here that don't directly lead to code changes in this project. The most constructive thing you could do is to try and understand how the language and type system we are proposing works, and how it could be used to solve the problems you are concerned with. We can then discuss how we could modify the language or type system to handle your use case better, or if we think that is something outside the scope of this language. |
@NodixBlockchain wrote:
I differentiate sincere well focused collaboration such as is ongoing here between myself and @keean, and the trolls in more populated forums. And for the latter, I have proposed a decentralized solution for that. I also coded the first two versions of my WordUp software (several 720K floppy diskettes of code) in 68000 assembly code in that late 1980s. But then I switched to C for version 3 and my productivity and clarity of expression soured by orders-of-magnitude. I hand-optimized many the low-level Painter natural media simulation algorithms in 1993. The UK version of WordUp is pictured below: In my 20s, I had the eye sight and patience to deal with all that tedious noise of assembly code, but it certainly was not maximizing my productivity as a creator (might maximize cracking and reverse engineering which is a youthful ego/curiosity phenomenon though). One wise old fart with a backhoe can dig canals faster than a 100 naive (boastful/overconfident) young guys with spoons. Here the backhoe is HLL and the spoon is assembly code. Your energy level and LLL knowledge could potentially be valuable in our work if you can learn to apply them efficiently (i.e. more cogent and less rambling verbiage) within more efficient HLL paradigms. Although I tend to think that most people do not reinvent themselves much, although I do challenge myself to do so and have done so to some extent throughout my life (although some core personality attributes are impossible to change because they are what make me who I am, e.g. my tenacity and willingness for confrontation extroversion. although I can teach myself that I am more efficient when I am more selective about when I do versus introverted focus on tasks). @keean wrote:
You are much more erudite in this area than me. |
@shelby3 wrote:
https://steemit.com/bitcoin/@qed/bitcoin-startup-interview-questions-from-steemians Maybe in the future I have will more time to look into what is being discussed at the blog. Edit: also sourced from @keean: http://lambda-the-ultimate.org/node/5003#comment-94645 |
Hello :)
I met shelby on the forum bitcointalk, i'm developping a framework and script engine to develop distributed application based on blockchain and html5/js, he talked about this github and zenscript, and i've been reading it for a while and there are lot of interesting discussion related to where i want to get at :)
To explain shortly, my idea stem from developing web browser plugins, and learning ATL - COM - Activex, XPCOM, and DOM, and how atl /xpcom use IDL files to define an interface that can be compiled to different languages, and then object in the DOM tree are defined as component implementing the interface defined in the IDL, and bound to html entities as object in javascript, in sort that you can 'embed' an ATL component in the html document and call the function of the c++ interface in javascript from the browser engine.
What i found extremely frustrating with those system is that despite the whole high level definition of interface, there is not really much cross platform binary compatibility, due to combination of proprietary thing for COM, and ideological obsession with open source and not caring about binary compatibility with C runtime etc on unix for XPCOM, there is not really any cross platform binary compatible version of such kind of components.
And there is always the problem with vanilla C/c++ that pointer are very crude, and can't get any metainformation of them, if they are from heap, from the stack, what size there is allocated after them, what kind of data it point to (f you don't know the type from compile time definition. Which can be very bothering on many levels.
Charm ( https://en.wikipedia.org/wiki/Charm_(programming_language) ) is a good example of what i want to get at, with this concept of object encapsulate as module and importable from each other, with simple synthax almost close to basic, but still easily compiled to assembler.
Node.js can have similary objective, but to me it use too much resource, and not good enough with binary data and/or linear algebra, simd, and threading, and the garbage collector thing is also annoying to me.
The other model is also language such as AS3 or flex builder/android sdk based on eclipse, with the XML UI-application-service definition, and i found this kind of application definition based on xml entities mapped to object dynamically at runtime very useful and clean.
My original idea was to develop a framework or script engine that is 100% event oriented, if possible stackless, based on asynchronous message queue, in sort that each function can be called regardless of context whenever a certain request or event need to be processed, and avoiding linear execution flow based main loop & stack, but rather on posting request and dispatching the processing and result/error handling aysnchronously like AS3 with green threaded event listeners that force asynchronicity on everything.
Even the android sdk tend to works like this, and javascript too, and force certain number of thing as background task, in AS3 it's mostly forced because all function are asynchronous, which keep the main UI processing always low latency.
And it's one of the idea i want to get at, keeping low latency main loop event dispatching, and posting asynchronous request using dynamic type with object instantiable at runtime from json or xml, and still being close to CPU and compiling to dynamically linked binary executable.
In this course i developed my own ABI with position independent code, support for dynamic linking, and made a tool to convert .so & dll files to this format, complemented with the dynamic typing system, and banning all call to compiler specific C runtime & libC, it allow to have operating system agnostic binary module who export API as functions taking reference pointer to dynamic object as arguments, which make perfectly portable API, and with native support for json object definition, it allow to easily implement JSON/RPC interfaces useable from javascript, which is also useful to program blockchain nodes.
To summarize, my plan is to have
portable binary module who support interface that use dynamic type as argument, those dynamic object can be instantiated from json definition, and json definition can be made from them, as well as other form of serialization.
Loopless/lockless/stackless function definition encapsulated as binary module exports or script routine.
safe memory with internal allocator, lockless reference counter, memory leak detection, explicit dynamic typing with runtime access check etc.
Transparent lockless multi thread as much as possible ( the technics are explained in the site 1024cores that shelby posted before here, i took most of the design for lockless list of object references from there).
defining network protocol message handlers, and integrating it into the event based framework as components definition.
If possible useable to boot on baremetal micro kernel, to be used for PI /ARM devices.
The problem is i didn't really find a language who could really allow to have all this for the moment, so the first part was already doing the low level ugly work from C, i took C because anyway most kernel and drivers / operating system are made in C, so anyway for any language that has to use kernel API, harware, interupts or other it need some kind of glue code in C, so i started from there, and developed the system of dynamic object.
I'm not very familliar with haskell, i have tried to get into it a bit but i don't really get it yet, but from what i can understand, i think my idea is close to the idea of monad with haskel, which are base placeholder for what i called a 'node' , which can be assigned a type and a name and data, and a list of children that are pointer to a reference of this same monad with type/name/data.
All access to the node/monads data from the C are 'semi monomorphized' (semi because only the type of the variable that need to be read or wrote is monomorphized) , and the tree system can already convert most simple non composed type (strs/int/hashes/float/vec3/mat3 etc) to each other transparently, and converted to josn. Node/monads can created with specific composed type if they contain a certain pre defined collection of child nodes.
From the C compiler stand point, the type of those 'monades/nodes' is completly opaque, the compiler just manipulate reference pointers, but all the 'leaf data' of nodes is associated with an explicit type and there are monomorphized function to read/write their value to a desired type with automatic conversion from the stored type to the destination type (again only for simple non composed types).
the interface for this tree is defined there :
https://github.com/NodixBlockchain/nodix/blob/master/libbase/include/tree.h
And such kind of 'monad tree' can be instantiated from json, and i added the possibility to add an explicit type to json object or keys definition that will be used by the node/monad instantiated from it.
The script language looks like this for the moment
https://github.com/NodixBlockchain/nodix/blob/master/export/nodix.node
I'm also using the script to define 'website' as collection of methods that can be called via 'http://xx.com/script.site/method/param1/param2' a la codeigniter, and can be used to generate html, and embed javascript variables into the page from the dynamically typed nodes/monads.
the script to generate webpage looks like this :
https://github.com/NodixBlockchain/nodix/blob/master/export/web/nodix.site
Like this the objective is to have standalone portable binary who replace whole stack of software such as apache/php/mysql/blockchain nodes/nodes.js with single node who can generate dynamic page from blockchain data, use javascript crypto & signature via web browser, and also implement JSON/RPC API to make more complex interactive HTML5 applications.
The script language is completely stack less, parameters are instanciated as a local object associated with the function, and it's supposed to be used to define endpoint for event handling on dynamically typed objects, either they come from the binary P2P protocol from blockchain, or from HTTP/JSON requests.
Well it's just to introduce 'quickly' where i want to get at, but i see many time the discussions on this git seem to be revolving around same kind of issues i'm trying to solve too, with dynamic typing, and cross platform / language module definitions, with good support for scaling, and still integrated fully with DOM/js.
I'm still quite early in the design, well i already have lot of stuff working and well developed on the low level side, but it's the discussion with shelby on btctalk and reading stuff here that sparked me to get started on the script engine itself, as it's also much better to introduce the framework than low level C code.
The high level language is still very simple and lacks a lot of things, but i don't have that much experience with high level language like haskel or rust, or all the problematics that can be involved with component/modules interface definition based on dynamic typing, and how to schedule execution flow based on event handlers etc.
Well I'm still crunching stuff and debuguing for the moment, and i'm also making a website a bit to explain things, and give more documentation and news. I should come up with in next week let say.
Well i hope it's not too long :D
The text was updated successfully, but these errors were encountered: