Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] "In" keyword relationship operator #62

Open
xieguigang opened this issue Apr 20, 2017 · 14 comments
Open

[Proposal] "In" keyword relationship operator #62

xieguigang opened this issue Apr 20, 2017 · 14 comments
Labels
LDM Reviewed: No plans LDM has reviewed and this feature is unlikely to move forward in the foreseeable future Proposal

Comments

@xieguigang
Copy link

xieguigang commented Apr 20, 2017

Generally, the word In make sense for these situations:

  • Dictionary has the Key?
  • Collection contains the element?
  • Number in a specific range?
  • File exists in a directory?
  • Resource exists in a specific source?

So that by introduce the existed In keyword As a new operator, then we can makes the vb code more simple and human natural Language:

Public Class TableType

    Dim table As New Dictionary(Of String, Integer)

    Public Shared Operator In(key$, table As TableType) As Boolean
        Return table.table.ContainsKey(key) 
    End Operator

End Class

Dim c As New TableType
Dim x$

If x In c Then
End If
Public Class CollectionType

    Dim list As New List(Of Integer)

    Public Shared Operator In(x%, list As CollectionType) As Boolean
        Return list.list.IndexOf(x) > -1
    End Operator

End Class

Dim c As New CollectionType
Dim x%

If x In c Then
End If
Public Class IntRange
    Dim range%()

    Public Shared Widenning Operator CType(l%()) As IntRange
        Return New IntRange With {
            .range = l
        }
    End Operator

    Public Shared Operator In(x%, r As IntRange) As Boolean
        Return Array.IndexOf(r.Range, x) > -1
    End Operator
End Class

' If [1: 10] means integer range 1 to 10, then we can
Dim x%

If x In [1:10] Then
End If 

And combine this new In keyword with the Linux bash file search API syntax in VisualBasic language:

https://github.com/xieguigang/GCModeller/blob/master/src/GCModeller/CLI_tools/eggHTS/CLI/2.%20DEP.vb#L196

to makes the VB style in more Natural Language

Public Class Directory
    Dim DIR$

    Public Shared Widening Operator CType(DIR$) As Directory
        Return New Directory With { 
            .DIR =DIR
        }
    End Operator

    Public Shared Operator In(file$, DIR As Directory) As Boolean
        Return (ls -r <= DIR).IndexOf(file.BaseName) > -1
    End Operator
End Class

' In another style
Public Class Directory
    Dim files As CollectionType(Of String)

    Public Shared Widening Operator CType(DIR$) As Directory
        Return New Directory With { 
            .files = (ls -l -r -"*.*" <= DIR)
        }
    End Operator

    Public Shared Operator In(file$, DIR As Directory) As Boolean
        Return file$ In DIR.files
    End Operator
End Class

Dim file$
Dim DIR As Directory = "C:\test"

If file In DIR Then
End If

If file$ In (ls -l -r -"*.xml" <= "D:\data\") Then
End If

' or even more simple
Public Class File
    Dim file$

    Public Shared Operator In (file As File, DIR$) As Boolean
        With file
            Return (ls -l -r - .File.ExtensionName <= DIR).IndexOf(.File) > -1
        End With
    End Operator
End Class

Dim file As File

If file In "C:\test\" Then
End If

Compared with the List.IndexOf, Dictionary.ContainsKey, In keyword operator just using 2 character in our code, simple and make sense.

@AnthonyDGreen
Copy link
Contributor

Like it!

@franzalex
Copy link

franzalex commented May 1, 2017

Much as I appreciate the thought that went into this proposal, wouldn't an extension method written for those specific cases serve the same purpose?

Here is a kludge that I just spent a couple of minutes to put together:

Module InExtensions

    Public Function [In](Of T)(value As T, seq As IEnumerable(Of T)) As Boolean
        Return seq.Contains(value)  ' utilizes LINQ's extension methods
    End Function

    Public Function [In](Of TKey, TValue)(key As TKey,
                                          dict As IDictionary(Of TKey, TValue)) As Boolean
        Return dict.ContainsKey(key)
    End Function

    Public Function [In](itemName As String, directory As IO.DirectoryInfo) As Boolean
        ' you can choose the depth of traversal to locate an item
        For Each item In directory.EnumerateFileSystemInfos("*", 
                                                            IO.SearchOption.AllDirectories)
            If item.Name.Equals(itemName, StringComparison.InvariantCultureIgnoreCase) Then
                Return True
            End If
        Next

        Return False
    End Function
End Module

With a set of extension methods like these in place, one can simply do those things you're mentioning. It also allows flexibility of implementation on the programmer's end as (s)he can choose how to go about checking the inclusion of any item within a larger set.

@Bill-McC
Copy link

Bill-McC commented May 1, 2017

@franzalex I agree but the In extensions you show would actually make more sense if they were named Contains.
And I think that highlights an issue with the In operator suggestion: for it to work it could only be applied to types that support a Contains method. And if that Contains method is actually supplied by an extension such as LINQ, then you'd be left scratching your head a lot when "If value In seq" failed because an import was missing.
And if it was only on types that have a defined operator it's real world usage would be unworthy of a language addition.
So although the In operator might look nice, I think the Contains methods where they exist do the job. Hard to justify spending on

@franzalex
Copy link

@Bill-McC You're right! I've updated the signatures of the extension methods so they really reflect the name In .

The sample for IEnumerable(Of T) is a quick implementation I wrote. One could eliminate the need for LINQ by using a For Each or using the GetEnumerator() method of seq to enumerate the items and check for equality.

On the following points, we seem to agree; I do not see the usefulness of the In operator if its function is already implemented by the Contains method or can be easily achieved via a trivial custom-made extension method.

@xieguigang
Copy link
Author

xieguigang commented May 1, 2017

Hi, @franzalex , @Bill-McC

On the literal meaning, the Contains function just return a single Boolean value. So that if we using the Contains extension method for this implementation just limited the use of this In keyword operator. Unlike the Contains extension method way, using the operator, its return value can no limited to Boolean type, example as the VisualBasic exclusive operator Like, this operator can returns no limited to Boolean but also all of the .NET type, here is a Double measure value return example:

    Public Structure Foo

        Dim s$

        Public Shared Operator Like(f As Foo, s$) As Double
            Return Levenshtein.ComputeDistance(f.s, s).MatchSimilarity
        End Operator
    End Structure

In my opinion, the In word not only have the meaning of Contains, and also have the Exists literal meaning. So that the return value of In will not just limited to a Single Boolean. For example, if I want to make such Vector style programming for finding the resource indices collection like Which operation, then Obviously the Contains Function Linq way will not working:

Public Structure StringVector
    Dim value$()

    ' Using the operator can returns no limited to a single boolean value 
    Public Shared Operator In(s$, strings As StringVector) As Boolean()
        Return strings.value.Select(Function(q) InStr(s, q) > 0).ToArray
    End Operator
End Structure

Dim s$
Dim V As StringVector

' The contains function interface can not satisfy the requirment for this operation:
Dim all_exists? = (s In V).All(Function(b) True = b)

' Gets all of the indices for the resources which is not exists target s
Dim not_exists_indices%() = Which.IsFalse(s In V)

' Gets all of the indices for the resources which exists target s
Dim exists_resource%() = Which(s In V)

Using Extension method is also have a disadvantage:

  • If we using Operator, then we just required imports one namespace for the implemented type
  • But if we using a extension method, then we must imports at least two namespace: one is for the implemented type, and another is for the extension method.

@Bill-McC
Copy link

Bill-McC commented May 1, 2017

@xieguigang The Like operator has special meaning within the language specifically for string pattern matching. This was from VB pre .NET. It actually when combined with LINQ provides really clear elegant solutions to some of the original examples you posted,:
eg From f in files Select f.Name Like "*pattern.xls"
etc.

The thing about Like though is it really is peculiar to Strings: sure it can be overloaded, but if you go through all the .net libraries and vb samples out there from the last fifteen years how many classes/structures can you find that define their own Like operator. If anything that tells us having to have classes that implement their own In operator would be a wasteland.

When LINQ was introduced, it was based heavily on extensions rather than requiring all existing types be re-written. And the worst past about having custom operators, especially if they are obscure and only in a couple of classes, is if they return different results, then the code has not become clearer, it's become one of those bits of code people have to pause on and check it's meaning: IOW, you'd have lost the clarity that was sought by introducing the operator.

The only way for it to be clear is to give it fixed meaning and have it implemented similar to LINQ. But having read some of your examples, I question if it would end up doing what you want if it must be a match versus a substring match or viceversa.

@zspitz
Copy link

zspitz commented May 25, 2017

I think that having a dedicated overloadable operator is overkill. However, I like the idea of an In keyword compiling to a .Contains call with a matching signature and reversed arguments, similar to LINQ's mapping of From x In lst Select to lst.Select(...), where Select can be either an instance method or an extension method.

Module InExtensions
    'Collection types are implemented using Enumerable.Contains method
    'Range can be implemented using Enumerable.Range and Enumerable.Contains

    'Dictionary types
    <Extension> Public Function Contains(Of TKey, TValue)(dictionary As Dictionary(Of TKey, TValue), key As TKey) As Boolean
        Return dictionary.ContainsKey(key)
    End Function
    
    'Filename in DirectoryInfo
    <Extension> Public Function Contains(di As DirectoryInfo, fileName As String) As Boolean
        Return fileName In di.EnumerateFiles(SearchOption.AllDirectories)
        'Return fileName In Directory.EnumerateFiles(di.FullName, SearchOption.AllDirectories)
    End Function
End Module

Then, the following will compile to the appropriate calls to Contains:

'compiles to dict.Contains(key)
'if the above extension method is in scope, it will be called
Dim dict As New Dictionary(Of String,Integer)
Dim key, As String
If key In dict Then
End If

'leverages List.Contains, an instance method
Dim lst As New List(Of Integer)
If 5 In lst Then
End If

'leverages Enumerable.Contains, an external extension method
Dim rng = Enumerable.Range(1,5)
If 4 in rng Then
End If

'leverages the above extension method on DirectoryInfo
'Using the strings directly wouldn't work, because how could the compiler differentiate between finding a filename in a folder, and finding a substring within a string?
If "test.txt" In New DirectoryInfo("C:\test\") Then
End If

@Bill-McC

then you'd be left scratching your head a lot when "If value In seq" failed because an import was missing.

Doesn't that happen anyway when trying to write a LINQ query without the appropriate Imports System.Linq?

@AnthonyDGreen
Copy link
Contributor

@zspitz because System.Linq is imported by default at the project level it's pretty hard to get into that situation.

@zspitz
Copy link

zspitz commented May 26, 2017

@AnthonyDGreen

because System.Linq is imported by default at the project level it's pretty hard to get into that situation.

If In would leverage Enumerable.Contains for IEnumerable<T>, System.Linq would cover the overwhelming majority of the use cases for In. It's only if someone wants to use a custom implementation for a specific type, in which case they'll have to worry about the namespaces.

@xieguigang
Copy link
Author

Still can not agree with the idea of using Extension method way.

  • As the operator overloads can returns any type, but the extension method only returns a single Boolean.
  • And also using the extension way for this implementation is not clear as the operator it does and sometimes it could mess up our code:

For example, if we want to determine that if each string element in s vector is in another string vector V, then by using operator Overloads, we can resulting a very simple expression:

' Using operator can returns a boolean vector at once
Dim b() = s In V
' But as the linq extension function just returns a single boolean at once, 
' the expression becomes much complicated 
Dim b() = s.Select(Function(t) t In V).ToArray 

Although the Linq extension is designed for processing the data collection in any .NET language in functional programming way, but

  • As the example shows, we wrap the contains in an In operator: Coding more at once and then less coding and with more clear code in VisualBasic, makes it Elegant
  • But when look at the Linq Extension way, you would found that it's much more complicated than using extension: Coding a little at once and then coding more on each call

Another situation is that, we usually using the private field for the data storage in a user type, so that we probably have a definition like:

Public Class Table

    Dim innerIndex As Index(Of String)
    Dim handles As Index(Of Long)
    
End Class

Due to the reason of the external extension method can not directly access this private field. So we have to add more code to change this private field to the public access, like:

' And probably to avoid the unexpected modification from outside of this reference type property
' So I needs to clone it in each property reference call
Public ReadOnly Property NameIndex As Index(Of String)
    Get
        Return New Index(Of String)(innerIndex)
    End Get
End Property

<Extension>
Public Function Contains(index As Table, name$) As Boolean
    Return index.NameIndex(name) > -1
End Function

And further more, if I want to write a extension method for another user type using handles field, and add more public access property/or function as the data source is required:

Public ReadOnly Property UidIndex As Index(Of Long)
    Get
        Return New Index(Of Long)(handles)
    End Get
End Property

<Extension>
Public Function Contains(index As Table, uid&) As Boolean
    Return index.UidIndex(uid) > -1
End Function

In such case, the user class type is becoming more and more heavy, and also the method is defined at external source file and required longer time to navigate to the definition and harder on using the IntelliSense. Using the operator can avoid such situation totally.

@zspitz
Copy link

zspitz commented May 26, 2017

@xieguigang

RE: private members -- You could write a Contains instance method on Table, which would have access to all the private fields.

Class Table
    ...
    Public Function Contains(index As Table, name$) As Boolean
        Return innerIndex.NameIndex(name) > -1
    End Function
    Public Function Contains(index As Table, uid&) As Boolean
        Return handles.UidIndex(uid) > -1
    End Function
End Class

Then, the following code:

Dim name = "abc"
If name In table Then
End If

Dim uid = 5
If uid In table Then
End If

would compile (transpile?) to calls to Contains, preferring the instance methods over any available extension methods (the compiler always prefers instance methods, all else being equal):

Dim name = "abc"
If table.Contains(name) Then
End If

Dim uid = 5
If table.Contains(uid) Then
End If

without a need for any additional members on the Table class.

As I said, the compiler uses the same mechanism to resolve LINQ keyword syntax to the underlying method calls.


RE: In returning non-boolean

I think the strongest use case for In is in a conditional context:

If uid In table Then
End If

But where the type of the result of the In operation is not a boolean, it cannot be used this way. If ... Then only accepts a Boolean or something which can be converted to Boolean, and the following code wouldn't compile:

's In V returns a Boolean(), as above
If s In V Then
End If

I'm not sure the additional cognitive burden -- sometimes the result of In can be used in an If ... Then statement, and sometimes not -- is worth the relatively minor use case where returning a non-Boolean would be useful.


method is defined at external source file

As I said above, if you have control of the class, you could write an instance method and not rely on an extension method.

and requires longer time to navigate to the definition and harder on using the IntelliSense.

I'm not sure why you think the IDE has to work harder in order to process extension methods. My guess is that all the available extension methods (marked with the ExtensionAttribute attribute) are stored in memory somehow, and the compiler makes use of this information when resolving method calls. Once the call has been resolved to a specific method, I don't think the actual navigation would take more time / CPU / memory. Same for Intellisense.

Even if this is true, I don't think the additional strain being placed on the CPU or memory by the IDE is a language design consideration. If it benefits the language, and it conforms to the standards of the language, then the IDE should simply have to adapt.

@jrmoreno1
Copy link

jrmoreno1 commented Apr 23, 2018

I agree that this could in the vast majority of the cases be done with an extension method (I have written just such an extension method for several projects).

Where extension methods wouldn't work, a better replacement than In would be select case expressions:

 a = Select Case Input
          case b: value_for_b
          case c: value_for_c
          case d: value_for_d
          case else: not_found
       end Select 

Now, obviously that is a bit heavy for including in an IF/Then condition, but you could assign it to a variable before the condition.

@KathleenDollard
Copy link
Contributor

This is reasonable, but has some things to consider:

  • Precedence
  • Currently VB (and C#) don't know about hashsets, so this wouldn't work where it most logical. For that we'd need to either have the language know about hashsets (an implementation detail) or have the BCL know about the In operator, which means we'd need to add it to C# too.
  • The above issue exists for many different set implementations.

Because the BCL is written in C#, this issue would have to start in C#. Feel free to propose this in https://github.com/dotnet/csharplang. It's possible that this will result from work on Range.

@KathleenDollard KathleenDollard added the LDM Reviewed: No plans LDM has reviewed and this feature is unlikely to move forward in the foreseeable future label Jun 13, 2018
@zspitz
Copy link

zspitz commented Jun 13, 2018

@KathleenDollard

Currently VB (and C#) don't know about hashsets

What do you mean when you say VB doesn't know about hashsets? Do you mean that if there was such a dedicated operator, it would have to be specially defined for the HashSet(Of T) type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LDM Reviewed: No plans LDM has reviewed and this feature is unlikely to move forward in the foreseeable future Proposal
Projects
None yet
Development

No branches or pull requests

7 participants