Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Search document operations #10568

Merged
merged 6 commits into from
Mar 13, 2020
Merged

Adding Search document operations #10568

merged 6 commits into from
Mar 13, 2020

Conversation

tg-msft
Copy link
Member

@tg-msft tg-msft commented Mar 12, 2020

This adds Search, Suggest, Autocomplete, GetDocument, and IndexDocuments. It also adds custom user schema serialization/deserialization.

If you're in a hurry, please prioritize these:

Apologies for the massive PR. I've sliced it up to be easier to review commit-by-commit:

  1. Implementation
  2. Generated code
  3. Readme/samples
  4. Tests (mostly skim infrastructure)
  5. Recordings/swagger

Please specifically call out anything you consider a release blocker. I'll be filing tracking bugs for all my TODO: XXXXX comments and updating the PR with bug numbers later today.

/// and debug/re-run at the point of failure without re-running
/// potentially lengthy live tests. This should never be checked in.
/// </summary>
public bool DEBUG_ONLY_SaveRecordingsOnFailure { get; set; } = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead flow them into the TestLogger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't save them though, right? I want them saved so I can do a dotnet msbuild /t:UpdateSessionRecords and keep re-running the test while I figure out why the damn thing won't parse.

A very common scenario with this PR was letting a test run for a minute plus and return successfully from the service only to fail on the final parse in some uniquely odd way. Being able to replay them even if the test didn't pass was super useful. If we really don't want to check this in, I can just add it back manually whenever I need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about #if DEBUG around this property to make it truly debug only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion. I'll make the name slightly less visually gruesome then since the CI will catch this for us if we slip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -75,6 +75,10 @@ public void Intercept(IInvocation invocation)

try
{
if (methodInfo.ContainsGenericParameters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added and commented better

Diagnostics.LoggedHeaderNames.Add("return-client-request-id");
Diagnostics.LoggedHeaderNames.Add("throttle-reason");
Diagnostics.LoggedHeaderNames.Add("User-Agent");
Diagnostics.LoggedHeaderNames.Add("x-ms-client-request-id");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't some of these in default options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably - I'll go double check. Might be worth adding a unit test that these are always unique too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added a unit test

/// <param name="reader">The JSON reader.</param>
/// <param name="expected">The expected token type.</param>
public static void Expects(
this ref Utf8JsonReader reader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use in when reader is not mutated in a method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...though it would pass by value. If the object is larger than a ref (I don't know if it is), is that really what we want for perf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

{
// Holds document content, clear it before returning it.
rented.AsSpan(0, written).Clear();
ArrayPool<byte>.Shared.Return(rented);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Return has bool clear parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I thought the same thing when I saw that at first. It uses Return to clear earlier in the code, but here it only clears however much it's used in case you only used a couple of bytes in a giant buffer.

Copy link
Member

@ahsonkhan ahsonkhan Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Return has bool clear parameter.

That would clear all of rented which might be up to 2x larger than needed (in the worst case).

/// </typeparam>
/// <param name="json">A JSON stream.</param>
/// <returns>A deserialized object.</returns>
public static T Deserialize<T>(this Stream json)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you CopyTo a memory stream here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed this code at Ahson's suggestion when we were talking about doing a CopyTo MemoryStream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wonder if this amount of complexity is warranted here and if it's worth the support cost later...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair question. I'm going to leave it today and have that conversation with you and Ahson in the near future when I pull this into Shared Source since anyone else supporting custom user schemas is going to need it.

[CodeGenClient("Documents")]
internal partial class DocumentsClient { }

// Work-around the generator not enjoying mixing model types between the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - added a comment to help track

directive:
- from: swagger-document
where: $.definitions.QueryType['x-ms-enum']
transform: $.name = "SearchQueryType";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you rename here because you didn't want to copy all the values? [CodeGenSchema] should be supported on enums now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - I just couldn't get it working with CodeGenSchema when I tried. Let me check again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

/// <param name="message">The error message.</param>
/// <param name="errorCode">The error code.</param>
/// <param name="additionalInfo">Additional error details.</param>
partial void ExtractFailureContent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature looks very unusual, should we make this method virtual?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for something with zero cost to everyone else who didn't need it and trying to be minimally invasive to code I don't own. I'm happy to change this to whatever you'd prefer it to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets do virtual, I think JIT can optimize this call away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted offline about this and decided to stay with what I have here for the short term. It's currently a sealed type and we'll rethink the approach for customizing error messages as the distributed tracing plans evolve.

@@ -45,7 +46,8 @@ public static bool TryGetTextEncoding(string contentType, out Encoding encoding)
if (contentType.StartsWith(textContentTypePrefix, StringComparison.OrdinalIgnoreCase) ||
contentType.EndsWith(jsonSuffix, StringComparison.OrdinalIgnoreCase) ||
contentType.EndsWith(xmlSuffix, StringComparison.OrdinalIgnoreCase) ||
contentType.EndsWith(urlEncodedSuffix, StringComparison.OrdinalIgnoreCase))
contentType.EndsWith(urlEncodedSuffix, StringComparison.OrdinalIgnoreCase) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of OData variants

Copy link
Member

@heaths heaths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually done, but GH's PR experience is slogging my browser. Going to pick this back up in Code with the SearchIndexClient.

/// <summary>
/// Flag you can (temporarily) enable to save failed test recordings
/// and debug/re-run at the point of failure without re-running
/// potentially lengthy live tests. This should never be checked in.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not define an environment variable, then? They only make sense for and impact recordings anyway, so no impact to production. Or was it to avoid restarting VS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly to avoid restarting VS but also partly to make it clear when you're doing something dangerous. I'd worry about leaving that env var on by accident.

// Make sure Response<T> is a concrete type
if (returnType.IsGenericType &&
returnType.GetGenericTypeDefinition() == typeof(Response<>) &&
returnType.ContainsGenericParameters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just more of a question, but why is this last check necessary? I can't imagine the code would compile otherwise. A generic type constraint is only valid in a type expression like you use in the line above and can't be returned except as a Type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, this was the situation where we've got something like SearchAsync which returns a Response<...> and we're trying to make sure when we flip from SearchAsync we don't grab Search and try to use its return type without making it specific to Hotel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more descriptive comments to all the bizarre changes I made in this file.

@@ -1,5 +1,6 @@
# Release History

## 1.0.0-preview.1 (2020)
## 11.0.0-preview.1 (2020-03)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supposed to be "(Unreleased)" until released, in which case it needs to be yyyy-MM-dd. See release guidelines for details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we supposed to change that in the PR before we release? This is that PR.

https://azure.github.io/azure-sdk/policies_releases.html has an example of yyyy-MM-dd but it doesn't say that explicitly and a bunch of this wave's releases were just doing 2020-03.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the full date on this auspicious Friday the 13th since that's the example in the policy doc.

adding a rich search experience over private, heterogeneous content in web,
mobile, and enterprise applications.

The **Azure Cognitive Search service** is well suited for the following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the bold type here and below on names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to make it easier to skim. This part is telling you about the service, the next part is telling you about the client, you can skip one or both parts if you just want to see the code.

- [Declare custom synonym maps to expand or rewrite queries](https://docs.microsoft.com/rest/api/searchservice/synonym-map-operations)
- Most of the `SearchServiceClient` functionality is not yet available in our current preview

- `SearchIndexClient` helps with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd put this above SearchServiceClient since,

  1. People will interact with it more.
  2. It's currently available (as you called out).
  3. It's lexigraphically first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - will change.

sdk/search/Azure.Search/src/SearchFilter.cs Show resolved Hide resolved
StringBuilder x => Quote(x.ToString()),

// Everything else
object x => throw new ArgumentException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use _ instead for the fallback case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using x in the exception though.

foreach (char ch in text)
{
builder.Append(ch);
if (ch == '\'')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about double quotes as well, or is that not possible in an OData filter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only single quotes are allowed as string delimiters in OData expressions.

@@ -198,7 +205,8 @@ public class SearchIndexClient
Debug.Assert(!string.IsNullOrEmpty(indexName));
Debug.Assert(pipeline != null);
Debug.Assert(diagnostics != null);
Debug.Assert(SearchClientOptions.ServiceVersion.V2019_05_06 <= version &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have been "asserted" (with exceptions) higher up the callstack already?

Copy link
Member Author

@tg-msft tg-msft Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (try to) use assertions at public facing boundaries and Debug at private boundaries. This is just a sanity check we should never hit. It's verified with an exception everywhere a user passes one.


#region GetDocumentCount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO: 🤮

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The region or the name or ...? I definitely do the region thing because a lot of the Storage files were MASSIVE.

Comment on lines +24 to +25
writer.WritePropertyName("message");
writer.WriteStringValue(Message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing the name and value together is faster. Across the board, minimize independent/standalone calls to WritePropertyName wherever possible.

Suggested change
writer.WritePropertyName("message");
writer.WriteStringValue(Message);
writer.WriteString("message", Message);

And also, just to re-iterate, use statically created JsonEncodedText for constant names.

I understand this is auto-generated, so probably the fix needs to go in the generator.

Comment on lines +28 to +29
writer.WritePropertyName("details");
writer.WriteStartArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here/elsewhere. Will stop mentioning it.

Suggested change
writer.WritePropertyName("details");
writer.WriteStartArray();
writer.WriteStartArray("details");

void IUtf8JsonSerializable.Write(Utf8JsonWriter writer)
{
writer.WriteStartObject();
if (Code != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the null check on Code but not on Message? Is writing {"code": null, ...} invalid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generated from the swagger specification that the service provides as a contract. "code" will often be null (today for Search - they're working on changing that) but "message" never will.

SearchError result = new SearchError();
foreach (var property in element.EnumerateObject())
{
if (property.NameEquals("code"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, use UTF-8 encoded bytes for comparison here for these string literals. Presumably this is already being tracked somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 66 to 67
writer.WritePropertyName("@search.action");
writer.WriteStringValue(ActionType.ToSerialString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
writer.WritePropertyName("@search.action");
writer.WriteStringValue(ActionType.ToSerialString());
writer.WriteString("@search.action", ActionType.ToSerialString());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed - all the other uses in the handwritten code are for writing a property before starting an array or nested object.

Copy link
Member

@ahsonkhan ahsonkhan Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those too can and should be fixed.

For example, in Models/IndexDocumentsBatch{T}.cs, this:

writer.WritePropertyName("value");
writer.WriteStartArray();

Should be re-written as:

writer.WriteStartArray("value");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change. Are these recommendations written down anywhere, btw? It'd be good to link to them from the C# guidelines if at all possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this, but it doesn't focus too heavily on performance/"best practices". We could add some explicit guidelines on how to use the APIs more optimally:
https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-migrate-from-newtonsoft-how-to#utf8jsonwriter-compared-to-jsontextwriter

Comment on lines +133 to +141
// The built in converters for JsonSerializer are a little more
// helpful than we want right now and will do things like turn "1"
// to the integer 1 instead of a string. The number of special
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment true? I am not following the intent here, but I don't think JsonSerializer does any type coercion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that happening... but maybe it was something screwy I did?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can chat offline so I can understand what you were seeing and what you intended.

object value = ReadSearchDocObject(ref reader);
list.Add(value);
}
return list.ToArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just return the list or do we need to call ToArray()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Track 1 was really aggressive about arrays here and have a lot of specific customer scenarios that matter. I know I'm already not meeting as many of them as I should by always returning object[] for this first preview.

Constants.NegativeInfValue => double.NegativeInfinity,
string text =>
// JsonReader's TryGetDateTimeOffset doesn't play
// nicely with time zones so we'll do our own parse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify on that please. What type of payload strings fail? TryGetDateTimeOffset follows the ISO spec.
https://docs.microsoft.com/en-us/dotnet/standard/datetime/system-text-json-support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the test cases was failing at this point - I'll dig into this as part of the cleaning up the JSON handling follow up work.

{
if (reader.TokenType == JsonTokenType.EndObject) { break; }
string property = reader.ExpectsPropertyName();
object value = ReadObject(ref reader);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK with recursion here? Could it be unbounded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service currently limits the nesting of objects in documents to 10 levels deep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad question, but the JSON document would have to be continuous/unending, right? Seems unnecessary to check depth unless we wanted to allow customers to not drag down perf with a lot of deeply nested docs. Seems the kind of thing we guage user feedback over time.

Copy link
Member

@ahsonkhan ahsonkhan Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was less to do with perf and more to do with avoiding unnecessary/unrecoverable stackoverflow with a potential payload like. Probably at 100-1k depth, the app would crash.
"{\"a\":{\"b\":{....}}"

If the service already has a limit and there is no way for someone to inject "bad" payload in (either maliciously or accidentally), then recursion is fine, since there is an implied bound :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it out of an abundance of caution, but I agree it's pretty unlikely.

{
return null;
}
reader.Expects(JsonTokenType.Number);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of this check to change the exception message and type of exception thrown? GetDouble already checks/throws.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - to change the exception message, but I'm going to remove these when we clean up the JSON handling in the near future.

public override JsonConverter CreateConverter(Type typeToConvert, JsonSerializerOptions options)
{
Debug.Assert(CanConvert(typeToConvert));
Type modelType = typeToConvert.GetGenericArguments()[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to not fail with IndexOutOfRangeException?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the way it's used today. I'll add a Debug.Assert to make sure it stays that way.

switch (reader.GetString())
{
case Constants.InfValue:
return double.PositiveInfinity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious to learn about why this special handling is required. Is this defined in the Search rest api somewhere? Can you share a link if you have it. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service implements an OData-compliant REST API and uses the Entity Data Model representations for data types. The representations for positive and negative infinity for type Edm.Double are "INF" and "-INF", respectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,2481 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some large JSON files. Do we generally check these into the repo (especially PagingStaticDocumentsAsync.json/PagingStaticDocuments.json/etc.)?

As an aside:
Are the contents of the async/non-async json files identical?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json files are test recordings and yes, we typically check them into the repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the async/non-async are often identical but not always so.

```

The request will succeed even if any of the individual actions fails and
return an `IndexDocumentsResult` for inspection. There's also a `ThrowOnAnyError`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding ThrowOnAnyError -- I'm not sure the explanation makes sense, I guess because this didn't used to be optional. The idea behind throwing IndexBatchException on 207 was to force partial failures to be handled; Otherwise there could effectively be data loss if any failed updates are ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to have a lot of fun digging into this one in more detail. We've been trying really hard not to add other types of exceptions so that using Track 2 means there's one thing you've got to worry about catching. Without a custom exception, we wouldn't be giving people the information to process/recover. This was debated a lot during Storage batching and we landed on this pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long-term solution is our "smart batching" idea; That makes this problem go away.

You're right that throwing on 207 without having extra info in the exception is an incomplete scenario, which raises the question -- Why have ThrowOnAnyError at all if we're never going to have IndexBatchException?

Copy link
Member

@heaths heaths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm done (browser wasn't keeping up as I scrolled through more and more files), changing response. Overall, nothing I would hold this very early preview back for, but some API issues that need to be addressed (some in the custom type definitions, but much more in the autorest.csharp project (not that I called those out; for example, collection properties should be publicly read-only) and never null (use LazyInitializer when it makes sense to).

/// not include any facet expressions via
/// <see cref="SearchOptions.Facets"/>.
/// </summary>
public IDictionary<string, IList<FacetResult>> Facets => _results.Facets;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be mutable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on comments above about tracking the generator on these.

{
if (reader.TokenType == JsonTokenType.EndObject) { break; }
string property = reader.ExpectsPropertyName();
object value = ReadObject(ref reader);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad question, but the JSON document would have to be continuous/unending, right? Seems unnecessary to check depth unless we wanted to allow customers to not drag down perf with a lot of deeply nested docs. Seems the kind of thing we guage user feedback over time.

/// <param name="reader">The JSON reader.</param>
/// <param name="expected">The expected token type.</param>
public static void Expects(
this ref Utf8JsonReader reader,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...though it would pass by value. If the object is larger than a ref (I don't know if it is), is that really what we want for perf?

this Stream json,
CancellationToken cancellationToken)
{
if (json == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Stream is abstract, I recommend json is null in case - though unlikely - someone overrides equality. In general, is null is preferred for reference types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

/// </summary>
internal static class Constants
{
// TODO: XXXXX - Switch constants to use JsonEncodedText
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: You might keep the consts for equality, but then add JsonEncodedText static readonlys for writing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can use the encoded text for equality as well, but I'll be following up separately on all of that.

@@ -35,11 +35,11 @@ directive:

// Document operations
"/docs/$count": $["/docs/$count"],
"/docs/search": $["/docs/search.post.search"],
"/docs/search.post.search": $["/docs/search.post.search"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about these changes since I'll be updating this block to add in indexing. What didn't work the way you had it before (which we discussed, so I thought I understood what you were doing)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually live endpoints. Bruce would know more about why they have both, but I decided not change them when I realized it was intentional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all comes down to OData-isms that actually help avoid common pitfalls of REST API design. We model the POST version of query APIs as OData "actions", which can have namespaces. In this case, the namespace is search.post and the action name is search. We support both the qualified and unqualified names for each action mainly for demo-ability and brevity.

People first evaluating an API tend to balk at things like indexes('myindex')/docs/search.post.search but are comfortable with indexes/myindex/docs/search. However, the issue with indexes/myindex/docs/search is that somebody could index a document whose key is search. In this case, we can disambiguate based on HTTP verb to tell whether it's a Search or a Lookup request, but in general the name-clashing problem is real. By always using the fully-qualified OData syntax in the client libraries, we avoid these problems completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this was to basically eliminate the potential for name collisions in our libraries (i.e. this exact change; not the content itself)?

/// Join a collection of strings into a single comma separated string.
/// If the collection is null or empty, a null string will be returned.
/// </summary>
/// <param name="items">The items to join.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could actually take an IEnumerable<T> and be more permissive. Just use Count() instead (it's optimized for ICollection/ICollection<T>).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only users are internal IList<T> so I'm not going to bother yet. If anyone else needs this, then yes, let's use Enumerable's Count which does those type specific optimizations.

/// <returns>A collection of individual values.</returns>
public static IList<string> CommaSplit(string value) =>
string.IsNullOrEmpty(value) ?
new List<string>() :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to possibly be mutable? In other projects, I keep a type-cached array (it's what Enumerable.Empty does internally, but exposes as an IEnumerable<T> - I think Array has something though) for this purpose if immutable. Less memory overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - we can give it back to users, they can add some values, then send it back again.

sdk/search/Azure.Search/src/Utilities/SearchExtensions.cs Outdated Show resolved Hide resolved
[Test]
public void ManyArguments()
{
Assert.AreEqual("Foo eq 2 and Bar eq 3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit here and below: a purist would say "one assert per unit test". While I think that's overkill, what might be better to consider is using a parameterized test with these inputs and outputs as parameters. When a particular tuple asserts, NUnit (via VS, AzPipelines, etc.) will show a much more helpful message at a glance what failed. I have lots of examples in Key Vault if you want to see, but basically you could do something like this:

[TestCaseSource(nameof(ManyArgumentsData))]
public void ManyArguments(FormattableString actual, string expected)
{
    Assert.AreEqual(expected, SearchFilter.Create(actual));
}

private static IEnumerable ManyArgumentsData() => new []
{
    new object[] { $"Foo eq {2} and Bar eq {3}", "Foo eq 2 and Bar eq 3"},
    // ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where you can use constants, just use [TestCase("foo", 1)] for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of that was not being an expert in the interpolation compiler decision making about when it turns things into constants strings, etc. But this whole area needs a lot more love and I'd like to clean the tests up when I do that.

{
SearchRequest result = new SearchRequest();
SearchOptions result = new SearchOptions();
Copy link
Member

@ahsonkhan ahsonkhan Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use of var in foreach should be changed to be explicit (auto-gen issue).

{
writer.WritePropertyName("details");
writer.WriteStartArray();
foreach (var item in Details)
Copy link
Member

@ahsonkhan ahsonkhan Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Change var usage.

/// <param name="documents">
/// The collection of documents to index.
/// </param>
internal IndexDocumentsBatch(IndexActionType type, IEnumerable<T> documents)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem that some customers run into when building batches, and I'm not sure how far we want to go to try to solve it (not in preview 1 for sure, but something to think about for the future).

Some customers don't realize that T is supposed to be a document type, and instead pass a JSON string for the documents parameter. This compiles because T can be char, but the failure mode is really unhelpful. We might want to consider doing some client-side validation for cases like this.

/// </returns>
public static IndexDocumentsBatch<SearchDocument> Delete(string keyName, IEnumerable<string> keyValues)
{
IndexDocumentsBatch<SearchDocument> batch = new IndexDocumentsBatch<SearchDocument>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tg-msft If you don't like var, you really wouldn't like let 😉

/// found, 409 for a version conflict, 422 when the index is
/// temporarily unavailable, or 503 for when the service is too busy.
/// </summary>
[CodeGenSchemaMember("statusCode")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's anything we can do in the Swagger to reduce the number of customizations like this that you need to do, feel free to file an issue in azure-rest-api-specs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to align with Response.Status - I'm not sure other languages will be making this choice. That said, yes, we should sit down across and compare deltas across languages sometime early next preview.

string name = clone.ExpectsPropertyName();
if (name == Constants.SearchScoreKey)
{
propertiesNeeded--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service won't return duplicate JSON property names in any circumstance I can think of.

propertiesNeeded--;
ReadHighlights(ref clone, result);
}
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up -- We will be adding more properties here over time. One should be arriving hopefully before the final preview release.

// Clone the reader so we can get the search text property without
// advancing the reader over any properties needed to deserialize
// the user's model type.
Utf8JsonReader clone = reader;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader is a struct I assume?

Copy link
Member

@ahsonkhan ahsonkhan Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (ref struct).

/// Set this to true if you're not inspecting the results of the Index
/// Documents action.
/// </summary>
public bool ThrowOnAnyError { get; set; } = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree with false being the default here. If nothing else, it will probably catch a lot of users who migrate from Track 1 off guard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pattern we've had for Track 2 batching so far as discussed above. Since the exception doesn't have failure details, we don't want folks to do that by default.


// Ignore OData properties - we don't expose those on custom
// user schemas
if (!propertyName.StartsWith(Constants.ODataKeyPrefix, StringComparison.OrdinalIgnoreCase))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tg-msft Although we don't have such annotations now, in the future we may add property-scope annotations to the response payload. For example:

{
    "@search.score": 3.5,
    "myfield": "Hello",
    "myfield@search.someInterestingStatistic": 6
}

We wouldn't do this without a Swagger change, so you'd know about it in advance. Just a heads up since this code would have to change to handle that case.

switch (reader.GetString())
{
case Constants.InfValue:
return double.PositiveInfinity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["address"] = Address?.AsDocument(),
// With no elements to infer the type during deserialization, we must assume object[].
["rooms"] = Rooms?.Select(r => r.AsDocument())?.ToArray() ?? new object[0]
};
}

[SerializePropertyNamesAsCamelCase]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attribute still exists? Can we make it work in Track 2 with System.Text.Json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - that's using the Track 1 library in the tests do our index creation. We will definitely be coming up with some answer here in the near future that allows using naming policies.

@ahsonkhan
Copy link
Member

Just a heads up because I noticed it a few times in this PR. Try to avoid responding to comments/threads as part of a "bulk" review (and consider adding those standalone comment replies separately). Otherwise, your comments show up twice, both in the review feedback and the comment thread you are responding on, making it difficult to track what exactly the comment is responding to.

@tg-msft
Copy link
Member Author

tg-msft commented Mar 13, 2020

@tg-msft tg-msft merged commit d783d90 into Azure:master Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants