-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Search document operations #10568
Conversation
/// and debug/re-run at the point of failure without re-running | ||
/// potentially lengthy live tests. This should never be checked in. | ||
/// </summary> | ||
public bool DEBUG_ONLY_SaveRecordingsOnFailure { get; set; } = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead flow them into the TestLogger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't save them though, right? I want them saved so I can do a dotnet msbuild /t:UpdateSessionRecords
and keep re-running the test while I figure out why the damn thing won't parse.
A very common scenario with this PR was letting a test run for a minute plus and return successfully from the service only to fail on the final parse in some uniquely odd way. Being able to replay them even if the test didn't pass was super useful. If we really don't want to check this in, I can just add it back manually whenever I need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about #if DEBUG
around this property to make it truly debug only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. I'll make the name slightly less visually gruesome then since the CI will catch this for us if we slip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -75,6 +75,10 @@ public void Intercept(IInvocation invocation) | |||
|
|||
try | |||
{ | |||
if (methodInfo.ContainsGenericParameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure to add tests for new features to https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/core/Azure.Core/tests/ClientTestBaseTests.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added and commented better
Diagnostics.LoggedHeaderNames.Add("return-client-request-id"); | ||
Diagnostics.LoggedHeaderNames.Add("throttle-reason"); | ||
Diagnostics.LoggedHeaderNames.Add("User-Agent"); | ||
Diagnostics.LoggedHeaderNames.Add("x-ms-client-request-id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't some of these in default options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably - I'll go double check. Might be worth adding a unit test that these are always unique too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and added a unit test
/// <param name="reader">The JSON reader.</param> | ||
/// <param name="expected">The expected token type.</param> | ||
public static void Expects( | ||
this ref Utf8JsonReader reader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use in
when reader is not mutated in a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...though it would pass by value. If the object is larger than a ref (I don't know if it is), is that really what we want for perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
{ | ||
// Holds document content, clear it before returning it. | ||
rented.AsSpan(0, written).Clear(); | ||
ArrayPool<byte>.Shared.Return(rented); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Return has bool clear parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I thought the same thing when I saw that at first. It uses Return to clear earlier in the code, but here it only clears however much it's used in case you only used a couple of bytes in a giant buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Return has bool clear parameter.
That would clear all of rented
which might be up to 2x larger than needed (in the worst case).
/// </typeparam> | ||
/// <param name="json">A JSON stream.</param> | ||
/// <returns>A deserialized object.</returns> | ||
public static T Deserialize<T>(this Stream json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you CopyTo
a memory stream here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I borrowed this code at Ahson's suggestion when we were talking about doing a CopyTo MemoryStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wonder if this amount of complexity is warranted here and if it's worth the support cost later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair question. I'm going to leave it today and have that conversation with you and Ahson in the near future when I pull this into Shared Source since anyone else supporting custom user schemas is going to need it.
[CodeGenClient("Documents")] | ||
internal partial class DocumentsClient { } | ||
|
||
// Work-around the generator not enjoying mixing model types between the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome - added a comment to help track
directive: | ||
- from: swagger-document | ||
where: $.definitions.QueryType['x-ms-enum'] | ||
transform: $.name = "SearchQueryType"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you rename here because you didn't want to copy all the values? [CodeGenSchema] should be supported on enums now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - I just couldn't get it working with CodeGenSchema
when I tried. Let me check again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
/// <param name="message">The error message.</param> | ||
/// <param name="errorCode">The error code.</param> | ||
/// <param name="additionalInfo">Additional error details.</param> | ||
partial void ExtractFailureContent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This signature looks very unusual, should we make this method virtual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for something with zero cost to everyone else who didn't need it and trying to be minimally invasive to code I don't own. I'm happy to change this to whatever you'd prefer it to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets do virtual, I think JIT can optimize this call away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted offline about this and decided to stay with what I have here for the short term. It's currently a sealed type and we'll rethink the approach for customizing error messages as the distributed tracing plans evolve.
@@ -45,7 +46,8 @@ public static bool TryGetTextEncoding(string contentType, out Encoding encoding) | |||
if (contentType.StartsWith(textContentTypePrefix, StringComparison.OrdinalIgnoreCase) || | |||
contentType.EndsWith(jsonSuffix, StringComparison.OrdinalIgnoreCase) || | |||
contentType.EndsWith(xmlSuffix, StringComparison.OrdinalIgnoreCase) || | |||
contentType.EndsWith(urlEncodedSuffix, StringComparison.OrdinalIgnoreCase)) | |||
contentType.EndsWith(urlEncodedSuffix, StringComparison.OrdinalIgnoreCase) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of OData variants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually done, but GH's PR experience is slogging my browser. Going to pick this back up in Code with the SearchIndexClient
.
/// <summary> | ||
/// Flag you can (temporarily) enable to save failed test recordings | ||
/// and debug/re-run at the point of failure without re-running | ||
/// potentially lengthy live tests. This should never be checked in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not define an environment variable, then? They only make sense for and impact recordings anyway, so no impact to production. Or was it to avoid restarting VS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partly to avoid restarting VS but also partly to make it clear when you're doing something dangerous. I'd worry about leaving that env var on by accident.
// Make sure Response<T> is a concrete type | ||
if (returnType.IsGenericType && | ||
returnType.GetGenericTypeDefinition() == typeof(Response<>) && | ||
returnType.ContainsGenericParameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really just more of a question, but why is this last check necessary? I can't imagine the code would compile otherwise. A generic type constraint is only valid in a type expression like you use in the line above and can't be returned except as a Type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, this was the situation where we've got something like SearchAsync which returns a Response<...> and we're trying to make sure when we flip from SearchAsync we don't grab Search and try to use its return type without making it specific to Hotel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more descriptive comments to all the bizarre changes I made in this file.
sdk/search/Azure.Search/CHANGELOG.md
Outdated
@@ -1,5 +1,6 @@ | |||
# Release History | |||
|
|||
## 1.0.0-preview.1 (2020) | |||
## 11.0.0-preview.1 (2020-03) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supposed to be "(Unreleased)" until released, in which case it needs to be yyyy-MM-dd. See release guidelines for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we supposed to change that in the PR before we release? This is that PR.
https://azure.github.io/azure-sdk/policies_releases.html has an example of yyyy-MM-dd but it doesn't say that explicitly and a bunch of this wave's releases were just doing 2020-03.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use the full date on this auspicious Friday the 13th since that's the example in the policy doc.
adding a rich search experience over private, heterogeneous content in web, | ||
mobile, and enterprise applications. | ||
|
||
The **Azure Cognitive Search service** is well suited for the following |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with the bold type here and below on names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to make it easier to skim. This part is telling you about the service, the next part is telling you about the client, you can skip one or both parts if you just want to see the code.
sdk/search/Azure.Search/README.md
Outdated
- [Declare custom synonym maps to expand or rewrite queries](https://docs.microsoft.com/rest/api/searchservice/synonym-map-operations) | ||
- Most of the `SearchServiceClient` functionality is not yet available in our current preview | ||
|
||
- `SearchIndexClient` helps with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd put this above SearchServiceClient
since,
- People will interact with it more.
- It's currently available (as you called out).
- It's lexigraphically first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - will change.
StringBuilder x => Quote(x.ToString()), | ||
|
||
// Everything else | ||
object x => throw new ArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use _
instead for the fallback case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using x
in the exception though.
foreach (char ch in text) | ||
{ | ||
builder.Append(ch); | ||
if (ch == '\'') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about double quotes as well, or is that not possible in an OData filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only single quotes are allowed as string delimiters in OData expressions.
@@ -198,7 +205,8 @@ public class SearchIndexClient | |||
Debug.Assert(!string.IsNullOrEmpty(indexName)); | |||
Debug.Assert(pipeline != null); | |||
Debug.Assert(diagnostics != null); | |||
Debug.Assert(SearchClientOptions.ServiceVersion.V2019_05_06 <= version && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have been "asserted" (with exceptions) higher up the callstack already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I (try to) use assertions at public facing boundaries and Debug at private boundaries. This is just a sanity check we should never hit. It's verified with an exception everywhere a user passes one.
|
||
#region GetDocumentCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO: 🤮
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The region
or the name or ...? I definitely do the region
thing because a lot of the Storage files were MASSIVE.
writer.WritePropertyName("message"); | ||
writer.WriteStringValue(Message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing the name and value together is faster. Across the board, minimize independent/standalone calls to WritePropertyName
wherever possible.
writer.WritePropertyName("message"); | |
writer.WriteStringValue(Message); | |
writer.WriteString("message", Message); |
And also, just to re-iterate, use statically created JsonEncodedText
for constant names.
I understand this is auto-generated, so probably the fix needs to go in the generator.
writer.WritePropertyName("details"); | ||
writer.WriteStartArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here/elsewhere. Will stop mentioning it.
writer.WritePropertyName("details"); | |
writer.WriteStartArray(); | |
writer.WriteStartArray("details"); |
void IUtf8JsonSerializable.Write(Utf8JsonWriter writer) | ||
{ | ||
writer.WriteStartObject(); | ||
if (Code != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the null check on Code
but not on Message
? Is writing {"code": null, ...}
invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated from the swagger specification that the service provides as a contract. "code"
will often be null (today for Search - they're working on changing that) but "message"
never will.
SearchError result = new SearchError(); | ||
foreach (var property in element.EnumerateObject()) | ||
{ | ||
if (property.NameEquals("code")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, use UTF-8 encoded bytes for comparison here for these string literals. Presumably this is already being tracked somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writer.WritePropertyName("@search.action"); | ||
writer.WriteStringValue(ActionType.ToSerialString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writer.WritePropertyName("@search.action"); | |
writer.WriteStringValue(ActionType.ToSerialString()); | |
writer.WriteString("@search.action", ActionType.ToSerialString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed - all the other uses in the handwritten code are for writing a property before starting an array or nested object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those too can and should be fixed.
For example, in Models/IndexDocumentsBatch{T}.cs
, this:
writer.WritePropertyName("value");
writer.WriteStartArray();
Should be re-written as:
writer.WriteStartArray("value");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change. Are these recommendations written down anywhere, btw? It'd be good to link to them from the C# guidelines if at all possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this, but it doesn't focus too heavily on performance/"best practices". We could add some explicit guidelines on how to use the APIs more optimally:
https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-migrate-from-newtonsoft-how-to#utf8jsonwriter-compared-to-jsontextwriter
// The built in converters for JsonSerializer are a little more | ||
// helpful than we want right now and will do things like turn "1" | ||
// to the integer 1 instead of a string. The number of special |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment true? I am not following the intent here, but I don't think JsonSerializer
does any type coercion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that happening... but maybe it was something screwy I did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can chat offline so I can understand what you were seeing and what you intended.
object value = ReadSearchDocObject(ref reader); | ||
list.Add(value); | ||
} | ||
return list.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just return the list
or do we need to call ToArray()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Track 1 was really aggressive about arrays here and have a lot of specific customer scenarios that matter. I know I'm already not meeting as many of them as I should by always returning object[]
for this first preview.
Constants.NegativeInfValue => double.NegativeInfinity, | ||
string text => | ||
// JsonReader's TryGetDateTimeOffset doesn't play | ||
// nicely with time zones so we'll do our own parse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify on that please. What type of payload strings fail? TryGetDateTimeOffset
follows the ISO spec.
https://docs.microsoft.com/en-us/dotnet/standard/datetime/system-text-json-support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the test cases was failing at this point - I'll dig into this as part of the cleaning up the JSON handling follow up work.
{ | ||
if (reader.TokenType == JsonTokenType.EndObject) { break; } | ||
string property = reader.ExpectsPropertyName(); | ||
object value = ReadObject(ref reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK with recursion here? Could it be unbounded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service currently limits the nesting of objects in documents to 10 levels deep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad question, but the JSON document would have to be continuous/unending, right? Seems unnecessary to check depth unless we wanted to allow customers to not drag down perf with a lot of deeply nested docs. Seems the kind of thing we guage user feedback over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern was less to do with perf and more to do with avoiding unnecessary/unrecoverable stackoverflow with a potential payload like. Probably at 100-1k depth, the app would crash.
"{\"a\":{\"b\":{....}}"
If the service already has a limit and there is no way for someone to inject "bad" payload in (either maliciously or accidentally), then recursion is fine, since there is an implied bound :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it out of an abundance of caution, but I agree it's pretty unlikely.
{ | ||
return null; | ||
} | ||
reader.Expects(JsonTokenType.Number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the purpose of this check to change the exception message and type of exception thrown? GetDouble
already checks/throws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - to change the exception message, but I'm going to remove these when we clean up the JSON handling in the near future.
public override JsonConverter CreateConverter(Type typeToConvert, JsonSerializerOptions options) | ||
{ | ||
Debug.Assert(CanConvert(typeToConvert)); | ||
Type modelType = typeToConvert.GetGenericArguments()[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guaranteed to not fail with IndexOutOfRangeException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the way it's used today. I'll add a Debug.Assert
to make sure it stays that way.
switch (reader.GetString()) | ||
{ | ||
case Constants.InfValue: | ||
return double.PositiveInfinity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious to learn about why this special handling is required. Is this defined in the Search rest api somewhere? Can you share a link if you have it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service implements an OData-compliant REST API and uses the Entity Data Model representations for data types. The representations for positive and negative infinity for type Edm.Double
are "INF" and "-INF", respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a link to the relevant part of the OData spec: http://docs.oasis-open.org/odata/odata-json-format/v4.01/odata-json-format-v4.01.html#sec_PrimitiveValue
@@ -0,0 +1,2481 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some large JSON files. Do we generally check these into the repo (especially PagingStaticDocumentsAsync.json/PagingStaticDocuments.json/etc.)?
As an aside:
Are the contents of the async/non-async json files identical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json files are test recordings and yes, we typically check them into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the async/non-async are often identical but not always so.
``` | ||
|
||
The request will succeed even if any of the individual actions fails and | ||
return an `IndexDocumentsResult` for inspection. There's also a `ThrowOnAnyError` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding ThrowOnAnyError
-- I'm not sure the explanation makes sense, I guess because this didn't used to be optional. The idea behind throwing IndexBatchException
on 207 was to force partial failures to be handled; Otherwise there could effectively be data loss if any failed updates are ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to have a lot of fun digging into this one in more detail. We've been trying really hard not to add other types of exceptions so that using Track 2 means there's one thing you've got to worry about catching. Without a custom exception, we wouldn't be giving people the information to process/recover. This was debated a lot during Storage batching and we landed on this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long-term solution is our "smart batching" idea; That makes this problem go away.
You're right that throwing on 207 without having extra info in the exception is an incomplete scenario, which raises the question -- Why have ThrowOnAnyError
at all if we're never going to have IndexBatchException
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm done (browser wasn't keeping up as I scrolled through more and more files), changing response. Overall, nothing I would hold this very early preview back for, but some API issues that need to be addressed (some in the custom type definitions, but much more in the autorest.csharp project (not that I called those out; for example, collection properties should be publicly read-only) and never null (use LazyInitializer
when it makes sense to).
/// not include any facet expressions via | ||
/// <see cref="SearchOptions.Facets"/>. | ||
/// </summary> | ||
public IDictionary<string, IList<FacetResult>> Facets => _results.Facets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be mutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on comments above about tracking the generator on these.
{ | ||
if (reader.TokenType == JsonTokenType.EndObject) { break; } | ||
string property = reader.ExpectsPropertyName(); | ||
object value = ReadObject(ref reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad question, but the JSON document would have to be continuous/unending, right? Seems unnecessary to check depth unless we wanted to allow customers to not drag down perf with a lot of deeply nested docs. Seems the kind of thing we guage user feedback over time.
/// <param name="reader">The JSON reader.</param> | ||
/// <param name="expected">The expected token type.</param> | ||
public static void Expects( | ||
this ref Utf8JsonReader reader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...though it would pass by value. If the object is larger than a ref (I don't know if it is), is that really what we want for perf?
this Stream json, | ||
CancellationToken cancellationToken) | ||
{ | ||
if (json == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Stream
is abstract, I recommend json is null
in case - though unlikely - someone overrides equality. In general, is null
is preferred for reference types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
/// </summary> | ||
internal static class Constants | ||
{ | ||
// TODO: XXXXX - Switch constants to use JsonEncodedText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip: You might keep the consts for equality, but then add JsonEncodedText
static readonlys for writing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can use the encoded text for equality as well, but I'll be following up separately on all of that.
@@ -35,11 +35,11 @@ directive: | |||
|
|||
// Document operations | |||
"/docs/$count": $["/docs/$count"], | |||
"/docs/search": $["/docs/search.post.search"], | |||
"/docs/search.post.search": $["/docs/search.post.search"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about these changes since I'll be updating this block to add in indexing. What didn't work the way you had it before (which we discussed, so I thought I understood what you were doing)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are actually live endpoints. Bruce would know more about why they have both, but I decided not change them when I realized it was intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all comes down to OData-isms that actually help avoid common pitfalls of REST API design. We model the POST version of query APIs as OData "actions", which can have namespaces. In this case, the namespace is search.post
and the action name is search
. We support both the qualified and unqualified names for each action mainly for demo-ability and brevity.
People first evaluating an API tend to balk at things like indexes('myindex')/docs/search.post.search
but are comfortable with indexes/myindex/docs/search
. However, the issue with indexes/myindex/docs/search
is that somebody could index a document whose key is search
. In this case, we can disambiguate based on HTTP verb to tell whether it's a Search or a Lookup request, but in general the name-clashing problem is real. By always using the fully-qualified OData syntax in the client libraries, we avoid these problems completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so this was to basically eliminate the potential for name collisions in our libraries (i.e. this exact change; not the content itself)?
/// Join a collection of strings into a single comma separated string. | ||
/// If the collection is null or empty, a null string will be returned. | ||
/// </summary> | ||
/// <param name="items">The items to join.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could actually take an IEnumerable<T>
and be more permissive. Just use Count()
instead (it's optimized for ICollection/ICollection<T>
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only users are internal IList<T>
so I'm not going to bother yet. If anyone else needs this, then yes, let's use Enumerable's Count which does those type specific optimizations.
/// <returns>A collection of individual values.</returns> | ||
public static IList<string> CommaSplit(string value) => | ||
string.IsNullOrEmpty(value) ? | ||
new List<string>() : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to possibly be mutable? In other projects, I keep a type-cached array (it's what Enumerable.Empty
does internally, but exposes as an IEnumerable<T>
- I think Array
has something though) for this purpose if immutable. Less memory overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - we can give it back to users, they can add some values, then send it back again.
[Test] | ||
public void ManyArguments() | ||
{ | ||
Assert.AreEqual("Foo eq 2 and Bar eq 3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit here and below: a purist would say "one assert per unit test". While I think that's overkill, what might be better to consider is using a parameterized test with these inputs and outputs as parameters. When a particular tuple asserts, NUnit (via VS, AzPipelines, etc.) will show a much more helpful message at a glance what failed. I have lots of examples in Key Vault if you want to see, but basically you could do something like this:
[TestCaseSource(nameof(ManyArgumentsData))]
public void ManyArguments(FormattableString actual, string expected)
{
Assert.AreEqual(expected, SearchFilter.Create(actual));
}
private static IEnumerable ManyArgumentsData() => new []
{
new object[] { $"Foo eq {2} and Bar eq {3}", "Foo eq 2 and Bar eq 3"},
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where you can use constants, just use [TestCase("foo", 1)]
for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of that was not being an expert in the interpolation compiler decision making about when it turns things into constants strings, etc. But this whole area needs a lot more love and I'd like to clean the tests up when I do that.
{ | ||
SearchRequest result = new SearchRequest(); | ||
SearchOptions result = new SearchOptions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use of var
in foreach should be changed to be explicit (auto-gen issue).
{ | ||
writer.WritePropertyName("details"); | ||
writer.WriteStartArray(); | ||
foreach (var item in Details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Change var
usage.
/// <param name="documents"> | ||
/// The collection of documents to index. | ||
/// </param> | ||
internal IndexDocumentsBatch(IndexActionType type, IEnumerable<T> documents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem that some customers run into when building batches, and I'm not sure how far we want to go to try to solve it (not in preview 1 for sure, but something to think about for the future).
Some customers don't realize that T
is supposed to be a document type, and instead pass a JSON string for the documents
parameter. This compiles because T
can be char
, but the failure mode is really unhelpful. We might want to consider doing some client-side validation for cases like this.
/// </returns> | ||
public static IndexDocumentsBatch<SearchDocument> Delete(string keyName, IEnumerable<string> keyValues) | ||
{ | ||
IndexDocumentsBatch<SearchDocument> batch = new IndexDocumentsBatch<SearchDocument>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tg-msft If you don't like var
, you really wouldn't like let
😉
/// found, 409 for a version conflict, 422 when the index is | ||
/// temporarily unavailable, or 503 for when the service is too busy. | ||
/// </summary> | ||
[CodeGenSchemaMember("statusCode")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's anything we can do in the Swagger to reduce the number of customizations like this that you need to do, feel free to file an issue in azure-rest-api-specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to align with Response.Status
- I'm not sure other languages will be making this choice. That said, yes, we should sit down across and compare deltas across languages sometime early next preview.
string name = clone.ExpectsPropertyName(); | ||
if (name == Constants.SearchScoreKey) | ||
{ | ||
propertiesNeeded--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service won't return duplicate JSON property names in any circumstance I can think of.
propertiesNeeded--; | ||
ReadHighlights(ref clone, result); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up -- We will be adding more properties here over time. One should be arriving hopefully before the final preview release.
// Clone the reader so we can get the search text property without | ||
// advancing the reader over any properties needed to deserialize | ||
// the user's model type. | ||
Utf8JsonReader clone = reader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader is a struct I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (ref struct).
/// Set this to true if you're not inspecting the results of the Index | ||
/// Documents action. | ||
/// </summary> | ||
public bool ThrowOnAnyError { get; set; } = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I agree with false
being the default here. If nothing else, it will probably catch a lot of users who migrate from Track 1 off guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pattern we've had for Track 2 batching so far as discussed above. Since the exception doesn't have failure details, we don't want folks to do that by default.
|
||
// Ignore OData properties - we don't expose those on custom | ||
// user schemas | ||
if (!propertyName.StartsWith(Constants.ODataKeyPrefix, StringComparison.OrdinalIgnoreCase)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tg-msft Although we don't have such annotations now, in the future we may add property-scope annotations to the response payload. For example:
{
"@search.score": 3.5,
"myfield": "Hello",
"myfield@search.someInterestingStatistic": 6
}
We wouldn't do this without a Swagger change, so you'd know about it in advance. Just a heads up since this code would have to change to handle that case.
switch (reader.GetString()) | ||
{ | ||
case Constants.InfValue: | ||
return double.PositiveInfinity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a link to the relevant part of the OData spec: http://docs.oasis-open.org/odata/odata-json-format/v4.01/odata-json-format-v4.01.html#sec_PrimitiveValue
["address"] = Address?.AsDocument(), | ||
// With no elements to infer the type during deserialization, we must assume object[]. | ||
["rooms"] = Rooms?.Select(r => r.AsDocument())?.ToArray() ?? new object[0] | ||
}; | ||
} | ||
|
||
[SerializePropertyNamesAsCamelCase] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attribute still exists? Can we make it work in Track 2 with System.Text.Json
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - that's using the Track 1 library in the tests do our index creation. We will definitely be coming up with some answer here in the near future that allows using naming policies.
Just a heads up because I noticed it a few times in this PR. Try to avoid responding to comments/threads as part of a "bulk" review (and consider adding those standalone comment replies separately). Otherwise, your comments show up twice, both in the review feedback and the comment thread you are responding on, making it difficult to track what exactly the comment is responding to. |
I've filed many tracking bugs at https://github.com/Azure/azure-sdk-for-net/issues?q=assignee%3Atg-msft+is%3Aopen+label%3ASearch |
This adds Search, Suggest, Autocomplete, GetDocument, and IndexDocuments. It also adds custom user schema serialization/deserialization.
If you're in a hurry, please prioritize these:
Apologies for the massive PR. I've sliced it up to be easier to review commit-by-commit:
Generated codeRecordings/swaggerPlease specifically call out anything you consider a release blocker. I'll be filing tracking bugs for all my
TODO: XXXXX
comments and updating the PR with bug numbers later today.