Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic C API for PluralRules #552

Merged
merged 12 commits into from
Mar 19, 2021
Merged

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Mar 16, 2021

This PR adds an icu_capi crate that contains a C API, as well as a simple C test file that can be built and run that is able to call PluralRules::select with an fs data provider.

This does not:

  • Have full support for all kinds of data providers
  • Use any automated tooling
  • Do any fancy cfg() stuff to make it possible to pull in selective APIs

These are all things we can add later as we explore writing this C API.

Some things I still need to do:

  • Add comments everywhere
  • Make Locale's constructor fallible
  • Actually run the "test" on CI and make it more of an actual test

But it's ready for preliminary review.

@Manishearth Manishearth requested a review from a team as a code owner March 16, 2021 04:11
@Manishearth Manishearth changed the title Add sample C API for PluralRules Add basic C API for PluralRules Mar 16, 2021
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just some assorted comments.

components/capi/Cargo.toml Outdated Show resolved Hide resolved
components/capi/include/locale.h Outdated Show resolved Hide resolved
Comment on lines 20 to 23
typedef enum {
PluralRuleType_Cardinal,
PluralRuleType_Ordinal
} PluralRuleType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typedef enum {
PluralRuleType_Cardinal,
PluralRuleType_Ordinal
} PluralRuleType;
typedef enum {
ICU4X_PluralRuleType_Cardinal,
ICU4X_PluralRuleType_Ordinal
} ICU4X_PluralRuleType;

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention for c enums that I'm used to is all caps, but that can be hard on the eyes. Maybe Icu4x_ as a prefix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Icu4x_ as a prefix?

Mmmm, I don't really like the aesthetics of mixed-case on "ICU4X", especially since the first character "I" is confusable with lowercase letter "l". I'd rather it be all uppercase or all lowercase.



CreatePluralRulesResult plural_rules_create(Locale* locale, ErasedDataProvider* provider, PluralRuleType ty);
PluralCategory plural_rules_select(const PluralRules* rules, PluralOperands op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've written PluralOperands as a data bag (which I agree with, at least for now). We definitely want it in the header file so it can be directly constructed from C, but in the function signature, do we want to pass by value, or should we pass by pointer, just like object types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pointer, to me this seems like idiomatic C:

  PluralOperands p;
  PluralCategory cat = plural_rules_select(rules, &p);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it doesn't really matter all that much, and I typically try to pass Copy types by value unless they're huge (this one isn't huge)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's that I don't think of PluralOperands as a value type; I think of it as a bag of values, and I expect a bag of values to be passed by reference/pointer.

We probably don't need PluralOperands to implement Copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using it as a Copy type within the bindings though I could call .clone(). It's just that .clone() is typically considered to be a heavy operation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, good point. PluralOperands can implement Copy but still be passed by reference, right?

I guess it comes down to a personal preference, unless there are performance impacts (i.e., I would guess that passing a 64-bit pointer is faster than Copying a struct that is a bit larger than that)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are passed by reference

components/capi/include/pluralrules.h Outdated Show resolved Hide resolved
pub extern "C" fn plural_rules_create(
locale: &ICULocale,
provider: &mut ErasedDataProvider,
ty: PluralRuleType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of where we should think about the implications for data slicing. There are two data keys for PluralRules: CARDINAL and ORDINAL. In Rust, you can pass in the type, and code slicing will remove the unused data key for you. However, in C, you get the whole function, so both data keys are compiled, even if you need only one.

We could:

  1. Manually make custom entrypoints for each Rust API that have one function per data key, like icu4x_plural_rules_open_cardinal() and icu4x_plural_rules_open_ordinal()
  2. Do something more along the lines of what I suggested in wrapper_layer.md, where the C functions wrap Rust one layer below the ergonomic Rust API.
  3. Ignore the problem 😢.
  4. Other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (1.) is fine. I wouldn't include it in the API just yet but we can as needed I think.

A problem will be if we wish to support non-erased data providers, we'll have multiple _create functions from the angle of the data providers. So I'm wary of a multiplicative blowup in constructors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking that we would do something more along the lines of:

  • The entrypoint APIs like icu4x_plural_rules_create() take a data struct, not a data provider
  • The code to resolve from data provider to data struct is written in the wrapper layer

Since the data provider may be async, this approach is cleaner since it doesn't require that icu4x_plural_rules_create() also be async.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if we did something along the lines of

ICU4X_Provider_PluralRulesV1* data = icu4x_provider_load_plural_rules_cardinal(provider, ...);
ICU4X_PluralRules* rules = icu4x_plural_rules_create(data, ...);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... this does make it easier for async and stuff.

Do we have any cases where the ICU4X object (like PluralRules) would need to care about the data provider after it has been constructed?

I think we should land the current code as is and file a followup for improving the data provider FFI story, because first off we'd have to add internal APIs for this everywhere.

Comment on lines 57 to 65
#[repr(C)]
pub struct PluralOperands {
pub i: u64,
pub v: usize,
pub w: usize,
pub f: u64,
pub t: u64,
pub c: usize,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you duplicating the enums only temporarily, or do you intend that we should always duplicate them instead of adding #[repr(C)] to the main types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer permanent duplication, the conversions below will stop compiling if you make incompatible changes, and this allows us to potentially change the Rust side while keeping the C side compatible.

My design goal here is that everything relevant to FFI is in this crate itself so people outside this crate don't need to care about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer-term, do you envision auto-generating the FFI enums from Rust enums, or is it by design that you need to update it in both places (to make FFI changes more explicit)?

How do we deal with API docs if we have duplicated structs? If the Rust struct has API docs, what happens with the FFI struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer-term, do you envision auto-generating the FFI enums from Rust enums, or is it by design that you need to update it in both places (to make FFI changes more explicit)?

By design, IMO. I talk about this in the "bridge blocks are self-contained" section of the diplomat doc. cxx has this restriction and I think it's a good thing, requiring your tooling to understand your crate graph can be fraught and subject to things moving out from underneath you.

The API docs are a good point. I don't know what to do here, we could have manual comments asking people to update things cross file, but that's not great. We could potentially teach tooling to scrape docs (and only docs) from other parts of the codebase, but I'm super wary of scraping anything other than docs.

Copy link
Member

@sffc sffc Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should come up with a story for docs soon, but we don't need it right now. Lack of documentation is a key pain point for ICU and ICU wrappers like PyICU.

Maybe something along the lines of #[doc(include = "...")] could help?

https://doc.rust-lang.org/unstable-book/language-features/external-doc.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should at least have a hyperlink back to the main Rust type from the docstring.

Really, the place where we need the docs is in the .h files, and we can't use rustdoc there anyway. We'll need to come up with a solution for documenting the C header files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would just hyperlink back tbh

I think some of the automated tools can copy docs over as comments (perhaps in ways pandoc will like?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about letting the C representation drift too far from the Rust structures. There is a Rust library in Firefox with a heavyweight FFI layer that maps between internal Rust types and FFI Rust types that are exposed to C. That has been difficult to test properly, and I think we've had more bugs in that FFI layer than in the Rust or C++ code on either side of it.

Copy link
Member Author

@Manishearth Manishearth Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't drift unless intentionally; due to the From implementation we are essentially checking that they always have identical fields. Any change will cause a compile error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the intentional drift that I was worried about :)

components/capi/src/provider.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@dminor dminor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good!

components/capi/ctest/test.c Outdated Show resolved Hide resolved
components/capi/ctest/test.c Outdated Show resolved Hide resolved
components/capi/ctest/test.c Outdated Show resolved Hide resolved
components/capi/include/locale.h Outdated Show resolved Hide resolved
Comment on lines 20 to 23
typedef enum {
PluralRuleType_Cardinal,
PluralRuleType_Ordinal
} PluralRuleType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention for c enums that I'm used to is all caps, but that can be hard on the eyes. Maybe Icu4x_ as a prefix?

components/capi/src/provider.rs Outdated Show resolved Hide resolved
components/capi/include/pluralrules.h Outdated Show resolved Hide resolved
uint64_t f;
uint64_t t;
size_t c;
} PluralOperands;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the number of operands we have I think this is a good approach. The only other thing that comes to mind would be an enum for the possible operand types, and then a struct with the operand type and a union (yucky...) to hold possible values. This would allow callers to pass in an array of values. That might be something to consider in the future if we expose an API with a large number of options.


typedef struct {
PluralRules* rules;
bool success;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this case, a nullptr for rules could indicate failure, so success is redundant. If results are always going to be ok or error, we could just use the pointer to indicate this. If not, then maybe we should have an enum of possible results, which could just be Ok/Error for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of adding a proper error enum to this later, the success is just a placeholder.

@@ -0,0 +1,25 @@
# This file is part of ICU4X. For terms of use, please see the file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're eventually going to have a lot of examples for the C API, I'm wondering if using make is going to cause us unpleasant experiences in the future. I realize it is generic and widely available, but it isn't that nice to work with. On the other hand, if we choose something else, then it is an extra dependency, and something we need to get working in CI. Offhand, I was thinking of something like meson as an alternative build tool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like make very much either, but it is the de-facto C build system so I figured we should use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use CMake so that the MSVC build files are auto-generated for Windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not using Make is an option we're willing to consider, we should probably have a discussion on C build systems before we chose one off hand. I've used CMake in the past and while it is definitely nicer to work with than Make, it still has some oddities. Other tools, like Meson, can also generate msvc build files, so it would be nice to determine our requirements and see what is out there that would be usable. I'd like to see us pick something as user friendly as possible for example code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great; but I don't think we necessarily need to block on this just yet -- this is just a quick and dirty test for the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, don't block on this.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/capi/ctest/test.c is different
  • components/capi/include/pluralrules.h is different
  • components/capi/src/pluralrules.rs is different
  • components/capi/src/provider.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/capi/Cargo.toml is different
  • components/capi/ctest/test.c is different
  • components/capi/include/locale.h is different
  • components/capi/include/pluralrules.h is different
  • components/capi/include/provider.h is different
  • components/capi/src/locale.rs is different
  • components/capi/src/pluralrules.rs is different
  • components/capi/src/provider.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@Manishearth
Copy link
Member Author

I renamed:

  • all methods to icu4x_methodname
  • all types to ICU4XType
  • enum variants to ICU4XType_Variant on the C side (just Variant on the Rust side)

I decided not to use underscores in the type name just to use consistent style with Rust, since C doesn't quite have any one well-accepted style.

@coveralls
Copy link

coveralls commented Mar 16, 2021

Pull Request Test Coverage Report for Build 409f2aa5703e52f3c693b9d50215dc0c8c2baaf7-PR-552

  • 1 of 86 (1.16%) changed or added relevant lines in 4 files are covered.
  • 50 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-1.4%) to 71.464%

Changes Missing Coverage Covered Lines Changed/Added Lines %
components/capi/src/locale.rs 0 8 0.0%
components/capi/src/provider.rs 0 28 0.0%
components/capi/src/pluralrules.rs 0 49 0.0%
Files with Coverage Reduction New Missed Lines %
components/provider/src/erased.rs 50 26.55%
Totals Coverage Status
Change from base Build b6ed6f058a0c3b6566eac78a58f47303bf48830f: -1.4%
Covered Lines: 4946
Relevant Lines: 6921

💛 - Coveralls

components/capi/src/provider.rs Outdated Show resolved Hide resolved
components/capi/src/provider.rs Outdated Show resolved Hide resolved
components/capi/src/pluralrules.rs Outdated Show resolved Hide resolved
components/capi/include/pluralrules.h Outdated Show resolved Hide resolved
}
}

pub fn from_boxed(x: Box<dyn ErasedDataProvider<'static>>) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you did this mem::transmute(x) thing for the ErasedDataProvider trait object instead of Box::into_raw like you did for PluralRules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box::into_raw is T: Sized 😄

there is currently no stable "raw" representation of trait objects. There's a pointer metadata API in the works that will give us this, in the meantime this is safe because we only care about getting the layout right, and mem::transmute errors if you get the layout wrong.

@codecov-io
Copy link

codecov-io commented Mar 17, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@2d6664a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #552   +/-   ##
=======================================
  Coverage        ?   72.57%           
=======================================
  Files           ?      123           
  Lines           ?     6644           
  Branches        ?        0           
=======================================
  Hits            ?     4822           
  Misses          ?     1822           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d6664a...48e12ce. Read the comment docs.

@Manishearth
Copy link
Member Author

Added lots of comments

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/capi/src/pluralrules.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@sffc sffc requested review from sffc and dminor March 18, 2021 17:14
dminor
dminor previously approved these changes Mar 18, 2021
Copy link
Contributor

@dminor dminor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! I think it makes sense to land this now and we can iterate on this in follow up work.

That said, I'd like to see #pragma once replaced with include guards before this lands unless someone has a strong reason to keep them.

@Manishearth
Copy link
Member Author

I can make that change

@Manishearth
Copy link
Member Author

pragmas removed!

@Manishearth Manishearth merged commit 91dab1c into unicode-org:main Mar 19, 2021
@Manishearth
Copy link
Member Author

Thanks! Going to file some followups

@Manishearth
Copy link
Member Author

Followups:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants