Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solr data source, parse multiValued fields #5

Closed
atomotic opened this issue May 8, 2017 · 5 comments
Closed

solr data source, parse multiValued fields #5

atomotic opened this issue May 8, 2017 · 5 comments

Comments

@atomotic
Copy link

atomotic commented May 8, 2017

a very common use case is to populate a solr index with a csv, fairly straightforward:

solr create -c reconcile
post -c reconcile data.csv

the default "schemaless" configuration has all fields defined as multiValued by default.
for example, given a field (csv column) label_en that has no explicit "multiValued":false

http://localhost:8983/solr/reconcile/schema/fields/label_en

{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "field":{
    "name":"label_en",
    "type":"strings"
    }}

the query will result in:

<doc>
    <arr name="label_en">
      <str>forgery, falsification and theft of artworks</str>
    </arr>
   ....
</doc>

would be easy to implement parsing of this result rather than modifying the solr schema?
thanks

@codeforkjeff
Copy link
Owner

Hi, thanks for opening an issue!

The tricky part is determining which value(s) should be returned to OpenRefine for a multivalued field. Should only the first value be used? Should all the values be concatenated together, separated by some character? Or should there be a way to add logic to determine which value is most relevant to the query performed?

@atomotic
Copy link
Author

atomotic commented May 9, 2017

for the basic csv use case i think that is enough to use the first value.
more complex indexes maybe would require some logic.
this is my first approach to solr, i'm still getting my mind to its schema model.

thanks

@codeforkjeff
Copy link
Owner

codeforkjeff commented May 12, 2017

I've created a v2.3.0 pre-release version that parses multivalued fields. You can download it here. Could you please try it out when you get a chance?

The default behavior is to concatenate all the values. If you want only the first value, set datasource.solr.field.name.multivalue.strategy to first in the properties file.

@atomotic
Copy link
Author

well done! it works, thanks.
i may write a small tutorial how to setup solr and populate index from csv + conciliator
(actually i was thinking a docker image too)

thanks again

@codeforkjeff
Copy link
Owner

Glad to hear it! Thanks for suggesting this improvement, and please do let me know if you end up writing a tutorial or docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants