Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generify DruidDimensionLoader #449

Merged
merged 7 commits into from
Aug 16, 2017

Conversation

kevinhinterlong
Copy link
Member

The DruidDimensionLoader now only tries to load dimensions on datasources that contain that dimension.

@@ -130,11 +127,9 @@ public DruidDimensionsLoader(
// time if `dimensions` were a flat list instead of a list of singleton lists.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't update this comment

@kevinhinterlong
Copy link
Member Author

Not sure why I couldn't review this part, but it appears I missed this as well TimeUnit.MILLISECONDS.toMillis(60000)

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code were more load bearing I'd ask for more tests on some of the inner details (making sure that columns are loaded on all table that they're on)

But it's not and I don't care that much about that optimization.

@yahoo yahoo deleted a comment Jul 28, 2017
@kevinhinterlong kevinhinterlong force-pushed the cleanupDruidDimensionLoader branch from 079ae99 to ae84d62 Compare August 2, 2017 15:09
@kevinhinterlong
Copy link
Member Author

kevinhinterlong commented Aug 2, 2017

@michael-mclawhorn This is now more of a dramatic change involding the DimensionRowProvider strategy you mentioned so you'll need to re-review.

Basically what changed is

  • DruidDimensionLoader -> DimensionLoader
  • DimensionLoader now takes a collection of AbstractDimensionRowProvider in the constructor
  • The DruidDimensionLoader druid querying logic has been moved to DruidDimensionRowProvider

@yahoo yahoo deleted a comment Aug 2, 2017
@kevinhinterlong kevinhinterlong changed the title Cleanup Druid Dimension Loader Generify DruidDimensionLoader Aug 2, 2017
@kevinhinterlong
Copy link
Member Author

It also might be interesting to add a NoOpDimensionRowProvider which just updates all the dimensions without loading any DimensionRows

Copy link
Collaborator

@garyluoex garyluoex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments are opinions, feel free to discuss if you think otherwise. Besides the opinions, it is goo.

*/
protected DruidDimensionsLoader buildDruidDimensionsLoader(
protected DimensionLoader buildDruidDimensionsLoader(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need more than one dimension loader per application this point right? @michael-mclawhorn .Would be nice if we can deprecate this method in a backward compatible way so that we can change the name of this method also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could mark both buildDruidDimensionsLoader(...) and setupDruidDimensionsLoader(...) as deprecated and make a method to provide the collection of AbstractDimensionRowProvider?

*
* @param healthCheckRegistry The health check registry to register Dimension lookup health checks
* @param dataDruidDimensionsLoader The DruidDimensionLoader used for monitoring and health checks
* @param dataDimensionLoader The DruidDimensionLoader used for monitoring and health checks
*/
protected final void setupDruidDimensionsLoader(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep two separate method for dimension loader setup, would be nice to change this name also.

@@ -879,7 +879,7 @@ protected final ConfigurationLoader getConfigurationLoader() {
* @return A configurationLoader instance
*/
protected ConfigurationLoader buildConfigurationLoader(
DimensionLoader dimensionLoader,
com.yahoo.bard.webservice.data.config.dimension.DimensionLoader dimensionLoader,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to rename things anyways, so try to get rid of the conflict by renaming.

/**
* The AbstractDimensionRowProvider provides a way to load dimension rows onto dimensions.
*/
public abstract class AbstractDimensionRowProvider {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DimensionRow or DimensionValue? depends on you, both are fine but personally I think value feels better, the concept of a dimension table is not obvious in fili.

private final List<Dimension> dimensions;
private final List<DataSource> dataSources;

private HttpErrorCallback errorCallback;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class assumes dimensions are provided through http request? Not sure if we can be a bit more generic or not... I think I would prefer a generic retrieval method ignorant class for this class. @michael-mclawhorn

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's kind of weird to put it here I agree. I thought it would be easier for the DimensionRowProvider implementation to only use it where it's applicable.

* @param dimension The dimension to load.
* @param dataSource The datasource to query values for.
*/
protected abstract void query(Dimension dimension, DataSource dataSource);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this to the top of method list so that it is more obvious to find.

private HttpErrorCallback errorCallback;
private FailureCallback failureCallback;
private BiConsumer<Dimension, DimensionRow> dimensionRowConsumer;
private Consumer<Dimension> dimensionLoadedConsumer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to store these two consumers, but instead pass in to function that uses consumer. It feels weird that a provider stores the consumer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively I could just pull the two methods that are actually used from DimensionLoader into AbstractDimensionRowProvider since they're unlikely to change

@Override
public void accept(Dimension dimension, DimensionRow dimensionRow) {
    dimension.addDimensionRow(dimensionRow);
}

@Override
public void accept(Dimension dimension) {
    // Tell the dimension it's been updated
    dimension.setLastUpdated(DateTime.now());
}

*/
@Singleton
public class DimensionLoader extends Loader<Boolean>
implements BiConsumer<Dimension, DimensionRow>, Consumer<Dimension> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if implementing consumer is useful here, it is unclear that a loader is a consumer.

@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@kevinhinterlong kevinhinterlong force-pushed the cleanupDruidDimensionLoader branch from 3f4a01a to f379906 Compare August 15, 2017 14:55
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@yahoo yahoo deleted a comment Aug 15, 2017
@kevinhinterlong kevinhinterlong force-pushed the cleanupDruidDimensionLoader branch from f379906 to 380b146 Compare August 15, 2017 15:10
Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Provider should probably be a Loader.

Suggested fix:
Change interface Loader to LoadTask
Change DimensionValueLoader to DimensionValueLoadTask
Change DimensionValueProvider to DimensionValueLoader

) {
this.dimensions = dimensionsToLoad.stream()
.map(dimensionDictionary::findByApiName)
.collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better as a Set?

private final Collection<AbstractDimensionValueProvider> dimensionRowProviders;

/**
* DimensionLoader fetches data from Druid and adds it to the dimension cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Druid?


/**
* DimensionLoader sends requests to the druid search query interface to get a list of dimension
* values to add to the dimension cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Druid?

SYSTEM_CONFIG.getPackageVariableName("druid_dim_loader_timer_duration");
public static final String DRUID_DIM_LOADER_TIMER_DELAY_KEY =
SYSTEM_CONFIG.getPackageVariableName("druid_dim_loader_timer_delay");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Druid in local constant name?

Changing the configuration parameter will require a deprecation.


private static final long TEN_YEARS_MILLIS = 10 * TimeUnit.DAYS.toMillis(365);
private static final Duration DURATION = new Duration(TEN_YEARS_MILLIS);
private static final Interval INTERVAL = new Interval(DURATION, DateTime.now());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @param dimensionDictionary The dimension dictionary to load dimensions from.
* @param dimensionsToLoad The dimensions to use.
*/
public AbstractDimensionValueProvider(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if you can create an interface for this

@michael-mclawhorn
Copy link
Contributor

Resolve conflicts, Add the interface name change to the CHANGE LOG, rebase and then someone can merge at will.

- This fixed the naming schema by moving DimensionRow -> DimensionValue
- Removed the abstraction of consumers for loading dimensions
- Loader -> LoadTask
	- Lots of other refactoring got dragged in
	- DimensionValueLoader -> DimensionValueLoadTask
- DimensionValueProvider -> DimensionValueLoader (interface)
- Fix tests
@kevinhinterlong kevinhinterlong force-pushed the cleanupDruidDimensionLoader branch from 839ee60 to cbcc50c Compare August 15, 2017 23:49
@garyluoex garyluoex merged commit 5de3050 into yahoo:master Aug 16, 2017
@kevinhinterlong kevinhinterlong deleted the cleanupDruidDimensionLoader branch August 16, 2017 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants