Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[random] support random fill #5913

Merged
merged 3 commits into from
Aug 17, 2020
Merged

[random] support random fill #5913

merged 3 commits into from
Aug 17, 2020

Conversation

FrozenGene
Copy link
Member

This pr support us to allocate non empty values of nd array, which could solve the issue of autotvm measurement is not correct (see #5200). This is one standalone part of Ansor (#5883).

@kevinthesun @merrymercy @minminsun @jcf94

golang/Makefile Outdated Show resolved Hide resolved
src/runtime/ndarray.cc Outdated Show resolved Hide resolved
python/tvm/runtime/ndarray.py Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Jun 24, 2020

see comments, we don't need to expose the(non_empty) function as a primitive function, given that it is not a numpy function, and is only used in AutoTVM, instead we can achieve the same goal by

x = nd.empty(..)
random_fill  = get_packed_func("contrib.random.random_fill") 
random_fill(x)

Notably, the above solution is better because:

  • Minimum interface exposure(non autotvm devs don't need to be aware of the change)
  • Random initialization is performed on the device, the current impl actually will results in random number generated on the host then transfering to the device.

src/runtime/ndarray.cc Outdated Show resolved Hide resolved
@tqchen tqchen self-assigned this Jun 24, 2020
@tqchen tqchen added the status: need update need update based on feedbacks label Jun 24, 2020
@junrushao
Copy link
Member

I agree with TQ's point. Also, I am a bit concerned if we really need a C API for this.

@FrozenGene
Copy link
Member Author

I will handle it as comments

@tqchen
Copy link
Member

tqchen commented Jul 10, 2020

@FrozenGene please followup

@FrozenGene
Copy link
Member Author

FrozenGene commented Jul 10, 2020 via email

@FrozenGene
Copy link
Member Author

FrozenGene commented Jul 17, 2020

see comments, we don't need to expose the(non_empty) function as a primitive function, given that it is not a numpy function, and is only used in AutoTVM, instead we can achieve the same goal by

x = nd.empty(..)
random_fill  = get_packed_func("contrib.random.random_fill") 
random_fill(x)

Notably, the above solution is better because:

  • Minimum interface exposure(non autotvm devs don't need to be aware of the change)
  • Random initialization is performed on the device, the current impl actually will results in random number generated on the host then transfering to the device.

@tqchen I restart this work. This way we could make the random initialization on the device (i.e. AllocaWorkSpace use correct device api). But as far as I know we still can not avoid generating random numbers on host if we don't call specific apis like cuRAND (if we are for NV GPU) . That is to say, we still have to generate random numbers and copy to device. However, we should accomplish the goal we could generate random numbers on the remote cpu (if we are in rpc mode) and copy the data from remote cpu (like arm) to remote GPU (like mali) directly, not the path x86 cpu -> arm cpu -> mali gpu.

@merrymercy
Copy link
Member

merrymercy commented Jul 27, 2020

@FrozenGene Please followup. It is okay to do the path CPU@remote_device -> GPU@remote_device for now, as long as there is no RPC communication cost (i.e. no local_device -> remote device)
I remembered that we tried to do this in our internal repo but failed. What's the problem at that time?

@tqchen
Copy link
Member

tqchen commented Aug 6, 2020

Ping

@FrozenGene
Copy link
Member Author

@FrozenGene Please followup. It is okay to do the path CPU@remote_device -> GPU@remote_device for now, as long as there is no RPC communication cost (i.e. no local_device -> remote device)
I remembered that we tried to do this in our internal repo but failed. What's the problem at that time?

@merrymercy Our current method is we will introduce one dummy cpu context in the remote and pass the data to the remote target (like OpenCL, CUDA). Previous time we want to do is to generate non empty data in the remote target but failed.

@tqchen 's suggestion we could leverage empty interface and fill the data into the allocated tensor to avoid introducing new non_empty api in the C / ndarray interface and generate random data directly in the remote device. Previous comment is to make sure that we maybe have to introduce cpu like our current way.

I will follow up my pr that move our implementation to the contrib/random/random.cc and turn it on always as our auto scheduler has local builder / local runner also rely on it (not just rpc).

@FrozenGene FrozenGene force-pushed the non_empty branch 2 times, most recently from de37081 to 696fdd9 Compare August 10, 2020 09:44
@FrozenGene FrozenGene changed the title [ndarray][autotvm] support ndarray.non_empty [random] support random fill Aug 10, 2020
@FrozenGene
Copy link
Member Author

FrozenGene commented Aug 11, 2020

@FrozenGene Please followup. It is okay to do the path CPU@remote_device -> GPU@remote_device for now, as long as there is no RPC communication cost (i.e. no local_device -> remote device)
I remembered that we tried to do this in our internal repo but failed. What's the problem at that time?

@merrymercy Our current method is we will introduce one dummy cpu context in the remote and pass the data to the remote target (like OpenCL, CUDA). Previous time we want to do is to generate non empty data in the remote target but failed.

@tqchen 's suggestion we could leverage empty interface and fill the data into the allocated tensor to avoid introducing new non_empty api in the C / ndarray interface and generate random data directly in the remote device. Previous comment is to make sure that we maybe have to introduce cpu like our current way.

I will follow up my pr that move our implementation to the contrib/random/random.cc and turn it on always as our auto scheduler has local builder / local runner also rely on it (not just rpc).

@merrymercy @tqchen I have updated the code and verified it in the remote cpu / remote mali gpu. We could do CPU@remote_device -> GPU@remote_device directly, not CPU@host->CPU@remote_device -> GPU@remote_device.

@FrozenGene FrozenGene requested a review from tqchen August 12, 2020 06:07
Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
So we will hava a later pr to add clflush & random_fill to AutoTVM & Auto_Scheduler?

@FrozenGene
Copy link
Member Author

Looks good.
So we will hava a later pr to add clflush & random_fill to AutoTVM & Auto_Scheduler?

Yes.

@FrozenGene
Copy link
Member Author

@tqchen @merrymercy gental ping. Code has been updated.

@FrozenGene
Copy link
Member Author

@tqchen @merrymercy @comaniac do you have other comments?

@tqchen tqchen merged commit f731652 into apache:master Aug 17, 2020
@tqchen
Copy link
Member

tqchen commented Aug 17, 2020

Thanks @FrozenGene @comaniac @merrymercy

@tqchen tqchen added status: accepted and removed status: need update need update based on feedbacks labels Aug 17, 2020
@FrozenGene FrozenGene deleted the non_empty branch August 18, 2020 04:16
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
@merrymercy
Copy link
Member

merrymercy commented Aug 27, 2020

@FrozenGene Can you send the follow-up PRs to enable this in ansor and autotvm?

@FrozenGene
Copy link
Member Author

Thanks for reminding @merrymercy. My agenda is completely full tomorrow and weekend. I could do this next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants