Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blobs are N-D arrays (for N not necessarily equals 4) #1486

Closed
wants to merge 65 commits into from

Conversation

jeffdonahue
Copy link
Contributor

This PR gives Blobs a vector<int> of dimensions, rather than the old num, channels, height, width. The first commit is all the changes needed to get Caffe to compile and everything to run as before with the new vector of tensor dimensions. The remaining commits generalize some existing classes to use the new tensor dimensions (but they are not necessary to make it run, as it's still fine to just use all 4-D blobs with extra singleton dimensions where needed).

Currently I think the only problem is that in the InnerProductLayer the weight blobs won't be compatible with existing saved nets (since the weights are now 2D tensors and biases are 1D tensors), so I need to add something to handle that case for backwards compatibility.

@bhack
Copy link
Contributor

bhack commented Nov 26, 2014

Has anyone analyzed how this will cross-impact all others PRs (some of this are ageing in the queue)?

@jeffdonahue
Copy link
Contributor Author

It should have little to no impact technically; everything works fine with just my first commit (and no changes whatsoever to existing layers). It's possible as a result of this PR we might ask for some minimal changes to not assume the 4D dimensions before we merge them into the official repo, but they will work fine without these changes.

@longjon
Copy link
Contributor

longjon commented Nov 26, 2014

I took a really rushed pass of the first commit, it looks pretty good. I'm glad we can do this so quickly and fairly noninvasively!

@jeffdonahue
Copy link
Contributor Author

Thanks for the feedback @longjon!

@sirotenko
Copy link

Upd. Already done.
How about keeping width, height, num, channels as a shorcuts for first 4 dimensions?
When I implemented similar functionality several years ago in cudacnn lib I found it convenient to use width and height instead of shape[0], shape[1].

@shelhamer
Copy link
Member

@sirotenko yeah, that is why @jeffdonahue kept those properties for shorthand: jeffdonahue@3715eab#diff-5c854864685133b02ed80f33ba8ad535R73

@shelhamer
Copy link
Member

Hey Jeff, this looks sweet! I'm pulling this for review on my flight along
with JL's latest PRs so we can warm up the merge machine and come out of
the deadline quiet.
On Mon, Dec 1, 2014 at 11:30 souzou notifications@github.com wrote:

Hello,
I done all this change, I compile but it have many error,
I try to correct them :

  1. In solver.hpp , we should add this line in protected issue:
    int current_step_;
  2. In io.cpp , the definition of the function : CVMatToDatum, should be
    before the use of it

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include // NOLINT(readability/streams)
#include
#include
#include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/util/io.hpp"
namespace caffe {
using google::protobuf::io::FileInputStream;
using google::protobuf::io::FileOutputStream;
using google::protobuf::io::ZeroCopyInputStream;
using google::protobuf::io::CodedInputStream;
using google::protobuf::io::ZeroCopyOutputStream;
using google::protobuf::io::CodedOutputStream;
using google::protobuf::Message;
bool ReadProtoFromTextFile(const char* filename, Message* proto) {
int fd = open(filename, O_RDONLY);
CHECK_NE(fd, -1) << "File not found: " << filename;
FileInputStream* input = new FileInputStream(fd);
bool success = google::protobuf::TextFormat::Parse(input, proto);
delete input;
close(fd);
return success;
}
void WriteProtoToTextFile(const Message& proto, const char* filename) {
int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
FileOutputStream* output = new FileOutputStream(fd);
CHECK(google::protobuf::TextFormat::Print(proto, output));
delete output;
close(fd);
}
bool ReadProtoFromBinaryFile(const char* filename, Message* proto) {
int fd = open(filename, O_RDONLY);
CHECK_NE(fd, -1) << "File not found: " << filename;
ZeroCopyInputStream* raw_input = new FileInputStream(fd);
CodedInputStream* coded_input = new CodedInputStream(raw_input);
coded_input->SetTotalBytesLimit(1073741824, 536870912);
bool success = proto->ParseFromCodedStream(coded_input);
delete coded_input;
delete raw_input;
close(fd);
return success;
}

void CVMatToDatum(const cv::Mat& cv_img, Datum* datum) {
CHECK(cv_img.depth() == CV_8U) << "Image data type must be unsigned byte";
datum->set_channels(cv_img.channels());
datum->set_height(cv_img.rows);
datum->set_width(cv_img.cols);
datum->clear_data();
datum->clear_float_data();
datum->set_encoded(false);
int datum_channels = datum->channels();
int datum_height = datum->height();
int datum_width = datum->width();
int datum_size = datum_channels * datum_height * datum_width;
std::string buffer(datum_size, ' ');
for (int h = 0; h < datum_height; ++h) {
const uchar* ptr = cv_img.ptr(h);
int img_index = 0;
for (int w = 0; w < datum_width; ++w) {
for (int c = 0; c < datum_channels; ++c) {
int datum_index = (c * datum_height + h) * datum_width + w;
buffer[datum_index] = static_cast(ptr[img_index++]);
}
}
}
datum->set_data(buffer);
}

void WriteProtoToBinaryFile(const Message& proto, const char* filename) {
fstream output(filename, ios::out | ios::trunc | ios::binary);
CHECK(proto.SerializeToOstream(&output));
}
cv::Mat ReadImageToCVMat(const string& filename,
const int height, const int width, const bool is_color) {
cv::Mat cv_img;
int cv_read_flag = (is_color ? CV_LOAD_IMAGE_COLOR :
CV_LOAD_IMAGE_GRAYSCALE);
cv::Mat cv_img_origin = cv::imread(filename, cv_read_flag);
if (!cv_img_origin.data) {
LOG(ERROR) << "Could not open or find file " << filename;
return cv_img_origin;
}
if (height > 0 && width > 0) {
cv::resize(cv_img_origin, cv_img, cv::Size(width, height));
} else {
cv_img = cv_img_origin;
}
return cv_img;
}
bool ReadImageToDatum(const string& filename, const int label,
const int height, const int width, const bool is_color, Datum* datum) {
cv::Mat cv_img = ReadImageToCVMat(filename, height, width, is_color);

if (cv_img.data) {
CVMatToDatum(cv_img, datum);
datum->set_label(label);
return true;
} else {
return false;
}
}
bool ReadFileToDatum(const string& filename, const int label,
Datum* datum) {
std::streampos size;
fstream file(filename.c_str(), ios::in|ios::binary|ios::ate);
if (file.is_open()) {
size = file.tellg();
std::string buffer(size, ' ');
file.seekg(0, ios::beg);
file.read(&buffer[0], size);
file.close();
datum->set_data(buffer);
datum->set_label(label);
datum->set_encoded(true);
return true;
} else {
return false;
}
}
cv::Mat DecodeDatumToCVMat(const Datum& datum,
const int height, const int width, const bool is_color) {
cv::Mat cv_img;
CHECK(datum.encoded()) << "Datum not encoded";
int cv_read_flag = (is_color ? CV_LOAD_IMAGE_COLOR :
CV_LOAD_IMAGE_GRAYSCALE);
const string& data = datum.data();
std::vector vec_data(data.c_str(), data.c_str() + data.size());
if (height > 0 && width > 0) {
cv::Mat cv_img_origin = cv::imdecode(cv::Mat(vec_data), cv_read_flag);
cv::resize(cv_img_origin, cv_img, cv::Size(width, height));
} else {
cv_img = cv::imdecode(vec_data, cv_read_flag);
}
if (!cv_img.data) {
LOG(ERROR) << "Could not decode datum ";
}
return cv_img;
}
// If Datum is encoded will decoded using DecodeDatumToCVMat and
CVMatToDatum
// if height and width are set it will resize it
// If Datum is not encoded will do nothing
bool DecodeDatum(const int height, const int width, const bool is_color,
Datum* datum) {
if (datum->encoded()) {
cv::Mat cv_img = DecodeDatumToCVMat((*datum), height, width, is_color);
CVMatToDatum(cv_img, datum);
return true;
} else {
return false;
}
}

// Verifies format of data stored in HDF5 file and reshapes blob
accordingly.
template
void hdf5_load_nd_dataset_helper(
hid_t file_id, const char* dataset_name_, int min_dim, int max_dim,
Blob* blob) {
// Verify that the dataset exists.
CHECK(H5LTfind_dataset(file_id, dataset_name_))
<< "Failed to find HDF5 dataset " << dataset_name_;
// Verify that the number of dimensions is in the accepted range.
herr_t status;
int ndims;
status = H5LTget_dataset_ndims(file_id, dataset_name_, &ndims);
CHECK_GE(status, 0) << "Failed to get dataset ndims for " << dataset_name_;
CHECK_GE(ndims, min_dim);
CHECK_LE(ndims, max_dim);
// Verify that the data format is what we expect: float or double.
std::vector dims(ndims);
H5T_class_t class_;
status = H5LTget_dataset_info(
file_id, dataset_name_, dims.data(), &class_, NULL);
CHECK_GE(status, 0) << "Failed to get dataset info for " << dataset_name_;
CHECK_EQ(class_, H5T_FLOAT) << "Expected float or double data";
vector blob_dims(dims.size());
for (int i = 0; i < dims.size(); ++i) {
blob_dims[i] = dims[i];
}
blob->Reshape(blob_dims);
}
template <>
void hdf5_load_nd_dataset(hid_t file_id, const char* dataset_name_,
int min_dim, int max_dim, Blob* blob) {
hdf5_load_nd_dataset_helper(file_id, dataset_name_, min_dim, max_dim,
blob);
herr_t status = H5LTread_dataset_float(
file_id, dataset_name_, blob->mutable_cpu_data());
CHECK_GE(status, 0) << "Failed to read float dataset " << dataset_name_;
}
template <>
void hdf5_load_nd_dataset(hid_t file_id, const char* dataset_name_,
int min_dim, int max_dim, Blob* blob) {
hdf5_load_nd_dataset_helper(file_id, dataset_name_, min_dim, max_dim,
blob);
herr_t status = H5LTread_dataset_double(
file_id, dataset_name_, blob->mutable_cpu_data());
CHECK_GE(status, 0) << "Failed to read double dataset " << dataset_name_;
}
template <>
void hdf5_save_nd_dataset(
const hid_t file_id, const string dataset_name, const Blob& blob) {
hsize_t dims[HDF5_NUM_DIMS];
dims[0] = blob.num();
dims[1] = blob.channels();
dims[2] = blob.height();
dims[3] = blob.width();
herr_t status = H5LTmake_dataset_float(
file_id, dataset_name.c_str(), HDF5_NUM_DIMS, dims, blob.cpu_data());
CHECK_GE(status, 0) << "Failed to make float dataset " << dataset_name;
}
template <>
void hdf5_save_nd_dataset(
const hid_t file_id, const string dataset_name, const Blob& blob) {
hsize_t dims[HDF5_NUM_DIMS];
dims[0] = blob.num();
dims[1] = blob.channels();
dims[2] = blob.height();
dims[3] = blob.width();
herr_t status = H5LTmake_dataset_double(
file_id, dataset_name.c_str(), HDF5_NUM_DIMS, dims, blob.cpu_data());
CHECK_GE(status, 0) << "Failed to make double dataset " << dataset_name;
}
} // namespace caffe

<\addr>


Reply to this email directly or view it on GitHub
#1486 (comment).

@souzou
Copy link

souzou commented Dec 3, 2014

Hello ,
I download the caffe-tensor-blob repository writed by jeffdonahue ( I try to done this changes with the caffe-master repository but many error of compilation) , So I compile it with :

make all ---> success
make test ---> success
make runtest ---> failed

AdaGradSolverTest/2.TestAdaGradLeastSquaresUpdateWithWeightDecay
F1203 17:56:11.812132 3091 euclidean_loss_layer.cpp:14] Check failed: bottom[0]->shape() == bottom[1]->shape()
* Check failure stack trace: ***
@ 0x2ba445ceddaa (unknown)
@ 0x2ba445cedce4 (unknown)
@ 0x2ba445ced6e6 (unknown)
@ 0x2ba445cf0687 (unknown)
@ 0x7b7284 caffe::EuclideanLossLayer<>::Reshape()
@ 0x7e6731 caffe::Net<>::Init()
@ 0x7e839e caffe::Net<>::Net()
@ 0x75c550 caffe::Solver<>::InitTrainNet()
@ 0x75d806 caffe::Solver<>::Init()
@ 0x75d966 caffe::Solver<>::Solver()
@ 0x571726 caffe::AdaGradSolverTest<>::InitSolver()
@ 0x571b6b caffe::GradientBasedSolverTest<>::InitSolverFromProtoString()
@ 0x566a49 caffe::GradientBasedSolverTest<>::RunLeastSquaresSolver()
@ 0x56bbd6 caffe::AdaGradSolverTest_TestAdaGradLeastSquaresUpdateWithWeightDecay_Test<>::TestBody()
@ 0x6fafd3 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x6f1cb7 testing::Test::Run()
@ 0x6f1d5e testing::TestInfo::Run()
@ 0x6f1e65 testing::TestCase::Run()
@ 0x6f51a8 testing::internal::UnitTestImpl::RunAllTests()
@ 0x6f5437 testing::UnitTest::Run()
@ 0x42186a main
@ 0x2ba448b0eec5 (unknown)
@ 0x42a4de (unknown)
@ (nil) (unknown)
Aborted
make: * [runtest] Erreur 134
----------------
***---------------------------------

If any person have explication, ???????????????

THX,

@souzou
Copy link

souzou commented Dec 3, 2014

Any documentation of how to use the tensor-blob ???
How to create the data layer, ??
Any specification of the text file that contains the path of image and the labels??

@jeffdonahue jeffdonahue force-pushed the tensor-blob branch 3 times, most recently from 73e15bc to 12e8c11 Compare December 3, 2014 23:47
@souzou
Copy link

souzou commented Dec 15, 2014

Caffe can read 3 formats (as far as I know):

  • leveldb
  • hdf5

The easiest way is to store the image somewhere as a JPG (maybe another image format) and then creating two text files (trainingset.txt, testset.txt) in the following format:

PATH_TO_RGB_IMAGE PATH_TO_Gray_IMAGE LABEL
PATH_TO_RGB_IMAGE PATH_TO_Gray_IMAGE LABEL
PATH_TO_RGB_IMAGE PATH_TO_Gray_IMAGE LABEL
...

where LABEL is a number. For me only a relative path from the solver-file to the image file works in PATH_TO_IMAGE.

I try to use the tool (in the tool directory) convert_imageset to create a leveldb or a lmdb. These db files don't support N-D blob that you have to write into your net-layout-file.
My question is what change I should do to create hdf5 file because only this type support N-D blob and if they are any documentaion about HDF5 .

THX;

num/channnels/height/width indexing is valid.
from saved NetParameter

Want to keep the param Blob shape the layer has set, and not necessarily
adopt the one from the saved net (e.g. want to keep new 1D bias shape,
rather than take the (1 x 1 x 1 x D) shape from a legacy net).
@dzhwinter
Copy link

@jeffdonahue @longjon how to make convolution based on N-D array in .more than 4 axes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.