Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for Stata 14 and 15 #2301 #4708

Merged
merged 55 commits into from
Jul 10, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
9a923f5
assert that Stata 13 ingests but 14 and 15 do not #2301
pdurbin May 23, 2018
079a59a
add highly experimental Stata 14 support #2301
pdurbin May 25, 2018
2999548
don't execute ingest API tests on build #2301
pdurbin May 25, 2018
254047b
handle Stata 14 files with labels #2301
pdurbin May 25, 2018
82275f4
add simple Stata 14 (format 118) for testing #2301
pdurbin May 29, 2018
cb26fea
add preliminary support for Stata 15 #2301
pdurbin May 29, 2018
7333dbc
add JUnit tests for Stata 14 (format 118) #2301
pdurbin May 30, 2018
10b5594
add more JUnit tests for various Stata formats #2301
pdurbin May 30, 2018
95207d1
Added basic Stata 13 strl test #2301
matthew-a-dunlap May 30, 2018
4145d10
typo fix #2301
matthew-a-dunlap May 30, 2018
9db1031
Added stata 14 strl test (currently broken and ignored) #2301
matthew-a-dunlap May 30, 2018
1994e8f
add TODO for small Stata 13 file, reformat #2301
pdurbin May 31, 2018
97a7853
replace non-zero offset hack with less terrible hack #2301
pdurbin May 31, 2018
06ef690
add tests for Stata 14 files from Brooke #2301
pdurbin May 31, 2018
b64d51e
StrLs parse correctly and tab file is valid #2301
matthew-a-dunlap May 31, 2018
baa3c7f
add TODO for "Checking section tags across byte buffers" #2301
pdurbin Jun 1, 2018
cedba6c
report on failed ingest for Stata 13 #2301
pdurbin Jun 1, 2018
6d8a125
Merge branch 'develop' into 2301-stata #2301
pdurbin Jun 4, 2018
ce37d68
A quick fix for the unsorted val. label offset table. (#2301)
landreev Jun 5, 2018
5250ed8
update Stata 13 offset set following ce37d68 #2301
pdurbin Jun 5, 2018
1d86783
Merge branch 'develop' into 2301-stata #2301
pdurbin Jun 5, 2018
4ef92fb
Stata 14 unsorted val. label offset fix port #2301
matthew-a-dunlap Jun 5, 2018
e377f19
Added Stata 14 bulk file IT Test iterator #2301
matthew-a-dunlap Jun 5, 2018
fdc6a95
Removal of unused bulk test lines #2301
matthew-a-dunlap Jun 5, 2018
3acfdb3
Fixed attempt to read 0 bytes from Stata value_label_table #2301
matthew-a-dunlap Jun 5, 2018
fb36139
Changes to the Stata 13 File reader.
benjamin-martinez Jun 5, 2018
8cda4bb
dta117 test class
benjamin-martinez Jun 5, 2018
c5bb1d6
Merge branch '2301-stata' of github.com:IQSS/dataverse into 2301-stata
benjamin-martinez Jun 5, 2018
03c1f4d
added the 2 lines that keep the buffer byte offset accurate before bu…
landreev Jun 6, 2018
a53048f
Merge branch '2301-stata' of https://github.com/IQSS/dataverse into 2…
landreev Jun 6, 2018
a772d65
get Stata 13 tests passing, add TODO #2301
pdurbin Jun 6, 2018
1a71a5a
better name for characteristics test, added TODO #2301
pdurbin Jun 6, 2018
d82d634
silenced the debugging messages in bufferMoreBytes() and checkTag() (…
landreev Jun 6, 2018
db0aa79
Merge branch '2301-stata' of https://github.com/IQSS/dataverse into 2…
landreev Jun 6, 2018
645fc11
Removed DTA117-119 replaced with NewDTA which does the job of all 3
oscardssmith Jun 8, 2018
ac9e372
note regression for testCharacteristics, reformat #2301
pdurbin Jun 8, 2018
ea6fd1b
incorperated ben"s fixes to read 0
oscardssmith Jun 8, 2018
80af14c
fixed merge
oscardssmith Jun 8, 2018
9e7c40b
Stata 14 Test File
benjamin-martinez Jun 8, 2018
601ce7f
deleted vestigal files
oscardssmith Jun 8, 2018
8bbb354
fixed the test file, oops
oscardssmith Jun 8, 2018
3c01b8a
Merge branch '2301-stata' of github.com:IQSS/dataverse into 2301-stata
benjamin-martinez Jun 8, 2018
2d2f8c1
logging back to fine, fixed fractional millis
oscardssmith Jun 11, 2018
08b72e1
fractional dates now working
oscardssmith Jun 11, 2018
a921577
minor cosmetic fixes to NewDTAFileReader, minor functional fixes to D…
oscardssmith Jun 12, 2018
5ca4ee7
changes to IngestServiceBean to prevent opening useless StorageIO (al…
oscardssmith Jun 18, 2018
84c0afc
code cleanup (mainly involving number parsing)
oscardssmith Jun 19, 2018
e36f105
cleaned up tests
oscardssmith Jun 20, 2018
bfeb54a
Removed and/or cleaned up lots of old comments; multiple TODOs that a…
landreev Jun 22, 2018
5e06ac0
put the context root path back into glassfish-web.xml
landreev Jun 25, 2018
03cb71b
fixed readUInt, fixed induced error where the incorrect behavior was …
oscardssmith Jun 25, 2018
bb976c9
Merge branch '2301-stata' of https://github.com/IQSS/dataverse into 2…
oscardssmith Jun 25, 2018
daf9b46
removed unused code and added a test to DataReader
oscardssmith Jun 25, 2018
730d74a
Added one last clarifying comment, about an empty string in a string …
landreev Jun 26, 2018
1b222ef
Merge branch 'develop' into 2301-stata #2301
pdurbin Jun 28, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added downloads/stata-13-test-files/Stata14TestFile.dta
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added scripts/search/data/tabular/stata13-auto.dta
Binary file not shown.
Binary file not shown.
2 changes: 2 additions & 0 deletions src/main/java/MimeTypeDisplay.properties
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ application/x-R-2=R Binary
application/x-stata=Stata Binary
application/x-stata-6=Stata Binary
application/x-stata-13=Stata 13 Binary
application/x-stata-14=Stata 14 Binary
application/x-stata-14=Stata 15 Binary
text/x-stata-syntax=Stata Syntax
application/x-spss-por=SPSS Portable
application/x-spss-sav=SPSS SAV
Expand Down
2 changes: 2 additions & 0 deletions src/main/java/MimeTypeFacets.properties
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ application/x-R-2=data
application/x-stata=data
application/x-stata-6=data
application/x-stata-13=data
application/x-stata-14=data
application/x-stata-15=data
text/x-stata-syntax=data
application/x-spss-por=data
application/x-spss-sav=data
Expand Down
5 changes: 4 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/TestIngest.java
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import edu.harvard.iq.dataverse.DataFile;
import edu.harvard.iq.dataverse.DataTable;
import edu.harvard.iq.dataverse.Dataset;
import edu.harvard.iq.dataverse.DatasetServiceBean;
import edu.harvard.iq.dataverse.FileMetadata;
import edu.harvard.iq.dataverse.ingest.IngestServiceBean;
Expand Down Expand Up @@ -101,7 +102,7 @@ public String datafile(@QueryParam("fileName") String fileName, @QueryParam("fil
try {
tabDataIngest = ingestPlugin.read(fileInputStream, null);
} catch (IOException ingestEx) {
output = output.concat("Caught an exception trying to ingest file "+fileName+".");
output = output.concat("Caught an exception trying to ingest file " + fileName + ": " + ingestEx.getLocalizedMessage());
return output;
}

Expand All @@ -121,6 +122,8 @@ public String datafile(@QueryParam("fileName") String fileName, @QueryParam("fil

DataFile dataFile = new DataFile();
dataFile.setStorageIdentifier(tabFilename);
Dataset dataset = new Dataset();
dataFile.setOwner(dataset);

FileMetadata fileMetadata = new FileMetadata();
fileMetadata.setLabel(fileName);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ private static String generateOriginalExtension(String fileType) {
return ".sav";
} else if (fileType.equalsIgnoreCase("application/x-spss-por")) {
return ".por";
} else if (fileType.equalsIgnoreCase("application/x-stata") || fileType.equalsIgnoreCase("application/x-stata-13")) {
} else if (fileType.equalsIgnoreCase("application/x-stata") || fileType.equalsIgnoreCase("application/x-stata-13") || fileType.equalsIgnoreCase("application/x-stata-14") || fileType.equalsIgnoreCase("application/x-stata-15")) {
return ".dta";
} else if (fileType.equalsIgnoreCase("application/x-dvn-csvspss-zip")) {
return ".zip";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ public void onMessage(Message message) {
//Thread.sleep(10000);
logger.fine("Finished ingest job;");
} else {
logger.warning("Error occurred during ingest job!");
logger.warning("Error occurred during ingest job for file id " + datafile_id + "!");
}
} catch (Exception ex) {
//ex.printStackTrace();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
import edu.harvard.iq.dataverse.ingest.tabulardata.TabularDataFileReader;
import edu.harvard.iq.dataverse.ingest.tabulardata.TabularDataIngest;
import edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.dta.DTAFileReader;
import edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.dta.DTA117FileReader;
import edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.dta.NewDTAFileReader;
import edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.dta.DTAFileReaderSpi;
import edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.rdata.RDATAFileReader;
import edu.harvard.iq.dataverse.ingest.tabulardata.impl.plugins.rdata.RDATAFileReaderSpi;
Expand Down Expand Up @@ -548,9 +548,6 @@ public void produceContinuousSummaryStatistics(DataFile dataFile, File generated
if (dataFile.getDataTable().getDataVariables().get(i).isIntervalContinuous()) {
logger.fine("subsetting continuous vector");

StorageIO<DataFile> storageIO = dataFile.getStorageIO();
storageIO.open();

if ("float".equals(dataFile.getDataTable().getDataVariables().get(i).getFormat())) {
Float[] variableVector = TabularSubsetGenerator.subsetFloatVector(new FileInputStream(generatedTabularFile), i, dataFile.getDataTable().getCaseQuantity().intValue());
logger.fine("Calculating summary statistics on a Float vector;");
Expand Down Expand Up @@ -582,9 +579,6 @@ public void produceDiscreteNumericSummaryStatistics(DataFile dataFile, File gene
&& dataFile.getDataTable().getDataVariables().get(i).isTypeNumeric()) {
logger.fine("subsetting discrete-numeric vector");

StorageIO<DataFile> storageIO = dataFile.getStorageIO();
storageIO.open();

Long[] variableVector = TabularSubsetGenerator.subsetLongVector(new FileInputStream(generatedTabularFile), i, dataFile.getDataTable().getCaseQuantity().intValue());
// We are discussing calculating the same summary stats for
// all numerics (the same kind of sumstats that we've been calculating
Expand Down Expand Up @@ -618,9 +612,6 @@ public void produceCharacterSummaryStatistics(DataFile dataFile, File generatedT
for (int i = 0; i < dataFile.getDataTable().getVarQuantity(); i++) {
if (dataFile.getDataTable().getDataVariables().get(i).isTypeCharacter()) {

StorageIO<DataFile> storageIO = dataFile.getStorageIO();
storageIO.open();

logger.fine("subsetting character vector");
String[] variableVector = TabularSubsetGenerator.subsetStringVector(new FileInputStream(generatedTabularFile), i, dataFile.getDataTable().getCaseQuantity().intValue());
//calculateCharacterSummaryStatistics(dataFile, i, variableVector);
Expand Down Expand Up @@ -678,6 +669,7 @@ public boolean ingestAsTabular(Long datafile_id) { //DataFile dataFile) throws I
// it up with the Ingest Service Provider Registry:
String fileName = dataFile.getFileMetadata().getLabel();
TabularDataFileReader ingestPlugin = getTabDataReaderByMimeType(dataFile.getContentType());
logger.fine("Using ingest plugin " + ingestPlugin.getClass());

if (ingestPlugin == null) {
dataFile.SetIngestProblem();
Expand Down Expand Up @@ -742,7 +734,7 @@ public boolean ingestAsTabular(Long datafile_id) { //DataFile dataFile) throws I
dataFile = fileService.save(dataFile);

dataFile = fileService.save(dataFile);
logger.fine("Ingest failure (IO Exception): "+ingestEx.getMessage()+ ".");
logger.warning("Ingest failure (IO Exception): " + ingestEx.getMessage() + ".");
return false;
} catch (Exception unknownEx) {
// this is a bit of a kludge, to make sure no unknown exceptions are
Expand Down Expand Up @@ -804,6 +796,7 @@ public boolean ingestAsTabular(Long datafile_id) { //DataFile dataFile) throws I
}

if (!postIngestTasksSuccessful) {
logger.warning("Ingest failure (!postIngestTasksSuccessful).");
return false;
}

Expand Down Expand Up @@ -850,6 +843,7 @@ public boolean ingestAsTabular(Long datafile_id) { //DataFile dataFile) throws I
}

if (!databaseSaveSuccessful) {
logger.warning("Ingest failure (!databaseSaveSuccessful).");
return false;
}

Expand Down Expand Up @@ -900,6 +894,7 @@ public boolean ingestAsTabular(Long datafile_id) { //DataFile dataFile) throws I
logger.warning("Ingest failed to produce data obect.");
}

logger.fine("Returning ingestSuccessful: " + ingestSuccessful);
return ingestSuccessful;
}

Expand Down Expand Up @@ -952,7 +947,11 @@ public static TabularDataFileReader getTabDataReaderByMimeType(String mimeType)
if (mimeType.equals(FileUtil.MIME_TYPE_STATA)) {
ingestPlugin = new DTAFileReader(new DTAFileReaderSpi());
} else if (mimeType.equals(FileUtil.MIME_TYPE_STATA13)) {
ingestPlugin = new DTA117FileReader(new DTAFileReaderSpi());
ingestPlugin = new NewDTAFileReader(new DTAFileReaderSpi(), 117);
} else if (mimeType.equals(FileUtil.MIME_TYPE_STATA14)) {
ingestPlugin = new NewDTAFileReader(new DTAFileReaderSpi(), 118);
} else if (mimeType.equals(FileUtil.MIME_TYPE_STATA15)) {
ingestPlugin = new NewDTAFileReader(new DTAFileReaderSpi(), 119);
} else if (mimeType.equals(FileUtil.MIME_TYPE_RDATA)) {
ingestPlugin = new RDATAFileReader(new RDATAFileReaderSpi());
} else if (mimeType.equals(FileUtil.MIME_TYPE_CSV) || mimeType.equals(FileUtil.MIME_TYPE_CSV_ALT)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ public class IngestableDataChecker implements java.io.Serializable {
// Map that returns a Stata Release number
private static Map<Byte, String> stataReleaseNumber = new HashMap<Byte, String>();
public static String STATA_13_HEADER = "<stata_dta><header><release>117</release>";
public static String STATA_14_HEADER = "<stata_dta><header><release>118</release>";
public static String STATA_15_HEADER = "<stata_dta><header><release>119</release>";
// Map that returns a reader-implemented mime-type
private static Set<String> readableFileTypes = new HashSet<String>();
private static Map<String, Method> testMethods = new HashMap<String, Method>();
Expand Down Expand Up @@ -91,6 +93,8 @@ public class IngestableDataChecker implements java.io.Serializable {
readableFileTypes.add("application/x-spss-por");
readableFileTypes.add("application/x-rlang-transport");
readableFileTypes.add("application/x-stata-13");
readableFileTypes.add("application/x-stata-14");
readableFileTypes.add("application/x-stata-15");

Pattern p = Pattern.compile(regex);
ptn = Pattern.compile(rdargx);
Expand Down Expand Up @@ -259,7 +263,45 @@ public String testDTAformat(MappedByteBuffer buff) {
}

}


if ((result == null) && (buff.capacity() >= STATA_14_HEADER.length())) {
// Let's see if it's a "new" STATA (v.14+) format:
buff.rewind();
byte[] headerBuffer = null;
String headerString = null;
try {
headerBuffer = new byte[STATA_14_HEADER.length()];
buff.get(headerBuffer, 0, STATA_14_HEADER.length());
headerString = new String(headerBuffer, "US-ASCII");
} catch (Exception ex) {
// probably a buffer underflow exception;
// we don't have to do anything... null will
// be returned, below.
}
if (STATA_14_HEADER.equals(headerString)) {
result = "application/x-stata-14";
}
}

if ((result == null) && (buff.capacity() >= STATA_15_HEADER.length())) {
// Let's see if it's a "new" STATA (v.14+) format:
buff.rewind();
byte[] headerBuffer = null;
String headerString = null;
try {
headerBuffer = new byte[STATA_15_HEADER.length()];
buff.get(headerBuffer, 0, STATA_15_HEADER.length());
headerString = new String(headerBuffer, "US-ASCII");
} catch (Exception ex) {
// probably a buffer underflow exception;
// we don't have to do anything... null will
// be returned, below.
}
if (STATA_15_HEADER.equals(headerString)) {
result = "application/x-stata-15";
}
}

return result;
}

Expand Down
Loading