Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify checksum after download #73

Closed
havardgulldahl opened this issue Jan 17, 2016 · 8 comments
Closed

Verify checksum after download #73

havardgulldahl opened this issue Jan 17, 2016 · 8 comments

Comments

@havardgulldahl
Copy link
Owner

Via private email, the suggestion was raised that our tools could automatically verify the content after download, by comparing the md5 checksum from jottacloud and that of the newly downloaded file.

Sounds like a nice command line option for download() in cli.py.

This would be a nice way to get the know the codebase for beginners.

@antonhagg
Copy link

Without previous knowledge of Python and without a working installation this is what I came up with.

def download(argv=None):
    if argv is None:
        argv = sys.argv[1:]
    parser = argparse.ArgumentParser(description='Download a file from Jottacloud.')
    parser.add_argument('remotefile', help='The path to the file that you want to download')
    parser.add_argument('-l', '--loglevel', help='Logging level. Default: %(default)s.',
        choices=('debug', 'info', 'warning', 'error'), default='warning')
    parser.add_argument('-c', '--checksum', help='Verfy checksum of file after download')
    args = parse_args_and_apply_logging_level(parser, argv)
    jfs = JFS.JFS()
    root_folder = get_root_dir(jfs)
    path_to_object = posixpath.join(root_folder.path, args.remotefile)
    remote_file = jfs.getObject(path_to_object)
    total_size = remote_file.size
    with open(remote_file.name, 'wb') as fh:
        bytes_read = 0
        with ProgressBar(expected_size=total_size) as bar:
            for chunk_num, chunk in enumerate(remote_file.stream()):
                fh.write(chunk)
                bytes_read += len(chunk)
                bar.show(bytes_read)
    if args.checksum:
        md5 = JFS.calculate_md5(data)
        if md5 != JFSFile.md5:
            print ('''MD5 hashes don't match!''')
            answer = input('Continue: [y/n]')
            if not answer or answer[0].lower() != 'y':
            print('%s was not downloaded successfully' % args.remotefile')
            exit(1)
    print('%s downloaded successfully' % args.remotefile)

@havardgulldahl
Copy link
Owner Author

That's not that bad for something you wrote without knowing the language.

But you'll see some issues once you get your installation running (get it straight from github). So get that going, and then keep on coding :)

Here's some things I see immediately

  1. You need to calculate the md5 of the file, not data.
  2. You don't want to end up with a input() prompt. A red error message (look at clint) and exit(1) is enough.
  3. Take a look at the argparse.ArgumentParser docs and see how you can use store_true to actually get True or False for free from argparse

@havardgulldahl havardgulldahl added this to the 0.5 milestone Jan 22, 2016
@antonhagg
Copy link

So there are some progress, but got stuck at an error which I couldn't figure out how to fix (commenting out the for loop in calculate_md5 removes the error).

WARNING:py.warnings:c:\python27\lib\site-packages\jottalib\JFS.py:92: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  for data in iter(lambda: fileobject.read(size), u''):

Traceback (most recent call last):
  File "C:\Python27\Scripts\jotta-download-script.py", line 9, in <module>
    load_entry_point('jottalib==0.4.post1', 'console_scripts', 'jotta-download')()
  File "c:\python27\lib\site-packages\jottalib\cli.py", line 240, in download
    md5_lf = JFS.calculate_md5(lf)
  File "c:\python27\lib\site-packages\jottalib\JFS.py", line 93, in calculate_md5
    md5.update(data.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal not in range(128)

I'm not sure that the way I am accessing the local file are the best. I'm also struggling getting the md5 property from the remote file. Would be nice with a hint in the right direction. =)

if args.checksum:
   with open(remote_file.name) as lf:
       md5_lf = JFS.calculate_md5(lf)
       md5_jf = JFS.JFSFile.md5
   if md5_lf != md5_jf:

@antonhagg
Copy link

I think the first error is related to issue #79. Will see if the result there fixes the issue.
Anyone that can give me a hand with getting the md5 from the remote file?

@havardgulldahl
Copy link
Owner Author

After you jfs.getObject(/path/to/file) and get a JFSFile object, look at JFSFile.md5, and in this case remote_file is already there, so:

md5_lf = JFS.calculate_md5(open(remote_file.name)) # because we've downloaded the file to remote_file.name
md5_jf = remote_file.md5

And take it from there. 👍

@antonhagg
Copy link

I've been trying to get it to work but the checksum doesn't seem to be correct. Not sure this is a issue that's related to using it under windows or not but. Anyway, below is the code in Cli.py.

    with open(remote_file.name, 'wb') as fh:
        bytes_read = 0
        with ProgressBar(expected_size=total_size) as bar:
            for chunk_num, chunk in enumerate(remote_file.stream()):
                fh.write(chunk)
                bytes_read += len(chunk)
                bar.show(bytes_read)
        #if args.checksum:
        md5_lf = JFS.calculate_md5(open(remote_file.name, 'rb')) #opening in binary mode
        md5_jf = remote_file.md5
        print md5_lf
        print md5_jf
    print('%s downloaded successfully' % args.remotefile) 

The checksum i get is:

C:\Users\XX>jotta-download jottacloud.pdf
[################################] 219340/219340 - 00:00:00
f8ceede2a2ac0c52f3e3bbeb25d3fa68
9fff650be9fd5a05d531730e4350af51
jottacloud.pdf downloaded successfully

Checking the file in an external md5 checker (http://onlinemd5.com/) gives the value of:
9FFF650BE9FD5A05D531730E4350AF51

Also doing a print data in JFS.py seems that it is missing out on the last rows. Have tried to figure out why this is but haven't found anything.

File content when opened in notepad:

obj
<</Length 3911/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
            xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <xmp:ModifyDate>2013-03-28T12:13:18+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2013-03-28T12:13:17+01:00</xmp:CreateDate>
         <xmp:MetadataDate>2013-03-28T12:13:18+01:00</xmp:MetadataDate>
         <xmp:CreatorTool>Acrobat PDFMaker 11 for Word</xmp:CreatorTool>
         <xmpMM:DocumentID>uuid:b8e0d258-8375-49f3-8e23-f7de68210a4d</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:9625165b-c271-4ea6-9002-fea7e8500cf4</xmpMM:InstanceID>
         <xmpMM:subject>
            <rdf:Seq>
               <rdf:li>50</rdf:li>
            </rdf:Seq>
         </xmpMM:subject>
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:title>
         <dc:description>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:description>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>roland</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <pdf:Producer>Adobe PDF Library 11.0</pdf:Producer>
         <pdf:Keywords/>
         <pdfx:SourceModified>D:20130328111211</pdfx:SourceModified>
         <pdfx:Company/>
         <pdfx:Comments/>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>





















<?xpacket end="w"?>
endstream
endobj
20 0 obj
<</Filter/FlateDecode/First 6/Length 58/N 1/Type/ObjStm>>stream
hÞ240V0P°±ÑwÎ/Í+Q0Ö÷ÎL)Ž640�Š�)�‚I���«�RY�ª��˜žZlg��` ~í�m
endstream
endobj
21 0 obj
<</Filter/FlateDecode/First 6/Length 184/N 1/Type/ObjStm>>stream
hÞlÍA�‚@�†á¿²7• w4("�Iº”��t^݉¶Ô‰i%ü÷��Ñ¡Û{øx>Ð3¥Õj�罿�‡L�él�¯©m±ó�ð�wÓ
c1�ï¨+ŒÇ°X&�R�&�H …ùDC uðY   •×L•ñj_lJsCV êL¬NÄr°Åá)1�”dÿ‰‹¯¸g²}�BZªp�ÕÎUlx�sª£ø@=×(Ž;;´¿���2è«+Ö^ÎŽÎ7�FYö�` ¯dIÍ
endstream
endobj
22 0 obj
<</DecodeParms<</Columns 5/Predictor 12>>/Filter/FlateDecode/ID[<D73E8F7CBFE3364DAC1DA07F06F81058><465B159AE6F857409B69D6D4AB883CAB>]/Info 104 0 R/Length 119/Root 106 0 R/Size 105/Type/XRef/W[1 3 1]>>stream
hÞbb �&FƆCL@†?ˆd©�‘<f ’Q�H2þš–µ ‘Ì�ÁâÙ ’Ó�Ìþ &ç�H_°,“%Xå:�^?��›¡�,n�"Ùþ€Hþ©`]ÓÁ¤�Ð
�Wî�«d�“ŒØIÆ?ødGÉÁL2m‡Ä/@€� gõ�ê
endstream
endobj
startxref
116
%%EOF

File content when doing print data in calculate_md5 in the JFS.py file:

obj
<</Length 3911/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
            xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <xmp:ModifyDate>2013-03-28T12:13:18+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2013-03-28T12:13:17+01:00</xmp:CreateDate>
         <xmp:MetadataDate>2013-03-28T12:13:18+01:00</xmp:MetadataDate>
         <xmp:CreatorTool>Acrobat PDFMaker 11 for Word</xmp:CreatorTool>
         <xmpMM:DocumentID>uuid:b8e0d258-8375-49f3-8e23-f7de68210a4d</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:9625165b-c271-4ea6-9002-fea7e8500cf4</xmpMM:InstanceID>
         <xmpMM:subject>
            <rdf:Seq>
               <rdf:li>50</rdf:li>
            </rdf:Seq>
         </xmpMM:subject>
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:title>
         <dc:description>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:description>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>roland</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <pdf:Producer>Adobe PDF Library 11.0</pdf:Producer>
         <pdf:Keywords/>
         <pdfx:SourceModified>D:20130328111211</pdfx:SourceModified>
         <pdfx:Company/>
         <pdfx:Comments/>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>






















@antonhagg
Copy link

Sorry, got it to work!
Was having one indent to much so it was missing out on the last chunk. =)

How do I go forward and suggest the new code (first time I use github)?

@antonhagg antonhagg mentioned this issue Jan 29, 2016
@havardgulldahl havardgulldahl modified the milestones: 0.5, 0.6 Jul 2, 2016
@antonhagg
Copy link

I think I need to rewrite some of the code that was proposed in the version i submitted since there has been quite a lot of changes and fixes since I wrote the code in the first place. Any help is appriciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants