#Quick python test
Please use python 3+
Implementation can be found inwordcounter.py
For example, run by invoking
python wordcounter.py -s wordcounter_test.txt
where source (-s) is file from which to read words.
python wordcounter.py --h
gives you stratup options. Together with -s
parameter it is possible to define -o
parameter to specify file to which result will be written, for example python wordcounter.py -s wordcounter_test.txt -o out.txt
.
Reading of unicode files and words is not properly tested and probably should be improved.
In terms of separation of concerns wordcounter.py could be improved so that reading words is separated from words processing. This would make more testable code, which could then plug-in different data source. But this would mean looping once more through list. Since for this specifical case input is expected to be file, I've chosen to optimise for speed.
Implementation can be found inpcleaner.py
It is implemented as specified: clens only paragraph containing only white spaces.
Paragraphs with attributres (ex: <p style="color: red">) containing only white spaces are cleared also.
See comments in method clean() for different cleaning policies (if you want to clean paragraphs containing tabs and newlines also, for example)For example, run by invoking
python pcleaner.py -s pcleaner_test.html
where source (-s) can be file or http / https URL .
python pcleaner.py --h
gives you stratup options. Together with -s
parameter it is possible to define -o
parameter to specify file to which result will be written, for example python pcleaner.py -s http://www.google.com -o out.html
.
test_pcleaner.py
contains some common test cases
Handling of source(file, URL) data coding can be improved further. It can give codec errors if data coding is not specified in response header (if datacoding header is not specified expects utf-8 content as input).
Implementation can be found incache_decorator.py
Cache expiry time and number of hits are hardcoded as requested by task, but decorator can be changed in order to pass those two parameters as arguments.
Time triggered eviction is done by one timer thread for every cached item, what simplifies the code, but may not be appropriate for large number of cached items. Anyway, evicting items after timer expires releases memory in case when some cached items are rarely hit. For real world scenario time based cache eviction policy can be rewritten differently.
To test eviction based on number of times function was invoked, run test test_cache_decorator.py
For time based eviction test scenarios, run test test_cache_decorator_time.py
. The reason behind separating time based eviction to different test case is that it takes 10 minutes to run.
If you want to see more detailed log messages, change log configuration in cache_decorator.py
to logging.basicConfig(level=logging.DEBUG)