SQML-alpha repository
It will be the small module for python, which can help you to parse html data.
You can do than without regexp and other brainfucking things.
Only SQL-like syntax. Nothing more ;)
It is very simple! Just look in test.py file ;)
Okay, let's start.
Firstly you need to initialize the html_parser module:
>>>import html_parser
>>>A = html_parser.Parser()
>>>
Next, you should load your webpage or html file:
>>>page = urllib.urlopen('http://habrahabr.ru/').read()
>>>
And then, load it in the module:
>>>A.load(page)
>>>
Ok! And now you can make SQL-like requests to this source, and parse it:
>>>A.query("SELECT @class FROM a WHERE @class=='hub '")
>>>A.value
{'class': ['hub ']}
>>>A.value['class']
['hub ']
>>>A.result
[['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '],
['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub '], ['hub ']]
>>>
SELECT @@content FROM title // Get the text from "title" tag
SELECT @href FROM a // Get the "href" value from "a" tag
SELECT @@content, @href FROM a WHERE @class=='test-class' // Get text and "href" value from tag a, where class value is "test-class"
SELECT @id FROM input WHERE @maxlength>5 // Get "id" value from "input" tag, where "maxlength" value more than 5
SELECT @class FROM div WHERE len(@id)!=0 // Yes, you can use python functions for string in conditionals
You can ask me, or cyberguru007 anything about this project!