Messed up Debugger using Scrapy #238

oli06 · 2020-05-12T08:33:20Z

Environment data

VS Code version: 1.45.0
Extension version (available under the Extensions sidebar): 2020.4.76186
OS and version: macOS Catalina 10.15.4
Python version (& distribution if applicable, e.g. Anaconda): 3.7.6
Type of virtual environment used (N/A | venv | virtualenv | conda | ...): N/A
Relevant/affected Python packages and their versions: Scrapy 1.6.0
Relevant/affected Python-related VS Code extensions and their versions: XXX
Jedi or Language Server? (i.e. what is "python.jediEnabled" set to; more info How to update the language server to the latest stable version vscode-python#3977): XXX
Value of the python.languageServer setting: Microsoft Python Language Server version 0.5.45.0

Expected behaviour

Debug as usual.

Actual behaviour

Debugger is messed up: Running lines of code one or two steps later or crash the python interpreter with errors that make no sense at all.

The interesting thing is that if i use print("foo") the debugger works fine again. If I remove the print() the debugger messes up again.
I´m not 100 per cent sure that this issue is vscode related but maybe my wrong implementation of code.

Steps to reproduce:

Create a new Scrapy Spider
Add 'https://www.tagesschau.de/wirtschaft/coronavirus-fleischbetrieb-103.html' to the urls list.
Add code below to the parse method

def parse(self, response):
        url = response.request.url

        named_references = {}
        text = ""

        text_div = content.css('div.storywrapper div.sectionZ div.con div.modCon div.modParagraph')
        for tag in text_div:
            nodes = tag.xpath('./node()')
            nodes_name = nodes.xpath('name()').get()

            for node in nodes:
                child_nodes = node.xpath('./node()')                
                for c in child_nodes:


#print('debugger works again, if i uncomment this')
#most of the time node_name is executed one line later than it should

                    node_name = c.xpath('name()').get()
                    if node_name == 'a':
                        href = c.xpath('@href').get() #relative href
#if the debugger / python interpreter crashes totally, it happens here
                       named_references[c.xpath('text()').get().strip('\n') if c.xpath('text()').get() is not None else 'unknown'] = href #save the link in the dict, key is the text (if it exists)
                        yield response.follow(c, callback=self.parseArticle)
                    elif node_name == None:
                        text += c.get().strip('\n')

        yield {'url': url, 'named_references': named_references, 'text': text}

Run Spider with scrapy crawl __spidername__

Logs

Here is an example of the messed up debugger:

for x in article_item['named_references']: #line 113, setting x to a key
    ref_url = article_item['named_references'][x]            
    yield response.follow(article_item['named_references'][x], callback=self.parseArticle) #this is line 115

File ".../spiders/tagesschau_spider.py", line 115, in parseArticle yield response.follow(article_item['named_references'][x], callback=self.parseArticle) UnboundLocalError: local variable 'x' referenced before assignment

The text was updated successfully, but these errors were encountered:

int19h · 2020-06-16T23:30:19Z

Where does the second code snippet come from? (or where is it supposed to be placed for the repro?)

oli06 · 2020-06-18T17:29:38Z

It was written by me.
named_references is a dictionary containing hyperlinks from articles:

named_references = { "The german football Bundesliga": "https://www.bundesliga.com/de/bundesliga", "German champion 2020 is Bayern Munich": "https://www.faz.net/aktuell/sport/fussball/bundesliga/fussball-bundesliga-bayern-muenchen-ist-deutscher-meister-2020-16809640.html"}

I then iterate over each key-value pair and crawl the url listed in each key-value pair.

int19h · 2020-09-23T00:35:41Z

I believe this was the same as #348, and is now fixed. Please re-open if it still repros with the most recent version of the debugger.

karthiknadig transferred this issue from microsoft/vscode-python May 12, 2020

int19h added the bug Something isn't working label Jun 16, 2020

int19h closed this as completed Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Messed up Debugger using Scrapy #238

Messed up Debugger using Scrapy #238

oli06 commented May 12, 2020

int19h commented Jun 16, 2020

oli06 commented Jun 18, 2020

int19h commented Sep 23, 2020

Messed up Debugger using Scrapy #238

Messed up Debugger using Scrapy #238

Comments

oli06 commented May 12, 2020

Environment data

Expected behaviour

Actual behaviour

Steps to reproduce:

Logs

int19h commented Jun 16, 2020

oli06 commented Jun 18, 2020

int19h commented Sep 23, 2020