Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic text summarization of results #915

Open
colinmegill opened this issue Mar 20, 2021 · 11 comments
Open

Automatic text summarization of results #915

colinmegill opened this issue Mar 20, 2021 · 11 comments

Comments

@colinmegill
Copy link
Member

colinmegill commented Mar 20, 2021

Thanks to @micahstubbs and @amyxzhang for spurring this.

Interpretability of Polis results has been, and continues to be, a critical issue for Polis as a platform, and a challenge to the usage of the method by various stakeholders and user archetypes. Interpretability is a hard and foundational problem that will require ongoing development: as more advanced analytic methods are added to the system, interpretability by those without data science and statistical methods backgrounds will suffer, and the burden on producing interpretable results on those running conversation will increase.

In a sense, the summary the platform has is the visualization: "here are the groups, what differentiates them, and what unifies them". But the visualization is necessarily limited to a handful of comments, and how these comments are chosen is opaque. We have always thought about interpretability in ‘tiers’ (a simple list, clustering, etc), from a list, to the visualization, to the report, to human generated reports, to news articles summarizing, etc., with various levels of human in the loop. Consider the following examples.

Biodiversity in NZ

  1. The comments chosen procedurally for the visualization for biodiversity: https://www.scoop.co.nz/stories/HL1908/S00014/scoop-hivemind-protecting-and-restoring-biodiversity.htm)
  2. The procedurally generated report https://pol.is/report/r3epuappndxdy7dwtvwpb
  3. And what PEP ultimately delivered to the government (PDF): https://www.scoop.co.nz/stories/PO1911/S00063/biodiversity-hivemind-report-plenty-of-common-ground.htm
  4. Direct link to PDF above: https://img.scoop.co.nz/media/pdfs/1911/Biodiversity_HiveMind_Final_Report_Scoop.pdf
  5. The debrief https://pep.org.nz/2020/12/01/doc-tries-to-restore-e-democracy/

Bowling Green Civic Assembly

  1. The comments chosen procedurally for a the list, a different option than the visualization https://pol.is/9wtchdmmun
  2. The report https://pol.is/report/r2xcn2cdbmrzjmmuuytdk
  3. The final report written by Columbia & University of Kentucky: http://www.civic-assembly.org/bowling-green-report/
  4. The news article from The Bowling Green Daily News https://www.bgdailynews.com/news/first-ever-civic-assembly-gives-residents-chance-to-be-heard/article_0a17254e-a8bb-5f4f-884f-9d0617ab9c08.html

The following image is from a town hall event in Bowling Green, KY, where the entire report was shared with citizens during a town hall, posted, as well as printed and distributed:

image

DEMOS

  1. https://demos.co.uk/project/polis-and-the-political-process/
  2. Direct link to PDF: https://demos.co.uk/wp-content/uploads/2020/08/Polis-the-Political-Process-NEW.pdf

Engage Britain

https://engagebritain.org/your-opinion/results/

These examples provide reasonable references for what has been accomplished with human in the loop summarization tasks and the system as it exists at present.

In each of these cases above, dozens of hours by multiple highly trained professional facilitators, statisticians, journalists, academics and / or data scientists were spent to derive meaning and make obvious and comprehensible ‘what happened.’ Polis as a system produces and surfaces a latent space, but, what it produces is an intermediate representation which really serves as an input to subsequent tasks and decision making. It is worthwhile to continue to bridge ‘the whole picture of what happened’, which Polis definitely assumes as an output, and ‘what will be communicated to busy and / or non technical people who need a quick takeaway’ because it democratizes the method as a whole and demonstrates enhanced ability to make meaning of the space. This will be greatly aided, we anticipate, by #217

It’s possible that Polis will only ever be able to get so far, as it is effectively "platform-itized data science". Any learning in this direction, however, should serve to illuminate the boundaries of what is possible, and may generate ideas for future methods.

It is a worthwhile goal to bring interpretability down to the individual citizen, in the way that a newspaper column might report on a sports match, as this makes the entire exercise more accessible and reduces burdens on reporting out what happened, while increasing confidence.

There is also potential benefit in 'automated conversations': if there is no human in the loop, conversations can be triggered procedurally, say with 3000 randomly selected citizens (procedurally selected), 5 years after a law was passed, to do assessment on whether or not the law had impact, should be revisited, etc. cc https://twitter.com/marcidale or, to automatically generate news stories about a conversation that happened as a result of news stories cc https://twitter.com/chrismoranuk.

NLP

A notable strength of the system, which has facilitated spread around the world, has been the complete rejection of natural language processing. cc @ceteri

It seems attractive to consider ‘assembling’ narratives from data and translate-able building blocks. If the building blocks are sufficiently atomic, perhaps they could be assembled and displayed given any detected browser language string.

@jucor
Copy link
Contributor

jucor commented Mar 20, 2021

A good starting point would by the large language generation work behind "The automatic statistician". https://link.springer.com/chapter/10.1007/978-3-030-05318-5_9
I know the authors, I'm sure they'll be happy to have a chat.
This predates large language models, (LLM) which might seem "dated" but is in my opinion a very good thing: the biases in LLM (as highlighted by the parrots paper) are too high a risk for something so crucial the take away from public consultation.

@colinmegill
Copy link
Member Author

Generally, I love this & printing to read, thanks for sharing. Obvious point to restate, but this is in line with previous choices to build on PCA over more black box options.

@jucor
Copy link
Contributor

jucor commented Mar 20, 2021 via email

@ThenWho
Copy link

ThenWho commented Mar 20, 2021

A solid and worthwhile goal 💯. I'm skeptical as to how close to the target we can shoot, but this should be the aspiration, 100% yes 👍

I wouldn't dismiss further visualizations as intermediate waypoints towards this goal though. Visualization is a huge strength of polis after all. For example, keeping snapshots of the conversation as it unfolds and visualizing a meaningful subset of them at the end. Or calculating and visualizing refinement-type relationships between statements, i.e. "statement A is a refinement/rephrasing/evolution of statement B", using well-understood, explainable methods such as Levenshtein distance or similar ( #913 (reply in thread) ).

These kind of historic/timeline data can later on be used for textual summaries too.

@amyxzhang
Copy link

amyxzhang commented Mar 22, 2021

Thanks for writing this up @colinmegill! Given my orientation as an HCI researcher, I would first want to understand the goals and experiences of the people who are doing the work to create a report. In my previous work, I've interviewed people who work to generate reports/make decisions from a deliberative process, specifically Wikipedia RfC closers (written up in my thesis starting around page 94) and town hall meeting reporters (in this CSCW paper), and there are a number of considerations people balance and trade off as they're making editorial decisions while writing, including transparency, inclusivity, brevity, etc., as well as different audiences they write for. I suspect there are parts of the work that are tedious and highly automatable and parts that must or should be conducted by humans, whether that's a centralized report author or additional collective signaling by participants.

@ceteri
Copy link

ceteri commented Apr 5, 2021

I'd like to help. From what I've seen, you've got relatively brief, semi-conversational snippets of text, which are obtained from comment threads. Is that roughly correct as a description? From that, I don't quite see where text summarization comes in.

OTOH, if you had annotations for these comments, then it makes sense to generate some text-ish report/narrative describing the aggregates, segmentation, trends, and so on. That would entail a different kind of tooling. Definitely, reaching a "well annotated" state is expensive, and fleeting :) Some of the HITL approaches for active learning and weak supervision can help cut the costs dramatically, and there can be ways to leverage self-supervised learning to make this less expensive too.

For the articles listed, is the intent to work with comments from them? Are you parsing these articles, representing them into some larger structure (e.g., entity linking)?

Summarizing a collection of articles makes sense. For an excellent example, see these fully generated COVID-19 reports https://covid19primer.com/dashboard

Otherwise, while I do understand natural language work, modeling the semantics of parsed text, summarization and other areas of language generation, I'm not quite understanding what the ask is here. I guess what I'm asking is, from a "product" perspective how are the comments and articles intended to generate some result. What's the use case definition, other than "do stuff with this" :) That seems to be missing above? Or perhaps I've misunderstood much of the above?

One point to consider is there are a couple of categories of natural language work referenced above:

  • parsing/summarizing articles and managing the result (our team and our partners do lots of this)
  • understanding conversational threads (check with RASA, etc.)

FWIW, it's good to be cautious about mixing and matching approaches. The semantics of these categories of language have vastly different properties, and their rhetorical structure is also generally quite different. That can lead to trouble, although there can be ways around it. Also, it's perhaps wise to be a bit skeptical about promises made in research papers; those are mostly about beating benchmarks to get the authors published in NeurIPS, ACL, etc., while the likelihood of running their published code tends to be quite low.

Also, NLP-progress is a good source for checking SOTA in related research, such as question answering http://nlpprogress.com/english/question_answering.html

There are also tools that consider some intersection of the above. For instance, a new API from IBM for identifying claims and their support from within a corpus – or within conversational snippets of text – then generating narratives from that: https://early-access-program.debater.res.ibm.com/terms Think of automating debate, based on a corpus of discussion about topics.

@NewJerseyStyle
Copy link
Contributor

@colinmegill I love the idea of automated conversations

There is also potential benefit in 'automated conversations': if there is no human in the loop, conversations can be triggered procedurally, say with 3000 randomly selected citizens (procedurally selected), 5 years after a law was passed, to do assessment on whether or not the law had impact, should be revisited, etc. cc https://twitter.com/marcidale or, to automatically generate news stories about a conversation that happened as a result of news stories cc https://twitter.com/chrismoranuk.

@NewJerseyStyle
Copy link
Contributor

I'd like to help. From what I've seen, you've got relatively brief, semi-conversational snippets of text, which are obtained from comment threads. Is that roughly correct as a description? From that, I don't quite see where text summarization comes in.

OTOH, if you had annotations for these comments, then it makes sense to generate some text-ish report/narrative describing the aggregates, segmentation, trends, and so on. That would entail a different kind of tooling. Definitely, reaching a "well annotated" state is expensive, and fleeting :) Some of the HITL approaches for active learning and weak supervision can help cut the costs dramatically, and there can be ways to leverage self-supervised learning to make this less expensive too.

For the articles listed, is the intent to work with comments from them? Are you parsing these articles, representing them into some larger structure (e.g., entity linking)?

Summarizing a collection of articles makes sense. For an excellent example, see these fully generated COVID-19 reports https://covid19primer.com/dashboard

Otherwise, while I do understand natural language work, modeling the semantics of parsed text, summarization and other areas of language generation, I'm not quite understanding what the ask is here. I guess what I'm asking is, from a "product" perspective how are the comments and articles intended to generate some result. What's the use case definition, other than "do stuff with this" :) That seems to be missing above? Or perhaps I've misunderstood much of the above?

One point to consider is there are a couple of categories of natural language work referenced above:

  • parsing/summarizing articles and managing the result (our team and our partners do lots of this)
  • understanding conversational threads (check with RASA, etc.)

FWIW, it's good to be cautious about mixing and matching approaches. The semantics of these categories of language have vastly different properties, and their rhetorical structure is also generally quite different. That can lead to trouble, although there can be ways around it. Also, it's perhaps wise to be a bit skeptical about promises made in research papers; those are mostly about beating benchmarks to get the authors published in NeurIPS, ACL, etc., while the likelihood of running their published code tends to be quite low.

Also, NLP-progress is a good source for checking SOTA in related research, such as question answering http://nlpprogress.com/english/question_answering.html

There are also tools that consider some intersection of the above. For instance, a new API from IBM for identifying claims and their support from within a corpus – or within conversational snippets of text – then generating narratives from that: https://early-access-program.debater.res.ibm.com/terms Think of automating debate, based on a corpus of discussion about topics.

@ceteri How is the progress there? I am doing a demo on summarization with the openData collected from threads in Polis. Saw this issue here and think maybe you have already some made more progress? Say a question answering model? I can ask question in a topic and figure out what is going on without reading all the comments in the topic? Or ask the model how people think about the topic?

@colinmegill
Copy link
Member Author

@compdemocracy compdemocracy deleted a comment from patcon Dec 16, 2022
@colinmegill
Copy link
Member Author

https://www.openrightsgroup.org/publications/democratic-innovations-polis-and-the-political-process/

@colinmegill
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants