Skip to content

Codeparrot/githubpairs & co #819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: eval-hackathon
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
619daa1
Merge pull request #4 from bigscience-workshop/eval-hackathon
Muennighoff Jul 19, 2022
a3ed6a3
Add cp complexity
Muennighoff Jul 22, 2022
0b99615
Adapt names
Muennighoff Jul 22, 2022
ae7f431
complexity -> time complexity
Muennighoff Jul 28, 2022
27f6a43
Add clue @07e1c20b2f4e8ac1af98d9a7a9cf3d05f007f36d
Muennighoff Jul 30, 2022
cd718e0
Add file
Muennighoff Aug 24, 2022
577d6e6
Add code
Muennighoff Aug 24, 2022
4dd7870
Rmv unrelated
Muennighoff Aug 24, 2022
3255421
Fixes
Muennighoff Aug 24, 2022
9b2f3a5
fix id
Muennighoff Aug 24, 2022
8f5f343
fix subsets
Muennighoff Aug 24, 2022
47c4c2f
Fix conflicts
Muennighoff Aug 24, 2022
dbd42f4
Fix
Muennighoff Aug 24, 2022
f31ce9b
Fix
Muennighoff Aug 24, 2022
a189e43
Fix
Muennighoff Aug 24, 2022
04d78a3
Add prompts
Muennighoff Aug 24, 2022
8a5208f
Add prompts
Muennighoff Aug 24, 2022
e78317b
Add xlcost
Muennighoff Aug 24, 2022
4a0bb91
Add
Muennighoff Aug 24, 2022
79373d1
Add
Muennighoff Aug 24, 2022
575f025
Fix
Muennighoff Aug 25, 2022
88cde9f
Fix
Muennighoff Aug 25, 2022
46070a0
Fix
Muennighoff Aug 25, 2022
ab848d3
Fix
Muennighoff Aug 25, 2022
fa971b4
Fix
Muennighoff Aug 25, 2022
04f4a91
Fix
Muennighoff Aug 25, 2022
cebbe61
Fix
Muennighoff Aug 25, 2022
1239612
Fix
Muennighoff Aug 25, 2022
8f6468b
Fix
Muennighoff Aug 25, 2022
4d3d840
Fix
Muennighoff Aug 25, 2022
1805c10
Fix
Muennighoff Aug 25, 2022
c80a1a3
Fix'
Muennighoff Aug 25, 2022
cfe4070
Fix'
Muennighoff Aug 25, 2022
c2d3c7d
Fix
Muennighoff Aug 25, 2022
5cde25b
fixes
Muennighoff Aug 25, 2022
f52ea34
Add Ps
Muennighoff Aug 25, 2022
5654fb0
Add strip conn option
Muennighoff Aug 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 35 additions & 4 deletions promptsource/templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,25 @@
# These are users whose datasets should be included in the results returned by
# filter_english_datasets (regardless of their metadata)

INCLUDED_USERS = {"Zaid", "craffel", "GEM", "aps", "khalidalt", "shanya", "rbawden", "BigScienceBiasEval", "gsarti"}
INCLUDED_USERS = {
"Zaid",
"craffel",
"GEM",
"aps",
"khalidalt",
"shanya",
"rbawden",
"BigScienceBiasEval",
"gsarti",
"Helsinki-NLP",
"Muennighoff",
"facebook",
"codeparrot",
"pasinit",
"Fraser",
"allenai",
"teven",
}

# These are the metrics with which templates can be tagged
METRICS = {
Expand Down Expand Up @@ -360,12 +378,13 @@ def get_fixed_answer_choices_list(self):
else:
return None

def apply(self, example, truncate=True, highlight_variables=False) -> Tuple[str, List[str]]:
def apply(self, example, truncate=True, strip_connection=True, highlight_variables=False) -> Tuple[str, List[str]]:
"""
Creates a prompt by applying this template to an example

:param example: the dataset example to create a prompt for
:param truncate: if True, example fields will be truncated to TEXT_VAR_LENGTH chars
:param strip_connection: if True, strips the connection between input & target
:param highlight_variables: highlight the added variables
:return: tuple of a string and a list of strings, for input and targets
"""
Expand Down Expand Up @@ -396,15 +415,27 @@ def apply(self, example, truncate=True, highlight_variables=False) -> Tuple[str,

# Splits on the separator, and then replaces back any occurrences of the
# separator in the original example
parts = [self._unescape_pipe(part).strip() for part in rendered_example.split("|||")]
if strip_connection:
parts = [self._unescape_pipe(part).strip() for part in rendered_example.split("|||")]
else:
parts = [self._unescape_pipe(part) for part in rendered_example.split("|||")]
if parts == [""]:
# Handles the case of blank results
# Example: `tydiqa` where prompts are conditionned on the language and thus most of the time will return a blank result
return parts
if len(parts) < 2:
raise ValueError("Prompt did not produce an input and at least one target.")

return parts[0], parts[1:]
if strip_connection:
return parts[0], parts[1:]
else:
# Remove double whitespace
if parts[0][-1] == " " and all(p[0] == " " for p in parts[1:]):
parts[0] = parts[0][:-1]
# Leave the connection between input & target unstripped
return parts[0].lstrip(), [p.rstrip() for p in parts[1:]]



pipe_protector = "3ed2dface8203c4c9dfb1a5dc58e41e0"

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
dataset: Fraser/python-state-changes
subset: default
templates:
2b358b1c-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 2b358b1c-7514-488f-99ed-3ca5da70e103
jinja: 'Starting variables:

{{ start }}

Applied code:

{{code}}

Ending variables:

|||
{{ end }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: startend
reference: ''
1b218b2c-8514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 1b218b2c-8514-488f-99ed-3ca5da70e103
jinja: 'I applied "{{code}}" given "{{ start }}".

What are the new values of the variables now?

|||

{{ end }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: newval
reference: ''
5f318b2c-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5f318b2c-7514-488f-99ed-3ca5da70e103
jinja: 'The final variables are:

{{ end }}

We know that the code "{{code}}" was applied.

What were the variables at the beginning?

|||
{{ start }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: varbeg
reference: ''
5b918b2c-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b918b2c-7514-488f-99ed-3ca5da70e103
jinja: 'What code do I need to apply to get from start to end?

Start: {{ start }}

End: {{ end }}

Needed code: ||| {{ code }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: needcode
reference: ''
65 changes: 65 additions & 0 deletions promptsource/templates/codeparrot/apps/all/templates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
dataset: codeparrot/apps
subset: all
templates:
5b318b1c-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b318b1c-7514-488f-99ed-3ca5da70e103
jinja: 'Solve in Python:

{{ question }}

|||

{{ solution }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: qsol
reference: ''
5b218b3c-8514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b218b3c-8514-488f-99ed-3ca5da70e103
jinja: '{{ question }}


Can you solve the above problem using Python?

|||

{{ solution }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: abovesol
reference: ''
5b318b2c-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b318b2c-7514-488f-99ed-3ca5da70e103
jinja: 'I found an interesting problem on {{url}}:

{{ question }}


I tried it in Python, but could not do it. Can you solve it?


|||

{{ solution }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: abovesol
reference: ''
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
dataset: codeparrot/codecomplex
subset: codeparrot--codecomplex
templates:
5b108b1c-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b108b1c-7514-488f-99ed-3ca5da70e103
jinja: '{{ code }}
What is the time complexity of the previous code?
|||
{{ complexity }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: whatcomplexity
reference: ''
1d85c898-70fe-4a51-be37-5111be357762: !Template
answer_choices: null
id: 1d85c898-70fe-4a51-be37-5111be357762
jinja: "Identify the time complexity of the following code as constant, linear, quadratic, cubic, log(n), nlog(n) or NP-hard. {{ code }} Complexity: |||{{ complexity }}"
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: false
name: identifycomplexity
reference: ''
5d85c898-70fe-4a51-be37-5111be357762: !Template
answer_choices: null
id: 5d85c898-70fe-4a51-be37-5111be357762
jinja: "{{ code }} Which one is the correct time complexity of the code snippet: constant, linear, quadratic, cubic, log(n), nlog(n) or NP-hard? |||{{ complexity }}"
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: false
name: whichcomplexity
reference: ''
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
dataset: codeparrot/github-jupyter-text-code-pairs
subset:
templates:
5b718b1c-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b718b1c-7514-488f-99ed-3ca5da70e103
jinja: '"{{ markdown }}"

Please write code following the instructions in jupyter notebook style.

|||

{{ code }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: code
reference: ''
5b218b2e-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b218b2e-7514-488f-99ed-3ca5da70e103
jinja: 'I am working on the file "{{ path }}".

The first task is:

{{ markdown }}

Can you write Python code for it?

|||

{{ code }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: taskcode
reference: ''
4d85c898-70fe-4a51-be37-5111be357762: !Template
answer_choices: null
id: 4d85c898-70fe-4a51-be37-5111be357762
jinja: "{{ markdown }}\n|||{{ code }}"
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: false
name: markdowncode
reference: ''
8d85c898-70fe-4a51-be37-5111be357762: !Template
answer_choices: null
id: 8d85c898-70fe-4a51-be37-5111be357762
jinja: '{{ code }}

Given the above code, generate some markdown instructions for it.

|||

{{ markdown }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: false
name: genmarkdown
reference: ''
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
dataset: codeparrot/xlcost-text-to-code
subset: C++-program-level
templates:
5f718b2d-7514-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5f718b2d-7514-488f-99ed-3ca5da70e103
jinja: '"{{ text }}"

Solution in C++:

|||
{{ code_clean }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: solcpp
reference: ''
5b218b2e-7525-488f-99ed-3ca5da70e103: !Template
answer_choices: null
id: 5b218b2e-7525-488f-99ed-3ca5da70e103
jinja: '"{{ text }}"

How can the above be solved in C++?

|||
{{ code_clean }}'
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Other
original_task: true
name: abovecpp
reference: ''
Loading