Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graded Assignment -4 (Jan Term 2023):- Redesigning The Hindu Data Point Stories #8

Open
Jimmi-Kr opened this issue Mar 10, 2023 · 26 comments

Comments

@Jimmi-Kr
Copy link
Collaborator

For this assignment, we'll use data stories from The Hindu Data Point. Use what you have learned in Week 4 & Week 5 for doing this assignment.

Select a story that you like, study it carefully, and redesign it. Specifically, we want you to focus on understanding the data that powers the story, and how it is visually encoded to tell the intended story. Document your design process, capturing the following:

  • What is the story the author is trying to tell?
  • What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.
  • How is it encoded, what problems are with it, and how have you attempted to improve it?

You may choose to expand or curtail the scope of the data used in the story or add an additional dataset to tell the story better. But do not deviate from the main intent of the original story. In other words, it is a redesign exercise, and hence I do not want you to tell a different, unrelated story.

While you should provide a link to the original story, it might be useful to capture and display inline, appropriate parts of the original visualization, and your own design iterations to produce coherent documentation.

For reference, take a look at what the previous batches (2019,2020,2021, 2022 )did with this assignment.

@ShagunDwivedi
Copy link

ShagunDwivedi commented Mar 16, 2023

Post-COVID-19 math skills of students in southern and Western States dipped the most

Shagun Dwivedi
21F1001731

Here's the article!
Authors: Vighnesh Radhakrishnan, Rebecca Rose Varghese
January 24, 2023

The Story Authors Narrate:
Using the recently released Annual Status of Education Report (Rural) 2022, the authors show that the ability of schoolchildren to carry out simple arithmetic calculations was poor in most of the southern, central and western States compared to the children in many northern and eastern States. The adverse impact of the pandemic on arithmetic ability was more pronounced among boys.
The Data: The Annual Status of Education Report (Rural) 2022 is a ‘floor test’ focusing on basic reading and arithmetic, rather than grade-level competencies. Testing is conducted at home, rather than in schools, so as to include out of school children and children attending different types of schools.
Extent and Dimension: All children in the 5-16 age group in a sampled household are tested using the same tools, irrespective of age, grade, or schooling status. ASER testing process assesses each child on reading, arithmetic and English. The author focuses on arithmetic abilities only.
The sampling strategy used in ASER is designed to generate a representative picture of each district. The sample is obtained by selecting 30 villages per district and 20 households per village.
Gaps in Data: Since the data only corresponds to rural India, it does not take into account the urban population, hence it's not a perfect representative for the country as a whole.
The Relevant and the Irrelevant: We ignore the Reading, Tuition, Private Vs Government School Factors, and solely take into account the change in arithmetic skills and gender division.

Data as Encoded by the Authors:
Chart 1 shows the share of students in Classes V and VIII who could carry out all four arithmetic tasks successfully.
Screenshot 2023-03-16 162127
Chart 2 shows the share of students in Classes V and VIII who could carry out all four arithmetic tasks successfully for the year 2018.
Screenshot 2023-03-16 162502
Chart 3 shows the percentage of Class VIII children who completed all four tasks in 2022 and the change in percentage points from 2018.
Screenshot 2023-03-16 173427
Chart 4 shows the change in the share of Class VIII students who could complete division in 2022 compared to 2018 (in percentage points).
Screenshot 2023-03-16 173451
Problems:

  • Aim of the story is about the dip in abilities, but the first two charts compare 5th and 8th graders from 2018 and 2022 separately. Should have used one chart instead of the first two.
  • Third chart gives no relevant information at first, moving the X-axis creates this confusion.
  • The fourth chart contains a conflicting color scheme for the bars. Readers will be able to decipher charts with these stereotypical colors faster, but the chart uses the opposite notation, creating confusion. Alternatively we could use different colors.
  • Also, even though the percentage of students has reduced (negative change), the chart makes it seem like an increase in the percentage of students with arithmetic skills, which goes against the primary narrative.

Collected Data from ASER, also included data related to students of III grade, and region-coded it.
Screenshot 2023-03-16 162353
Created scatter plot similar to the original chart categorised by grades with a time series. But, it is hard to read.
Screenshot 2023-03-16 163900
Violin Plots divided by region Useful if the goal was to convey the fact that 8th graders are most arithmetically skilled.
Screenshot 2023-03-16 165808
Violin Plots divided by grades Somewhat useful in communicating the difference in arithmetic skills in each region, but it does not communicate the change.
Screenshot 2023-03-16 171912
Area Plot good at visualising the dip in arithmetic skills. No reference to regional divide.
Screenshot 2023-03-16 170316
Radar Chart makes comparison between the two years but not the regions.
Screenshot 2023-03-16 171359
Slope chart seems to do well with comparison between the years and representing regions, divided by classes.
Screenshot 2023-03-16 153013
Slope chart based on percentage change, would be useful if the aim was to compare the level of improvement in each state/region.
Screenshot 2023-03-16 172628

Final Visualisations
Chart 1:
Screenshot 2023-03-16 165509

What it does better:

  • The slope chart visualises the dips in arithmetic skills, the focus of the story, that the original visualisations somewhat missed out on.
  • Matching Y-axes across the grids helps us compare the performance of different grades at once.
  • Color Coded Regions for reference.
  • The title, description, legends and source citations make it useful as a standalone visualisation as well.

Chart 2:
Screenshot 2023-03-17 000113

What it does better:

  • Added all states for which data was available and added stacks for comparison in change over the years. Since the bars for the recent year are mostly shorter, one cannot misunderstand the change to be a positive one.
  • Uses different colors for gender-data, removing confusion as well as any space for stereotype endorsement.

Thanks.

@Chaitanya-Kumaria
Copy link

Terror Attacks in Pakistan: a data Story

Name : Chaitanya Kumaria

Roll No. : 21f1000479

Article Link: Terror Attacks in Pakistan : A data Story

Context

The Article is part of the the series Week in 5 Charts, I have taken Story number 4 Peshawar Bombings for the assignment. Here the author starts with the news of Peshawar Mosque Bombings which killed over a 100 people and left 200+ others injured. Author then delves into the history of Terror attacks and Suicide Bombings in the country of Pakistan over past two decades and discusses the reason for same

Data Charts Provided by the Author

Chart 1

image
This chart deals with Zone wise Suicide Bombing incidents and number of related deaths in the country over past 2 decades

Issues

  • Terror related activities pertaining to a geographical region can best be visualized through choropleths, as it helps highlight regions of concern
  • There are no axes labels on the graph
  • Value on a label are not easily highlighted

Chart 2

image
This is a stacked bar chart which author uses to depict the fatalities of different classes of individuals in terror attacks in Pakistan across past 2 decades

Issues

  • The choice of chart leaves a lot to be desired, stacked bar charts do not aid in comparison of of each individual groups
  • Number on top of each bar is the total number of deaths, but mention of subparts may be inside each graph would be useful

Approach

For my analysis, I have used raw data from the South Asian Terrorism Portal Website
Based on the data I thought of the following visualizations

Chart 1

image
This chart helps in understanding the number and severity of terror related incidents in Pakistan in last 20 years.
As is evident from the chart that number of terror attacks has more or less remained constant throughout the time frame but severity of these were at its peak in 2009, also an interesting thing to note that first 2 months of 2023 has lead to more deaths than last 2 years combined.

Chart 2

image
This chart is my version of representing number of deaths of different types of individuals, a grouped bar chart helps in giving group wise comparisons and also gives better context for evaluations

Chart 3

image
This chart highlights zone wise Suicide Bombings killings, the choice of visualization used is "Choropleths", which helps visualize areas of concern. For instance here border proviences of of Khyber Phaktunkhwa (Afgan Border) and Punjab(India Border) have been the worst affected by suicide bombings.

Final Version

image

This basically combines all the visualizations and adds required labels to better help understand the whole story in one shot.

@aryab-sudo
Copy link

aryab-sudo commented Mar 18, 2023

Personal Details

Name: Arya Bhattacharyya
Roll Number: 21f2000436

Article Chosen

Friction over revenue sharing formula: Why some States get more money from Centre (https://www.thehindu.com/data/data-friction-over-revenue-sharing-formula-why-some-states-get-more-money-from-centre/article66625863.ece)

The Narrative

The article discusses the recent point of contention between the centre and various states. Overall, it tries to present the viewpoint of various stakeholder from different government institutions (at the state and central level) and uses data visualizations to drive across specific points as mentioned in the corresponding charts.

Charts By The Author v/s My Visualizations

Chart 1 (By the Author)

image

Here the author is trying to show the returns that various states receive for every rupee that they contribute to the centre's tax collection. From the overall theme of the article, there is a contention between less returns for southern states, while higher returns for the northern states.

Data
2-Dimensional data of a select few states, wherein one dimension is the state's name while the other is the returns it gets for every rupee it contributes. There are a few observable gaps, especially in the case of states from whom the data is not available (like most of the northeastern states, Delhi, etc). The source of this actual data is not explicitly stated and is difficult to obtain. The data is relevant in terms of showcasing the divide that exists in terms of returns.

Cons in Encoding

  1. No axes and their corresponding title.
  2. Since the contention is categorized by geographical location, the bar chart makes it difficult to understand the divide, especially for people who are not familiar with the geography of India.
  3. The title says, "each State" but clearly, quite a few states are missing and are not addressed. Hence the title is somewhat inappropriate.

Chart 1 (By me)

wHHNK-returns-of-each-state (2)

(Interactive Version: https://datawrapper.dwcdn.net/wHHNK/6/)

Approach

  1. Since the central theme is the geographic categorization and the consequent divide of the returns, this visualization in my opinion helps the user navigate the idea better.
  2. A central point of 1 rupee has been chosen in the colour gradation to ensure that there is an easier comparison of returns and an explicit indication of the visual encoding of a 100% return.
  3. Footnote is added in the visualization to explicitly state the fact that data for some states are not available and therefore are greyed out. This gives more justification to the title.

Chart 2 (By the Author)

image

Here the author tries to illustrate the idea that the share of the southern states over the divisible pool of taxes has "consistently" declined over the years. Here again, the focus is on the geographical location of the states and how it correlates with the share of the taxes that they have received.

Data
2-Dimensional data (across different sheets of Microsoft Excel, each pertaining to a different state; overall 3-Dimensional Data) of a select few states, wherein one dimension is the finance commission(under consideration) while the other is the %share the state gets. The source of this actual data is not explicitly stated and is difficult to obtain. The data is relevant in terms of showcasing the declining share of southern states.

Pros

  1. Multiple graphs make it easier to compare instead of plotting everything on the same graph.

Cons in Encoding

  1. There is no straightforward way to obtain the particular region that a state belongs to, from the given plots. For example, as someone who might not know the geography of India, I may not know that Karnataka is a southern state, while Bihar is a northern state.
  2. Since one of the attributes (the finance commission under consideration) is qualitative in nature, it is inappropriate to use a line graph because the same can be used only when both attributes are quantitative and continuous. This is because there is no physical reality or significance of a value like the "10.234th" Finance Commission. And therefore a line depicting a regression in between the values is not appropriate.
  3. Axes titles are missing.

Chart 2 (By me)

Frame 1 (1)

Approach

  1. I have retained the multiple graph approach as used by the author because of its clutter free nature and ease of comparison.
  2. Since the central theme is the geographic categorization and the consequent divide of the share, I decided to colour code the region of the different states and club them together on the same row, so that it is easier for the reader to compare trends across geographies and regions.
  3. Because of the colour coding, it is now also possible to easily know which state belongs to which region, reducing the requirement for an unaware reader to consult secondary sources for the same.
  4. Since the trend is more important that the actual values, some liberty has been taken to not have standardized start and end values for the axis, but rather have values which drive the point easily.
  5. Scatter plots have been used instead of line chart because of the presence of a qualitative variable.

Chart 3 (By the Author)

image

Here the author tries to explain how the states with higher shares (like Bihar and UP as seen earlier) have higher fertility rates and hence effectively the center is not able to reward states with lower fertility values. This is presented to support one of the opinions stated by Tamil Nadu's finance minister (written in the article above the figure).

Data
2-Dimensional data (across different sheets of Microsoft Excel, each pertaining to a different state; overall 3-Dimensional Data) of a select few states, wherein one dimension is the health survey(under consideration) while the other is the fertility rate of the state. The source of this actual data is not explicitly stated and is difficult to obtain. The data is relevant in terms of showcasing the contrast in efficiency of the different states for reducing fertility rates and population.

Cons in Encoding

  1. No titles for the axes.
  2. The colour scheme is not clear at all. As per convention, states like Haryana and Rajasthan are northern states, however they are given a different colour from UP and Bihar. If we were to assume that the scheme is with respect to the final fertility rates (as per NFHS-5, that would still be incorrect because Maharashtra should then be in the purple shade. Hence, the colour scheme is not at all interpretable.
  3. With so many different lines of the same colour and no labelling on them, it is impossible to discern (in a static version), which line corresponds to which state.
  4. The same error as in Chart 2, the variable of the Survey under consideration, which are the different NFHS's is a qualitative variable and not a quantitative one. Hence the use of a line chart is not justified here. (by a similar reasoning as chart 2, con number 2)
  5. Since the colour coding is of no use, the same issue of an unaware reader needing to spend extra effort to figure the regions of different states arises.
  6. Since there are so many lines (with quite a few not even needed to drive the point home), the chart looks messy and difficult to navigate for a reader.

Chart 3 (By me)

Frame 2 (1)

Approach

  1. I have used the same template, as I have used for Chart 2 that I have made from my end. This comes with its own benefits of having the region encoded in the colour of the state and scatter dots.
  2. The colour scheme not only reflects the region information, but also is the same as used in the previous chart, so that there is consistency of representation and ideas.
  3. Since the states are clubbed together on the basis of region and also colour coded, this would ideally convey more information and lead to less confusion for the reader.
  4. The axis start and end values are kept the same across different subplots to ensure that the reader can easily observe the disparity between the values amongst the northern and southern states.
  5. Since not all states and their data is necessary to drive the point home, I have limited it to the same set of states (and their corresponding region categorization) used in the previous chart, to continue the line of thought for the reader, and make it easier for the reader to relate to the different factors.
  6. The chart is broken into subplots for the reader to easily compare values across regions, and it consequently makes it easier for the reader to navigate.

Chart 4 (By the Author)

image

Here the author tries to illustrate the point that was made by Tamil Nadu's finance minister regarding the lack of development to the poorer states (which as per my understanding refer to states which receive higher returns for their contribution to the tax, effectively meaning the northern states). As an indicator for development, the author uses the HDI values over the years to show the change and the actual divide again between the northern and southern states in terms of their HDI values.

Data
2-Dimensional data (across different sheets of Microsoft Excel, each pertaining to a different state; overall 3-Dimensional Data) of a select few states, wherein one dimension is the period(under consideration) while the other is the HDI value of the state. The exact source of this actual data is not explicitly stated and is difficult to obtain. The data is relevant in terms of showcasing the contrast in efficiency of the different states in achieving higher HDI values.

Pros

  1. The line chart is appropriate in this case because it is a time series(quantitative) data and continuous in nature.

Cons in encoding

  1. Missing axis titles.
  2. No clarity on the colour coding scheme and the reason on why the same was chosen.
  3. No legend, making it difficult to identify the different states (in the static version of the graph)
  4. Too many lines, which also intersect and overlap with each other on various occasions, leading to difficulty in analysis and observing patterns/trends.
  5. Similar/same colours used for multiple lines, even though they technically represent different states. No rationale is apparent from the given graph on why this was done.

Chart 4(By me)

Frame 3

Approach

  1. I have retained the idea of using line charts for this illustration.
  2. The template is similar to the previous scatter plots I had done, and similar to chart 3, this chart too has the same scale on the y-axis across the different subplots because in addition to the trend (the main focus), the values matter as well up to a certain bit.
  3. Colour coding scheme is consistent with the previous scatter charts.
  4. The subplots are clubbed in the same order as previously for the respective states, leading to consistency and easier thought linking for the reader.
  5. Plot was limited to only the small set of states present in the previous scatter plots in order to keep the chart clutter free and easy to comprehend.

Chart 5(By the Author)

image

Here the author is using the NSDP metric to illustrate the same point as in the previous chart which was to compare the growth of the states with their shares of taxes. The point they are trying to make is that growth sedates for the states with higher returns while growth is much better for states with lower returns.

Data
2-Dimensional data (across different sheets of Microsoft Excel, each pertaining to a different state; overall 3-Dimensional Data) of a select few states, wherein one dimension is the period(under consideration) while the other is the NSDP value of the state. The exact source of this actual data is not explicitly stated and is difficult to obtain. The data is relevant in terms of showcasing the contrast in efficiency of the different states in achieving higher NSDP values.

Pros

  1. Line Chart is appropriate because both variables to be plotted are quantitative and continuous in nature.

Cons

  1. Missing axis titles.
  2. Messy plot with too many lines that overlap/interact.
  3. Colour scheme is based on the returns that the state gets from the center and whether it is high, medium, or low. However, this does away with the theme of geographical categorization and divide. And is thus not consistent in my opinion.

Chart 5 (By me)

NSDP (2)

Approach

  1. Since I wanted to plot all the information on the same graph to have an easier comparison of the NSDP values, I decided not to use my previous template which was used in scatter plots and the previous line chart.
  2. In order to drive the point home without losing any important consequential information, I decided to derive the mean value of the NSDP values for the northern, southern and the western states. Please note that the categorization was the same as previously done wherein northern states were Rajasthan, Bihar and Uttar Pradesh, southern states were Karnataka, Kerala and Tamil Nadu and finally western states were Maharashtra and Gujarat.
  3. With regards to point 2, another added benefit of this approach was that since the trend (which was growth v/s stagnation) was more important than the actual individual values, this graph conveyed did a better job than the previous one in a much cleaner and neater way.
  4. A line chart was chosen as it was appropriate for quantitative datatypes (both of the variables under consideration).
  5. Colour scheme is chosen to consistent with the colours of the different regions across the scatter plots and line plot.
    PS: Title was added to the chart but for some unknown reason, exporting it from FLourish studio removes the title. Apologies for the inconvenience.

Final Output

Frame 4 (1)

@kun101
Copy link

kun101 commented Mar 18, 2023

Kunal Chaturvedi

21f1003353

Working on

Russia's Invasion of Ukraine
https://www.thehindu.com/data/in-charts-a-year-of-russias-invasion-of-ukraine/article66545197.ece

The author is trying to narrate the impacts of the Russian invasion of Ukraine from February 2022 to February 2023 and how it has changed the political, social and economic landscape for the people of Ukraine and people of countries around the world.

The author looks as the following aspects of the invasion with regards to visualizations:

  • Current Situation of the Invasion, which parts of Ukraine are in whose control.
  • Military Casualties
  • Refugee Situation
  • Financial Aid to Ukraine

Redesigning the Current Situation of the Invasion

The author wants to talk about the areas under capture or under active war in Ukraine till 20th February 2023, as shown below.

By the author:
Type of Data Used : GeoJSON
Extent and Dimension of Data: Map data showing focus areas for the invasion
Gaps in Data and Relevant Data to be shown: The data is presented in a very crowded visualization, which doesn’t point towards the purpose of the data, that is showing which areas are troubled in Ukraine. Some extra data points are marked on the map, with added news pieces attached which isn’t related to the purpose of the visualization.

image4

Here is how I redesigned it.

Removed irrelevant news pieces to the map.
Colour coded the whole map to further amplify the gravity of the situation.
Took extra data from Wikipedia to encode all districts of Ukraine : https://en.wikipedia.org/wiki/Territorial_control_during_the_Russo-Ukrainian_War
Made the data interactive using Flourish.
Only kept the relevant news pieces to only denote areas under localized battles.

image2
https://public.flourish.studio/visualisation/13116877/

Redesigning Military Casualties

The author draws a comparison between military losses/casualties between Russia and Ukraine.
By the author:
Type of Data Used : Numerical data across categories of losses
Extent and Dimension of Data: Numerical data implying which country is suffering more from the war, by suffering more losses.
Gaps in Data and Relevant Data to be shown: The data fails to clearly indicate comparison between both the countries. Prima facie, the chart is trying to show Russia is suffering more from the war, which it isn’t able to convey well.

image1

How I redesigned it:

Introduced category-wise pie-charts to represent the comparisons clearly.
Avoided using icons as above, instead used numbers as labels to clearly display the units lost.
Added casualties for lives as well, with personnel killed already mentioned, to provide an all-in-one view of the situation.

image7
https://public.flourish.studio/visualisation/13120006/

Redesigning the Refugee Situation

The author lists out the countries which the Ukraininans saw as refuge.

By the author:
Type of Data Used : Geological Data, with the refugee numbers as numerical data
Extent and Dimension of Data: Map Data trying to depict the countries traveled to by the Ukrainians for seeking refuge.
Gaps in Data and Relevant Data to be shown: The map presents a lot of irrelevant data, by showing countries not needed in the map/having no refugees, it becomes redundant to show such data.

image6

How I redesigned it:

Decided to use a bar chart instead of a map, to show only the relevant countries, which are a handful in number in comparison to all the countries shown on the map.
Extracted relevant data from the UNHCR website, containing only the countries having more than 0 refugees.

image8
https://public.flourish.studio/visualisation/13120271/

Redesigning the Financial Aid Visualization

The author aims to provide an idea of which countries provided how much aid to Ukraine in the past year after the invasion.

By the author:
Type of Data Used : Numerical Data
Extent and Dimension of Data: Packed Circle chart representing the aids categorized as Military, Financial and Humanitarian.
Gaps in Data and Relevant Data to be shown: The map uses a lot of space inherently, which makes navigating through it on a web page, difficult. Further, the packed circle structure can make it difficult to understand the proportions of the circle at first glance.

image5

How I redesigned the chart:

Decided to utilize the Treemap hierarchical structure for optimal space utilization
Restructured the data from the Kiel Institute to fit the data format of Flourish

image3
https://public.flourish.studio/visualisation/13120979/

@nithish050497
Copy link

nithish050497 commented Mar 18, 2023

Data | Layoffs by IT firms in the U.S. will greatly impact H-1B workers from India

Nithish Arram
21f1003498

Here's the article!
Authors: REBECCA ROSE VARGHESE
December 02, 2022

What is the story the author is trying to tell?

The story "Layoffs by IT firms in the U.S. will greatly impact H-1B workers from India" focuses on the impact of layoffs in the IT sector in the United States on workers from India with H-1B visas. The article uses data from the U.S. Department of Labor's Office of Foreign Labor Certification (OFLC) and other sources to provide insights into the number of H-1B visa holders affected by these layoffs and the potential impact on the Indian IT workforce.

What data is the author using to tell the story?

The author uses data from the OFLC to show that there were 415,920 H-1B visa holders in the U.S. as of September 2021, with over 70% of them being from India. The article also mentions that layoffs in the IT sector have been happening for several years, and that the COVID-19 pandemic has exacerbated the situation, leading to more job losses.

Screenshot 2023-03-18 194829

The author highlights the concerns of Indian IT workers who fear losing their jobs and being forced to return to India. The article also discusses the potential impact on the Indian IT industry, which has been heavily reliant on the H-1B program to send workers to the U.S.

Screenshot 2023-03-18 195125

The story relies heavily on data from the OFLC and other sources to support its claims about the number of H-1B visa holders in the U.S., the percentage of Indian workers, and the impact of layoffs on the Indian IT workforce. However, the data is not visually encoded, and there is no visualization provided in the article to help readers understand the data better.

Screenshot 2023-03-18 195223

According to the article, IT firms in the US are facing a challenging business environment due to the COVID-19 pandemic and increasing automation, which has led to job cuts. Many of these job cuts have affected H-1B workers from India, who may face difficulties in finding new employment due to the restrictive nature of the visa program.

Screenshot 2023-03-18 195248

The article also highlights the fact that the H-1B visa program has been the subject of much debate in the US, with some critics arguing that it takes away jobs from American workers. As a result, the Trump administration had tightened the rules around the H-1B visa program, making it more difficult for companies to hire foreign workers.

Screenshot 2023-03-18 195313

Screenshot 2023-03-18 195336

How is it encoded, what problems are with it, and how have you attempted to improve it?

To improve the visual encoding of this story, I would suggest creating a chart or graph to display the number of H-1B visa holders by country of origin. This would make it easier for readers to understand the dominance of Indian workers in the program. Additionally, a bar chart or line graph could be used to show the trend in H-1B visa approvals and rejections over time, highlighting any changes in policy or trends in the industry.

Another useful visualization could be a map of the U.S. showing the states with the highest numbers of H-1B visa holders and the industries where they are employed. This could help readers understand which states and industries are most affected by layoffs and potential job losses.

Overall, while the story provides valuable insights into the impact of layoffs on H-1B visa holders from India, there is room for improvement in the visual encoding of the data to make it easier for readers to understand and interpret the information.the article paints a bleak picture for H-1B workers from India who may be impacted by layoffs in the US IT sector. It underscores the need for better policies to protect the interests of foreign workers and ensure that they are not unfairly targeted in times of economic hardship.

we could consider adding some data and visualizations. Here are some suggestions:

  • Add data on the number of H-1B workers from India who have been impacted by recent layoffs. This would provide a better understanding of the scale of the problem.

  • Visualize the trend in the H-1B denial rate over the years. This would help readers understand the recent changes in US immigration policies and the impact on Indian IT workers.

  • Add data on the number of jobs that have been moved offshore by US companies. This would provide a more comprehensive understanding of the job market and the reasons behind the layoffs.

  • Visualize the average salary of H-1B workers compared to American workers in the same field. This would provide insights into the criticism of the H-1B visa program for depressing wages.

@21f1003953
Copy link

Europe picks up more arms even as global weapon imports drop

ROLL NUMBER - 21F1003953

NAME - PARAM CHORDIYA

CONTEXT -

The article is dated March 18, 2023 and mentions about the data associated with arms supply to European countries. The article highlights the rise in arms imports by European countries, particularly those in the NATO alliance, due to increased tensions with Russia after its invasion of Ukraine in 2014. Despite a global decline in arms transfers, Europe's share has increased significantly, with many countries importing the highest volume of arms in the latest five-year period. Ukraine's arms imports have also seen a sharp increase, making it the world's third-largest importer of arms in 2022. The US dominates global arms exports, with a 14% increase in exports between 2013-2017 and 2018-2022, while Russia's arms exports have fallen by 31% in the same period due to the invasion of Ukraine and trade sanctions. SIPRI's Trend Indicator Values measure the volume of international transfers of major arms and provide a common unit to measure trends in the flow of arms to particular countries and regions over time.


Data Charts Provided by the Author -

Chart 1 : Shows arms imports of select European nations using SIPRI’s Trend Indicator Values (TIVs) expressed in millions. It shows the import data for five time periods: 1998-2002, 2003-2007, 2008-2012, 2013-2017 and 2018-2022.

image

Issues -

  1. Each country's graph has a different y-axis scale. This makes it a bit difficult to compare instantly.
  2. Having Data shown separately increases comparison complexity.

Chart 2 : Shows the region-wise share of arms imports in the last two five-year periods.

image

Issues -

  1. It can be difficult to accurately compare the individual components of each bar.
  2. This is because the height of each bar represents the total of all components, rather than the individual values of each component.

Chart 3 : Country wise domination of arms exporters

image

Issues -

  1. The author has not mentioned the country tag in some bars making it difficult for readers to identify.
  2. Bar charts make it difficult for readers to compare the change

My Visualizations -

Chart 1:
I have made use of Line chart and mapped it according to the years period based on the data given in original article so that we get each country's trend as well as easy to compare it with other nation's data.

Country Arms Imports

Chart 2:
I have made use of Bubble charts to appropriately display the shares of import by each region showing the evident comparisons

Percent share of arms imports in 2018-2022 -
1

Percent share of arms imports in 2013-2017 -
2

Chart 3:
I have converted the given data based on estimations to a radar chart which shows the trend over the periodic cycle for each NATO country.

c3


Conclusion -

I have tried to redesign the charts provided in the article by the author in a way that tries to cover the flaws in the original charts.

@Pramoth-SK
Copy link

Pramoth-SK commented Mar 18, 2023

Data | Indian startups founded in 2021 took only 28 months to go from seed stage to Series A funding: RBI paper

link to the orginal post

Name: S.K.Pramoth
ROLL NO: 21f1005796

What is the story the author is trying to tell?

The author of this article tries to show us how the startup environment changed in india in the past 10 years. He points out that the number of months taken by startups in India to climb up the funding ladder has drastically declined in recent years. Acording to DPIIT(Department for Promotion of Industry and Internal Trade) 87,988 startups are recognized by India as on January 12,2023 which makes us the 3rd largest startup ecosystem in the world.

Charts used to tell the story

...
Chart 1:
Chart 1 shows the number of unicorns created in India each year. We can see that there is a huge bump in 2021 this is due to post-pandemic effect, after 2020 most government restrictions got lifted there many startups got more funding from investors to grow the company which increased their valuation.

ezgif com-gif-maker2

Chart2:
Chart 2 shows us the startup’s place of origin. From the chart we can see that the majority of the startups are registered in Bengaluru followed by Gurugram and Mumbai.(this data is as of January 2023)
ezgif com-gif-maker1

Chart 3:
Chart 3 shows us how the funding progression has changed from 2014 to 2021. The chart shows that the average time between the funding stages is reducing year by year.

ezgif com-gif-maker3

Chart 4:
Chart 4 shows us the distribution of the total number of funding rounds got by the company on every stage. From this chart we can understand that the number of funding rounds during the initial stages of the company got smaller by the year and number of funding rounds after series C is huge in the recent years compared to 2008 or 2011.

ezgif com-gif-maker

Problems with the visualization and my attempt to improve it

...
Problems with the chart:
Chart 1:
• there is no exact numbers displayed on the charts
Chart 3:
• It is not clear at the first glance what we should compare from this chart.
• The author expects us to compare the numbers which is not obvious at first.
• The columns are not aligned, the height of each segment(ex: series A to B) can’t be compared, the width or the area of the segment cant be compared
Chart 4:
• Cumulative money raised by the companies are different on every year, which is not mentioned in the chart.

My solution:

(The following data is extracted from official RBI paper “What Drives Startup Fundraising in India?”)

I think to fully understand the changes in the startup environment we should find the correlation between the total funding got by the companies and number of rounds it took to acquire it.

Chart 5:
graph1

From this chart we can see that the funding for new startups in India is increasing at a healthy rate and RHS is also coming down respectively so we can infer that investors are happy to invest more in the early stages of the company with lesser rounds of funding.

Chart 6:
Screenshot 2023-03-19 214542

To further support my point the chart above shows that the funding in each round respectively. If we compare this chart with chart 4 we can see that fewer rounds and more funding in recent years for example take 2008, according to chart 4 we can see that 40%(consider till series B) of total rounds fetch us 80%(chart 6) of total funding but in 2021 15% of total rounds fetch us 80% of the total funding.

I think the above 2 chart sufficiently explain how the startup environment have changed from the past

@mukeshonlinesiitm
Copy link

mukeshonlinesiitm commented Mar 19, 2023

Topic: Relatively fewer tobacco users in the southern States

Name: Mukesh Kumar Singh
ROLL NO: 21f1000350

Artical link: - https://www.thehindu.com/data/data-relatively-fewer-tobacco-users-in-the-southern-states/article66568119.ece

In this article author is trying to compare the tobacco uses across different status in India and comparing how southern States consume less tobacco compare to other states. Based on data author find that north-eastern States of India, consumption among men in both smokable and chewable forms was higher than rest of India in 2019-21. If only the smokable forms were considered, the share was higher in the northern States of Himachal Pradesh, Uttarakhand, Haryana, J&K U.T. and the eastern State of West Bengal. If only the chewable forms were considered, the share was higher in the east — Jharkhand, Bihar and Odisha — and in Uttar Pradesh, Madhya Pradesh and Gujarat.
Even in the southern States, the share was relatively low with regard to both forms of tobacco consumption. However, among those who smoked, the share of those who consumed more than five sticks a day was much higher in many southern States. So, while smokers were fewer in the south, those who smoked did so heavily.

Authors has created different map to compare data.

image

image

Map-1 shows the percentage of all men aged 15-49 who smoked cigarettes and/or bidis and/or cigars and/or pipe and/or hookah in 2019-21. The share should be read with caution as those who smoked cigarettes could also be bidi smokers, which means they were counted twice. The share was much higher among some northern and all north-eastern States except Assam. While the share of smokers was low in the south, it was even lower in the western States of Gujarat and Maharashtra.

My observation: - Authors has used choropleth map to show the uses of tobacco in different state on India. Red Colours shaded are used Most high uses of tobacco where light blue for lesser uses. But colour coding used do not present the data well like Mizoram has highest 84.7 whereas Manipur as 40.2 but by colour schema it will difficult to understand so I used different colour scheme to present same graph. Now it is clear visible that Mizoram is highest compare to other including Manipur.

(Map-1 graph- Reproduce in different colour schema as choropleth Graph- My Graph)
Tobacco user (2)

Addition to that author is comparing only men tobacco uses in his data but as per topic tobacco users I feels that we need to consider women in our data analysis. So I find some data from “https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7942198/” and plotted 2D cluster column data to compare between men and women for different state. From Graph it is clear visible that Mizoram is highest for men as well as women whereas Kerala, Punjab women users of tobacco is very less compare to men. In this graph we can compare by state but it cannot explain the difference among geography position of India like northern states consumes more or less compare to north- eastern states.
image

Other observation with author explains If only cigarette smokers were considered, Mizoram (62.4% smoke cigarettes), Meghalaya (49.6%), Manipur (36.2%) and Arunachal Pradesh (31.7%) were the top four. In West Bengal, 24.3% smoked bidis, the highest share in India. In Haryana, 9.9% smoked hookah, the highest share by a high margin.

image
image

Map 2 shows the percentage of all men aged 15-49 who chewed gutka with tobacco and/or paan masala with tobacco and/or paan with tobacco and/or khaini and/or other forms of tobacco in 2019-21. The share was much higher in the northeastern, eastern, and some central, western and northern States. All the southern States and some northern States have a relatively low share.. The usage of khaini was over 35% in Bihar and Jharkhand. These two States led by a wide margin. In Gujarat, over 33% men chewed gutka/paan masala with tobacco, the second highest share, followed by Odisha (31%), M.P. (29.6%) and U.P. (27.6%).

My observation: - Authors has used choropleth map to show the uses of percentage of all men aged 15-49 who chewed gutka with tobacco and/or paan masala with tobacco and/or paan with tobacco and/or khaini and/or other forms of tobacco in 2019-21 in different state on India. Red Colours shaded are used Most high uses of chewed gutka with tobacco where light blue for lesser uses. But colour coding used do not present the data well like Manipur has highest 92.9 whereas Tripura as 45.2 but by colour schema it will difficult to understand so I used different colour scheme to present same graph. Now it is clear visible that Manipur is highest compare to other including Tripura.

(Map-2 graph- Reproduce in different colour schema as choropleth Graph- My Graph)
Tobacco user (1)

image

image

Map 3 shows the share of male smokers who smoked more than five sticks a day in 2019-21 in India. The share in all the southern States, some northern States and some north-eastern States was higher than the rest of India.

Map 3: My Observation- Authors shows the share of male smokers who smoked more than five sticks a day in 2019-21 in India looks not correct or description is not correct. When we compare the Tobacco users Percentage for “Andhra Pradesh” in Map-1 is 19.1 whereas in Map-3 Percentage users for “Andhra Pradesh” is 44.9 so issue is, this 44.9% is from total population or from the smoker population that is not being clearly understandable.

Table Data:
Overall, in India, the share of cigarette/bidi smokers was coming down. Compared to 2005-06, the share of smokers came down by over 10% points in 2019-21.

image
Also, the rural-urban gap became negligible by 2019-21. However, among those who chewed tobacco, there was no change among rural users whereas among urban users there was a decline, although to a smaller extent compared to the drop in smokers.
Author displayed the data between different NFHS survey to showcase the decline ness of Tobacco users between Urban and Rural population with correction graph. But it does not explain the decline ness at a one go.

My Observation: I tried to display using 2d line graph and it is clearly visible that Smoked Tobacco users are declined for both Urban and Rural whereas Chewed Tobacco for Rural are not decline however Urban Chewed Tobacco are declined.

image

Also, among smokers, the share of those who smoked more than nine sticks a day reduced significantly and those who smoked less than five has increased (Chart 5) .

image

@imaadansari1
Copy link

imaadansari1 commented Mar 19, 2023

Imaad Ansari - 21f1004808


Shortfall of surgeons, gynaecologists and paediatricians in rural India was 80% in 2022

  • This story is about the acute shortage of surgeons, gynaecologists and paediatricians in Indian States.
  • The story also tries to convey the fact that over the course of 2012 – 2022 the supply of surgeons, gynaecologists, paediatricians and CHCs (Community Health Centres) has decreased in some states, while it has increased in the others.
  • The author tries to convey the data about shortages about health specialists and CHCs in the charts given below:

Charts By Author:

Chart 1
The chart compares the shortage of medical professionals in 2012 in 2022.

image

Problems:

  • The chart used a centred stacked bar chart to compare the shortfall which makes it difficult to compare the ratios. It would have been visually easier to compare if a vertical bar chart was used.
  • The order of makers in legends is not same as the order of data represented in the charts. The legends should have been in order 2012, 2022 not 2022,2012
  • The % symbol is not mentioned in the chart which might confuse some readers about the units.

Chart 2
It shows the shortfall percentage and surplus percentage of CHCs in each State as of March 2022.

image

Problems:

  • A scatter plot is used instead makes it visually harder to compare the points. A column chart would have been more apt.
  • There are no labels on the points, which makes it hard to determine which point represents which state.
  • The chart has no title or additional information to help determine what it is about.

Chart 3
This chart shows the shortfall percentage and surplus percentage of CHCs in each State in 2012.

image

Problems:

  • The problems are same as chart 2.
  • Additionally, Chart 2 and Chart 3 are related. Having two different charts to represent related data it confusing for the reader to comprehend. They should have been combined to one.

Chart 4
This chart shows the shortfall percentage of specialists in 2022 only among the States that had a shortfall of CHCs in 2022.

image

Problems:

  • A scatter plot is used instead makes it visually harder to compare the points. A column chart would have been more apt.
  • Different states are clubbed together in a single zone, and without labels it is not possible to determine which point represents which state.

My Approach:

Chart 1
The chart compares the shortage of medical professionals in 2012 in 2022.

image

What it does better:

  • The chart is not centered and there are different bars representing the shortfall percentage in 2012 and 2022.
  • This makes it easier for the reader to compare the numbers from both the years as the bars start from the same level.

Chart 2 & Chart 3
This chart shows the shortfall percentage and surplus percentage of CHCs in each State in 2012 and 2022.

image

What it does better:

  • Since the data in Chart 2 and Chart 3 is related, I have combined both the charts.
  • The charts 2 and 3 represents the data from 2022 and 2012 respectively. Having a single chart for both of them makes it easier for the reader to compare them and to relate the data with each other.
  • Having a column chart instead of a scatter plot makes it visually easier for the readers to comprehend the data and compare different data values.

Chart 4
This chart shows the shortfall percentage of specialists in 2022 only among the States that had a shortfall of CHCs in 2022.

image

What it does better:

  • This chart shows the shortfall percentage of specialists in 2022 only among the States that had a shortfall of CHCs in 2022.
  • Instead of a scatter plot a column chart is used to represent the data to make to visually easier to comprehend and compare.
  • The axis value is set to 0 and the bars are coming downwards to show that the values are negative.
  • The color or the bar is set to orange to indicate that the values are negative.

Thanks!

@jemma-mg
Copy link

jemma-mg commented Mar 19, 2023

Name: Jemma Mariya George
Roll no: 21f1001937


Data | Law to raise marital age is not enough as child marriages rarely get reported

Over 60% Indian women aged 25-29 in 2019-21 married before their 21st birthday

Link to Data Story: https://www.thehindu.com/data/data-law-to-raise-marital-age-is-not-enough-as-child-marriages-rarely-get-reported/article66536045.ece
Published on: February 22, 2023 11:31 am | Updated 11:47 am IST
Authors: VIGNESH RADHAKRISHNAN, REBECCA ROSE VARGHESE
Source: National Family Health Survey-5, Analytical Paper Series UNFPA
Additional Resources: National Family Health Survey-3, Analytical Paper Series UNFPA, National Family Health Survey-4

What is the story the author is trying to tell?
The author is discussing the issue of child marriage in India and the proposed amendment to increase the minimum age of marriage for women from 18 to 21 years. The author highlights the concerns raised by MPs and the Supreme Court's decision to leave the power to amend the law to Parliament. The author also points out the high prevalence of underage marriage in certain states of India and the challenges in enforcing the existing law.

What data he/she is using to tell the story?
Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.
The author is using data from the National Crime Records Bureau to show the low number of reported cases of underage marriage and data from the National Family Health Survey to show the prevalence of underage marriage in different states of India. The author also uses data from the same survey to compare the median age of marriage for women in different age groups.

  • Type of data: The data used in the above case is primarily quantitative data, including statistics and figures related to the age of marriage for women in India, the percentage of women marrying before the age of 18, the number of cases registered under the Prohibition of Child Marriage Act, and the share of women who married before turning 21.
  • Extent of the data: The data used in the case is limited to the context of India and the status of women's age of marriage and its enforcement in the country.
  • Dimensions of the data: The data includes different dimensions such as state-wise variations in the age of marriage and its enforcement, age-wise distribution of women marrying before the legal age of marriage, and the impact of education on the age of marriage.
  • Gaps in the data: One of the significant gaps in the data is the lack of comprehensive reporting of underage marriages leading to limited enforcement of the law. Another gap is the absence of qualitative data related to the social, economic, and cultural factors influencing the age of marriage.
  • What data is essential: The essential data includes statistics related to the age of marriage for women, the percentage of women marrying before the legal age of marriage, and state-wise variations in the age of marriage and its enforcement.
  • What is irrelevant: The irrelevant data includes any information not directly related to the age of marriage for women in India, such as details about the Bill's passage in Parliament, MPs' opinions, and the Supreme Court's judgment on the petition.

How is it encoded, what problems are with it?
The data is encoded in the form of percentages and charts to show the prevalence of underage marriage in different states and age groups. One problem with the data is that it does not capture the nuances of the reasons behind underage marriage, such as cultural practices and economic factors. To improve the data, one could include qualitative data from surveys or interviews with individuals who have experienced or witnessed underage marriage.


State wise distribution of percentage of women in the age 20-24, married before the legal age of 18:
Map 1
Despite the legal age of marriage for women being 18 years, almost 23% of women who were aged between 20 and 24 years in 2019-21 married before their 18th birthday.

  • In fact, in the eastern States of Bihar and West Bengal, the share was over 40%
  • In Assam, Andhra Pradesh and Rajasthan, the share was over 25%.
  • The share was below 10% in Kerala, Himachal Pradesh, Punjab and Uttarakhand, among other States.
  • only 1,050 cases were registered under The Prohibition of Child Marriage Act in 2021, according to the National Crime Records Bureau. Reportage of underage marriages is negligible, resulting in limited enforcement of the law.

State wise distribution of percentage of women in the age 25-29, married before the age of 21:
Map 2
Bill proposing to raise the legal age from 18 to 21.In India, over 60% of women who were aged between 25 and 29 in 2019-21 married before their 21st birthday.

  • In the eastern States of Bihar and West Bengal, the share was over 70%
  • In Andhra Pradesh, Madhya Pradesh, Rajasthan, Telangana and Tripura, it was more than 65%.
  • Even in Goa, the State with the least share of such women, one in five women aged between 25 and 29 in 2019-21 married before turning 21.

Share of women aged 20-24 and 45-49 in 2019-21 who married before their 18th birthday:
chart
In all the States, except Assam, Meghalaya and Manipur, the share of such women in the 20-24 age group were much lower than the share in the 45-49 age group.


How have you attempted to improve it?

  • The original article does not give insights into the yearly trends or variations from past statistical records, I have incorporated the reports of National Family Health Survey-3, NFH4 along with NFH5.

Comparison

  • The legend added to the map helps in understanding the visualization in a better manner.
  • From the combined analysis there has been a positive change in the trends except in some states.
  • According to NFH5, West Bengal, Bihar and Tripura top the list with more than 40% of women aged 20-24 years married below 18.

Background

  • This visual includes the place of residence, educational and economic factors and its effects on underage marriage in India, it also compares this data across the 3 surveys.
  • Better-educated women have had more control over when they should get married for decades now.
  • As per NFH5, 48% of girls with no education were married below 18 years of age as compared to only 4% among those who attained higher education.
  • Another striking feature is the variations in child marriage by the household wealth index. NFHS-5 shows that a staggering 40% of the girls from the lowest income group were married before they turned 18 years of age. In a clear-cut contrast to this, only 8% of girls from the highest income group got married before 18.

@jaidevd
Copy link

jaidevd commented Mar 19, 2023

Roll No: 21F1003751
Name   : Jaidev Deshpande

Disclaimer: The story uses data from NFHS-5 and NFHS-5. In order to come up with my suggestions of how data could be better visualized in this particular story, I have scraped the data from the original web page itself, since searching for the original data and preprocessing it was far too time-consuming.


Women from Tamil Nadu are at a high risk of anaemia

The original story points out that the number of women consuming dark green, leafy vegetables has drastically declined in Tamil Nadu. They further point out that this decline is indeed significant, compared to that in the other states. The lack of an iron-rich diet puts any demographic under a severe risk for anaemia. While this is bad generally for any population, the story does an exceptional job of pointing out that the deficiency is highly localized to southern states, especially Tamil Nadu. In other words, the situation is generally alarming, but given that a particular region or a state fares very poorly, points to a bigger underlying problem.

Data and Charts

The story uses NFHS-4 and NFHS-5 as their primary data sources (both surveys had independent questionnaires for men and women). Using this data, they have come up with four basic metrics for each state:

  1. The percentage of women who include dark green, leafy vegetables in their diet (as of NFHS-5).
  2. The change in the percentage of women who include dark green, leafy vegetables in their diet from NFHS-4 to NFHS-5.
  3. The percentage of men who include dark green, leafy vegetables in their diet (as of NFHS-5).
  4. The change in the percentage of men who include dark green, leafy vegetables in their diet from NFHS-4 to NFHS-5.

Thus, there are essentially two metrics, for each gender. Accordingly, the story features four charts, each of which describes these metrics across all Indian states. Here is a screenshot of the chart representing the first metric:

image

As can be seen, Tamil Nadu is at the leftmost position along the X-axis. This alone does meet the purpose of convincing the reader that the situation in Tamil Nadu is indeed dire. However, the original chart does not include the popup around the marker which represents Tamil Nadu - one has to click each circular marker to see which state it belongs to. Thus, there is no way to pre-attentively understand the point of the graphic. Additionally, the colors used for the markers do not have any special semantics - they exist simply to differentiate between geographical zones (i.e. there is no particular reason to color central Indian states in black and eastern states in grey). Finally, the axis labels are too dim.

The only visual scale that actually matters in this chart is the X-axis. It is clear that a state to the left is worse than a state to the right. Every other visual encoding - the vertical grouping of states into zones, the color encodings for the zone, and even the size of the circular markers - is completely arbitrary. These decisions do not help us exploit the natural, innate conventions that humans understand.

For example, see the second chart:

image

Since Tamil Nadu is at the leftmost position again, all we understand is that it is in the worst possible position relative to the other states. The other visual encodings contribute little or nothing to better understanding of that data.


Improvements

It is reasonable to assume that anyone who is interested enough to read the original story would know a little about the geography of India. Therefore, I propose that this data is better represented with choropleths. Choropleths also have some additional advantages:

  1. People naturally understand directions, and therefore, it is easier to locate geographical zones like "South India" or "North East India" on a map than on the Y-axis of the original plots.
  2. The original charts do not immediately show the names of the states. To see them, one has to click on each bubble independently. Thus, while it's easy to see that Tamil Nadu is in a bad position, it is time-consuming to figure out where there other states lie in relation to Tamil Nadu (we would have to click on each bubble independently, multiple times). A choropleth solves this problem by allowing the user to compare the colors of different regions.

Here, then, is the dataset represented as four choropleths, one for each metric:

image

As can be seen, each choropleth makes the point of the original article perfectly:

  1. In the top-left, Tamil Nadu appears as the darkest (reddest) region - indicating that the proportion of women eating an iron-rich diet is the lowest.
  2. In the top-right, Tamil Nadu again appears as the darkets region - indicating that the metric has massively dropped in the last few years since NFHS-4.
  3. In the bottom-right, TN is not the darkest state - meaning that men are doing better than women.
  4. In the bottom-right, TN is again the darkets state, alongwith Assam - which means that even the male population eating an iron-rich diet is declining, although not as badly as the female population.

Thus, we have used colors, orientation, and the viewer's (presumably sufficient) knowledge of Indian geography a lot more effectively to improve the narrative.

@Chirag-Goel-17
Copy link

AISHE higher education survey

Name: Chirag Goel
Roll No: 21f2000540

Article Chhosen:
https://www.thehindu.com/data/data-latest-aishe-higher-education-survey-has-many-discrepancies/article66480748.ece

Data

The article tells recently released All-India Survey on Higher Education (AISHE) 2020-21 report had revised the Gross Enrolment Ratio (GER) in higher education retrospectively for the previous four years, by recalculating it based on population projections as per the 2011 Census. Previous reports had used projections based on the 2001 Census.

Charts in Data

image

Issues:

  • The chart is not very clear. The title of the chart is not very clear.
  • Legend is missing.

image

Issues:

  • The chart is not very clear. y axis data is not clear.
  • It can be presented better.

(Chart by me)

image

@21f1004861-HarshithaSrikanth

Name:Harshitha Srikanth
Roll number:21f1004861

Title : Indians are spending more but eating less in the post-pandemic period
Link to original article:https://www.thehindu.com/data/data-indians-are-spending-more-but-eating-less-in-the-post-pandemic-period/article61445859.ece
Data Source used by article : The Hunger Watch Report

Story author is trying to tell:
The author is trying to infer the relationship between food buying patters of pre and post covid times,taking into consideration the inflation in prices and reduction in income as primary parameters of analysis.The analysis proves retail prices of 21 essential items across India shows that to buy 1 kg of each, a consumer has to spend ₹500 more in June 2021 than its average cost between 2016 and 2019 in the same period. This expense comes at a time when income levels have decreased due to stringent lockdowns.
The Hunger Watch Report(Primary data source) found that 27% of the respondents lost their incomes due to the national lockdown and 24% saw their incomes reduce by half by October 2020 compared to pre-lockdown levels.

The data has been presented and visualized as tables itself with colour encodings added.
There are three tabulations made:
1)Price inflation:
img1

2)Income drop
img2

3)Change in consumption patterns:
img3

Data encodings:
Data types:
Table with inflation prices shows a simple comparison of costs of 21 common goods before and after covid.
The before price is an average of the cost of that product over 2016-2019 and the post covid cost is the products price in 2021
Income drop table:
Is a table with percentage data on how many Indians had no income or income reduced by half and so on.
Change in consumption table:
4 major commodities are listed and how its consumptions have changed after the pandemic has been calculated in percentage terms.

Problems in encoding:
The data is not visualized at all. The tabular format has been directly presented with colour encodings that don't signify much.
The data does not tell a story at the first glance and takes quite a lot of time to understand each table and infer their connections.
The colour encodings do not add any semantics to the visualization rather they only end up distracting the reader.

Solving the issues:
I propose to make simple easy understandable visualizations of this data such that it strongly strengthens the claim the article makes clearly and concisely.

Visualizations:
Visual 1:
image
Link:https://public.flourish.studio/visualisation/13119416/

Visual 2:
image
Link:https://public.flourish.studio/visualisation/13119893/

Visual 3:
image
Link:https://public.flourish.studio/visualisation/13119920/

Problems I've tried to address:
Made simple visualizations for all the data.
Used simple colour schemes to not distract user.
Used annotations to help understand data
Made sure to strengthen the articles inference through the visualizations.
Tried to convey a story with the visuals.

@ayushpatidar14
Copy link

Name: Ayush Patidar
Roll: 21F1004981

Risky diet: Which Indian States top fried food and aerated drink consumption

@SamandeepSinghTomar
Copy link

SamandeepSinghTomar commented Mar 19, 2023

Name: Samandeep Singh Tomar
Roll No: 21f1001112

Original Story: "30 crore missing voters in India: mostly young, urban, or migrants"
Link: https://www.thehindu.com/data/data-30-crore-missing-voters-in-india-mostly-young-urban-or-migrants/article66485421.ece

Central Message:
India has experienced a significant increase in electors, but nearly one-third of them did not vote in the last Lok Sabha election. The Election Commission of India (ECI) is focusing on urban populations, young voters, and migrants to increase voter turnout.

Data Used in the Story:

  • Elector count and voter turnout from past 15 Lok Sabha elections
  • Parliamentary constituencies with the lowest voter turnout in select states (2019 general elections)
  • Registered electors for recent parliamentary/presidential elections in select countries
  • Voter turnout comparison among 162 countries

Encoding in the Original Story and Proposed Improvements:

  1. Bar Chart: Number of electors and voter turnout in past 15 Lok Sabha elections

image

Improvement:None
  1. Table: Lowest voter turnout constituencies in select states (2019 general elections)

image

Improvement: Sorting States Alphabetically.

image

  1. Bar Chart: Registered electors in top 10 countries

image

Improvement: Converted World Map

image

  1. Bar Chart: Voter turnout comparison among 162 countries

image

 Improvement: None

Adjustments to Dataset and Scope:
The focus remains on India's missing voters, highlighting their impact on the democratic process. No additional datasets have been incorporated or any alterations made to the data's scope.

By redesigning the visualizations, the story becomes more engaging and informative, helping users grasp the magnitude of missing voters in India and the efforts being made by the ECI to increase voter turnout.

@kevin-IITM
Copy link

kevin-IITM commented Mar 19, 2023

Name: Kevin Mathew Varghese
Roll No: 21f1004582

Original Story: "Where does your State stand on the India Innovation Index?"
Link: https://www.thehindu.com/data/india-innovation-index-where-does-your-state-stand/article29776807.ece

What is the story the author is trying to tell?

The author of the article is discussing India's position in the Global Innovation Index 2019 and NITI Aayog's India Innovation Index report 2019. The author highlights that India ranks 52nd out of 129 countries in the Global Innovation Index, but it has consistently improved its position in recent years.

The article also compares India's position with other BRICS nations in the Global Innovation Index, where India ranks third but is far behind China, which is in the top position. The author further highlights that the India Innovation Index report classifies States into three categories, major States, hill & northeast States, and Union Territories & smaller States, and provides the innovation index of each State.

According to the report, three of the five major States with the highest innovation index were in the south of India, while Sikkim topped the northeastern States. The article also mentions that the innovation framework of the India Innovation Index report has two dimensions, enablers (human capital, investment, business environment) and performance (knowledge output, knowledge diffusion), and provides examples of indicators for each dimension.

Finally, the author discusses the input-outcome gap in the India Innovation Index report, where the scores of the enabler metrics are higher than the performance scores in 29 out of 36 States and UTs, indicating an outcome gap relative to inputs. The author highlights that the gap is highest in Punjab and Gujarat among major States.

What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.

The author of the article is using two main sources of data to tell the story:

  1. Global Innovation Index 2019: The first source of data used by the author is the Global Innovation Index 2019, which ranks 129 countries on their innovation capabilities and outcomes. The data provides information about India's position in the global innovation landscape and its performance compared to other countries. The author mentions India's ranking in the index and its consistent improvement in recent years. The author also compares India's position with other BRICS nations and provides their scores.

  2. NITI Aayog’s India Innovation Index report 2019: The second source of data used by the author is the India Innovation Index report 2019, which provides an innovation index for each State in India. The data is based on a framework that has two dimensions: enablers (human capital, investment, business environment) and performance (knowledge output, knowledge diffusion). The report provides scores for each indicator for each State and categorizes the States into three categories.

The data used in the article includes both quantitative and qualitative information. The quantitative information includes scores and rankings, while the qualitative information includes descriptions of the enablers and performance indicators, the classification of States, and examples of the indicators.

The extent of the data used in the article is limited to India and other BRICS nations, as the focus is on India's position in the global innovation landscape and the innovation index of each State in India. The article does not provide information about other countries outside of the BRICS nations or compare India's position with them.

The dimensions of the data used in the article include India's position in the global innovation landscape, the innovation index of each State in India, and the enablers and performance indicators used in the India Innovation Index report.

There are some gaps in the data used in the article. For example, the article does not provide information about the methodology used in the Global Innovation Index or the India Innovation Index report. The article also does not provide information about the sample sizes used in the indices. Additionally, the article does not discuss any limitations or weaknesses in the data used.

The data that is essential for the story includes India's ranking in the Global Innovation Index, the innovation index of each State in India, the enablers and performance indicators used in the India Innovation Index report, and the input-outcome gap in the report. The data that is irrelevant to the story includes any information that does not relate to India's position in the global innovation landscape or the innovation index of each State in India.

How is it encoded, what problems are with it, and how have you attempted to improve it?

The Author's Chart:

image

What to improve:
The horizontal bar chart used by the author to compare India's innovation ranking with other BRICS nations is a simple and effective way to convey the idea. However, Instead of a simple horizontal bar chart, we can use a stacked bar chart to show the scores of each BRICS nation broken down by the different indicators used in the Global Innovation Index. This would provide a more detailed and nuanced picture of each nation's innovation capabilities and allow for a more meaningful comparison.

My Improvement:
Note: This is based off the same source by compiling different metrics. Hence the data is different from author's but conveys the exact same meaning by providing more information to the user.

Chart 1

In the above chart, we take an aggregated score of 3 different innovation measures as chosen and outlined by the Global Innovation Index. This picture makes it clearer what goes behind the final ranking.

@deysanjeeb
Copy link

deysanjeeb commented Mar 19, 2023

Name: Sanjeeb Dey
Roll Number: 21f1002729

Article: Younger diabetes patients on the rise in most Indian States
Authors:VIGNESH RADHAKRISHNAN,REBECCA ROSE VARGHESE

The story the authors are trying to convey: In more than 20 states out of 29 analysed, the percentage of young people with high glucose levels have increased over the last 5 years.

The Data: The data used by authors is from the National Family Health Survey-5(NFHS-5) which was conducted between 2019-2021 and the National Family Health Survey-4 (NFHS-4) which was conducted between 2015-2016.

Data Charts provided in the article

Chart 1a

image
This chart shows the %age of men below the age of 35 with high glucose levels during 2019-2021 and the percentage change from 2015-2016.

Chart 1b

image
This chart shows the %age of women below the age of 35 with high glucose levels during 2019-2021 and the percentage change from 2015-2016.

Chart 2a

image
This chart shows the %age of males between the ages of 15-54 with high glucose levels during 2019-2021 and the percentage change from 2015-2016.

Chart 2b

image
This chart shows the %age of females between the ages of 15-54 with high glucose levels during 2019-2021 and the percentage change from 2015-2016.

Issues:

  • Not easy to identify how many states have had reduction in high glucose levels
  • Difficult to compare female and male levels

My Approach:
image
image

As it both the graphs of male and female have been placed side by side with their scales adjusted, it is much easier to visually compare the two. Having a vertical line at 0 on the X axis makes it easier to differentiate states with positive vs negative growth.

@AliPhaeez
Copy link

Another government survey debunks Swachh Bharat’s 100% ODF claim, count increases to four

by VIGNESH RADHAKRISHNAN

Faiz Ali

21f1006793


STORY

The story that the author is trying to tell is that the Indian government's claims of achieving 100% open defecation-free (ODF) status in various regions across the country through its Swachh Bharat Abhiyan program are misleading and inaccurate. The article cites data from the National Annual Rural Sanitation Survey (NARSS) conducted by the government itself, which indicates that only four out of 28 states and union territories in India can claim to be ODF.

What data is he/she using to tell the story?

The author is using data from the National Annual Rural Sanitation Survey (NARSS) conducted by the Ministry of Jal Shakti. The survey collects data from rural areas in India to determine the ODF status of different states. The survey data covers the time period between November 2019 and February 2020.

The data contains information on the ODF status of different states, including the number of villages and households surveyed, the number of toilets present, and the percentage of households with access to toilets. The data also includes information on the discrepancies found in the ODF status claimed by the government.

The data is essential in determining the accuracy of the government's claim of achieving 100% ODF status in various states. However, there are gaps in the data since the survey was conducted before the COVID-19 pandemic, which may have affected the progress made in sanitation.

Furthermore The author used data from two more surveys -
1). The National Statistical Office (NSO) survey from October 2018
2). The National Family Health Survey-5 (NFHS-5) 2019-21
to question the Swachh Bharat Abhiyan's claim of achieving 100% ODF status in all rural areas of India.

How is it encoded, what problems are with it, and what improvements could be done?

The gaps in the data are not explicitly mentioned in the article, but it is likely that there are some limitations to the surveys in terms of sample size, representativeness, and accuracy of data collection. The data that is essential to the story is the comparison between the claimed ODF status and the actual availability and usage of sanitation facilities in rural areas, as this is the main point of contention.

In total , 4 charts ( geo maps/ choropleths) has been used in the article

Chart 1

image

COLOR CONTRAST
This is a common problem with such maps, and it can make it difficult for readers to distinguish between different states or regions.

To address this issue, one possible solution would be to use a color scheme that has a higher contrast and is easier to distinguish between different states. For example, instead of using shades of blue and green for different levels of ODF status, we could use a color scheme that uses distinct colors for each level, such as green for ODF, yellow for partially ODF, and red for not ODF.

Another option would be to use patterns or textures to differentiate between different levels of ODF status. For example, we could use diagonal lines for ODF, horizontal lines for partially ODF, and no pattern for not ODF. This would make it easier to distinguish between different states, even for readers who are colorblind .

Finally, we could also consider using a combination of color and texture to create a more effective and accessible visualization. By using a color scheme with high contrast and patterns or textures to differentiate between different levels of ODF status, we could create a visualization that is both informative and easy to read.

chart 2 and 3

image


image
both the charts tell the same story but with a different timeline , quoted from the article "Map 2 shows the share of such ODF-plus villages as of April 1, 2022. Overall in India, only 8% of villages attained this ODF-plus status back then. Tamil Nadu’s share was over 91%. Interestingly, just a year ago, according to the MIS survey, only 72.4% of rural households in Tamil Nadu had access to some form of toilet."
here the author is suggesting the readers to take a look at chart 3 and to compare between chart 2 and 3, as they we ahve to scroll down to get a look at chart 3 , comparing them would be difiicult.

I would suggest combine Chart 2 and Chart 3 into a single chart to make it easier for readers to compare the changes in ODF status over time.

One possible solution could be to use a single chloropleth map that shows the ODF status for each state in both 2022 and 2023. The map could use different colors or patterns to differentiate between the two time periods, and it could also include a legend or a caption to explain the changes over time.

image
image

Another option could be to use a small multiples approach, where multiple smaller maps are used to show the ODF status for each state in both years side by side. This would allow readers to compare the changes more easily without having to scroll down the page.

Regardless of the approach chosen, it's important to make sure that the chart is clearly labeled and easy to read, with a clear title and axes, and a well-designed legend or key. This will help readers to quickly understand the information presented and to draw meaningful insights from the data.

chart 4

quoted from the article ** "While the SBMG dashboard does not track toilet access separately, the Swachh Survekshan Grameen survey (December 2021-April 2022) lists the percentage of households with access to toilets (chart 4). It concluded that in 28 States, the share of such households crossed 90% and India’s average was 95%, wildly different from the figures of the MIS survey completed six months earlier."**

image

To improve the clarity of the story and better convey the difference between the MIS survey and the Swachh Survekshan Grameen survey, one possible approach would be to redesign Chart 4 to show a clear comparison between the two data sources.

For example, instead of just showing the percentage of households with access to toilets for each state, the chart could include two separate bars or columns for each state, one showing the percentage from the Swachh Survekshan Grameen survey and the other showing the percentage from the MIS survey. This would allow readers to easily see the difference between the two sources and understand the reasons behind the disparity in the results.

Overall, the goal of the redesign should be to help readers better understand the nuances and complexities of the data behind the story, and to provide a clear and accurate representation of the differences between the various surveys and data sources.

Addddditionallyyyyy

Achieving ODF status is not just a matter of building more toilets. In addition to access to toilets, several other factors can affect the overall success of a sanitation program, such as access to clean water, proper drainage systems, and public awareness campaigns to promote good hygiene practices.

For example, even if toilets are built, they may not be used if there is not enough water to flush them( and i've seen this situation in some of the villages in Rajasthan) , or if the waste is not properly managed and disposed of. Similarly, if people are not aware of the importance of using toilets and practicing good hygiene, they may continue to defecate in the open or use shared toilets, which can contribute to the spread of diseases.

Therefore, achieving and maintaining ODF status requires a multi-pronged approach that addresses not only the physical infrastructure but also the underlying social, cultural, and economic factors that affect people's behavior and attitudes towards sanitation.

Some possible strategies could include:

Promoting awareness and education campaigns to increase understanding of the benefits of using toilets and practicing good hygiene.

Building and maintaining proper sanitation infrastructure, including toilets, drainage systems, and waste management facilities.

Encouraging community participation and ownership of sanitation programs, which can help increase their sustainability and effectiveness.

Addressing issues related to water scarcity and access, which can affect the usage and maintenance of toilets.

By taking a comprehensive approach that addresses these various factors, it may be possible to achieve and maintain ODF status in a sustainable and effective way.

@21f1003692
Copy link

21f1003692 commented Mar 19, 2023

Name: Rudraraj Dasgupta
Roll Number: 21f1003692

Article: The religious states of America, in 22 maps
Author: Niraj Chokshi

What is the story the author is trying to tell?
The author is trying to covey the distribution of various religious beliefs among various ethnic group in the United States of America in a state wise visualization.

What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.

From the following visualization we can conclude that the southern states are more religious compared to the northern states. The data was gathered via interviews on attendance of weekly religious services.

image

Catholics are the largest religious group in America. The author segregates the Catholics in the categories as per their ethnicities. Catholic Americans were present in small numbers early in United States history, both in Maryland and in the former French and Spanish colonies that were eventually absorbed into the United States, the vast majority of Catholics in the United States today derive from unprecedented waves of immigration from primarily Catholic countries and regions (Ireland was still part of the United Kingdom until 1921 and German unification didn't officially occur until 1871)link during the mid-to-late 19th and 20th century.

image

We can observe that the Catholics are fairly distributed throughout the country barring exceptions of some states predominately some south eastern states.

Catholics by ethnicity

image

image

image

Unaffiliated is the second largest group. This group does to belong to any religions. The concentration of the unaffiliated group or irreligious group seem to have high concentrations in the west coast. Irreligion is the active rejection of religion in general, or any of its more specific organized forms, as distinct from absence of religion.

image

Evangelicals are the third largest group in the survey. In the United States, evangelicalism is a movement among Protestant Christians who believe in the necessity of being born again, emphasize the importance of evangelism, and affirm traditional Protestant teachings on the authority as well as the historicity of the Bible.

Evangelicals Protestants

image

Mainline Protestants are the forth largest group in the survey. Protestantism emphasizes the Christian believer's justification by God in faith alone (sola fide) rather than by a combination of faith with good works as in Catholicism. They are segregated into various ethnicities just like the Catholics.

image

image

image

image

The author has also provided visualizations for other religious groups such as Mormons, Jews, Jehovah's Witness, Orthodox Christians, Muslims, Buddhists and Hindus.

How is it encoded, what problems are with it, and how have you attempted to improve it?

While the author has provided visualizations for the religious(including ethnicity) groups. There is no context to understand the relative population of the states. Without the context of the population, conclusions cannot be drawn. Some states have higher population that other, some might have higher population densities. Also there is no timeline to understand the religion distribution among the states throughout history. United States of America has is highly diverse when it comes to religion and ethnicity. We are given the current distribution, but with history we have no context. We cannot draw any conclusions on the trends and the patterns that might occur if such data was provided. Although the author talks about the percentage of population for some religious groups, the data is not consistent throughout the article.

Wkipedia Article Religion in the United States provides more clarity.

image

The above line graph show us the trends when it comes to religious groups in the US.

image

The stacked bar graph shows us the trends, for example the increase in unaffiliated group and the decrease in Protestantism.

@NinoLeenus
Copy link

NinoLeenus commented Mar 19, 2023

Data | Latest AISHE higher education survey has many discrepancies

Name: Nino Leenus
Roll Number: 21f1001786

Article Link:
https://www.thehindu.com/data/data-latest-aishe-higher-education-survey-has-many-discrepancies/article66480748.ece

The Story Authors Narrate:
AIM of the story is to highlight the discrepancies in population projections and Gross Enrolment Ratios calculated using 2001 and 2011 census data.
The author is trying to highlight the discrepancies in the recently released All-India Survey on Higher Education (AISHE) 2020-21 report, particularly regarding the Gross Enrolment Ratio (GER) in higher education, which has been revised retrospectively for the previous four years by recalculating it based on population projections as per the 2011 Census. The author focuses on Tamil Nadu and other states to show how the use of different population projections has led to variations in the GER calculation, which, in turn, has affected the state's education policies and achievements. The story also highlights how the data used in the report is not consistent and that even if the projected population figures given in AISHE 2020-21 were used to calculate GER, the numbers did not match with the ones stated in the report.

Data:
Recently released AISHE 2020-2021 report (used census data 2011), whereas previous report used past census data (2001 census data).
Identifying the Data The author is using data from two primary sources - the All-India Survey on Higher Education (AISHE) 2020-21 report and the Census of India (CoI)’s Report by a Technical Group on Population Projection, released in July 2020. The data used in the AISHE report includes enrolment figures in higher education courses and projected population figures in the 18-23 age group for the years 2016 to 2020. The data also includes the GER in higher education for Tamil Nadu and other states. The data from the CoI report includes the projected population figures in the 18-23 age group for the years 2016 to 2020.
The type of data used is primarily quantitative data, including population figures, enrolment figures, and GER. The extent of the data covers Tamil Nadu and other states in India. The dimensions of the data include time (years) and geography (states). Gaps in the data include variations in the population projections and GER calculations in different reports. The essential data is the enrolment figures, projected population figures, and GER for each state for each year. The irrelevant data is any other data that does not relate to enrolment figures, population figures, or GER.
Data is visually encoded using line charts, bar charts, tables and scatter plots

Design Process:
Author is trying to project discrepancies in retrospective revision for calculating population projections and Gross Enrolment Ratios due to difference in raw data used.
Encoding the Data The author has used several charts and tables to encode the data and tell the story. The charts and tables include:
• Chart 1: This chart shows the population projection for Tamil Nadu in the AISHE 2020-21 report and the CoI report for the years 2016 to 2020. The problem with this chart is that the two projections show wide variations, and it is challenging to compare them visually.

image

• Chart 2: This chart shows the GER for Tamil Nadu in the AISHE 2020-21 report based on the 2011 Census and the AISHE 2019-20 report based on the 2001 Census. The problem with this chart is that it is challenging to compare the two reports visually as they use different Censuses.

image

• Table 3: This table calculates the GER using the population projections and enrolment figures taken from the AISHE 2020-21 report for Tamil Nadu. The problem with this table is that the calculated figures do not match with the ones stated in the report.

image

• Chart 4A and 4B: These charts show the difference in population numbers and GER calculations between the latest AISHE figures and the CoI’s numbers for different states. The problem with these charts is that it is challenging to compare the two sets of data visually.

image

Improving the Encoding To improve the encoding, I would suggest the following:

Clarify the significance of changes in GER: While the story mentions that the GER has gone up or down in most states, it does not provide an explanation of why this matters. The story could benefit from providing context on the importance of GER as a metric for assessing educational development and the potential impact of changes in GER on various stakeholders.

Include a comparison of state-wise GER rankings: The story talks about changes in GER for individual states, but it would be useful to also provide a ranking of states based on their GER in the latest AISHE report. This would help readers understand where each state stands in terms of educational development and how it compares to other states.

Provide more information on the methodology: While the story briefly explains how GER is calculated, it could benefit from providing more details on the methodology used in the AISHE report. This would help readers understand how the GER figures were arrived at and the limitations of the methodology.

Include quotes from experts: To provide more depth to the story, it could benefit from including quotes from experts in the education sector. These experts could provide insights on the significance of the changes in GER and the implications for policy-making.

Use interactive visualizations: While the scatterplot and radar graph used in the story are useful in showing changes in GER over time and across states, interactive visualizations could be more engaging for readers. For example, an interactive map that allows readers to compare GER across different states or a tool that enables readers to explore trends in GER over time could be more effective in conveying the story's key message.
Graph4 uses scatter plot to show the extend of deviation. Though the scatter in x axis clearly identifies the magnitude of deviation in ger values while its not clear about y axis data.
Ideally a radar graph would have been better or violin plot.

difference

radar graph

In a radar chart we can observe changes in 2D: Year and Population. The difference in projected population for each state can be visualized using data slicer which helps in interactive visualization beyond a static representation.

@arvindsankariitm
Copy link

Name: Arvind Sankar
Roll No.: 21f1002061

Chosen Article: Data | Justice Chandrachud to begin longest tenure for a CJI in a while

The article broadly discusses the appointment of Justice DY Chandrachud as the Chief Justice of India. The authors try to provide a background surrounding his appointment in numbers, and compare how his tenure may be similar or different from his predecessors.
On a side note, Justice DY Chandrachud has been a very popular judge in the legal community, particularly with law students and young legal professionals. He has been known for his assertive and progressive decisions on a wide variety of issues including privacy, homosexuality, free speech, personal liberty, gender justice and religious freedoms, among others. I had the pleasure of having him as a guest at my convocation, where he received a very long applause before and after his commencement speech. Given this background, his 'long' tenure, as the author tries to convey, is most likely to be a celebrated one.

What are the authors trying to convey?

The author discusses how Justice DY Chandrachud will have one of the longest tenures as the CJI in recent history. The length of the tenure may have implications on how much and what impact can the CJI have in the Judicial system, and broadly in the country. Unlike other Justices who had much shorter tenures (especially in the last decade), Justice Chandrchud may have the opportunity to make systemic, and much more meaningful changes to the judicial system.
Thereon, the author also tries to present the distribution of judges by their birth state and university from which they graduated from.

What do the authors miss?

On Length of Tenure

The authors do not seem to provide many graphical visualizations to support their narrative, and instead simply presents a table listing the CJIs and their tenure in decreasing order.
Pasted image 20230316183713
However, and more importantly, the author missed the opportunity to meaningfully convey why Justice Chandrachud's tenure is likely to a statistical anomaly. The author simply takes the length of their tenure into consideration without factoring in the process by which Chief Justices are selected and appointed.
Further, the authors also provide the following table indicating the age of Justices when they are appointed as the CJI. However, the table essentially provides the same information as table 1 given the position of CJI is timebound as elaborated in the subsequent section.

Some legal context
As per the constitution, the President appoints the next Chief Justice, generally on the recommendation of the outgoing Chief Justice. The succeeding Justice serves as the Chief Justice of the Supreme Court until he/she reaches the age of 65, i.e. until they reach their age of retirement. By convention, the senior most judge (by length of tenure at the Supreme Court) on the day of retirement of the outgoing Chief Justice is chosen as the next CJ. The convention has only been broken on two politically motivated occasions, which otherwise also do not hold much significance to our analysis.

Given this context, it is important to be mindful that certain Justices at the SC never get to be CJs (i.e. if an SC Judge turns 65 and retires before they become the senior most). This largely depends on how young the Judge was when they were elevated to the SC.
Furthermore, it is important to note that historically, the Constitution only envisaged the Supreme Court to have only 8 Justices, implying the SC would have the power to have 8 sitting Justices to hear cases. Over time, due to the pendency of cases, this number has from 8 in 1950 to 11 in 1956, 14 in 1960, 18 in 1978, 26 in 1986, 31 in 2009 and 34 in 2019 (current strength). Coupled with the fact that the average age on the appointment of the judges to the SC have not changed, CJIs appointed today are expected to have much shorter tenures than those during the inception of the SC.

The authors do not take the above into consideration, and simply compare the tenure of DY Chandrachud with the absolute length of tenures of his predecessors. While it is true that Justice Chandrachud's tenure would be the second longest in the past decade, there may be more to why the length of his tenure is something we may not see that often.
What would be an appropriate metric to detect such anomaly would as follows:

A better visualization for length of tenures

In order to account for the increase in strength of judges at the SC, data was obtained from an open-source legal data and legal tech initiative Agami. Specifically, data was obtained from JusticeHub.in, wherein the dataset was partially developed by my alma matter. Although the dataset includes data only up to 2019, details of 3 subsequent CJI appointed thereform, and details of 6 other Judges who may be appointed as the CJI after Justice Chandrachud's retirement have been appended manually.
my_plot

The above boxplots effectively show the distribution of the length of tenure of all past CJIs, with each box plots being indicative of the approved strength at the SC. As shown, there is a decreasing trend in the average length of the tenure as the strength of the Supreme Court increased; which was exactly as we had expected. The only exception to this would be the boxplot representing the lengths of tenures of past CJs who were appointed when the SC strength was 18. Note that this period was relatively short, and the data only comprises tenures of two judges, including of Justice YV Chandrachud who holds the honour of serving as the CJI for the longest duration. Justice YV Chandrachud also happens to be the father of Justice DY Chandrachud.
Given this observation, one can very clearly appreciate that although his tenure is only the 14th longest overall, the length of his tenure is sits right at the outlier whisker of the last boxplot. Such visualization may better support the argument that Justice Chandrachud's tenure may be an anomaly that we may not see at the SC, i.e. at least until a change is introduced in the retirement age or strength of the court.

Analysis of proposed visualization

While a more coherent graph appropriate to the message that needs to be conveyed with suitably provided pop-outs have been used, the visualization could further be improved on following aspects:

  • Boxplot colors do not provide any additional information. A legend could be provided to show what they indicate or could be represented in uniform colors.
  • Elements of boxplot may not be understandable for people with a background in statistics.

On Distribution of Birth State

Pasted image 20230319232032

The authors go on to provide the above treemap to indicate the distribution of the birth states and universities that the appointed CJIs belong. The former has come into discussion in recent years with questions generally being raised with respect to the diversity of judicial appointments in the country. While the appointment of CJ is based on seniority, appointment of Judges to the SC are determined by the collegium (which comprises 5 of the senior most judges of the SC). Such subjective system for appointment of Judges may subsequently reflected by the birth state distribution of the Judges.

In order to better visualize the diversity, or lack thereof, of the birth states of past CJIs, it would be more effective to provide a choropleth map instead of a treemap. A choropleth map would allow for a more intuitive and geographically accurate representation of the birth state distribution of the CJIs, which would enable a more meaningful analysis of the diversity in judicial appointments.

A better visualization for birth states

The following chart is proposed for representing the same:
birth-state-distribution

It is clear that Maharashtra, West Bengal and Uttar Pradesh consume a significant proportion of the birth state distribution, thereby indicating that the CJI appointments have not been diverse. One possible reason for the same could be due to West Bengal and Maharashtra having two of the oldest common law courts in the country. Since these states have some of the oldest institutions for legal practice, it is likely that their residents have had more opportunities to pursue a legal career and thus are more likely to enter the judiciary and be appointed as judges, including as CJIs. Moreover, some of the oldest and most traditional institutions have been established in these states, thereby allowing their residents to more comfortably obtain a quality legal education.

Analysis of the proposed visualization
  • There are many more factors and variables associated with diversity (such as caste, religion, race, gender, etc.) which are not adequately reflected on the map.
  • The visualization, while accounting for geographical region/distribution, does not take into consideration the population in each of these states. To remedy the same, a form of cartogram could be considered.

@codeswapnadeep
Copy link

codeswapnadeep commented Mar 19, 2023

Name: Swapnadeep Pradhan
Roll Number: 21f1002240
Article: Education, more than wealth, determines women’s marital age
Authors: Vignesh Radhakrishnan,Rebecca Rose Varghese

What is the story the author is trying to tell?

The authors of the article is discussing how education plays a more important role than wealth in determining when women get married in India. The article cites data from the National Family Health Survey (NFHS) that show that women who have completed more than 11 years of schooling tend to marry later than those who have less than five years of schooling. This trend has been consistent for decades, regardless of the current age group of women.

The article also reveals that wealth has only recently become a relevant factor in influencing women's marital age. Among older generations, even women from richer households married at a younger age than those from poorer households. However, among younger generations, women from wealthier households tend to marry later than those from poorer households.

The article further explores how caste and location also affect women's marital age. Women from SC/ST/OBC communities tend to marry earlier than those from non-SC/ST/OBC communities, even among younger generations. Similarly, women from rural areas tend to marry earlier than those from urban areas.

What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.

The article uses data from the National Family Health Survey (NFHS) 2019-21 to analyze how education, wealth, caste and location affect the age at which women get married in India.The NFHS is a large-scale survey conducted by the Ministry of Health and Family Welfare that covers various aspects of health and well-being of women, men and children. The survey collects information from a nationally representative sample of households using face-to-face interviews.

The article focuses on one indicator from NFHS-5: the median age at first marriage among women aged 25-49 years. This indicator reflects the prevalence of child marriage and early marriage among women, which have implications for their health, education, empowerment and rights. The article compares the median age at first marriage across different groups of women based on their wealth quintiles (from poorest to richest), years of schooling completed (from none to more than 11 years), caste categories (SC/ST/OBC/other) and location (rural/urban).

The article uses tables and charts to present the data. It also provides some context and explanation for the patterns observed in the data.

Some possible sources of error or bias could be:

  • Non-response or under-reporting by some respondents due to social stigma or fear of legal consequences
  • Inaccurate recall or estimation of dates by some respondents due to lack of documentation or memory lapses
  • Sampling errors or design effects due to complex survey design and weighting procedures
  • Measurement errors or inconsistencies due to different definitions or interpretations of terms such as marriage, first marriage, etc.

The data used in the article is essential for understanding how social factors affect women's choices and outcomes regarding marriage. However, some data that could be irrelevant or less important for this analysis are:

  • Data on men's marital age

How is it encoded, what problems are with it, and how have you attempted to improve it?

This article does not have any graphs. Only tables are shown viz.

Chart 1:
Screenshot 2023-03-19 at 23-06-12 Education more than wealth determines women’s marital age

It would have been better served if this table was supplemented by some graphs viz.

marriage-wealth

marriage-school

Another table given is this

Chart 2:
Screenshot 2023-03-19 at 23-53-05 Education more than wealth determines women’s marital age

This also could have been supplemented by line charts viz.

marriage-caste

marriage-location

Based on these charts, we can get a clearer picture of the societal factors which affect the age of marriage in women. Among these, the factor of education has the greatest correlation and this has been true across multiple generations. Thus increasing access to education for women must be a priority to discourage girl-child marriage.

Apart from these this article also has two charts elucidating the same data for men. This is irrelevant as the article purports to focus on women's marital age.

@cheriangeorge
Copy link

Cherian George
21F1002142

Title : Data | From 5% to 15%, China’s share in India’s imports tripled in last two decades

Article by JASMIN NIHALANI
Date of Publication : January 27, 2023 07:49 am | Updated January 28, 2023 12:06 am IST

Story

The author uses India's trade data(import-export by value) from 2002 to 2022 to highlight the increase in imports from China to India over two decades. The author also highlights the decline in India's exports to China from 2020 (6.9%) to 2022 (3.4%). However during the same period (2002 to 2022) India's exports to China do not show a consistent trend as in the case of imports. The author uses two charts to show the trade deficit between the two countries widening between 2002 and 2022.

The story also looks at the change in imports between China and 9 other countries including India between 2011 and 2021. A slope chart is used to illustrate this difference. The chart indicates and increase in imports from China for 8 countries while US remains constant.

The story also explores the types of commodities imported by India from China and exported from India to China. While non-value added raw materials constitute most of India's exports to China, India's imports primarily constitute finished electronic goods and machinery with a high degree of value addition. The author uses 2 tree-map charts to represent this. Line charts have also been used to capture the dependency of India on China for 5 types of goods/commodities from 2012 to 2021

Data Sources used in the story

The original data used for the story was downloaded from the above 2 sources. India's net import / export values for the years 2002 - 2022 were also taken. This is not represented in the original story.

Type of data : International Trade data from 2002 to 2022
Extent of data : Yearly aggregates of imports and exports between countries. Net trade data and commodity wise break up are available
Dimensions of data : 2 dimensional in most cases - value of trade and year are the dimensions available in the data. In one of the data visualisations a geographical dimension has also been used. This can be seen in the third figure below.
Gaps in the Data : The original story misses out India's net exports and imports by year and considers only the imports and exports between India and China. A line chart indicating the change in percentage of imports and exports between India and China from 2002 to 2022 has be represented in the chart.
Essential Data : India-China trade by value, commodity wise data
Irrelevant Data : Item-wise dependency data does not really contribute to the narrative put forth.

Visualisation 1
Imports from China to India
Original Visualisation Modified Visualisation
Screenshot from 2023-03-14 23-19-30 Imports from China to India
  • Click to enlarge above visualisations

The original visualisation uses import data (China to India) in $ billion from 2002 to 2022 to display a bar-chart. The line in the chart represents the change in percentage of imports from China from net imports year on year. The axis for this line is on the right side of the chart and it ranges from 4 to 17. Spanning this range across the entire vertical extant of the chart causes small changes in percentages to be amplified. The axis is also cut-off at the horizontal axis. In the absence of net import data in the visualisation it is had to perceive linkage between the change in percentage of Chinese imports relative to net imports in the given time period.

To modify this visualisation net import data was taken from the provided source and added to the visualisation. This is represented as a grey coloured bar. The red bar (semantically encoded) in the grey bar indicates the portion of Chinese imports. The blue line in the chart represents the percentage of net imports that constitute imports from China. The blue line spans 0-20% and occupies the lower half of the chart giving a clearer picture of reality without amplifying small changes in percentages.

Visualisation 2
Exports from India to China
Original Visualisation Modified Visualisation
Screenshot from 2023-03-14 23-19-47 India's Exports to China
  • Click to enlarge above visualisations

The original visualisation uses export data (India to China) in $ billion from 2002 to 2022 to display a bar-chart. The line in the chart represents the change in percentage of exports to China relative to net exports year on year. The axis for this line is on the right side of the chart and it ranges from 3 to 8. Spanning this range across the entire vertical extant of the chart causes small changes in percentages to be amplified. The axis is also cut-off at the horizontal axis. In the absence of net export data in the visualisation it is had to perceive linkage between the change in percentage of exports to China relative to net exports in the given time period.

To modify this visualisation net export data was taken from the provided source and added to the visualisation. This is represented as a grey coloured bar. The red bar (semantically encoded) in the grey bar indicates the portion of exports to China. The blue line in the chart represents the percentage of net exports that constitute exports to China. The blue line spans 0-10% and occupies the lower half of the chart giving a clearer picture of reality without amplifying small changes in percentages.

Visualisation 3
Imports from China to nine other countries including India and some of its neighbours
Original Visualisation Modified Visualisation
Screenshot from 2023-03-14 23-21-18 China_Radar
  • Click to enlarge above visualisations

The original visualisation uses data of nine countries - their total imports from China in 2011 and 2021. The data is visualised using a slope chart which highlights the general increase in imports from China. The United States seems to have maintained stability by constraining imports from China to ~18.4% of it's total imports over the 10 year period. This is represented by the nearly straight line parallel to the horizontal axis for USA. This chart could be quite hard to understand for someone who has not seen slope charts before and in the original visualisation the names of 2 countries are missing.

The visualisation has been modified to represent the Chinese import data of the same countries on a radar plot. This has been complemented with 2 Choropleth maps (2011 and 2021) to capture the change in percentage of imports from China. Semantic Encoding - A deeper shade of red indicates a higher percentage of imports from China and a lighter shade indicates a lower percentage. Both the Choropleth maps have been colour adjusted so that USA has the same colour in both maps since its percentage remains the same.

Other Visualisations Present in the Story
Most Exported Goods Most Imported Items Item-wise Dependency
Screenshot from 2023-03-14 23-20-31 Screenshot from 2023-03-14 23-20-58 Screenshot from 2023-03-14 23-21-38

The three visualisations above represent and depict the data quite well in line with the narrative presented in the story and were not modified.

@successmanraedx
Copy link

successmanraedx commented Mar 21, 2023

Rajan Kumar
21F1006139

Title : A year of Russia’s invasion of Ukraine
(https://www.thehindu.com/data/in-charts-a-year-of-russias-invasion-of-ukraine/article66545197.ece)

Article by THE HINDU BUREAU
Date of Publication : February 24, 2023

Story

The article is a collection of charts and data that illustrate the events surrounding Russia's invasion of Ukraine over the past year. It covers topics such as the number of people affected by the conflict, the amount of military spending by Ukraine and Russia, and the impact of economic sanctions on Russia. The charts provide a visual representation of the conflict and the article provides a concise summary of the key points.

On February 24, 2022, Russia launched an attack on Ukraine, following weeks of military build-up along the border of its neighbour. The conflict dates back to 2014, when Russia annexed Crimea. As of February 21, 2023, deaths of Russian and Ukrainian military personnel amounted to 180,000 and 100,000, respectively, while there were 16,150 civilian casualties in Ukraine, since February 24, 2022. Nearly one-third of the population of Ukraine remains forcibly displaced from their homes, making it one of the largest displacement crises in the world today, according to the United Nations High Commissioner for Refugees. The conflict has had a devastating effect on the lives of many.

Data Sources used in the story

  1. Institute for the study of war, Reuters
  2. New York Times
  3. UNHCR Operational Data Portal

Visualisation 1:

There is huge difference in data value and there are many parameters on which this comparison has been done. So decided to go with doubnut chart

Screenshot 2023-03-21 at 1 50 53 PM

After Modification:
Military Losses & Casualties
Screenshot 2023-03-21 at 6 07 08 PM

Visualisation:
Screenshot 2023-03-21 at 6 31 41 PM

After Modification
Used Hierarchy Chart:

Screenshot 2023-03-21 at 6 32 18 PM

Problems with the visualisation and my attempt to improve it
...
Problems with the chart:
Chart 1:
• Unnecessary distraction due to more imagery and graphics
Chart 2:
At first glance, it's unclear what to compare from this chart. The author's intention is for us to compare the numbers, which is not immediately evident.

@kumar-cmd
Copy link

kumar-cmd commented Mar 21, 2023

Kumar Chandan : 21F1004845

Title : Data | Food( Fish, Chicken, Egg) habit of Indians state-wise
https://www.thehindu.com/data/data-how-many-indians-eat-meat/article65299234.ece

Story by Author: The author of the article uses several graphs to illustrate the data on meat consumption in India.
This graph allows readers to quickly see which states have the highest and lowest percentage of meat-eaters, including chicken, mutton, fish, and eggs. This graph helps readers understand which types of meat are most popular in India.
Overall, the author uses these graphs to visually represent the data and make it easier for readers to understand the trends and patterns in meat consumption in India.

This article provides statistical information on the percentage of Indians who consume meat.
Type of Data:
The data presented in the article is quantitative data, as it consists of numerical values representing the percentage of meat-eating Indians in different states and regions of India.
Extent of the Data:
The data presented in the article is based on a survey conducted by the Indian Council of Medical Research (ICMR) between 2019 and 2020. The survey covered 30 States/ UTs of India, representing over 70% of the country's population.
Dimensions of the Data:
The data presented in the article has several dimensions, including:
State/region: The article presents data for 30 States/ UTs of India.
Type of meat: The article provides information on the percentage of people who consume different types of meat, including chicken, mutton, fish, and eggs.
Demographic factors: The article also breaks down the data by demographic factors such as gender, age, and education level. For example, it provides information on the percentage of men and women who eat meat and the percentage of meat-eaters in different age groups.
Gaps in the Data:
One potential gap in the data is that how many people consume meat of state and what % people consuming meat age-wise.
Essential Data:
The essential data provided in the article includes the percentage of Indians who consume different types of meat, the breakdown of meat consumption by state and region, and demographic factors such as gender and age.
Irrelevant Data:
There is no irrelevant data presented in the article, as all of the information provided is relevant to understanding meat consumption patterns in India.

Page 8

Since the data used in graph not mentioned clearly that from where he got these data so i scrap the data from graph and put it in CSV file to analyse it better before going to redraw this graph again. i also added one extra column for population to see total number of consumption state-wise, which is given below.

Page 10

Even the author try to explain the the data in 4 different graph but again there are many informations missing:

  1. Colouring the state is not proper like they coloured the state with same colour if they fall in 75%-90% , they mainly used only 5 colour to represent whole country.
  2. they using three different graph to show the three food items where its hard to correlate the three food items in the same state that what is the relative ration they consuming the food.
  3. they talking in term of % so here here its hard to find that actually how much meat consumption by every state

Final Graph:
Page 9

I tried to solve the missing informations which was missing in given graph

  1. In final graph the colouring of the state is not discrete but its continuous so here i used gradient concept to show actual colour based on the %.
  2. Instead of using three different graph i used only one graph and at every state I used pie chart to display the relative % of different food.
  3. Size of Pie Chart: To display the total consumption used by the number of people , The maximum the size equal to maximum people consumed the non-veg.

Interesting Fact: In graph we can see clearly that only 55% people eat non-veg from UP compare to Tamil Nadu but consumption of meat/fish/egg is much larger in UP

IMP Points:

Since the chart and the graph generated using programatically so its highly dynamic in nature.
The graph presented here solve the problem only for food (fish/meat/egg) but it can solve any comparison type of problems state-wise in country. Like, export import, different type of plants grow-up by a state and many more.

Technology and data :

  1. For Drawing the map i used GeoJSON and TopoJSON data for India with states
  2. gadm.org
  3. geoBoundaries-IND-ADM3.topojson
  4. For Drawing the pie chart followed by the geolocation i used D3.geo
  5. for project of geoJSON data on map i used geoAlbers project method
  6. For drawing and colouring I used vanilla Javascript, SVG, D3.Js mainly

@Ajay-Kumar-1998
Copy link

Ajay-Kumar-1998 commented Mar 21, 2023

Lolla Ajay Kumar : 21f1000200
Title: Analysing the post-pandemic Indian economy and its recovery
NEWS ARTICLE LINK

The News article that was chosen by me is a Analysis report on the Indian Economy and various parameters that are effecting it in the post-pandemic world. The writer of the article has considered 4 major parameters that effect the GDP growth rate into consideration.
The four parameters considered are-
-Private Final Consumption Expenditure(PFCE)
-Government Final Consumption Expenditure(GFCE)
- Gross Fixed Capital Formation(GFCF)
- Net Exports
- GDP (y-o-y) growth in %

-
These 4 parameters are very crucial in GDP growth rate of any country, and even India. And the article is all about showing now effective the 4 parameters are.

In this article of THE HINDU, 5 charts have been used to depict various trends.

CHART-1

image

NEGATIVE POINTS-

  • Although the intention of the chart was to show the contribution of the 4 parameters in GDP growth, It is very though to compare relative share of the 4 parameters.
  • The increase/decrease of any parameters contribution is also not clearly visible in the chart.
  • The most important Quarter (x-axis) was Q1, but it is not at all mentioned or highlighter in chart.

Keeping in mind all the above drawbacks, I had come up with a new approach.

MY APPROACH- CHART-1
Firstly the whole intention of Chart-1 is to show the relative contribution of the 4 parameters towards the change in GDP growth. And in the above chart, we can clearly see through the increase/decrease in Area of the colours.
All the data points have been preprocessed and scaled to sum to unity, and then plotted.
The two rectangles are to highlight the high points in the two financial years which is Q1.
The Chart was completely built from scratch, as there was no such tool available.

CHART-1

CHART-2

image
NEGATIVE POINTS-
Here two charts have been used to show the comparison of trendlines between GDP growth rate, PFCE, and GFCE.
Having two separate charts looks redundant.

MY APPROACH- CHART-2
I have merged bother the charts and came out with a single one.
The Visual appearance also is better than the original design.

CHART-2

CHART-3
image

NEGATIVE POINTS-
The legends on the x-axis are not clear.
and visual appearance of char is not aesthetic.

MY APPROACH -CHART-3

I have introduced a new line that depicts the lag between the REPO rate and FED rate. It clearly shows which happens first and which happens later.
chart-3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests