-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesigning The Hindu Data Point Stories (2019) #1
Comments
Roger Federer - Aged Like Fine Wine?
For this re-design, I picked an article titled 'Roger Federer's career shows he's ageing like fine wine', written on the 8th of March, 2019 (right after he won his 100th ATP title). It was a visualization of Roger Federer's career performance in terms of titles won, when compared to the 15 men's tennis players with the most titles. The original article can be found here. The article claimed that Federer has been performing better as his career has progressed, relative to the other players in the top 15. It did so by comparing a single metric - Titles won/Tournaments entered - for each player across three stages of their career. The information was presented through simple dot plots made in Flourish. The plot for Stage I (Years 1 to 7) looked like this - The percentage of wins was encoded in triplicate - On the Y Axis, in the size of the dot, and the shade of the dot as well. In terms of interactivity, readers could view the position of one player at a time using a drop down menu in the top left. By hovering on a dot, readers could see information about the player, number of wins, and win percentage. Having followed tennis fairly religiously since I was 7, I felt that the approach towards player performance was overly simplistic at best, and at times very misleading. Just to clear things, my mood in life is directly proportional to the extent of Federer's successes at any given point of time. That being said, I don't think he has gotten better over time. What's amazing about his career is how he has bounced back from multiple almost-career-ending spells, and is still performing consistently at the highest level of the game. The biggest problem I had with the visualisation was how the longevity of a player, and consistency over time, was not being adequately expressed. In fact, in the third graph, I felt it went ahead and displayed the opposite insight, by making it seem that Djokovic and Nadal had reached the level of Federer, but with much greater pace (Image below). This is most clearly not the case. And so I started off by collecting and compiling more detailed statistics for each of the 15 players in this visualisation. I managed to find the number of tournaments entered and titles won in each year of each of their careers, and put it all together in a spreadsheet that looked something like this - Of the 15 players included in the original title, I excluded three - Rod Laver, Ilie Nastase and Guillermo Vilas. These three all played some portion of their careers before the Open Era, and a number of tournaments that they played in at the start of the Open Era are no longer recognised by the ATP, and I wasn't able to find accurate statistics for them. Even a cursory glance at the image of the spreadsheet above depicts how it doesn't make sense to compare all players across the three stages of their career. Federer has been active in the tennis circuit for 22 years now. Only four players of the remaining 12 are active, and considering Andy Murray's recent debilitating history with injuries, it seems unlikely that his third act will produce many more titles. Among the players who have retired, only Jimmy Connors, John McEnroe and Andre Agassi have had career lengths comparable to Federer. Initial explorations with graphing all 12 players together led to some very noisy and unreadable graphs. For the purpose of this redesign, I decided to pick -
I started by plotting the same metric used by the original article (titles won/tournaments entered) for each of these six players across the three stages of their careers. This is depicted below - These graphs did make the distinction a little clearer. One can easily see how Federer's efficiency metric has been much higher than any other player consistently over stage 2 and stage 3. But this still felt inaccessible to people without a clear understanding about how the tennis season is structured. I then tried just representing wins and losses at tournaments across Federer's career through a stacked bar graph - This to me seemed like a nicer way of depicting things. It gives readers a visual sense of the total number of tournaments played at each step of a players career, and how many of those tournaments were converted into titles. I then made a continuous graph (instead of bars, for purely aesthetic reasons) for each of the six players in consideration. I also figured it would be nice to have player graphs correspond to certain known characteristics about them.
The initial iteration of the article that was discussed in class looked like this - The career stage graphs on the right are all aligned to the start of each player's career, and divided by guide lines into the three stages considered by the original article. Textual information was included for the number of tournaments entered and titles won for each career stage. Another graph, depicting the cumulative titles won by each player over the course of their career was included as well. There were a few issues with how the information was presented in this. It was unclear whether players were retired or currently active, as all graphs were brought to zero in the end. A number of the labels were not very readable too. These have been fixed, and the current version of the article looks like - Addition 1Changes to the article:
The newer version of the article is included below - Thanks to Venkat Sir for pointing these mistakes out, and for the overall feedback as well. Further ResponsesOrder of playersOn the whole, the order of the 6 players in the main article was determined as follows -
The meaning of 'Consistency', in the context of this analysisMy interpretation of consistency was informed by the broader analysis of 12 Open Era players. The expanded version of the visualization is included below. The original 6 are ordered as is, and the next 6 are in the order of decreasing career lengths. Now the drop-off in the case of the next 6 players is very clear when presented visually. They have all played far fewer tournaments in the third stage of their careers, and have won even lesser tournaments than those in the initial visualization. Consistency in this case is a subjective, visually judged measure. I think it would be possible to quantify the same in a better way, will have to think about it. But if I had to split the measure, the three parts to it would be (obviously) -
For example, Thomas Muster played a staggering number of tournaments in his stage II (182), but really didn't win that high a percentage of them. Soon after, in stage III, his output and efficiency quickly decreased even further. Hence between stage 2 and 3, Muster was not quite consistent, nor efficient. Now going back to the initial 6, Federer's 'consistency' comes from the fact that he has played much more tournaments in the third stage of his career than anyone with the exception of Jimmy Connors. However, Connors' win percentage was terribly low throughout his stage III, while Federer at this point in time has the highest win percentage among any player, even in the longlist. This combination of longevity and high win percentage makes his trajectory look more consistent over the 3 stages. The same can be said for his contemporaries, Nadal and Djokovic, who have maintained similar win percentages in their third stages, and show no signs of dipping in the near future. The dominance of Federer, Nadal and Djokovic over the last two decades has been unprecedented, and on an individual level, their output has far exceeded any tennis player before them. While they haven't played for as long as Connors, they have managed to keep winning the biggest tournaments of the sport at a point where Connors was merely participating in his career. So in a way, the article could just as easily have been about the great performances of Nadal and Djokovic. Maybe some other author will write it for them. I respect what they have achieved in the sport, but this is me doing my bit and preaching the gospel of Federer. That's all for now, |
Good document! Some quick comments:
|
What is the status of Smart City projects in India?Original Article link click here The story they are telling through article In this article, the author tried to give a status report of the smart city mission. Major insights of this article
The data they are using to tell story The data used to make this data story is mainly from smart city websites(maybe). So I tried looking for the data but could not find anything related to reports on smart city website. But I got another report from MoHUA stating figures and data related to the smart city link, page no 189 How is the data encoded and problems with encoding They used a 3D doughnut pie chart to visualize data but did not mention the source of data as well as the legend to chart. .. . ..
Complexity in this visualization: Cost of the projects and number of completed projects are two independent variables, comparison of both in the form of a graph with additional third parameter (Number of smart cities in States) is difficult to understand for the common readers. .. I started with reading and finding important information in this article. In this article, some of the important statistical figures about the smart city were just written in text format and need to be presented in a better way and highlight it. -Identified information-
The above information can give insight and the number of projects initiated under the smart city mission. The budgetary information needs to be visualized in a manner to give a comparison between the budget allocated and the budget spent on projects. I separated information of "number of smart cities in the state" from the "average number completed projects per city" vs "average cost of completed project per city" for simplifying the visualization. |
|
Here is the original Hindu Article. Data Details Gaps in the data: The rate of success for the different kinds of moon missions (Lander/ Rover/ Orbiter/ Sample return) is given in percentages and in the same chart even though the data is in small samples. This led to complication in calculating the exact number of successful and unsuccessful missions of different purposes which is why the story does not sell. The reader gets confused in the tables and percentage values when all s/he is looking for is the proof about how Moon lander missions are more difficult. The message is delivered through data but only text. The charts have confusing titles which increase cognitive load as the reader has to reverse calculate the figures. The Info-graphic The feedback session helped me realize that the concentric circle representation is not the best way to compare lengths and hence values. I should have removed the percentage values and consistently used the number of missions in each category as the numbers were quite small in my data set. Feedback Incorporation: |
A World within a CountryHere is the original article. The article's aim is to be show how big the Indian electorate is and compare it to some of the democracies of the world. It starts of by showing the growth in the electorate since 1952. They also mention the percentage of voter turn out over the years starting from 1962. These visualisations needn't have been separate and could have been combined into one, letting the user explore correlations between the no. of voters and turn out. Also, we don't get a sense of how many voters are actually voting with these two separate visualisations. There's also a section about the number of candidates contesting, with a line graph to represent the percentage of women candidates. While this was an interesting graph, with a lot of areas that could be improved, I felt it did not add to the story of showing how big India's electorate is, so I decided to omit it from the final visualisation. The final visualisation of the article shows how the electorates of the various Indian states compare with other democracies of the World. The visualisation gives an approximation of what countries would could best replace the Indian states if only the electorates were considered. So, for example, Uttar Pradesh has an electorate that is roughly the same size as that of Brazil. The problem with this is it doesn't tell anything to the reader. To start with, I have no idea about the size of Uttar Pradesh's electorate, and I am being told that this is the same size as the electorate of Brazil. Also, the comparisons are being made with countries that don't have any relation with the state in discussion. Jammu & Kashmir is compared with Madagascar, which doesn't quite make much sense. I tried to remove this notion by getting rid of a choropleth and instead have a Sankey. This ensured that the only comparison would be between the electorates of the states and countries without adding confusion. In order to be able to make the Sankey, I needed to find the numbers that make up the electorate. For the Indian states, thankfully, Wikipedia had a neat documentation under their article on the Indian Elections. For finding the electorates of other countries, I had to look a bit before I found this wonderful website. The ordering of the Sankey is arbitrary, as is the colour choices. This was the limitation of the software I was using at the time. I have also combined the growth in the size of the electorate and voter turn out into one visualisation to allow for comparison. There were quite a few areas to improve with this visualisation. To start with, the percentage of voters line could have been more exaggerated, as done in the original visualisation to highlight the ups and downs in the voter turn out. The Sankey was too small and for the given space, there were other better ways of visualising, while retaining the choropleth characteristic used in the orginila visualisation. The Sankey was too arbitrary and could have done with some amount of logic, with respect to the ordering and use of colours. Also, the smaller states, such as Nagaland and Tripura are hardly visible. So on to fixing this... Now I really wanted to fix everything, really. But somethings require greater attention than initially thought. I decided to focus on the fixing the Sankey. I wanted to make the representation more relevant. While the Sankey showed how the electorates translated across to different countries, it was still flawed in terms of the arbitrariness of the flows. Also, there was no inherent order in which either the states or the countries appeared. And the other pressing point was the lack of relatability to some of the lesser known countries visualised. Just as the Indian electorate's choices are represented in Lok Sabha, I decided to represent the relation of the states to the countries through their representation in the Lok Sabha. At the same time, I decided to fix the issue of having to look up countries such as Timor-Leste. I changed the data set to 10 well known democracies that the average Indian reader would've come across. I took the size of the electorates of each of these countries, and normalised them to get the magical number of 543 seats. I then visualised the Indian states' allotted constituencies on one side. Arranged them alphabetically, and split the legend either side of the chart depending on its proximity to the location of the state's dot. This might be slightly contentious, but 31 different colours would've been quite a task. Right below this, I placed the representation that the countries would have if they had to make up the Indian Lok Sabha. So that I think would be my final visualisation of translating the Data Point article. Cheers, |
How Much Mobile data do Indians use in a Month?The original article can be read on this link. The article is trying to tell a story of the trend of mobile data usage in India in the past four years using data mined from the report of TRAI titled Wireless Data Services in India published recently. The author gives emphasis to show how much mobile data do Indian individual uses in a month and how these numbers have been changing from 2014 to 2018. The author tries to articulate the story using three graphical visualizations. First, to show the change in the number of mobile data subscribers according to different connectivity technologies(CDMA, 2G, 3G, 4G) throughout the last five years. Second, to show the effect of change in cost on mobile data consumption per month per subscriber. And third, to show the number of data subscribers and average data consumption per month in different service areas. For the first visualization, an area chart is used to show the number of mobile data subscribers for four types of connectivity technologies, i.e., CDMA, 2G, 3G & 4G. The area chart is inherently difficult to comprehend when the intent is to show the change in numbers and not just to show something is changing over time. To show how the number of subscribers changed over time, I redesigned the visualization using the line chart, with the same dataset. The lines are encoded with different colors for different types of connectivity and legend is provided to refer the encoding. With these changes, now it is easy to understand the trend in the use of Mobile data of these connectivity technologies for each year from 2014 to 2018. The second graph, which shows the trend in the amount of mobile data used with respect to the decrease in the cost of mobile data per GB, used the line graph on top of the Bar graph. Although the trend shown in this graph is easy to understand, the graph itself is not necessary here to understand this trend. It is obvious that decreasing rates result in an increase in mobile data consumption. For the third visualization, the scatter plot is used to show the amount of data used per month vs. no. of subscribers in each service area. From the scatter plot, it is difficult to understand the two attributes (amount of data used per month and the total number of subscribers) of each service area. I transformed the same scatter plot to a Bar graph with x-axis plotted with two bars (Red & Green) for each service area, Red for a number of data subscribers in Millions and Green for the amount of Mobile data consumed by individuals per month in GB. The two Y-axes show the value of two bars encoded with respective colors. The First Submission Feedback
Iteration
|
What the Parliament has been DiscussingThe original Hindu article can be found here
What I do have, though, is a PDF version I'd used for printing, and the valuable insights gained from discussion in the class.
In any case, I would need to make the infographic again.
Another article, instead?Since I'm working from scratch here anyways, thought I could try out another article from Data Point which seemed pretty interesting from the data perspective, but alas for readers of The Hindu, not visualized particularly well. Here's the 'new' article, which looks at crude oil, the recent price spike (all time high in the last 19 years), and India's oil imports. The particular graph which had source data available deals with India's oil import and looks somewhat like this:
India's Oil Imports
This totals to 85.18%, leaving us with an assumption that the remaining 14.82% could be attributed to 'others' The fact that it is components of a whole pointed towards using a pie chart or tree chart. As pie charts are more widely understood, I chose to represent the data in this form. I had originally intended to make the pie chart colourful, with the colours of the flags of different countries from where India sources its crude oil. But on reflection, I decided not to go ahead with it as the important content here is the chunk Saudi Arabia contributes, and so I decided to give greater importance to this key point of the content rather than equal importance to all source countries and having a colourful graphic which adds clutter to the intended datastory. Using semantic colours to relate to a barrel of 'black gold' i.e. crude oil, and newsprint-like body copy as well as typography, the overall news piece looked somewhat like this: I tried to add some more meaning by giving a brief summary of the story the body copy told, and added this as a 'tldr' (too long, didn't read 😛 ) type of encapsulation so as to cater to readers who would like to know the story in short without having to delve into the verbose part of content. This was done as a series of steps which looked at events one after the other. Post these additions, I corrected for better visual clarity, legibility and contrast while maintaining the original semantic colour scheme. The final news piece looks like this: Although I call it the final news piece, I'm looking forward to actionable feedback and constructive suggestions to make it better and iterate further, given time. Made with love and data, |
You can find the original Hindu article here Data: It also gives the data about the trends in percentages of the 'couple only', 'single mother' and 'single father' families between the years 1983 and 2015. Problems with the original visualization:
Feedback on the above visualization: Submission 03: |
I chose to redesign the Visualization of the article Titled - More Indians have access to drinking water, basic sanitation_
The current data visualization is as follows- availability of drinking water-
The issues with this data visualization; |
Is Kashmir UnderdevelopedTo redesign the story 'Is Kashmir underdeveloped as stated by Amit Shah?'. For the data in its current form
Cons:
(The images in the iteration 1 and 3 are not complete, the decision to discard the direction was made after a quick wireframe to get an idea of what the entire article could look like) Direction 1 Direction 2 Post this I realised that I had to bring in the interval information i.e. the average (which was getting hidden in the text). I retained the use of an X axis to show the relative position of each state which is a simple and easily comprehensible representation. Direction 3 Thanks! |
The selected data point article was How fast does traffic move in your city? The article cites information from the research paper published by researchers from various universities in the US. The papers evaluate 154 cities in India based on two indices, Mobility Index and Congestion Factor. The mobility index incorporates the element speed of the motorised vehicle in a given city. Whereas, Congestion factor checks for Traffic Density of the city by Number of registered vehicles in the city, Population of the city, avg time delay to cover some distance. With the provided chart, most of the important information is concealed, Moreover, the chart itself is difficult to interpret. The data set of all 154 cities is nowhere found in the article or in the research paper. Another issue with the research is the comparison of cities like Dhanbad and Mumbai, where the population of Mumbai is about 15x of that of Dhanbad. Hence, it was decided to work on major metro cities. Other data sets like kilometers of road, traffic density, average speed in the city was sourced from different sources, though published in different years, but consistent across cities. The graphic treatment was chosen to emote a feeling of congestion, hence elements were tightly packed. Encoding was such: Bottom: Is the Mobility index to Congestion factor chart. This provides a better comparison between the cities. According to the article, Low mobility index, higher congestion factor : Worst traffic experienced by the city. Higher mobility index, lower congestion factor: better-performing traffic. Apart from the 7 metro cities, other cities were handpicked along with the best performing city of Srinagar. Issues: Based on the feedback, I tried to incorporate the changes in the data point story.
The double bar charts were again difficult to comprehend. Upon discussing it with classmates, I realised that there were multiple areas of comprehension, which couldn't show a clear distinction while stitching the information together. Also, the infographic had drifted far away from the core element of the data story and have been sourcing information from sources rather inconsistent. One distinct example being the length of roads. Some cities have fairly large lengths of arterial roads that may not contribute to the traffic, hence reducing the road density and portraying a contrasting image from that data point story. The final iteration came about stacking the two values of Mobility Index and Congestion factor which amounts to the understanding of how congested the roads could be. Upon validating this against the previous iteration, this gave a much better idea about the congestion and clear distinctions about the conditions of the roads were perceived. It was more comprehensible to find the distinctions between the better and poor performing cities. Though I am unsure about the appropriateness of the coding for the given data. Alas, it works. Thanks. |
Datapoint link: How many students in rural districts can perform division? In as many as 443 of 586 rural districts, less than 50% of students in Grades VI-VIII knew how to carry out basic division, the Annual Status of Education Report, 2018 revealed. The article gives two visualizations to put forth its point, one with the scatter plot which shows the percentage of students who could read as well as do the math. The table gives the data on the number of districts with poor reading skills and poor maths skills in each state of India. I believe that with the data that was provided in the data point, it did not do a fair explanation of how only 41% of students living in rural districts could perform division. Also with the data provided in the table, it could be seen that there could be some relationship between the level of maths skills and the reading skills. So in order to understand and visualize the relationship, I tried to plot a bar graph for the values for each state in the decreasing order. (Graph 2) Interpretations:
Problems faced: The data is inadequate to ground the claim. Even after looking for the data related to this particular data point, I could not find anything that could help to create better data visualization. Note: In Retrospect, as per the feedback was given to me, I strongly believe that I did not do a good job to show the intended relationship between the poor reading skills and the maths skills of the districts of India, and I should rework on this assignment to find a better story and relationship between the data provided. |
Here is the original article. How many women MLAs in your State?The idea was to simplify what seems to be a very confusing scatter plot of percentage of women MLAs before 2000 and after 2000. References were made in the article about the increase and decrease of women MLAs in some states. However inferring and comparing information was difficult in this plot. During the feedback session I realized that I had no idea why the percentages were chunked as pre and post 2000. I tried finding if any major changes had happened around that time (such as election seat quotas, something related to constituencies) but realized that none had happened in India. |
State of Migrationhttps://www.thehindu.com/data/india-migration-patterns-2011-census/article28620772.ece The Datapoint article on migration used visualizations to indicate three things:
States which host the highest number of migrants For the assignment I chose to work on the third set of visualisations as I felt that the data could tell more about the paths taken by people. I had to first retrieve the original census 2011 data from its website. The data visualisation I made represents three sets of data:
As population sizes of states vary a lot, I chose to work with the data of the 15 largest migrant populations by state. The states are arranged by decreasing size of migrant population. Blue represents the local population while red from above represents the influx of migrant population. A dashed line represents the national average of migrant population. Version 1: The second iteration of the visualisation was made to quickly incorporate feedback on the intuitiveness of the graph. The layout of the article was changed to portrait to maintain proximity between the heading and the visuals in the graph. Colours were changed to highlight only the migrant population in shades of red. A final attempt was made to include data on all states. States were split into two sets based 50th percentile of migrants when states were arranged in ascending order. |
Why do Indians migrate? The data point article I picked up for designing point to Reasons why men and women migrate in India? The data point is presented as a table which list out the percentage of men and women migrating for work, marriage and education. Critique of the original visualisation
Data viz direction The first attempt was to 'make sense' of the data using a plotting technique. A connected dot plot seemed appropriate for 1 ~ shows connection between the genders, 2 ~ distance they travelled, 3 ~ pattern recognition is easier. Final layout Some advantages of this visualisation
|
Progress report on ganga cleaning missionAs part of the assignment I picked up this article. The article is a progress report of a government-commissioned independent study of 97 towns along the Ganga shows that 39% of these towns in five States are in need of overall improvement in cleanliness, solid waste management and a change in how nullahs (drains) are handled. The article talks about the division of towns along the ghats of river ganga. The article contains 2 parts which talks about grading of these towns among the five states: Uttarakhand, Uttar Pradesh, Bihar, Jharkand, West Bengal. These states share the largest part of the river. Among these the state of Uttar Pradesh is the most populated state in India. And this article reveals interesting data about the amount of towns which needs cleanliness management. Jharkand is a state which shares less area of the river. 1st part - Grading of townsThe first part talks about how many cities are graded either A or B or C. These grades define about how much cleanliness is required for the percentage of the towns mentioned.
Following is the bar graph used to visualise this data: 2nd part - Different types of river dumpsThe second part of the article talks about the river dumps by visualising the number of nullahs(drains) that flow into the river among the same states. It is common for nullahs to drain into the Ganga across towns in all the States. In Bihar, the towns had dumpsites along the river as well. The below visualisation shows the percentage of towns in each State that:
How I re-visualised itSo after reading the article the 1st part which explains the percentage of towns spread along the river banks is explained in the form of bar charts with percentages. I thought it's better to give the readers an idea of what is the size of each state. Both the parts in the article are visualised linearly while they both have a connected explanation. The state of Jharkand shares less area of the river water and it had 100% of Grade B(Partial cleanliness around the ghats) towns. The same state had no towns with solid waste floating on the surface. So my first approach included a visualisation which had the geographical illustration of the river and the states and the percentages in line graphs near the states itself. Feedback after discussing with the class:
Version 2After the discussion with the class I noticed that I'm using less canvas area for the visualisation. So I re thought how I can show the sheer area of the states and the effect of population on the pollution. |
For this assignment, we'll use data stories from The Hindu Data Point.
Select a story that you like, study it carefully and redesign it. Specifically I want you to focus on understanding the data that powers the story, and how it is visually encoded to tell the intended story. Document your design process, capturing the following:
You may choose to expand or curtail the scope of the data used in the story, or add an additional dataset to tell the story better. But do not deviate from the main intent of the original story. In other words, it is a redesign exercise, and hence I do not want you tell a different, unrelated story.
While you should provide a link to the original story, it might be useful to capture and display inline, appropriate parts of the original visualization, and your own design iterations to produce a coherent documentation.
The text was updated successfully, but these errors were encountered: