Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instances showing 502s #94

Closed
crarugal opened this issue Jul 22, 2022 · 9 comments
Closed

Instances showing 502s #94

crarugal opened this issue Jul 22, 2022 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@crarugal
Copy link

This relates to www.mind.org.uk captures: https://www.webarchive.org.uk/act/wayback/archive/*/http://www.mind.org.uk/

image

anjackson added a commit that referenced this issue Jul 22, 2022
@anjackson
Copy link
Contributor

Okay, so this was quite nasty.

I looked up the most recent copy in the CDX index, and grabbed the WARC records from the WARC Server (this should probably be a helper Juypter notebook, as it's pretty straightforward calls to internal APIs!).

CDX: Visit http://cdx.api.wa.bl.uk/data-heritrix?url=https%3A%2F%2Fwww.mind.org.uk&sort=reverse&limit=1

uk,org,mind)/ 20220721105603 https://www.mind.org.uk/ text/html 200 JVFSHQYYUHI3TW77S6ALQK4ZGEDL3Z2O - - 23663 55649806 /heritrix/output/frequent-npld/20220606093552/warcs/BL-NPLD-20220721103716543-20118-80~npld-heritrix3-worker-1~8443.warc.gz

WARC: Take filename and offset/length (55649806/23663). Convert to range start-end (55649806 to 55649806+23663-1, i.e. 55649806-55673468. Run a range request, like this:

curl -r 55649806-55673468 http://warc-server.api.wa.bl.uk/webhdfs/v1/by-filename/BL-NPLD-20220721103716543-20118-80~npld-heritrix3-worker-1~8443.warc.gz | gunzip - | head -50
WARC/1.0
WARC-Type: response
WARC-Target-URI: https://www.mind.org.uk/
WARC-Date: 2022-07-21T10:56:03Z
WARC-Payload-Digest: sha1:JVFSHQYYUHI3TW77S6ALQK4ZGEDL3Z2O
WARC-IP-Address: 104.16.26.83
WARC-Record-ID: <urn:uuid:97078652-9bf1-4a1e-8358-5c974b6d72fd>
Content-Type: application/http; msgtype=response
Content-Length: 92761

HTTP/1.1 200 OK
Date: Thu, 21 Jul 2022 10:56:03 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Access-Control-Expose-Headers: Request-Context
...

That showed that we do have proper 200 content for that URL. It also indicated that there was an extremely large cookie in the original response.

Set-Cookie: personalisationGroupsNumberOfVisits="5863,5862,5862,11421,12761,1082,11343,11198,11725,24127,12003,14649,17178,20871,22039,22040,22041,22341,22342,22343,19220,20716,30600,30603,30604,30605,30606,30609,30610,14505,8519,17210,30550,12611,11719,30612,30615,30618,4112,11714,11717,11282,11283,12147,12150,28461,11746,11190,11233,23773,25407,11347,12584,12620,24411,12865,10928,30641,30642,28478,30637,12588,6931,30645,30645,11969,16229,19238,30620,11984,30644,30646,30623,30622,17559,6783,22250,30654,28285,28293,13605,14164,20459,14163,30648,30649,30650,30651,15666,15768,6872,30659,30571,30636,18094,12265,12385,11704,24684,21948,12997,30669,20525,30667,30668,30672,11415,16261,17684,25254,25256,25256,30621,7493,18069,11280,11759,13652,25351,30676,30677,30678,30624,30625,30626,21514,19382,16281,16280,16568,11261,22011,12148,11712,24060,30555,30680,30681,22109,21601,7582,7710,21952,21970,21972,12286,28122,30695,30684,30685,30686,30687,30688,30689,30632,15828,10745,15668,6782,8420,7009,21122,30729,30694,30708,30709,30710,30727,30728,30730,18932,17671,17672,11194,11200,11398,11973,11978,11979,11980,11981,11982,12171,12170,12172,12173,12174,12176,12177,12178,12179,12180,12182,15957,15968,28193,30731,30732,30733,30734,30735,30736,30744,6932,28436,30590,30750,30753,16236,8447,15640,28464,21902,21906,25034,30758,30536,30537,11748,11320,30781,11202,11205,30717,30718,30724,30719,30720,19283,22235,22359,28260,30582,30756,15825,15819,16569,18516,18518,11279,11281,11284,30822,30823,30825,30826,30828,30829,30830,30824,11412,11413,11414,11416,11417,11418,11419,30856,30849,30850,30851,30852,30872,30876,30878,16251,30745,30879,30885,30739,30742,30743,30538,30755,30821,30835,30883,30886,30887,14656,30930,30931,7611,30937,20736,23859,25428,25560,28082,30935,30950,30953,30954,30957,30958,30959,15679,22628,22628,22651,22651,16230,30977,18101,18102,18104,17594,17595,14659,14650,19478,20889,20945,22689,30978,30981,18289,12509,15696,17541,19056,7515,7515,25251,30873,7923,17179,19179,16013,19370,23927,30986,31001,20539,23700,24388,24500,24709,24929,25057,28509,30543,30979,30992,28377,30939,31002,24128,20892,15661,15661,19172,11185,11382,12127,13045,13047,30969,30991,31012,31013,31019,31021,31022,31023,31024,22107,14992,11383,12254,14651,21867,30881,31027,11236,11384,25755,11331,11385,11386,11387,31046,31035,31036,31037,31039,31041,31051,31053,31054,20606,17533,20453,20453,31045,15791,15794,15802,15810,14801,8638,8640,8641,8644,8646,8653,8657,8660,8661,8662,8664,8666,8667,8669,8682,8683,8685,8691,8698,8699,8706,8707,8709,8710,8717,8719,8723,8738,8744,8745,8747,8749,8752,8764,8773,8776,8782,8783,8785,8788,8793,8794,8795,8797,8804,8807,8809,8810,8812,8813,8814,8815,8817,8818,8820,8826,8827,8829,8864,8865,8958,8960,8966,8969,9532,9537,9538,9542,9546,9549,9554,9738,9802,9804,9834,9837,9840,9845,9849,9850,9851,9981,9994,10009,10035,10038,10042,10062,10092,10095,10098,10102,10110,10113,10175,10192,10197,10219,10223,10226,10253,10262,10272,10283,10314,10319,10323,10361,10369,10374,10398,10454,10471,10473,10494,10504,10510,10515,10519,10529,10653,10667,10700,10764,10808,10823,10827,10839,10859,10915,10975,11023,11052,11058,11087,11098,11143,11442,11444,11452,11475,11501,11528,11533,11540,11545,11563,11618,11621,11637,11639,11644,11682,12650,12654,12656,12665,12676,12682,12684,13064,13069,13076,13102,13127,13130,13138,13252,13276,13494,13730,13734,13740,13746,13754,13757,13796,13800,13803,13804,13810,14064,14069,14071,14075,14086,14092,14097,14101,14104,14158,14160,14162,14166,14170,14181,14188,14190,14192,14199,14201,14203,14205,14207,14257,14259,14264,14266,14269,14282,14285,14286,14288,14302,14353,14393,14400,14418,14425,31062,31063,31065,8861,8642,8521,7645,7430,7181,6994,6992,6991,16978,17054,17055,17056,17058,21922,21922,12614,7258,15550,12278,12512,11736,15244,8483,15662,7683,11342,15261,31028,8068,31052,28191,31066,31074,11329,11330,11332,11333,11334,11335,11336,11337,11338,11339,11340,30521,31075,31076,31077,31078,31080,31082,31086,31087,31092,31095,17593,21107,11708,24406,24415,11709,16607,21202,31097,31088,8021,31079,31085,11223,11242,11301,11358,11393,15541,15541,15543,15542,15544,15545,15546,15547,15548,17063,15812,11603,28262,28262,31110,31112,31113,31114,31115,31116,31117,11260,11260,22065,18106,20590,20048,14661,14652,14657,14653,14654,16677,18281,19208,19879,14742,15290,15291,11285,16233,11697,11699,11971,11711,11293,23895,20537,25442,31103,11974,17596,6808,17162,22701,23654,24718,24749,25310,31125,31128,31151,17685,13041,12506,28345,5467,20317,31126,8042,31179,31235,24100,24700,31162,31171,31174,31186,31187,31194,31192,17602,5356,7761,8299,21364,21532,19837,20481,20483,25371,25587,25618,27786,5419,7504,7723,7978,7734,7734,7917,8774,11213,11968,12825,20146,20348,21373,20152,20364,23449,25358,25362,7990,27808,27812,27812,27813,28048,27814,27803,27815,28057,27804,27994,28065,5449,7656,8076,8372,14059,15305,15305,15404,15488,15622,16277,16675,16887,16887,16888,16888,21883,21883,21884,21884,22156,22156,22157,22157,22249,23531,23707,23707,23708,24033,24034,18502,18514,18515,18517,18519,18520,7937,8616,17643,19216,22243,22291,7970,31081,12572,8632,8633,8635,8636,8637,8639,8647,8650,8654,8659,8663,8665,8671,8673,8684,8686,8687,8689,8690,8692,8701,8702,8704,8705,8708,8711,8712,8714,8716,8718,8721,8724,8725,8726,8727,8728,8729,8730,8737,8739,8740,8741,8751,8753,8765,8766,8771,8775,8778,8779,8792,8800,8802,8803,8816,8821,8823,8825,8828,8831,8833,8862,8863,8962,8963,8965,8967,9533,9535,9541,9544,9548,9555,9558,9559,9562,9736,9737,9795,9798,9805,9829,9830,9832,9835,9846,9984,9985,9987,9988,9990,9992,9993,9996,9998,9999,10001,10004,10006,10011,10013,10015,10016,10021,10022,10032,10034,10040,10041,10048,10052,10054,10058,10059,10063,10066,10069,10070,10073,10076,10077,10078,10082,10084,10087,10096,10101,10104,10105,10106,10108,10109,10111,10115,10117,10150,10151,10158,10159,10161,10163,10170,10173,10178,10180,10181,10184,10203,10208,10210,10214,10230,10233,10235,10239,10251,10254,10258,10260,10263,10264,10267,10270,10271,10273,10275,10277,10280,10291,10305,10316,10321,10324,10328,10332,10336,10338,10339,10342,10345,10348,10351,10352,10354,10358,10360,10363,10366,10370,10376,10379,10381,10384,10387,10388,10390,10392,10402,10407,10408,10416,10417,10420,10424,10426,10428,10429,10433,10437,10438,10440,10445,10449,10453,10458,10463,10465,10479,10480,10482,10483,10486,10487,10490,10496,10500,10507,10518,10525,10528,10535,10543,10630,10636,10638,10640,10645,10647,10651,10656,10657,10661,10662,10663,10665,10669,10671,10675,10676,10678,10680,10682,10687,10690,10693,10702,10705,10738,10742,10744,10772,10774,10777,10779,10785,10786,10790,10794,10796,10798,10799,10809,10811,10813,10817,10819,10820,10826,10830,10832,10834,10836,10849,10894,10898,10899,10901,10905,10913,10917,10918,10924,10926,10933,10934,10936,10942,10945,10947,10948,10958,10977,10979,10983,10985,10987,10995,10997,10999,11004,11005,11011,11026,11033,11038,11062,11064,11072,11077,11090,11092,11094,11159,11436,11439,11448,11450,11454,11457,11458,11466,11467,11470,11477,11479,11480,11482,11497,11498,11505,11506,11509,11511,11520,11536,11537,11542,11549,11551,11553,11566,11569,11571,11572,11574,11577,11578,11579,11582,11584,11586,11589,11590,11592,11594,11596,11598,11600,11604,11609,11613,11619,11623,11624,11627,11631,11633,11634,11646,11648,11660,11662,11664,11668,11670,11671,11676,11678,11689,11692,11695,12209,12216,12625,12630,12640,12644,12647,12652,12659,12664,12667,12669,12673,12680,12696,12698,12701,12704,12708,12709,12712,12833,12834,12839,12841,12843,12861,13051,13055,13056,13059,13071,13072,13092,13111,13113,13115,13124,13125,13142,13143,13145,13274,13279,13386,13497,13658,13660,13661,13664,13665,13667,13669,13670,13673,13676,13678,13681,13684,13686,13690,13692,13694,13696,13698,13700,13702,13704,13706,13708,13710,13714,13717,13719,13723,13727,13729,13732,13742,13745,13752,13759,13798,13808,13815,13823,13833,13845,13923,14062,14067,14073,14088,14090,14099,14103,14106,14108,14116,14118,14120,14122,14124,14126,14128,14130,14132,14134,14136,14138,14140,14142,14145,14146,14149,14150,14152,14155,14156,14168,14171,14172,14177,14179,14186,14194,14196,14210,14212,14214,14217,14219,14255,14261,14263,14268,14294,14298,14305,14310,14333,14337,14344,14350,14357,14361,14363,14396,14402,14405,14407,14412,14414,14429,14431,14433,14435,14437,14439,14441,14453,14455,14457,14461,14467,14470,14494,14495,14508,14509,14510,14521,14536,14575,14766,14767,14796,14868,14870,14946,14949,14951,14953,14983,15224,15241,15242,15243,15250,15417,15418,15424,15443,15487,15490,15602,15709,15820,15907,15932,15939,15951,16004,16125,16499,16540,16550,16551,16553,16556,16557,16562,16637,16674,16676,16708,16737,16738,16740,16741,16742,16764,16769,16804,16837,16841,16842,16884,16885,16886,17046,17047,17072,17121,17186,17209,17346,17472,17532,17780,17848,17914,18290,18406,18825,19022,19023,19031,19132,19134,19144,19198,19252,19253,19314,19356,19372,19375,19392,19405,19440,19457,19541,19542,19588,19611,19614,19674,19683,19774,19797,19835,19842,19892,19909,19929,19958,20084,20089,20148,20259,20276,20278,20342,20347,20380,20447,20468,20497,20529,20538,20570,20634,20642,20676,15656,20830,20853,20857,20919,20948,21118,21121,21165,21179,21204,21243,21308,21343,21439,21452,21458,21462,21481,21504,21512,21639,21691,21738,21863,21872,21877,21915,21920,21987,22070,22113,22593,22614,23307,23599,23606,23620,23693,23727,23795,23860,23933,24041,24102,24125,24130,24173,24201,24207,24238,24248,24291,24337,24410,24502,24538,24554,24556,24717,24809,24930,24969,24975,24981,25002,25105,25112,25149,25201,25244,25261,25430,25472,25475,25671,25743,25756,27784,28308,28309,28323,30546,30565,10247,10045,10044,10024,9995,9977,9954,9949,9823,9820,9724,9212,8970,8763,8735,8697,8618,8618,8609,8608,8607,8606,8605,8604,8603,8601,8600,8599,8598,8597,8596,8595,8594,8593,8592,8591,8590,8589,8588,8587,8586,8585,8584,8583,8580,8579,8578,8577,8576,8575,8572,8569,8568,8565,8564,8563,8562,8561,8559,8558,8557,8556,8555,8554,8553,8552,8551,8550,8549,8548,8547,8546,8545,8544,8543,8542,8541,8540,8536,8533,8531,8530,8527,8526,8525,8524,8523,8522,8518,8517,8516,8515,8514,8513,8512,8511,8510,8509,8508,8507,8504,8501,8499,8497,8496,8495,8494,8493,8492,8491,8490,8489,8487,8486,8485,8484,8482,8481,8478,8477,8476,8475,8474,8472,8471,8470,8469,8468,8467,8466,8465,8464,8463,8462,8461,8460,8459,8458,8457,8455,8454,8453,8452,8451,8450,8449,8448,8442,8441,8440,8439,8436,8435,8434,8433,8432,8431,8430,8429,8428,8427,8425,8424,8422,8421,8419,8417,8416,8409,8408,8400,8399,8398,8367,8365,8364,8363,8362,8361,8360,8359,8357,8334,8333,8332,8331,8330,8324,8323,8322,8320,8192,8191,8190,8160,8154,8150,8149,8147,8146,8145,8144,8142,8141,8140,8139,8055,8054,8053,8051,8050,8049,8046,8039,8038,7976,7975,7973,7967,7966,7964,7963,7961,7936,7934,7933,7932,7924,7922,7921,7918,7916,7915,7914,7913,7911,7910,7907,7906,7715,7709,7708,7671,7670,7669,7668,7667,7666,7665,7664,7662,7661,7653,7639,7626,7625,7624,7623,7622,7621,7620,7619,7618,7615,7614,7613,7612,7610,7606,7605,7603,7600,7597,7591,7584,7581,7580,7579,7578,7577,7576,7575,7574,7573,7572,7571,7570,7569,7568,7567,7566,7565,7564,7563,7560,7559,7558,7554,7553,7552,7549,7548,7547,7543,7541,7539,7538,7535,7530,7528,7527,7526,7525,7524,7523,7522,7520,7517,7516,7513,7502,7500,7499,7496,7494,7492,7487,7486,7485,7484,7468,7466,7464,7461,7458,7454,7450,7448,7447,7446,7445,7444,7443,7442,7440,7439,7438,7437,7436,7434,7433,7432,7429,7428,7427,7426,7425,7424,7423,7422,7420,7419,7418,7417,7416,7415,7414,7413,7412,7411,7410,7409,7408,7407,7406,7405,7404,7403,7402,7401,7400,7395,7394,7393,7392,7391,7390,7389,7388,7387,7386,7385,7384,7383,7382,7381,7380,7378,7377,7376,7375,7357,7354,7353,7351,7350,7349,7348,7347,7346,7345,7344,7343,7342,7341,7340,7339,7338,7337,7336,7335,7334,7333,7331,7330,7329,7309,7308,7307,7306,7305,7304,7303,7302,7301,7300,7299,7298,7297,7274,7271,7270,7269,7268,7267,7265,7264,7261,7260,7259,7257,7255,7254,7253,7252,7251,7250,7249,7242,7241,7240,7238,7237,7236,7235,7234,7233,7228,7227,7226,7225,7224,7223,7221,7219,7216,7211,7209,7208,7207,7206,7205,7204,7203,7202,7200,7199,7198,7197,7194,7193,7192,7191,7190,7189,7187,7186,7185,7184,7183,7182,7178,7177,7174,7173,7172,7171,7164,7163,7162,7161,7160,7159,7158,7157,7156,7155,7148,7147,7146,7145,7144,7143,7142,7141,7135,7133,7132,7053,7052,7051,7050,7049,7048,7047,7046,7045,7044,7043,7042,7041,7040,7039,7038,7035,7034,7033,7032,7031,7030,7028,7027,7026,7025,7024,7023,7022,7021,7020,7019,7018,7017,7016,7015,7013,7008,7007,7006,7005,7004,7003,7002,7001,7000,6999,6998,6996,6995,6993,10266,10278,10404,10502,10506,10683,10684,10685,10783,10887,10889,10950,10986,11151,11512,11602,12166,12255,12624,12786,12794,12829,12835,13103,13109,13110,13128,13522,13547,14068,14076,14182,14254,14423,14523,14526,14534,14538,14642,14692,14747,14815,14914,14981,15419,15515,15516,15694,15778,16838,17032,17194,17657,17699,18297,18537,18998,19083,19149,19243,19246,19433,19439,19442,19549,19666,19762,19901,19937,19940,20087,20318,20444,20464,20629,20630,20707,20723,20822,20823,20826,20953,20971,21100,21102,21111,21113,21309,21466,21633,21735,21742,21870,21933,21951,21978,21979,21980,22012,22069,22074,22128,22146,22147,22162,22168,22612,23173,23559,23616,23656,23709,24025,24026,24040,24065,24115,24137,24217,24254,24314,24404,24607,24615,24708,24710,24752,24902,24998,25007,25060,25225,25243,25321,25411,25467,25537,25584,25602,25711,25714,25762,28058,28059,28137,28140,28198,28199,28344,28484,28485,28493,30492,30526,30589,19890,19891,19911,20307,20507,20510,20511,20514,20515,20624,20627,20648,20715,21289,21313,21328,21372,21442,21467,21470,21549,21551,21574,21603,21678,21707,21719,21741,21746,21813,21873,21874,21895,21899,21900,21928,21929,21932,21947,21991,22004,22017,22020,22021,22022,22023,22133,22134,22210,22211,22214,22215,22217,22218,22219,22220,22221,22225,22226,22228,22229,22233,22238,22241,22299,22354,22355,22356,22357,22358,22360,22590,22594,22617,22618,23459,23462,23463,23534,23535,23545,23605,23623,23625,23627,23635,23639,23640,23643,23651,23653,23663,23670,23679,23680,23682,23683,23688,23689,23697,23698,23699,23704,23760,23775,23816,23846,23854,23861,23862,23873,23904,23905,23938,23943,23944,23945,23951,23955,23980,23986,23999,24000,24014,24015,24024,24054,24068,24069,24070,24076,24080,24084,24086,24088,24089,24116,24121,24147,24148,24150,24151,24152,24153,24161,24163,24166,24168,24174,24178,24185,24187,24190,24191,24192,24206,24223,24235,24236,24241,24242,24243,24244,24246,24259,24293,24294,24295,24320,24335,24336,24341,24343,24351,24352,24354,24394,24395,24396,24398,24399,24400,24401,24402,24403,24414,24421,24431,24433,24434,24435,24436,24440,24499,24512,24529,24531,24533,24542,24546,24577,24592,24593,24594,24595,24597,24598,24690,24692,24697,24698,24701,24702,24712,24714,24716,24721,24722,24723,24724,24733,24735,24737,24740,24746,24748,24788,24792,24825,24888,24889,24897,24899,24900,24904,24908,24923,24928,24971,24973,24978,24980,24982,24985,24986,24989,24990,25006,25009,25022,25023,25026,25029,25030,25031,25032,25035,25053,25054,25058,25068,25070,25094,25098,25099,25100,25114,25141,25152,25153,25154,25158,25159,25160,25161,25174,25178,25180,25181,25191,25200,25221,25222,25223,25224,25234,25237,25238,25240,25257,25274,25282,25315,25359,25360,25372,25378,25391,25393,25410,25413,25415,25440,25441,25458,25459,25468,25469,25473,25498,25506,25539,25561,25576,25582,25606,25607,25636,25682,25683,25697,25698,25699,25706,25707,25708,25709,25710,25719,25721,25744,25760,25761,27783,27785,27787,28036,28037,28038,28041,28074,28079,28080,28081,28083,28084,28098,28111,28128,28149,28156,28183,28184,28192,28194,28201,28225,28227,28229,28231,28247,28255,28256,28257,28265,28266,28268,28294,28298,28302,28316,28317,28318,28319,28320,28321,28324,28325,28326,28327,28328,28329,28330,28383,28432,28437,28444,28453,28465,28476,28477,28513,28515,28516,28517,28518,28519,28520,30481,30482,30484,30485,30486,30488,30489,30496,30498,30501,30507,30520,30525,30539,30540,30569,30570,30572,30577,30583,30593,11287,11764,12609,25192,20604,20530,20951,16916,18839,20964,17037,17062,15262,15263,15264,15265,15266,15267,31508,31523,31532,31541,31541,20610,16501,31034,16449,31502,31503,31504,31505,31506,31507,31514,31515,16855,11317,16481,16480,16482,16483,16484,16485,16486,16487,16462,16462,16464,16465,16466,16467,16468,16469,15809,16438,6784,7220,6807,24800,31564,31598,31559,31049,31520,31522,31553,31555,31556,31557,31560,31561,31562,31565,31566,31567,31568,31572,31584,31587,15432,16117,15649,31574,31607,31614,31601,31602,31193,31575,31617,31618,31620,31630,31631,31626,31133,31067,31577,31621,11762,15651,7699,17682,31600,31629,31659,22662,31656,31656,17619,17619,31552,17127,15789,15803,31610,7716,14799,14533,16099,15657,15658,15754,15755,15761,17057,16419,16420,16421,16422,16424,16100,11288,11234,31683,12618,31665,31668,31669,19692,11394,17991,17996,17992,17993,31676,25101,25109,25113,25117,31725,31727,31729,31734,17159,31563,31660,31664,31724,31731,31732,31733,31736,31737,31738,31741,31742,31743,31744,31056,31709,31721,31682,24992,7495,31776,12160,31761,31763,31775,16311,16319,16358,31740,31801,31830,31770,31772,31688,31693,31706,23580,24810,25059,25501,25504,25603,28131,31814,31815,31816,31819,31834,31844,31856,31857,31858,22025,12125,31837,31867,31871,31873,31876,31825,31832,31833,20862,31874,31875,31880,31881,31885,31886,31887,31888,25512,15805,20138,28346,28362,28365,28370,28372,28386,28393,28398,28404,28408,28411,28419,28424,28426,28259,28261,28263,28272,31509,11408,31903,31904,17148,17150,17151,20334,21335,17167,17168,17169,17170,17171,17172,17173,31818,21644,24095,17644,17918,18635,19057,19057,23729,23729,31869,31870,23776,24101,31517,31905,31910,31912,31919,31920,31921,31923,11191,11367,18336,19098,17147,17166,17152,24743,20332,24753,15804,21420,19598,19781,20154,31925,31927,31929,31945,31947,31948,31949,31950,31957,31962,31963,31963,31964,31965,31966,31967,31968,31969,31970,31983,31971,31978,32002,31902,31936,31937,32004,31981,23796,23796,15767,12901,6871,31942,32020,32021,16500,12130,32046,32003,32030,32032,32034,32037,32040,32041,32042,32047,32050,32051,32052,32054,15406,15665,32024,32027,32063,32065,19165,23798,11176,12256,11713,32026,32029,32031,32033,32035,32057,32078,32080,32081,12889,12914,12915,12916,12917,12918,12920,13080,32090,32039,32094,32095,31999,32005,32059,12606,32067,32068,32069,32071,32072,32073,32074,15269,15269,15270,15271,15272,15273,15274,15673,12600,24557,28491,32175,32176,32177,32178,32179,12124,31951,32342,32343,32180,32181,32182,32350,32351,17185,11707,17331,7696,13016,13018,32058,16443,31930,31931,31932,31933,31934,32353,32354,32355,32356,30633,13096,32357,32384,32385,32386,7700,32395,32383,32387,11349,32473,32425,32426,16435,32427,32440,19199,19199,17751,23617,23617,11230,11178,11179,11181,11186,11187,11250,11252,11255,11254,11257,32007,32525,11313,20601,20596,20597,20598,20599,20600,20602,12760,12760,32496,12161,32480,16231,16235,21624,17929,31014,32410,11235,11237,11239,11361,16423,19306,32542,32544,22232,25233,32559,32560,32561,32562,32563,32564,32568,32569,7698,32399,32430,32481,32401,32402,32441,32447,7704,32406,32408,32572,32573,25246,15775,15635,9623,32434,32435,32436,32552,32575,32576,32577,32533,32590,32591,32605,32606,32607,32608,32609,11375,17725,17748,17750,22086,22112,21911,21912,22213,22234,22297,23585,23958,23971,24082,24120,24149,24164,24169,24170,24181,24183,24184,24193,24198,24199,24200,24237,24239,24258,24322,24324,24325,24331,24340,24342,24345,24346,24347,24348,24397,24439,24490,24491,24492,24494,24497,24516,24517,24518,24519,24521,24522,24523,24524,24525,24526,24535,24585,24599,24605,24682,24695,24704,24711,24713,24715,24756,24778,24793,24797,24801,24803,24807,24813,24866,24874,24924,24977,24983,24988,24991,24993,24995,24999,25033,25090,25102,25116,25131,25151,25170,25179,25182,25185,25190,25202,25203,25204,25218,25219,25242,25263,25264,25265,25311,25316,25337,25349,25361,25388,25417,25424,25425,25427,25429,25453,25481,25502,25503,25507,25508,25509,25532,25535,25540,25574,25600,25601,25609,25674,25686,25691,25712,25715,25716,25722,25763,27802,28042,28091,28097,28100,28102,28109,28110,28112,28126,28148,28150,28155,28187,28190,28232,28245,28271,28275,28278,28295,28297,28305,28492,30483,30522,30523,30533,30562,30566,30567,30568,30586,30587,30588,31168,31500,31657,31802,31826,31835,31868,31914,32091,32611,32614,32617,32618,32619,17163,17164,17203,20722,31845,32089,12009,12010,12011,12929,12900,13000,13002,13003,12904,13019,13020,13021,12907,13042,13043,13044,23878,32620,32622,32623,32624,32625,32626,32629,32630,32631,32632,32633,12258,12305,17165,18835,18836,32621,32628,32635,15772,31820,32360,15785,15787,6786,7212,17071,32638,32640,7215,7314,16433,16606,32644,8128,32643,7705,11401,11238,12589,12597,12598,12599,32677"; expires=Wed, 19-Oct-2022 10:56:03 GMT; path=/; secure; HttpOnly

So, the suspicion fell on such large headers breaking some buffers or limits. But where? These requests go through a few proxies....

The 'raw' service is available at port 7171 on the prod1 server, where we can talked directly to the uwsgi server that is running pywb. Running the request against that service showed an OSError: write error, and the server was returning an empty response.

Running ukwa-pywb without uwsgi seemed to work fine.

cd github/ukwa-pywb
source venv/bin/activate
export UKWA_INDEX=cdx+http://cdx.api.wa.bl.uk/data-heritrix
export UKWA_ARCHIVE=http://warc-server.api.wa.bl.uk/webhdfs/v1/by-filename/
ukwa_pywb -p 8089
...

After quite a lot of guesswork and experimentation, where just upping the overall buffer size wasn't helping, I discovered an (AFAICT) undocumented configuration parameter response_header_limit. The default value for this parameter seemed to be low enough to balk at that long header, because when the service was reconfigured with response-headers-limit = 262144 it started working.

However, then the layers of proxies still were thowing errors. In this case, it seems the initial proxy needs:

            uwsgi_buffers 16 256k;
            uwsgi_buffer_size 256k;
            uwsgi_busy_buffers_size 512k;

and the front-end NGINX proxy needed to match this (128k was not enough!):

        proxy_buffer_size  256k;
        proxy_buffers   4 256k;
        proxy_busy_buffers_size   256k;

So these same changes need rolling out, along with a new ukwa/ukwa-pywb:2.6.7.2 that contains the improved uwsgi configuration.

This was such a nightmare that I thing we should open an issue in pywb to see if very large headers should be handled differently, e.g. dropped.

@anjackson
Copy link
Contributor

anjackson commented Jul 22, 2022

Note that, having rolled out these changes on the DEV system, this now works: https://dev.webarchive.org.uk/wayback/archive/20220720112219/https://www.mind.org.uk/

But at the time of writing, prod is broken: https://www.webarchive.org.uk/act/wayback/archive/20220720112219/https:/www.mind.org.uk/

@crarugal
Copy link
Author

Thanks for finding the solution, Andy, and thank you for the detailed breakdown. These steps are really helpful, another thing I can try when investigating issues

anjackson pushed a commit to ukwa/ukwa-services that referenced this issue Jul 22, 2022
@anjackson
Copy link
Contributor

I was hoping you'd find that useful!

Under ukwa/ukwa-services#100 I've worked through better tracing of these kind of things, and BETA now works too. Still need to roll out to PROD.

@anjackson
Copy link
Contributor

Rolled out now.

@anjackson
Copy link
Contributor

Gah, of course, need to do the same for QA Wayback.... e.g. https://www.webarchive.org.uk/act/wayback/archive/20220723104224/https://www.mind.org.uk/

@anjackson anjackson reopened this Jul 26, 2022
anjackson added a commit to ukwa/ukwa-services that referenced this issue Jul 26, 2022
@anjackson
Copy link
Contributor

anjackson commented Jul 26, 2022

Changes in be607d4 mean https://dev.webarchive.org.uk/act/wayback/archive/20220723104224/https://www.mind.org.uk/ now works. Needs rolling out by @GilHoggarth to BETA and then PROD

@GilHoggarth
Copy link

Rolled out ukwa-services/ingest/w3act master onto beta swarm.

@GilHoggarth
Copy link

Tagged the code in ingest/w3act and released onto production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants