Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimise AND queries #679

Merged
merged 9 commits into from
Jan 13, 2025

Conversation

Computerdores
Copy link
Collaborator

@Computerdores Computerdores commented Jan 2, 2025

Changes the way that ANDs are converted to SQL. This now uses less subqueries which has lead to a search time decrease when searching in TS.

My benchmark search previously took ~1min in TS and roughly ~46s when running the generated SQL to an external tool.
With these improvements the benchmark search now takes ~36s in TS and roughly ~1s in the external tool. This means there are further performance gains possible by optimising other parts of the search code.

Note:

  • Currently the tag ids are hardcoded into the main query; I also tried directly querying these in the main query instead, but this only lead to a ~6s improvement while breaking some tests so I will not pursue this for now (should someone be interested in how I did that, checkout this commit 2615e7d from feat: implement parent tag search #673 that was reverted there)

@Computerdores Computerdores added Type: Enhancement New feature or request Status: Review Needed A review of this is needed TagStudio: Search The TagStudio search engine labels Jan 2, 2025
@Computerdores

This comment was marked as outdated.

@Computerdores
Copy link
Collaborator Author

Computerdores commented Jan 2, 2025

I think if the search_library method only returned the ids of the Entries this would probably cut the search time for my benchmark example from the remaining ~36s down to ~17s, but this would be a larger endeavor since it would need substantial changes though out the code base.
@CyanVoxel If I understand what you are doing in #655 atm correctly, then you are moving parts of the Qt frontend towards only keeping the Entry ids around and not the full entries - do you think that would make this change more feasible?

@CyanVoxel
Copy link
Member

I think if the search_library method only returned the ids of the Entries this would probably cut the search time for my benchmark example from the remaining ~36s down to ~17s, but this would be a larger endeavor since it would need substantial changes though out the code base. @CyanVoxel If I understand what you are doing in #655 atm correctly, then you are moving parts of the Qt frontend towards only keeping the Entry ids around and not the full entries - do you think that would make this change more feasible?

As of my latest commit in #655 (f59b84b) I believe that this would be okay, as the main call for search_library() now immediately converts those results to a list of ID ints immediately anyway. The only other call is in dupe_files.py which I think can be refactored to work with this. So if returning IDs offers a performance improvement (and is THAT big) I think we should do it

@CyanVoxel CyanVoxel added the Priority: High An important issue requiring attention label Jan 7, 2025
@CyanVoxel CyanVoxel added this to the Alpha v9.5 (Post-SQL) milestone Jan 7, 2025
@CyanVoxel
Copy link
Member

Sorry for taking a bit to get around to thoroughly testing this out. I've got some, let's say interesting results with how this performs with the benchmarks previously discussed with this mock library compared to some real-world benchmarks. Each SQL Expression Builder finished took between 0.01 and 0.03 seconds across all these results.

main (Mock Library: show dolor)

SQLite CLI: Run Time: real 7.780 user 7.647456 sys 0.130087
TagStudio: 10505 Results Found (12.94 seconds)
TagStudio (move to page 2): 10505 Results Found (12.19 seconds)

Full SQL Query
-- main (Mock Library: show dolor)
SELECT DISTINCT entries.id, entries.folder_id, entries.path, entries.suffix 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id 
WHERE entries.id IN (SELECT entries.id 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id LEFT OUTER JOIN tag_fields ON tag_box_fields.id = tag_fields.field_id 
WHERE EXISTS (SELECT 1 
FROM tags, tag_fields 
WHERE tag_box_fields.id = tag_fields.field_id AND tags.id = tag_fields.tag_id AND tags.id IN (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158, 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197, 1198, 1199, 1200, 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226, 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275, 1276, 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314, 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1382, 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418, 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457, 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483, 1484, 1485, 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496, 1497, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509, 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522, 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535, 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1547, 1548, 1549, 1550, 1551, 1552, 1553, 1554, 1555, 1556, 1557, 1558, 1559, 1560, 1561, 1562, 1563, 1564, 1565, 1566, 1567, 1568, 1569, 1570, 1571, 1572, 1573, 1574, 1575, 1576, 1577, 1578, 1579, 1580, 1581, 1582, 1583, 1584, 1585, 1586, 1587, 1588, 1589, 1590, 1591, 1592, 1593, 1594, 1595, 1596, 1597, 1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1606, 1607, 1608, 1609, 1610, 1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618, 1619, 1620, 1621, 1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629, 1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638, 1639, 1640, 1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651, 1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662, 1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684, 1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708, 1709, 1710, 1711, 1712, 1713, 1714, 1715, 1716, 1717, 1718, 1719, 1720, 1721, 1722, 1723, 1724, 1725, 1726, 1727, 1728, 1729, 1730, 1731, 1732, 1733, 1734, 1735, 1736, 1737, 1738, 1739, 1740, 1741, 1742, 1743, 1744, 1745, 1746, 1747, 1748, 1749, 1750, 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1766, 1767, 1768, 1769, 1770, 1771, 1772, 1773, 1774, 1775, 1776, 1777, 1778, 1779, 1780, 1781, 1782, 1783, 1784, 1785, 1786, 1787, 1788, 1789, 1790, 1791, 1792, 1793, 1794, 1795, 1796, 1797, 1798, 1799, 1800, 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001))) AND entries.id IN (SELECT entries.id 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id LEFT OUTER JOIN tag_fields ON tag_box_fields.id = tag_fields.field_id 
WHERE EXISTS (SELECT 1 
FROM tags, tag_fields 
WHERE tag_box_fields.id = tag_fields.field_id AND tags.id = tag_fields.tag_id AND tags.id IN (1069, 1148, 1190, 1252, 1309, 1314, 1336, 1497, 1542, 1582, 1653, 1672, 1764, 1787, 1891, 1901, 1921, 1951))) AND (entries.suffix NOT IN ('json', 'xmp', 'aae'))
 LIMIT 500 OFFSET 0

#679 (Mock Library: show dolor)

SQLite CLI: Run Time: real 0.773 user 0.761808 sys 0.011079 <- Improvement
TagStudio: 10505 Results Found (5.66 seconds) <- Improvement
TagStudio (move to page 2): 10505 Results Found (5.84 seconds) <- Improvement

Full SQL Query
-- #679 (Mock Library: show dolor)
SELECT DISTINCT entries.id, entries.folder_id, entries.path, entries.suffix 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id 
WHERE (EXISTS (SELECT 1 
FROM tag_fields 
WHERE tag_fields.field_id = tag_box_fields.id AND tag_fields.tag_id IN (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158, 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197, 1198, 1199, 1200, 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226, 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275, 1276, 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314, 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1382, 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418, 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457, 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483, 1484, 1485, 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496, 1497, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509, 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522, 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535, 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1547, 1548, 1549, 1550, 1551, 1552, 1553, 1554, 1555, 1556, 1557, 1558, 1559, 1560, 1561, 1562, 1563, 1564, 1565, 1566, 1567, 1568, 1569, 1570, 1571, 1572, 1573, 1574, 1575, 1576, 1577, 1578, 1579, 1580, 1581, 1582, 1583, 1584, 1585, 1586, 1587, 1588, 1589, 1590, 1591, 1592, 1593, 1594, 1595, 1596, 1597, 1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1606, 1607, 1608, 1609, 1610, 1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618, 1619, 1620, 1621, 1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629, 1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 1638, 1639, 1640, 1641, 1642, 1643, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651, 1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662, 1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1683, 1684, 1685, 1686, 1687, 1688, 1689, 1690, 1691, 1692, 1693, 1694, 1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708, 1709, 1710, 1711, 1712, 1713, 1714, 1715, 1716, 1717, 1718, 1719, 1720, 1721, 1722, 1723, 1724, 1725, 1726, 1727, 1728, 1729, 1730, 1731, 1732, 1733, 1734, 1735, 1736, 1737, 1738, 1739, 1740, 1741, 1742, 1743, 1744, 1745, 1746, 1747, 1748, 1749, 1750, 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1766, 1767, 1768, 1769, 1770, 1771, 1772, 1773, 1774, 1775, 1776, 1777, 1778, 1779, 1780, 1781, 1782, 1783, 1784, 1785, 1786, 1787, 1788, 1789, 1790, 1791, 1792, 1793, 1794, 1795, 1796, 1797, 1798, 1799, 1800, 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001))) AND (EXISTS (SELECT 1 
FROM tag_fields 
WHERE tag_fields.field_id = tag_box_fields.id AND tag_fields.tag_id IN (1069, 1148, 1190, 1252, 1309, 1314, 1336, 1497, 1542, 1582, 1653, 1672, 1764, 1787, 1891, 1901, 1921, 1951))) AND (entries.suffix NOT IN ('json', 'xmp', 'aae'))
 LIMIT 500 OFFSET 0

main (Other Library: Benchmark x and y Query)

SQLite CLI: Run Time: real 3.337 user 3.236075 sys 0.094545
TagStudio: 1185 Results Found (5.45 seconds)
TagStudio (move to page 2): 1185 Results Found (5.27 seconds)

Full SQL Query
-- main (Other Library: `Benchmark x and y Query`)
SELECT DISTINCT entries.id, entries.folder_id, entries.path, entries.suffix 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id 
WHERE entries.id IN (SELECT entries.id 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id LEFT OUTER JOIN tag_fields ON tag_box_fields.id = tag_fields.field_id 
WHERE EXISTS (SELECT 1 
FROM tags, tag_fields 
WHERE tag_box_fields.id = tag_fields.field_id AND tags.id = tag_fields.tag_id AND tags.id IN (2471, 1101, 1103, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1352, 1376, 1378, 1384, 1539, 1683, 1828, 1915, 2005, 2049, 2054, 2060, 2062, 2079, 2084, 2087, 2088, 2092, 2093, 2096, 2098, 2099, 2104, 2105, 2106, 2108, 2112, 2549, 2801, 2802, 2892, 3157, 3288, 3290, 3291, 3554, 3704, 3706, 3716, 3728, 3729, 3750, 3772, 3784, 3793, 3801, 3819, 3820, 3826, 3827, 3828, 3829, 3833, 3835, 3839, 3840, 3841, 3842, 3843, 3844, 3845, 3846, 3885, 2801, 2891, 1153, 1352, 1374, 1376, 2053, 2070, 2078, 2079, 2088, 2095, 2109, 3705, 3706, 3716, 3728, 3729, 3750, 3772, 3794, 3795, 3817, 3818, 3821, 3822, 3823, 3824, 3836, 3838, 1101, 1102, 1146, 1149, 2039, 2050, 2054, 2056, 2071, 2073, 2088, 2089, 2091, 2099, 2769, 3704, 3807, 3815, 3832, 3837, 1539, 1101, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 2071, 2073, 2091, 2099, 3750, 3832, 3834, 2770, 2771, 2789, 2790, 3089, 3691))) AND entries.id IN (SELECT entries.id 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id LEFT OUTER JOIN tag_fields ON tag_box_fields.id = tag_fields.field_id 
WHERE tag_fields.tag_id = 1) AND (entries.suffix NOT IN ('json', 'xmp', 'aae', 'xml'))
 LIMIT 500 OFFSET 0

#679 (Other Library: Benchmark x and y Query)

SQLite CLI: Run Time: real 0.216 user 0.182426 sys 0.032897 <- Improvement
TagStudio: 1185 Results Found (30.46 Seconds) <- Slowdown?
TagStudio (move to page 2): 1185 Results Found (38.77 seconds) <- Slowdown?

Full SQL Query
-- #679 (Other Library: `Benchmark x and y Query`)
SELECT DISTINCT entries.id, entries.folder_id, entries.path, entries.suffix 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id 
WHERE (EXISTS (SELECT 1 
FROM tag_fields 
WHERE tag_fields.field_id = tag_box_fields.id AND tag_fields.tag_id IN (2471, 1101, 1103, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1352, 1376, 1378, 1384, 1539, 1683, 1828, 1915, 2005, 2049, 2054, 2060, 2062, 2079, 2084, 2087, 2088, 2092, 2093, 2096, 2098, 2099, 2104, 2105, 2106, 2108, 2112, 2549, 2801, 2802, 2892, 3157, 3288, 3290, 3291, 3554, 3704, 3706, 3716, 3728, 3729, 3750, 3772, 3784, 3793, 3801, 3819, 3820, 3826, 3827, 3828, 3829, 3833, 3835, 3839, 3840, 3841, 3842, 3843, 3844, 3845, 3846, 3885, 2801, 2891, 1153, 1352, 1374, 1376, 2053, 2070, 2078, 2079, 2088, 2095, 2109, 3705, 3706, 3716, 3728, 3729, 3750, 3772, 3794, 3795, 3817, 3818, 3821, 3822, 3823, 3824, 3836, 3838, 1101, 1102, 1146, 1149, 2039, 2050, 2054, 2056, 2071, 2073, 2088, 2089, 2091, 2099, 2769, 3704, 3807, 3815, 3832, 3837, 1539, 1101, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 2071, 2073, 2091, 2099, 3750, 3832, 3834, 2770, 2771, 2789, 2790, 3089, 3691))) AND entries.id IN (SELECT entries.id 
FROM entries LEFT OUTER JOIN tag_box_fields ON entries.id = tag_box_fields.entry_id LEFT OUTER JOIN tag_fields ON tag_box_fields.id = tag_fields.field_id 
WHERE tag_fields.tag_id = 1) AND (entries.suffix NOT IN ('json', 'xmp', 'aae', 'xml'))
 LIMIT 500 OFFSET 0

So in this other personal library I've tested with the query runs much quicker inside the CLI but for some reason is significantly dragged down inside of TS. I wonder if whatever was causing the TS queries to take a big longer is going full force here, I'm just having trouble narrowing down what might be going on.
There does seem to be a significant gap between the expression builder finishing and the actual start of the library searching:

2025-01-08 22:55:08 [info     ] SQL Expression Builder finished (0.03 seconds)
2025-01-08 22:55:33 [info     ] searching library              filter=...
2025-01-08 22:55:42 [info     ] SQL Execution finished (9.25 seconds)

Upon some debugging it seems that the count_all: int = session.execute(query_count).scalar() on line 557 of library.py is responsible for the slowdown in the log, but still doesn't account for the full ~30 second query search slowdown.

@Computerdores
Copy link
Collaborator Author

#679 (Other Library: Benchmark x and y Query)

SQLite CLI: Run Time: real 0.216 user 0.182426 sys 0.032897 <- Improvement TagStudio: 1185 Results Found (30.46 Seconds) <- Slowdown? TagStudio (move to page 2): 1185 Results Found (38.77 seconds) <- Slowdown?

Full SQL Query
So in this other personal library I've tested with the query runs much quicker inside the CLI but for some reason is significantly dragged down inside of TS. I wonder if whatever was causing the TS queries to take a big longer is going full force here, I'm just having trouble narrowing down what might be going on. There does seem to be a significant gap between the expression builder finishing and the actual start of the library searching:

2025-01-08 22:55:08 [info     ] SQL Expression Builder finished (0.03 seconds)
2025-01-08 22:55:33 [info     ] searching library              filter=...
2025-01-08 22:55:42 [info     ] SQL Execution finished (9.25 seconds)

Upon some debugging it seems that the count_all: int = session.execute(query_count).scalar() on line 557 of library.py is responsible for the slowdown in the log, but still doesn't account for the full ~30 second query search slowdown.

Can you run that again now that I have added a log statement for the time required to count the entries in the result?

@Computerdores
Copy link
Collaborator Author

Computerdores commented Jan 9, 2025

Small note: c1d6a11 introduces a small bug, however this bug will disappear once #655 is merged / won't even get to main if this PR is merged after #655
(I also added a comment about this in the mean time)

@CyanVoxel
Copy link
Member

Can you run that again now that I have added a log statement for the time required to count the entries in the result?

Running on c1d6a11:

Mock library "show dolor"

2025-01-09 09:27:37 [info     ] SQL Expression Builder finished (0.02 seconds)
2025-01-09 09:27:42 [info     ] finished counting (4.69 seconds)
2025-01-09 09:27:42 [info     ] searching library              filter=...
2025-01-09 09:27:42 [info     ] SQL Execution finished (0.58 seconds)

Other library "X AND Y"

2025-01-09 09:24:38 [info     ] SQL Expression Builder finished (0.01 seconds)
2025-01-09 09:25:00 [info     ] finished counting (22.04 seconds)
2025-01-09 09:25:00 [info     ] searching library              filter=...
2025-01-09 09:25:09 [info     ] SQL Execution finished (8.95 seconds)

@Computerdores
Copy link
Collaborator Author

Can you run that again now that I have added a log statement for the time required to count the entries in the result?

Running on c1d6a11:

Mock library "show dolor"

2025-01-09 09:27:37 [info     ] SQL Expression Builder finished (0.02 seconds)
2025-01-09 09:27:42 [info     ] finished counting (4.69 seconds)
2025-01-09 09:27:42 [info     ] searching library              filter=...
2025-01-09 09:27:42 [info     ] SQL Execution finished (0.58 seconds)

Other library "X AND Y"

2025-01-09 09:24:38 [info     ] SQL Expression Builder finished (0.01 seconds)
2025-01-09 09:25:00 [info     ] finished counting (22.04 seconds)
2025-01-09 09:25:00 [info     ] searching library              filter=...
2025-01-09 09:25:09 [info     ] SQL Execution finished (8.95 seconds)

Can you also run those on main and post the relevant log lines? (I know the counting time won't be in there, but it would still be useful)

@CyanVoxel
Copy link
Member

Can you also run those on main and post the relevant log lines? (I know the counting time won't be in there, but it would still be useful)

I went ahead and added the timers on top of the main code (no other changes):

Mock library "show dolor"

2025-01-09 10:41:09 [info     ] SQL Expression Builder finished (0.02 seconds)
2025-01-09 10:41:15 [info     ] finished counting (6.29 seconds)
2025-01-09 10:41:15 [info     ] searching library              filter=...
2025-01-09 10:41:22 [info     ] SQL Execution finished (6.49 seconds)

Other library "X AND Y"

2025-01-09 10:42:11 [info     ] SQL Expression Builder finished (0.04 seconds)
2025-01-09 10:42:14 [info     ] finished counting (3.05 seconds)
2025-01-09 10:42:14 [info     ] searching library              filter=...
2025-01-09 10:42:16 [info     ] SQL Execution finished (2.04 seconds)

@Computerdores
Copy link
Collaborator Author

Computerdores commented Jan 11, 2025

So it turns out search performance on large libs has gotten quite a lot worse on main...
This is how long my benchmark query is taking after merging with main:
main: 5min 28s
PR: 1min 27s

@python357-1
Copy link
Collaborator

What query are you running?

@Computerdores
Copy link
Collaborator Author

What query are you running?

The same one I was using in the description. Looks like this on my lib: T T-1. If you want to try it on the same lib, follow these steps to create it:

  • Open TS and create a new library in an empty directory
  • Run this script in the directory:
Python Script
import sqlite3
import random

INSERT_ENTRY = "INSERT INTO entries (folder_id, path, suffix) VALUES (?, ?, ?);"
INSERT_TAG = "INSERT INTO tags (name, color, is_category) VALUES (?, ?, false)"
INSERT_PARENT = "INSERT INTO tag_parents (parent_id, child_id) VALUES (?, ?)"
INSERT_TAG_FIELD = "INSERT INTO tag_entries (entry_id, tag_id) VALUES (?, ?)"

PATH = ".TagStudio\\ts_library.sqlite"

# ENTRY_BASE ^ 2 entries will be created
ENTRY_BASE = 500

# TAG_BASE ^ TAG_EXPONENT tags will be created
TAG_BASE = 100
TAG_EXPONENT = 2

# TAG_COUNT random tags will be added to each entry
TAG_COUNT = 50

conn = sqlite3.connect(PATH)

c = conn.cursor()


entries = [(1, f"{a}/{b}.txt", "txt") for b in range(ENTRY_BASE) for a in range(ENTRY_BASE)]
c.executemany(INSERT_ENTRY, entries)

print("Created Entries")

entry_ids = c.execute("SELECT id FROM entries;").fetchall()

COLORS = ["YELLOW", "RED_ORANGE", "RED", "LIGHT_BLUE", "BLUE", "GRAY"]
def insert_tag(base_name: str, parent: int, depth: int):
    for i in range(TAG_BASE):
        name = f"{base_name}-{i}"
        new_parent = c.execute(INSERT_TAG, (name, COLORS[depth])).lastrowid
        if new_parent is None:
            raise RuntimeError
        c.execute(INSERT_PARENT, (new_parent, parent))
        if depth > 0:
            insert_tag(f"{name}-{i}", new_parent, depth-1)

parent = c.execute(INSERT_TAG, ("T", "GREEN")).lastrowid
if parent is None:
    raise RuntimeError
insert_tag("T", parent, TAG_EXPONENT-1)

tag_ids = c.execute("SELECT id FROM tags;").fetchall()

print("Created Tags")

for entry in entry_ids:
    c.executemany(INSERT_TAG_FIELD, [(entry[0], tag[0]) for tag in random.sample(tag_ids, TAG_COUNT)])
        
print("Added Tags")

c.close()
conn.commit()
conn.close()

@CyanVoxel
Copy link
Member

I'm also experiencing a huge slowdown with the "show dolor" test, now taking around 53 seconds on main and 18 seconds here, with most of the work here being done on the counting step

@Computerdores
Copy link
Collaborator Author

Computerdores commented Jan 11, 2025

I'm also experiencing a huge slowdown with the "show dolor" test, now taking around 53 seconds on main and 18 seconds here, with most of the work here being done on the counting step

can you check the other one as well? Because that was previously the one that worsened

@CyanVoxel
Copy link
Member

I'm also experiencing a huge slowdown with the "show dolor" test, now taking around 53 seconds on main and 18 seconds here, with most of the work here being done on the counting step

can you check the other one as well? Because that was previously the one that worsened

That one appears to have sped up on main (down to 3 seconds) and especially here (down to 0.1 seconds) - I'm now skeptical if I'm even using the same query as I did before, despite having written it down... In any case, everything I run here now is quicker than anything I run on main 🙃

@Computerdores
Copy link
Collaborator Author

I'm also experiencing a huge slowdown with the "show dolor" test, now taking around 53 seconds on main and 18 seconds here, with most of the work here being done on the counting step

can you check the other one as well? Because that was previously the one that worsened

That one appears to have sped up on main (down to 3 seconds) and especially here (down to 0.1 seconds) - I'm now skeptical if I'm even using the same query as I did before, despite having written it down... In any case, everything I run here now is quicker than anything I run on main 🙃

Hmm... now I am confused whether I should be confused or not...
It being slower in the first place was confusing and that it isn't anymore now is also confusing... Does confusingness cancel out here or does it multiply? I am so confused

@CyanVoxel
Copy link
Member

I'm also experiencing a huge slowdown with the "show dolor" test, now taking around 53 seconds on main and 18 seconds here, with most of the work here being done on the counting step

can you check the other one as well? Because that was previously the one that worsened

That one appears to have sped up on main (down to 3 seconds) and especially here (down to 0.1 seconds) - I'm now skeptical if I'm even using the same query as I did before, despite having written it down... In any case, everything I run here now is quicker than anything I run on main 🙃

Hmm... now I am confused whether I should be confused or not... It being slower in the first place was confusing and that it isn't anymore now is also confusing... Does confusingness cancel out here or does it multiply? I am so confused

I can definitely feel it multiply in this case...

@CyanVoxel CyanVoxel added Status: Mergeable The code is ready to be merged and removed Status: Review Needed A review of this is needed labels Jan 11, 2025
@CyanVoxel CyanVoxel merged commit 5c8f2c5 into TagStudioDev:main Jan 13, 2025
5 checks passed
@CyanVoxel CyanVoxel removed the Status: Mergeable The code is ready to be merged label Jan 13, 2025
@CyanVoxel
Copy link
Member

Thank you for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High An important issue requiring attention TagStudio: Search The TagStudio search engine Type: Enhancement New feature or request
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants