1BRC in Elixir #93

IceDragon200 · 2024-01-04T18:53:11Z

IceDragon200
Jan 4, 2024

Implementations

https://github.com/IceDragon200/1brc_erl_ex_test

Discussion	Contributor	Link	Language
#93 (comment)	Kartstig	https://github.com/Kartstig/1brc_erl	Erlang
#93 (comment)	jesperes	https://github.com/jesperes/erlang_1brc	Erlang
#93 (comment)	garazdawi	https://github.com/garazdawi/erlang_1brc	Erlang
#93 (comment)	onno-vos-dev	https://github.com/onno-vos-dev/1brc	Erlang
#93	IceDragon200	https://github.com/IceDragon200/1brc_ex	Elixir
#93 (comment)	mnfloresv	https://github.com/mnfloresv/1brc-elixir	Elixir
#93 (reply in thread)	rparcus	https://github.com/rparcus/ex_1brc	Elixir
#93 (reply in thread)	rrcook	https://github.com/rrcook/brc	Elixir
#93 (comment)	andypho	https://github.com/andypho/1brc	Elixir
#93 (comment)	stevensonmt	https://github.com/stevensonmt/brc	Elixir
IceDragon200/1brc_erl_ex_test#5	mneumann	https://github.com/mneumann/1brc-elixir	Elixir

AMD Ryzen 5 2600 / 32Gb DDR4 IceDragon200

Standard Data Set (420 weather stations)

50 Million

Contributor & Commit	Time (50M)	CPU% (50M)	Mem kb (50M)	Comments
jesperes	00:01.06	922%	973988
rparcus-with_explorer	00:02.08	810%	1670636
rparcus-runtime_compiled	00:02.11	523%	757776	Won the RNG Lottery (#93 (comment))
garazdawi	00:02.39	1070%	340048
IceDragon200	00:03.17	908%	833112
mneumann	00:03.55	840%	318888
onno-vos-dev	00:03.37	345%	1208524
rparcus-better_file_reader	00:06.76	1038%	816060
rrcook	00:09.06	1094%	238652
Kartstig	00:19.91	360%	757724
rparcus-just_elixir	00:25.89	289%	193716
andypho	00:27.19	357%	140848
stevensonmt	00:27.23	363%	194028
mnfloresv	00:30.52	348%	189292

1 Billion

Contributor & Commit	Time (1B)	CPU% (1B)	Mem kb (1B)	Comments
rparcus-runtime_compiled	00:14.57	1064%	1272896	Won the RNG Lottery (#93 (comment))
jesperes	00:15.42	1139%	1040872
onno-vos-dev	00:19.53	1007%	1451976
rparcus-with_explorer	00:26.64	1065%	25459792
garazdawi	00:42.57	1176%	350848
IceDragon200	00:46.44	1167%	938872
mneumann	00:49.41	1166%	318016
rparcus-better_file_reader	02:08.83	1160%	4494896
rrcook	02:56.21	1123%	245964
Kartstig	05:56.68	393%	13546576
rparcus-just_elixir	08:23.19	283%	203932
andypho	08:29.06	357%	161452
stevensonmt	08:31.09	373%	211704
mnfloresv	09:54.67	348%	189292

10k City Data Set

50 Million

Contributor & Commit	Time (50M)	CPU% (50M)	Mem kb (50M)	Comments
rparcus-with_explorer	00:02.44	774%	1647112
jesperes	00:02.50	650%	821316
mneumann	00:03.91	863%	360824
onno-vos-dev	00:05.15	464%	1179844
garazdawi	00:06.80	1023%	607792
IceDragon200	00:07.53	1022%	1249700
rparcus-better_file_reader	00:09.10	1056%	990444
rparcus-runtime_compiled	00:23.41	187%	1003712	Won the RNG Lottery (#93 (comment))
rrcook	00:26.59	999%	1322332
rparcus-just_elixir	00:31.38	288%	213396
stevensonmt	00:32.11	405%	1379696
andypho	00:36.45	380%	205496
Kartstig	00:56.50	431%	1014856
mnfloresv	02:17.39	234%	225332

1 Billion

Contributor & Commit	Time (1B)	CPU% (1B)	Mem kb (1B)	Comments
rparcus-with_explorer	00:36.28	1058%	28558884
onno-vos-dev	00:38.87	1081%	1398632
jesperes	00:41.55	1155%	1084288
mneumann	00:53.63	1161%	400416
rparcus-runtime_compiled	01:00.50	707%	1055176	Won the RNG Lottery (#93 (comment))
garazdawi	02:00.66	1137%	617936
IceDragon200	02:12.07	1168%	1686932
rparcus-better_file_reader	02:43.36	1167%	4514804
andypho	09:23.23	379%	220544
stevensonmt	09:40.13	415%	1171440
rrcook	10:22.71	1106%	16270312
rparcus-just_elixir	10:52.72	280%	227824
Kartstig	18:13.26	447%	16842868
mnfloresv	45:49.63	232%	232408

RETIRED: Intel i7-2710QE @ 2.10Ghz / 16Gb DDR3 IceDragon200

This has been retired, if your implementation is not listed here, it's because I've stopped testing on my poor laptop, check the Ryzen section for your updated figure.

Dirty System - While under normal usage

Commit	Time (50M)	Mem kb (50M)	Time (1B)	Mem kb (1B)	Comments
Kartstig/1brc_erl@`b0a42d6`)	00:44.05	736568	OOM	N/A	High-Memory
jesperes/erlang_1brc@`09b5ffa`)	Pending	N/A	Pending	N/A	Pending
IceDragon200/1brc_ex@`2931163`)	00:21.94	726124	07:23.93	N/A	High-CPU, Moderate-Memory
mnfloresv/1brc-elixir@`b0a81e5`)	00:57.48	128796	18:44.84	N/A	Low-CPU, Low-Memory
rparcus/ex_1brc@`a61f018`)	00:04.85	1454096	OOM	N/A	Very-Fast, High-Memory, Has-Dependencies
rrcook/brc@`4b19ecb`)	00:39.00	482720	OOM	N/A	High-Memory, Moderate-CPU
andypho/1brc@`6a6da4f`	00:56.39	109984	19:48.36	118852	Low-CPU, Low-Memory
stevensonmt/brc@`cd1f939`	See Clean	-	-	-	See clean, no longer testing on the i7

Clean System - Nothing else running with it

Commit	Time (50M)	Mem kb (50M)	Time (1B)	Mem kb (1B)	Comments
Kartstig/1brc_erl@`b0a42d6`)	00:41.79	736496	15:08.69	13503400	High-Memory, Moderate-CPU
jesperes/erlang_1brc@`09b5ffa`)	Pending	N/A	Pending	N/A	Pending
IceDragon200/1brc_ex@`2931163`)	00:20.15	760956	07:45.96	856184	Moderate-Memory, High-CPU
mnfloresv/1brc-elixir@`b0a81e5`)	00:56.35	135656	18:14.55	143636	Low-Memory, Low-CPU
rparcus/ex_1brc@`a61f018`)	00:04.50	1565412	01:53.38	14787216	High-Memory, Moderate-CPU
rrcook/brc@`4b19ecb`)	00:37.46	499284	19:00.64	11378640	High-Memory, Moderate-CPU
andypho/1brc@`6a6da4f`	See Dirty	-	-	-	See Dirty, performance would be the same
stevensonmt/brc@`cd1f939`	01:18.39	131592	25:28.49	186184	Low-CPU, Low-Memory

AMD Threadripper 1950X @ ?Ghz / 94.2Gb DDR4 Kartstig

Commit	Time (50M)	Mem kb (50M)	Time (1B)	Mem kb (1B)	Comments
Kartstig/1brc_erl@`b0a42d6`	Pending	N/A	01:30.40	13561004	Pending
jesperes/erlang_1brc@`09b5ffa`	Pending	N/A	Pending	N/A	Pending
IceDragon200/1brc_ex@`2931163`	Pending	N/A	00:44.40	2782672	Pending
mnfloresv/1brc-elixir@`b0a81e5`	Pending	N/A	10:49.12	275052	Pending
rparcus/ex_1brc@`a61f018`	Pending	N/A	Pending	N/A	#93 (reply in thread)
rrcook/brc@`4b19ecb`	Pending	N/A	01:46.60	365120	Pending
andypho/1brc@`6a6da4f`	Pending	N/A	07:51.37	222400	Pending
stevensonmt/brc@`cd1f939`	Pending	N/A	13:29.34	256144	Pending

Wait a minute, how did you get these numbers?

tl;dr i7-2710QE capped at 2.10Ghz, and 16Gb of DDR3 RAM, only 4Gb is readily available for benching on a Lenovo E520. The hardware is 10+ years old.

All implementations were ran on a dirty system (that is, it's actively doing other work, mostly just firefox running so I can check this discussion), I'm currently listening to music which should be a low cpu and low disk job.

Hardware is a 10+ year old laptop CPU, more specifically a i7-2710QE running at a max clock of 2.10Ghz, turbo is disable to avoid overheating since the Lenovo E520 it's currently sitting in was not designed for it.

They are all ran with:

time your_main_entry_point

I always run the 50M test first, so your code will at least compile during that window, it will be ran twice, since all currently listed implementations can finish in under a minute

OOM?

Out-Of-Memory, I have 16Gb of memory, only 4~9Gb is usually free, so that means all of these implementations needed to run under 4Gb at worst, if your implementation OOMs, I will mark it as such no further comment.

Okay, so you sabotaged it for your own implementation!

No, I am aware of my system's constraints and appropriately reduced the amount of work it had to do in order to work within those constraints, relying on the disk and hammering the CPU instead of the memory, all implementations are ran with the same constraints, I try not to use the computer while it's benching, outside of listening to music to pass the time (come on, 18 minutes is a lot of nothing to do you know).

How do I get my implementation listed?

Just comment in this discussion with a link to it, I'll check and try to run it, just be aware of what I'm benchmarking it on.

Okay, I have an implementation!

Great, if it doesn't process 50M rows in under 2 minutes I'm not doing the 1B row test, I'm not wasting time or compute on that (and to be fair, I treat all of my implementations the same)

Okay, I made changes, can I get a re-test?

Sure, just comment that you did and I'll re-test it.

Current Status

After much trial and error and gathering knowledge from everyone's methods, I finally have a fast-enough implementation (there may be more room for improvement):

https://github.com/IceDragon200/1brc_ex/blob/master/src/1brc.workers.blob.maps.chunk_to_worker.exs

This completes in under 7 minutes on an i7-2710QE @ 2.10Ghz (turbo was disabled to maintain a safe operating temp, this laptop was not designed for the CPU after all).

Originally the code was idiomatic elixir, but there are a few things:

File vs :prim_file - if you run the original code under :eperf, one would notice the countless file related modules being called through standard Elixir.File or erlang's :file, while I'm not entirely sure why it has so many processes involved, it likely has to do with concurrent access. The short solution is to cut out all of it and drop down to :prim_file it is likely not concurrently safe, but chances are you're already chewing through the file on a single process as fast as you can, this removes around 2 to 3 layers of indirection and sends
:binary.split/2 - is major the bottleneck, it's unclear if there is a way to make it faster at this point
Float.parse/1 vs :erlang.binary_to_float/1 vs just doing it yourself -
- Float.parse was much slower than expected, until one looks into it's implementation, since Float.parse needs to handle's elixir's niceities, it introduced additional parsing overhead
- :erlang.binary_to_float/1, is much faster, if you don't need all the nice elixir things (e.g. underscores)
- But in this scenario you can do one better and prefer fixed-point math, since all decimals in the source files only have one place, you can extract each
Reading lines, reading chunks or Streams? -
- Reading each line going through IO.read_line/1 is a terrible idea, it is wasteful if one understands how reading line works internally, at the very bottom of it all is :prim_file, a read_line must read a chunk into a prim_buffer and then each character scanned to look for the newline, the result is then returned, this is great, if you didn't have 3+ function calls between each line read and the same scanning over and over and over again.
- Chunks are the way to go, read as much of the file into memory at once and then treat it as one massive binary that can be reduced upon with :binary.split, binaries are pretty optimized and fast
- Streams + read_ahead are pretty good, but you could do more closer to the metal (i.e. :prim_file) without the need for all the overhead, still pretty though

No longer relevant original post

Not exactly the fastest, erlang/elixir has trouble with heavy IO applications in general.
I only tested it against 50 million rows vs the 1 billion, the former completes in around a minute (specifically the 1brc.workers.stream.exs others are 2x to 8x slower), since it pretty much scales linearly from what I've seen, I estimate it would take 20 minutes to complete on my hardware (i.e. Intel Core i7-2710QE).

As of this writing my poor laptop's fan is screaming at me while I run the 1 billion line test.

I tried four different implementations, anything with the reduce suffix is 4x slower than its stream counterpart.

The fastest implementation I have so far is the 1brc.workers.stream.exs on my system it uses 32 workers/erlang-processes (that is 8 logical processors x 4) by taking batches of 100'000 rows at a time for each worker it keeps them quite busy anything less and they end up idling.

Each implementation is mostly self contained outside of the output module which is shared across all four for writing the final result to console/STDIO

If anyone with slightly more recent hardware would be willing to provide some better numbers that would be swell.

jesperes · 2024-01-07T11:18:03Z

jesperes
Jan 7, 2024

I was planning to do it in Erlang just for the fun of it, I'll keep you posted on my progress. :)

0 replies

onno-vos-dev · 2024-01-08T15:13:29Z

onno-vos-dev
Jan 8, 2024

@IceDragon200 Did you get your 1brc challenge completed? I managed in 141 seconds on Apple M1 Pro with OTP 26 using plain Erlang. I'll try and push a solution later tonight after some cleaning up :-)

I know that I still have a bottleneck at a specific place, I just don't know how to fix it just yet 😅

3 replies

IceDragon200 Jan 8, 2024
Author

No, it overheated, this is 10+ year old hardware fyi, I haven't had the time to look into it to see where I'm losing most of my cycles yet

IceDragon200 Jan 16, 2024
Author

@onno-vos-dev Did you ever get your implementation uploaded?

onno-vos-dev Jan 16, 2024

@IceDragon200 I was stuck in life and work and never pushed to GitHub 😥 Now I'm on vacation so it'll have to wait until Sunday/Monday depending on your timezone 👍 Will update here once I do 👌

rparcus · 2024-01-10T09:44:59Z

rparcus
Jan 10, 2024

I'm trying something with Flow as this is almost the same scenario from their docs.

defmodule VanillaFlow do
  def run() do
    filename = "./data/measurements.txt"
    parent = self()

    File.stream!(filename, :line)
    |> Flow.from_enumerable()
    |> Flow.map(fn line ->
      [ws, temp] = :binary.split(line, ";")
      temp = :binary.split(temp, "\n") |> List.first() |> :erlang.binary_to_float()
      [ws, temp]
    end)
    |> Flow.partition()
    |> Flow.reduce(fn -> :ets.new(:words, []) end, fn [ws, temp], ets -> # ETS

      case :ets.lookup(ets, ws) do
        [] ->
          :ets.insert(ets, {ws, {temp, temp, temp}})

        [{_, {current_min, current_mean, current_max}}] ->
          :ets.insert(ets, {ws, {min(current_min, temp), (current_mean + temp) / 2, max(current_max, temp)}})
      end
      ets
    end)
    |> Flow.on_trigger(fn ets ->
      :ets.give_away(ets, parent, [])

      {[ets], :new_reduce_state_which_wont_be_used} # Emit the ETS
    end)
    |> Enum.to_list()
    # then tab2list...
  end
end

2 replies

stevensonmt Jan 16, 2024

Love this. Did you see the docs suggestion to avoid a single large file as a stream?

streams = for file <- File.ls!("dir/with/files") do
  File.stream!("dir/with/files/#{file}", read_ahead: 100_000)
end

streams
|> Flow.from_enumerables()

might lead to some small gains.

rparcus Jan 17, 2024

The single file source is indeed a huge bottleneck when using File.stream!. In real life, it would make sense to split the file in smaller parts but this challenge was about a single file. I got some interesting results following other implementations here using the low level :prim_file API.

rparcus · 2024-01-11T10:29:11Z

rparcus
Jan 11, 2024

Found an old post from 2007 about file processing in erlang. They were also referencing some 1 million (not billion) rows problem which seemed to take 6-7 seconds on their hardware.
The comments section get's pretty hot, with mentions of "language wars", Java, Ruby, etc.

How far we've come :)

1 reply

jesperes Jan 14, 2024

I contributed an Erlang benchmark to https://github.com/PlummersSoftwareLLC/Primes, mostly for fun, and when I posted about it in the Erlanger slack describing the dismal performance (compared to Rust), people were almost arguing that I shouldn't have done it because that is not what Erlang is good for. Like I had shown Erlang in bad light, or something.

jesperes · 2024-01-14T15:40:02Z

jesperes
Jan 14, 2024

Pure Erlang solution which runs in 130s on the full 1B input: https://github.com/jesperes/erlang_1brc/blob/main/src/aggregate.erl. Almost all the time is spent inside binary:split/3.

5 replies

IceDragon200 Jan 14, 2024
Author

It's always nice to see erlang so I can learn about modules I wouldn't otherwise use and ~~steal~~ to improve my own implementations, jokes aside, I didn't even think of writing a specialized float parser, I switched from Float.parse/1 to :erlang.binary_to_float/1.

Unfortunately on my hardware, I'm already losing 30s on just reading the file with no processing, could you share your hardware specs for comparison?

rparcus Jan 14, 2024

Cool!

Almost all the time is spent inside binary:split/3

The parsing can probably be faster with some pattern matching. In elixir, it could look like this:

defmodule Parsers do
  # Generates a function that takes binary strings representing numbers from -99 to 99
  # and returns the corresponding integer. The numbers may be followed by more binary information such as
  # a newline character that will be ignored.
  @compile {:inline, temp2float: 1, split: 1}

  for int <- 0..99 do
    for dec <- 0..9 do
      str_int = Integer.to_string(int)
      str_dec = Integer.to_string(dec)
      the_match = "#{str_int}.#{str_dec}"
      float_val = String.to_float("#{str_int}.#{str_dec}")
      def temp2float(<<"-", unquote(the_match), _>>), do: -unquote(float_val)
      def temp2float(<<unquote(the_match), _>>), do: unquote(float_val)
      def temp2float(<<"-", unquote(the_match)>>), do: -unquote(float_val)
      def temp2float(unquote(the_match)), do: unquote(float_val)
    end
  end

  # Generates a function that takes binary strings and splits them in 2 parts:
  # - the first part can have max length of 100 bytes and is delimited by a semicolon.
  # - the second part is a float number that may be followed by a newline character.
  for x <- 1..100 do
    ws_length_in_bits = x * 8

    def split(<<ws::bitstring-size(unquote(ws_length_in_bits)), ";", temp::binary>>),
      do: {ws, temp}
  end
end

IceDragon200 Jan 14, 2024
Author

Or you could just do fixed-point integers (just like jesperen):

  defp binary_to_fixed_point(<<?-, d2, d1, ?., d01>>) do
    -(char_to_num(d2) * 100 + char_to_num(d1) * 10 + char_to_num(d01))
  end

  defp binary_to_fixed_point(<<?-, d1, ?., d01>>) do
    -(char_to_num(d1) * 10 + char_to_num(d01))
  end

  defp binary_to_fixed_point(<<d2, d1, ?., d01>>) do
    char_to_num(d2) * 100 + char_to_num(d1) * 10 + char_to_num(d01)
  end

  defp binary_to_fixed_point(<<d1, ?., d01>>) do
    char_to_num(d1) * 10 + char_to_num(d01)
  end

On the topic of :binary.split, it's unfortunately unavoidable, I ran the eperf on my own code, and :binary.split is indeed eating up most of the time, we aren't getting that any faster since it's already an nif.

jesperes Jan 14, 2024

I'm going to take a look inside binary:split/3 to try to understand how it interacts with binaries. I suspect it is going to be difficult to squeeze much more performance out of my machine, but who knows.

jesperes Jan 14, 2024

The 130s figure is on a i7-10610U CPU, so it is pretty spiffy but no monster.

rparcus · 2024-01-14T16:38:11Z

rparcus
Jan 14, 2024

We should be able to go faster if we could read the file concurrently. That's what the Java implementations are doing after all.

Does anyone know if we could use Erlang's pread/2 to achieve that?

3 replies

IceDragon200 Jan 14, 2024
Author

pread won't work unless you know the position of chunks/lines ahead of time I think

You can drop down to :prim_file like with jesperen's code to shed some of the file overhead: about 33% in my tests (overall still too slow on my hardware, at this point I think it's just a hardware bottleneck)

jesperes Jan 14, 2024

I'm using prim_file:read/2 in my solution. It works fine, but you need to compensate for the last line which will usually be cut into two pieces. I do essentially this:

loop(FD) -> loop(FD, <<>>).
loop(FD, PrevBuf) ->
    {ok, NewBuf} = prim_file:read(FD, ?BUFSIZE),
    [First, Rest] = binary:split(NewBuf, <<"\n">>),
    Chunk = <<PrevBuf/binary, First/binary>>,
    process_chunk(Chunk), %% Chunk is now a binary with a whole number of lines
    loop(FD, Rest).

jesperes Jan 14, 2024

I've tried both prim_file:read/2 and prim_file:pread, and haven't seen any big difference.

onno-vos-dev · 2024-01-14T17:40:03Z

onno-vos-dev
Jan 14, 2024

I use pread in my approach and chop off the last bit of the binary that doesn't form a full line and append it to the next chunk. This allows me to decouple reading and processing and do both in parallel.

"Unfortunately" I went on vacation before I could clean it up and push my solution so it'll have to wait a week.

In essence though the flow is as followed:
Start a reader process which uses prim_file:pread to read chunks. Hand a chunk over to annother process to find a set of complete lines. Hand that over to a worker that processes it and accumulates state in the "chunkier" process. This way I can keep all cores on an M1 Mac busy all the time.

Most time was spent in binary_to_float/1 so curious how much improvements @jesperes float parsing brings to my approach. Will try that next week when I'm back at home 👍

3 replies

jesperes Jan 14, 2024

I'm doing something similar. The original caller process starts N "pipelines", and then goes on reading the input file in chunks, and sending the chunks to one of the N pipelines in a round-robin fashion. Each of the processing pipelines consists of two processes, one which calls binary:split/3 and another one which parses each line.

rrcook Jan 14, 2024

How do you account for a partial line at the end of a chunk going to one pipeline and the rest of that line going to the next pipeline?

IceDragon200 Jan 15, 2024
Author

Read a chunk (N bytes) and then immediately read a line afterwards, append the result to your chunk before sending it to the worker (that's what I did at least)

{:ok, blob} = IO.read(file, count)
{:ok, line} = IO.read_line(file)
blob = <<blob::binary, line::binary>>

You will of course have to check if you've reached the end of the file in both cases, but that's the gist of it

IceDragon200 · 2024-01-14T17:55:54Z

IceDragon200
Jan 14, 2024
Author

Well, finally finished my first run: 17 minutes, I think I can do better even with my potato hardware, I'll have new results in... 20 minutes:

Good news: down to 8 minutes, https://github.com/IceDragon200/1brc_ex/blob/master/src/1brc.workers.blob.maps.chunk_to_worker.exs on my hardware

8 replies

jesperes Jan 15, 2024

How did you find out about prim_file? I did a search trying to find it in Erlang docs but they're so confusing I couldn't find how to use it, I had to use your code as a template.

Well, I knew about the prim_file module from reading source code at work, and looked up its usage by reading the source code. :)

IceDragon200 Jan 15, 2024
Author

I found out about prim_file because of jesperes, I tried looking it up from the docs as well but it was nowhere to be found, but I do have elixir and erlang's sources locally available so I usually just read them there

rrcook Jan 16, 2024

I'm running your (IceDragon200) version on my machine, and it's taking about half the time as mine. I don't know how much more I'll work on it because each change basically turns it into your version.
However I will point out that you're incorrect in implementing the mean. My numbers are matching the java implementation, you're aren't. It looks like you're adding and dividing by two as you go, where the mean is usually being calculated by keeping a count of how many lines are done for each city, and a running total on the temperatures, then total/count at the end.

IceDragon200 Jan 16, 2024
Author

Ah, whoops, how off was it? decimals or whole numbers off?

EDIT: Fixing it made it a tad bit faster now, since I don't do a rolling average (which doesn't work with multiple processes, so I'll keep that in mind next time)

jesperes Jan 16, 2024

Yeah, it is worth keeping in mind that you can process the lines in any order you want, and then do the final computation of the mean at the end.

jesperes · 2024-01-15T07:51:52Z

jesperes
Jan 15, 2024

How do you account for a partial line at the end of a chunk going to one pipeline and the rest of that line going to the next pipeline?

For chunks C1, C2, C3, ... Cn, I split C2 once at \n into {C2a, C2b}, and process C1+C2a together, then C2b + C3a, then C3b + C4a, etc.

0 replies

jesperes · 2024-01-15T07:55:14Z

jesperes
Jan 15, 2024

One problem with binary:split/3 is that it needs to construct a list holding all the sub-binaries; there is no way to process them streaming. I'd like to have lazily evaluated iterator like you can easily do in Rust:

input.split('\n').fold(...)

so that you can fold over the binaries without having to construct a (huge) list first.

2 replies

rparcus Jan 15, 2024

Could you try String.splitter for that? It should behave as you describe. No idea if it will be faster though.

jesperes Jan 15, 2024

Well, no, because String.splitter is Elixir and I am, unfortunately, an Erlang-person. :)

(I mean, I could write my own fold-method which splits on line at a time, which I presume is what String.splitter does, but presumably the Elixir function is also implemented in beam-space and not as a nif, so it is likely not much faster.)

rparcus · 2024-01-15T09:56:44Z

rparcus
Jan 15, 2024

Even though the challenge is about using "just java" or Elixir/Erlang in this case, this Explorer solution seemed quite interesting to me.

defmodule WithExplorer do
  # Results
  # [
    # 1_000_000_000: 675483.000ms,
    #   500_000_000: 58244.713ms,
    #   100_000_000: 10321.046ms,
    #    50_000_000: 5104.949ms,
  # ]
  require Explorer.DataFrame
  alias Explorer.{DataFrame, Series}

  @filename "./data/measurements.txt"

  def run() do
    parent = self()

    results = @filename
    |> DataFrame.from_csv!(header: false, delimiter: ";", eol_delimiter: "\n")
    |> DataFrame.group_by("column_1")
    |> DataFrame.summarise(min: Series.min(column_2), mean: Series.mean(column_2), max: Series.max(column_2))
    |> DataFrame.arrange(column_1)
    
    # This could be improved... Would be nice to do it inside Explorer
    for idx <- 0..(results["column_1"] |> Series.to_list() |> length() |> Kernel.-(1)) do
      "#{results["column_1"][idx]}=#{results["min"][idx]}/#{:erlang.float_to_binary(results["mean"][idx], decimals: 2)}/#{results["max"][idx]}"
    end
  end
end

The eager backend scaled linearly up to about 7GB (500 mil lines) using all my cores. Something happens with larger files that will make my machine do a lot of disk IO, and the 1 billion version performs ~6x slower.
Also, moving the position of DataFrame.arrange up one line will massively impact performance.

0 replies

mnfloresv · 2024-01-16T13:14:38Z

mnfloresv
Jan 16, 2024

Another Elixir implementation, which has been my first program in this language. It takes about 12 minutes with 1B rows on an ultra-low-voltage processor that is 7 years old.
https://github.com/mnfloresv/1brc-elixir

1 reply

IceDragon200 Jan 16, 2024
Author

I love the implementation, but unfortunately it's not fast... very concise though.
Thanks for sharing! I learned something from it

Here are the stats on my hardware:
i7-2710QE @ 2.10Ghz:

./calculate_average.exs  4075.50s user 324.30s system 391% cpu 18:44.84 total

Kartstig · 2024-01-16T14:39:49Z

Kartstig
Jan 16, 2024

Another Erlang implementation:

https://github.com/Kartstig/1brc_erl

Takes ~80s on my Threadripper 1950X

However, I haven't utilized prim_file or memory mapping like these top contenders. Instead I'm reading the file in a single process upfront, and then distributing the binary chunks. I think there's some room for improvement

9 replies

IceDragon200 Jan 16, 2024
Author

So good news: I think it completed on my system, I had to kill everything not needed outside of my desktop environment (i3 btw), I haven't checked the stats yet though, I'll have an update in an hour or so

IceDragon200 Jan 16, 2024
Author

@Kartstig Good news, I have the test repo up https://github.com/IceDragon200/1brc_erl_ex_test/tree/master

Bad news: I deleted the results (while fixing up the repo) before updating the table, I'm stupid, so we'll have to wait another hour or so for the results, again...

Kartstig Jan 19, 2024

@IceDragon200 I tried running it. Got through all the submodues, but I had some issue with the symlink:

I ran them one by one moving the file into each directory manually live a caveman:

@andypho - Slow, but barely noticed any memory usage. Was very light on my cores

	Command being timed: "./run.exs"
	User time (seconds): 1546.34
	System time (seconds): 103.24
	Percent of CPU this job got: 349%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:51.37
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 222400
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2595
	Minor (reclaiming a frame) page faults: 45784842
	Voluntary context switches: 2237266
	Involuntary context switches: 40443
	Swaps: 0
	File system inputs: 0
	File system outputs: 24
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@IceDragon200 - Quick. Used all my cores and a good chunk of memory

	Command being timed: "./src/1brc.workers.blob.maps.chunk_to_worker.exs"
	User time (seconds): 1329.40
	System time (seconds): 12.61
	Percent of CPU this job got: 3022%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:44.40
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2782672
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2444
	Minor (reclaiming a frame) page faults: 4103827
	Voluntary context switches: 16165
	Involuntary context switches: 150534
	Swaps: 0
	File system inputs: 0
	File system outputs: 24
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@Kartstig - Lots of memory usage :). Probably too much core activity

	Command being timed: "./run.sh ./measurements.txt"
	User time (seconds): 2549.36
	System time (seconds): 8.99
	Percent of CPU this job got: 2829%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:30.40
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 13561004
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 5847
	Minor (reclaiming a frame) page faults: 3421897
	Voluntary context switches: 7770
	Involuntary context switches: 207806
	Swaps: 0
	File system inputs: 48
	File system outputs: 96
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@rparcus
I got an error:

* (HEAD detached at 2b83bd1) 2b83bd1 Try solutions with pure erlang/elixir
  main                       2b83bd1 Try solutions with pure erlang/elixir
    warning: Flow.from_enumerable/1 is undefined (module Flow is not available or is yet to be defined)
    │
  9 │     |> Flow.from_enumerable()
    │             ~
    │
    └─ lib/entries/6_normal_parse.ex:9:13: NormalParser.run/0

    warning: Flow.map/2 is undefined (module Flow is not available or is yet to be defined)
    │
 10 │     |> Flow.map(fn str ->
    │             ~
    │
    └─ lib/entries/6_normal_parse.ex:10:13: NormalParser.run/0

    warning: Flow.on_trigger/2 is undefined (module Flow is not available or is yet to be defined)
    │
 17 │     |> Flow.on_trigger(fn ets ->
    │             ~
    │
    └─ lib/entries/6_normal_parse.ex:17:13: NormalParser.run/0

    warning: Flow.partition/2 is undefined (module Flow is not available or is yet to be defined)
    │
 14 │     |> Flow.partition(key: fn {ws, _} -> ws end, window: Flow.Window.global())
    │             ~
    │
    └─ lib/entries/6_normal_parse.ex:14:13: NormalParser.run/0

    warning: Flow.reduce/3 is undefined (module Flow is not available or is yet to be defined)
    │
 16 │     |> Flow.reduce(fn -> :ets.new(NormalParser, []) end, &reducer/2)
    │             ~
    │
    └─ lib/entries/6_normal_parse.ex:16:13: NormalParser.run/0

    warning: Flow.Window.global/0 is undefined (module Flow.Window is not available or is yet to be defined)
    │
 14 │     |> Flow.partition(key: fn {ws, _} -> ws end, window: Flow.Window.global())
    │                                                                      ~
    │
    └─ lib/entries/6_normal_parse.ex:14:70: NormalParser.run/0

** (FunctionClauseError) no function clause matching in BetterFileReader."-inlined-int2atom/1-"/1    
    
    The following arguments were given to BetterFileReader."-inlined-int2atom/1-"/1:
    
        # 1
        193
    
    (ex_1brc 0.1.0) lib/entries/13_better_file_reader.ex:176: BetterFileReader."-inlined-int2atom/1-"/1
    (elixir 1.16.0) lib/enum.ex:4368: Enum.map_range/4
    (elixir 1.16.0) lib/enum.ex:4368: Enum.map_range/4
    (elixir 1.16.0) lib/enum.ex:4368: Enum.map/2
    (ex_1brc 0.1.0) lib/entries/13_better_file_reader.ex:12: BetterFileReader.run/0
    run-better_file_reader.exs:2: (file)
Command exited with non-zero status 1
	Command being timed: "./run-better_file_reader.exs"
	User time (seconds): 0.50
	System time (seconds): 0.16
	Percent of CPU this job got: 134%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.49
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 94052
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2880
	Minor (reclaiming a frame) page faults: 19593
	Voluntary context switches: 4201
	Involuntary context switches: 338
	Swaps: 0
	File system inputs: 8
	File system outputs: 8
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 1

@rrcook - Moderate Core usage.

	Command being timed: "./brc ./measurements.txt"
	User time (seconds): 1682.81
	System time (seconds): 157.63
	Percent of CPU this job got: 1726%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:46.60
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 365120
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1318
	Minor (reclaiming a frame) page faults: 50632251
	Voluntary context switches: 3151267
	Involuntary context switches: 175299
	Swaps: 0
	File system inputs: 48520
	File system outputs: 32
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@stevensonmt - Light core usage

	Command being timed: "./run-1B.exs "
	User time (seconds): 3020.96
	System time (seconds): 117.49
	Percent of CPU this job got: 387%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 13:29.34
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 256144
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2779
	Minor (reclaiming a frame) page faults: 41150466
	Voluntary context switches: 1858647
	Involuntary context switches: 98143
	Swaps: 0
	File system inputs: 96
	File system outputs: 24
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@mnfloresv - Light core usage

	Command being timed: "mix run --no-mix-exs ./calculate_average.exs"
	User time (seconds): 1755.07
	System time (seconds): 103.98
	Percent of CPU this job got: 286%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 10:49.12
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 275052
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2489
	Minor (reclaiming a frame) page faults: 37453537
	Voluntary context switches: 2562679
	Involuntary context switches: 18938
	Swaps: 0
	File system inputs: 0
	File system outputs: 24
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

stevensonmt Jan 19, 2024

Cool, thanks for running those. I'm clearly not concurrent enough but have several ideas for how to improve it. I assumed Task.async_stream with max_concurrency of the number of cores would max out the system without any other manual manipulation but I guess it does not.

IceDragon200 Jan 19, 2024
Author

Updated the table

And that's weird with the symlinks, worked fine on both of my machines are you running this on linux?

rparcus has multiple implementations, the one I usually pull the stats from is the with_explorer one, the other two were attempts without it, he last said to disregard that specific implementation due to a race condition though.

Did I miss anything else?

IceDragon200 · 2024-01-16T15:28:57Z

IceDragon200
Jan 16, 2024
Author

@jesperes Could you provide a bash script to run your code, I'm putting together a table with the current times using my hardware for comparison I just have no idea how to run yours

3 replies

jesperes Jan 24, 2024

I've done some improvements and incorporated some fixes from @garazdawi. There is a script which can be run using run.sh [filename] which will print the elapsed time on stdout.

Feel free to update the table at the top when/if you get around to rerunning it.

jesperes Jan 24, 2024

Oh, I didn't realize that there was a scaffolding project. I'll make a PR to it to update the submodule for my implementation.

IceDragon200 Jan 24, 2024
Author

@jesperes re-ran the benchmark, but the numbers are about the same

IceDragon200 · 2024-01-16T16:11:20Z

IceDragon200
Jan 16, 2024
Author

Okay, everyone I've tried gathering some concrete numbers from my hardware (and almost killed it in the process) running everyone's implementation.

I think we can all appreciate a single piece of hardware running it and giving somewhat consistent numbers yes?

You can check the primary comment for your implementation, some times and my comments about it, for everyone who's new, you can link your implementation by simply commenting and I'll get it up within an hour of seeing it, usually.

4 replies

rparcus Jan 16, 2024

Could you check this one also? https://github.com/rparcus/ex_1brc/blob/main/lib/entries/10_with_explorer.ex
You probably should turn on :lazy when by uncommenting lines 30 and 45.

Related chat: https://elixirforum.com/t/performance-with-explorer-scales-linearly-with-size-and-then-suddenly-degrades-on-large-file/60987

IceDragon200 Jan 16, 2024
Author

Sure, give me however long it takes to run, and I'll probably need to enable :lazy yeah

EDIT.1: Very promising, 4 seconds on 50M rows, but it doesn't output the result so I can't verify if it's correct
EDIT.2: Enabled lazy, got error

** (RuntimeError) the function `pull/2` is not available for the Explorer.PolarsBackend.LazyFrame backend. Please use Explorer.DataFrame.collect/1 and then call this function upon the resultant
dataframe
    (explorer 0.7.2) lib/explorer/polars_backend/lazy_frame.ex:505: Explorer.PolarsBackend.LazyFrame.pull/2
    (explorer 0.7.2) lib/explorer/data_frame.ex:3737: Explorer.DataFrame.pull_existing/2
    (explorer 0.7.2) lib/explorer/data_frame.ex:372: Explorer.DataFrame.fetch/2
    (elixir 1.16.0) lib/access.ex:309: Access.get/3
    (ex_1brc 0.1.0) lib/entries/10_with_explorer.ex:47: WithExplorer.run/0
    run.exs:2: (file)

EDIT.3: One moment, enabled collect, trying again, modified output:

    IO.write "{"
    for idx <- 0..(results["column_1"] |> Series.to_list() |> length() |> Kernel.-(1)) do
      min = :erlang.float_to_binary(results["min"][idx], decimals: 2)
      mean = :erlang.float_to_binary(results["mean"][idx], decimals: 2)
      max = :erlang.float_to_binary(results["max"][idx], decimals: 2)
      "#{results["column_1"][idx]}=#{min}/#{mean}/#{max}"
    end
    |> Enum.intersperse(", ")
    |> IO.write()
    IO.write "}"

EDIT.4: 4.7s on 50M rows, OOM on 1B rows
EDIT.5: Will try to rerun after freeing up more memory (will require that I rebench everyone to be fair though)

rrcook Jan 16, 2024

GIve mine a try please, https://github.com/rrcook/brc

After you git clone, 'mix deps.get', 'mix escript.build' then './brc measurements.txt'. I have timing built in, it will tell you the number of milliseconds after the output.

IceDragon200 Jan 16, 2024
Author

Sure thing, I'm waiting on the others to finish, then I'll add yours to the list, had to change computers since I need all my resources.

EDIT: Added your repo to the tests, need to rerun everything because I deleted it in my infinite wisdom

IceDragon200 · 2024-01-17T19:38:41Z

IceDragon200
Jan 17, 2024
Author

@rparcus Broke one of your new implementations:

** (FunctionClauseError) no function clause matching in BetterFileReader."-inlined-int2atom/1-"/1    
    
    The following arguments were given to BetterFileReader."-inlined-int2atom/1-"/1:
    
        # 1
        193
    
    (ex_1brc 0.1.0) lib/entries/13_better_file_reader.ex:176: BetterFileReader."-inlined-int2atom/1-"/1
    (elixir 1.16.0) lib/enum.ex:4368: Enum.map_range/4
    (elixir 1.16.0) lib/enum.ex:4368: Enum.map_range/4
    (elixir 1.16.0) lib/enum.ex:4368: Enum.map/2
    (ex_1brc 0.1.0) lib/entries/13_better_file_reader.ex:12: BetterFileReader.run/0
    run-better_file_reader.exs:2: (file)

4 replies

rparcus Jan 18, 2024

Wow your new system has more cores than mine^^.

At the bottom of the file, where that function is generated, there is a comment about this. Could you bump the number up to be equal or greater than the number of cores you have?

IceDragon200 Jan 18, 2024
Author

Strange, I only have 6 cores / 12 threads, you'd think that would fit in the range you have there

IceDragon200 Jan 18, 2024
Author

iex(2)> :erlang.system_info(:logical_processors)
32

Okay, I have more than I thought I did, what is going on here

rparcus Jan 18, 2024

I thought a bit about it and I think that there is a race condition issue with the implementation :( Please disregard :(

IceDragon200 · 2024-01-17T20:39:09Z

IceDragon200
Jan 17, 2024
Author

Everyone, the Ryzen numbers are up... and while I want to chalk it up to bias hardware being bias... look at the numbers yourself (gosh I look good up there)

The most surprising is @rparcus was able to claim the 50M spot, but his implementation seems to have fallen apart at the 1B test.

I have no idea why, considering the intel test smoked everyone else, so I'd like to believe it has something to do with the Intel vs AMD optimizations in the nifs themselves.

Other implementations did, as expected with faster hardware, same thing, just faster.

3 replies

stevensonmt Jan 17, 2024

Any idea what's going on with the memory use here:

https://github.com/mnfloresv/1brc-elixir/commit/b0a81e55c2950f45a75660a93c947ce7a609f6d6)	00:33.16	192140	11:42.23	184936

with the 1B run using LESS memory than the 50M?

rrcook Jan 17, 2024

2nd place for the non-Explorer implementations, I'll take it.
I also see that @IceDragon200 's 1B line run uses 10x the memory of mine, that leads me to think I could tune with bigger buffers passed to the workers. That and the fact that my memory doesn't go up from 50M lines to 1B lines.

IceDragon200 Jan 18, 2024
Author

@rrcook My implementation scales by the number of logical processors, which I just learned was not 12 but 32, so it's actually 128 workers instead of 48 that chewed up chunks of 16Mb, so it likely under-performed (which is scary to think about)
EDIT: Just confirmed it, I removed the 4 multiplier on the worker_count, and doubled the chunk size: it's 3s faster and half the memory usage, but I'll leave the table as is

@stevensonmt It's a fairly stable implementation, it could just be BEAM being BEAM, these are just rough estimates I feel
50M

	Command being timed: "mix run --no-mix-exs ./calculate_average.exs"
	User time (seconds): 100.05
	System time (seconds): 14.00
	Percent of CPU this job got: 343%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:33.16
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 192140
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 2790992
	Voluntary context switches: 119221
	Involuntary context switches: 40379
	Swaps: 0
	File system inputs: 0
	File system outputs: 24
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

1B

	Command being timed: "mix run --no-mix-exs ./calculate_average.exs"
	User time (seconds): 2141.35
	System time (seconds): 264.97
	Percent of CPU this job got: 342%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 11:42.23
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 184936
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 50202419
	Voluntary context switches: 2016894
	Involuntary context switches: 61158
	Swaps: 0
	File system inputs: 0
	File system outputs: 24
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

IceDragon200 · 2024-01-19T14:08:22Z

IceDragon200
Jan 19, 2024
Author

@garazdawi Results are up, considering your low CPU and low memory footprint, I really wish I could have ran jesperen's original version as well for comparison, that is some amazing performance, makes me think I could squeeze some more out of my own implementation right now.

Bonus points: the 50M completed in under 6s on my intel cpu when I was testing the test scripts, the only other implementation that did that was @rparcus explorer

6 replies

IceDragon200 Jan 19, 2024
Author

Ah, so I'll just use your repo for his results then, give me a moment, could you shed some light on why the implementation is this fast though?

garazdawi Jan 19, 2024

My implementation is faster than Jesper's because I use bit syntax instead of binary:split. Using bit syntax does not build the huge list that is otherwise generated. I also avoid a copy of the binary by not joining the chunk binaries before sending them to the process.

I haven't looked at the other implementations here, so can't say what makes the difference there.

IceDragon200 Jan 19, 2024
Author

Ah, I see it now, it's faster because all it has to do is scan the string and then count how many bits/bytes it needs to extract before the semicolon, completely avoiding the list construction from binary.split

You can safely slice the string without creating a new copy, brilliant, I'll try it in my implementation and see if that improves performance on my intel system

stevensonmt Jan 19, 2024

@garazdawi, I'm having a hard time following your code (b/c I'm pretty unfamiliar with Erlang mostly) but I think I'm doing something similar in my newer approach with macros in Elixir. Is this what you mean by "use bit syntax instead of binary:split"?

@compile {:inline, parse_line: 1}
  for x <- 1..100 do
    station_len = x * 8

    for i <- 0..99 do
      temp_int = Integer.to_string(i)

      for d <- 0..9 do
        temp_dec = <<d + ?0>>
        temp_str = "#{temp_int}.#{temp_dec}"

        def parse_line(
              <<station::bitstring-size(unquote(station_len)), ";", unquote(temp_str),
                _rest::binary>>
            ),
            do: {station, unquote(i) + unquote(d) / 10}

        def parse_line(
              <<station::bitstring-size(unquote(station_len)), ";-", unquote(temp_str),
                _rest::binary>>
            ),
            do: {station, -1 * (unquote(i) + unquote(d) / 10)}
      end
    end
  end

garazdawi Jan 19, 2024

My first version did something similar to that, but I realised that I could do it simpler, so the latest version is just a simple loop.

IceDragon200 · 2024-01-19T16:14:04Z

IceDragon200
Jan 19, 2024
Author

Updated Ryzen times for: @IceDragon200 @garazdawi and @jesperes

Also rerunning the entire suite to see if the figures change drastically, I think garazdawi's code wasn't JIT-ed at the time of testing hence the slower time

10 replies

IceDragon200 Jan 20, 2024
Author

@stevensonmt Ran and updated (figures are about the same), you accidentally committed a massive log file though, not sure if that was intended

stevensonmt Jan 20, 2024

That's bizarre. I'm consistently getting 30 seconds on my 4 cores for 50M. Can't imagine what's happening to make it slower with more cores. I will try to see what's going on with the log file. I think lexical crashed and that's probably the log file.

So yeah lexical crashed and kept trying to restart. Thanks for pointing that out.
I cloned your repo to try and figure out why I'm getting such different results. I ran your data generation script and it produces temps with two decimals like so "Cairo;23.4.1". I had changed it to fix that in my own data generation. How are you generating the data when you run them?

Also I see the difference in our timing methods now, which explains why I'm so far off in my estimates. I was going off running :timer.tc(fn -> run_file(file) end) from within the algorithm. With your method of time ./my_impl_binary /path/to/measurements.txt I get

It took 27981.012 milliseconds # this is the final line of the output from my implementation
./brc lib/measurements.txt 87.16s user 6.14s system 329% cpu 28.290 total

I'm not sure what accounts for the huge discrepancy. Starting the VM and cleaning up the tables are the only things not captured in the run_file/1 call, and those should probably not take 50 seconds.

stevensonmt Jan 21, 2024

I just hand timed a run and the wall-clock time it takes is apparently equivalent to what time reports in the last total which is similar to what the :timer.tc output is.

It took 33055.563 milliseconds
./brc lib/measurements.txt 121.72s user 3.84s system 376% cpu 33.357 total
stopwatch: 36s

I feel like I'm taking crazy pills over here. That 121.72 is somewhat close to the number of cores * the 33.357 total. Is there some bug in time that has if reporting the cpu total time labelled as the user time and the user time labelled as the cpu total time?

These numbers are all for a 50M row data file.

I ran @IceDragon200's implementation and got

elixir src/1brc.workers.blob.maps.chunk_to_worker.exs 26.14s user 0.44s system 368% cpu 7.214 total

while my most recent iteration is getting

It took 22607.198 milliseconds
./brc lib/measurements.txt 82.35s user 0.52s system 361% cpu 22.914 total

so I would expect mine to run at maybe 10-15 seconds on the machine that's getting 3s for his implementation. This is really puzzling to me.

IceDragon200 Jan 21, 2024
Author

Let's unfold it:

It took 33055.563 milliseconds # assuming this is from your :timer.tc/2 call

./brc lib/measurements.txt 
121.72s user (how much cpu time the process spent, this can exceed the wall clock or total based on the number of cores if you run it concurrently, makes sense)
3.84s system (how much time was spent on kernel level calls)
376% cpu (how much cpu overall was used)
33.357 total (this is the wall clock time)

stopwatch: 36s # if you timed this by hand your reaction time may have been a bit slouchy

Honestly I think it's fine, btw I used GNU time for the tests

stevensonmt Jan 21, 2024

Thanks for the explanation, but what seems crazy to me is that you got a slower time on better hardware (I assume your table is reporting wall-clock time). I also used GNU time when I ran them.

Name : time
Version : 1.9-4
Description : Utility for monitoring a program's use of system resources
Architecture : x86_64
URL : https://www.gnu.org/software/time/

And yes, my reaction time is not great on the stopwatch, lol.

BTW, I don't expect you to troubleshoot this or anything. I just don't like not understanding what could be responsible for the differences.

rparcus · 2024-01-21T16:00:44Z

rparcus
Jan 21, 2024

Even if :prim_file is not thread safe, one can use a semaphore to regulate access and let each worker read its own chunk with :prim_file.

A worker process could:

claim access to the file,
read a chunk,
inform the scheduler process that the file now can be used again,
proceed to parse the chunk and calculate partial results while other processes also read their own chunks.

I thought it would be faster because we don't need to pass partial chunks of binaries around between process, but turns out that it's about 30% slower than something similar to what @IceDragon200's implementation does, where one process reads and all other workers parse.

My lesson learned here is:

don't try to reinvent the BEAM's scheduler
next weekend I'm going to spend some time studying the implementation of message-passing because I'm sure to find some really smart things there to make it so good.

0 replies

onno-vos-dev · 2024-01-22T15:11:04Z

onno-vos-dev
Jan 22, 2024

Managed to tidy up my implementation a bit and push it (1brc) and create a PR to add my implementation @IceDragon200 👍 See: IceDragon200/1brc_erl_ex_test#1

It's not the cleanest code I ever wrote but I don't believe that was part of the challenge here 😅

I'm using @jesperes Float parsing but otherwise the structure is fairly similar to what the others have tried. Interestingly enough my approach beats @garazdawi (by a hair 😅) by using plain binary:split/3.

Here are the results from my Mac M1 Pro:

Versions used for testing:
Erlang: 26.0
Elixir: 1.16

50M Results:

Contributor	Time (50M)	CPU% (50M)	Mem kb (50M)	Comments
stevensonmt	0:00.42	190%	71472	CRASHED - INVALID
rparcus-with_explorer	0:01.15	576%	1274192
garazdawi	0:01.63	797%	143696
IceDragon200	0:02.04	647%	734752
jesperes	0:02.07	699%	4179072
onno-vos-dev	0:02.54	440%	1653024
rrcook	0:06.55	902%	171280
rparcus-better_file_reader	0:09.14	704%	813664
rparcus-just_elixir	0:10.59	250%	163616
andypho	0:12.71	323%	119776
mnfloresv	0:13.00	347%	130768
Kartstig	0:13.43	348%	739920

1B Results:

Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)	Comments
stevensonmt	0:00.60	127%	72480	CRASHED - INVALID
rparcus-with_explorer	0:15.29	806%	25399216
onno-vos-dev	0:27.37	831%	9739296
IceDragon200	0:27.95	879%	1764624
garazdawi	0:29.15	893%	155584
jesperes	1:06.61	549%	21963216
rrcook	2:01.35	884%	797136
rparcus-better_file_reader	3:04.08	742%	1614800
rparcus-just_elixir	3:45.43	228%	153376
andypho	3:54.88	340%	112576
Kartstig	4:04.60	390%	13540320
mnfloresv	4:07.58	359%	134992

13 replies

onno-vos-dev Jan 22, 2024

Oh wow! Interesting to see it perform that much slower on Ryzen HW compared to my M1 😱 Guess I got some more work to do to steal that first place 😅

stevensonmt Jan 22, 2024

What was the error? Just thinking about this the problem might be that my repo includes the executable compiled on my machine which would be incompatible with your Mac. Did you run the test with MIX_ENV=prod mix escript.build to generate an executable in the root directory on your machine before running ./brc /path/to/measurements.txt?
Just ran it on an old AMD athlon 4 core machine and had no problem. Roughly same time as the i5.

IceDragon200 Jan 22, 2024
Author

Yeah I was surprised too, I ran it three times to be sure, it's pretty much the same figure, heck I ran my implementation first to ensure the disk was warmed up (clocked in at 2 minutes instead of 46 seconds like it normally does), but your implementation always runs at around 1 minute, so not sure what's going on there

IceDragon200 Jan 24, 2024
Author

@stevensonmt You changed the main function right? So it no longer takes the second parameter, the test scaffold needed updating so I'll fix that

EDIT: Okay your new results are up

stevensonmt Jan 24, 2024

Thanks! Still a little disappointing but at least it's close to what I'm getting for the 50M on four cores. Thanks again for doing these runs! I'm curious if I'm interpreting the results correctly. Since it's maxing out at <400% CPU usage on a 6-core CPU, I'm guessing either something in my code limits the concurrency or the concurrent tasks complete so quickly that it's never using all cores to the max. Is that accurate? Just curious if I'm doing something to limit it to only 4 cores by accident and I don't catch it b/c all my machines are only 4 cores since the CPU usage is almost the same:

./brc lib/measurements.txt 79.58s user 0.45s system 382% cpu 20.928 total

I noticed I was using a chunk size about 5x more than what @IceDragon200 uses. I reduced it from 80M bytes to 10M and my times are now consistently under 18 seconds for 50M lines on my 4 core CPU. Not sure how to predict the optimal chunk size baed on available hardware.

jesperes · 2024-01-26T14:12:59Z

jesperes
Jan 26, 2024

I have pushed a PR with two optimizations:

Ensure that the match context is reused when performing successive matches over a binary.
Use a pre-computed "compressed" key to store station names. This speeds up lookups and is a prerequisite for (1) to work (at least in my case).

6 replies

jesperes Jan 26, 2024

I'd be happy if you'd ping me whenever you rerun this, but no hurry. :)

IceDragon200 Jan 26, 2024
Author

@jesperes Wish fulfilled

jesperes Jan 26, 2024

Awesome, thanks!

stevensonmt Jan 26, 2024

This is seriously impressive

jesperes Jan 27, 2024

Fun thing: I ran this on my Windows gaming rig (i5-12400F 32GB), and got the blazingly fast runtime of 13.5s (!).

onno-vos-dev · 2024-01-29T14:21:31Z

onno-vos-dev
Jan 29, 2024

@IceDragon200 just merged my latest version. See: IceDragon200/1brc_erl_ex_test#4

Results on my Mac M1 Pro compared to the fastest previous solution from Jesperes. a 17.2% improvement on the 1B results. For some reason its a slower on the 50M results. Something that I'm looking into tonight 👍

50M results:

Contributor	Time (50M)	CPU% (50M)	Mem kb (50M)
jesperes	0:01.00	533%	141216
onno-vos-dev	0:01.61	247%	549280

1B results:

Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)
onno-vos-dev	0:09.44	793%	623056
jesperes	0:11.40	880%	145216

Very excited to see how this'll do on Ryzen and if we can get an "official" sub 10 second time on the board 😅

12 replies

mneumann Jan 30, 2024

Great! With 10k entries, the Erlang process dictionary seem to be the bottle neck. ETS scales better in that regard.

onno-vos-dev Jan 30, 2024

@IceDragon200 Created IceDragon200/1brc_erl_ex_test#9 in order to address my crash 👍

@jesperes You're getting a name_clash crash as your hashing function apparently isn't unique enough (something you mentioned to me before) 😞

@mneumann I ran mine through the 1B 10k test and process dictionary still outperforms ETS by about 3x (3.23x on my M1 Pro). To be clear, I don't think we should start using process dictionary instead of ETS in production systems (there has to be strong reasons to rely on process dict instead of some other storage) but for the sake of the 1brc it still performs well and while being 64% slower, it still handles it well.

Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)	Comments
onno-vos-dev	00:15.51	749%	654672
mneumann	00:50.13	865%	463232

For shits and giggles I also tried a 1B with 41.342 weather stations (all names from the csv present in this repo) to see if process dictionary finally caves in. 🔨

Generated 41 342 station names with length from 2 to 49

While both of are are definitely slower, process dictionary still takes the crown 👑 While process dictionary is 2.28x slower with that many more keys, ETS is only 1.27x slower so ETS takes the crown on that front 🤴

Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)	Comments
onno-vos-dev	00:35.34	785%	3538144
mneumann	01:03.42	882%	494240

Combined set of default 1B, 1B with 10k keys and 1B with 41.342 keys sorted by time.

Funny to see how much memory spiked with 41.342 weather stations on my solution 😅 And how stable yours is 🎉 🥇

Weather Stations	Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)	Mem gb (1B)
413	onno-vos-dev	00:10.47	715%	793344	0.76
10k	onno-vos-dev	00:15.51	749%	654672	0.62
41.342	onno-vos-dev	00:35.34	785%	3538144	3.37
413	mneumann	00:47.14	868%	383936	0.37
10k	mneumann	00:50.13	865%	463232	0.44
41.342	mneumann	01:03.42	882%	494240	0.47

(Disclaimer: Each of these sets was run while I had other windows open such as Slack, Spotify and Chrome which eat a little bit of CPU despite me being hands-off from my keyboard this means slightly higher times than on idle)

mneumann Jan 30, 2024

@onno-vos-dev Are you saying that Process dictionary is always faster than ETS? On my machine, it's 2x slower in the 10k weather stations case. Maybe I am doing something wrong :). Btw, I created an adaptive solution that can start using a Process dict and then upgrades on-the-fly to a ETS when a certain threshold (number of keys) is reached. As it is super generic, it is slightly slower than the specialized version, but not significantly :). I now also support monadic-style Map storage backends :).

onno-vos-dev Jan 30, 2024

@mneumann I don't think Process dictionary is always faster per sé. It might be that if I tweak my solution to use ETS that on the 10k approach, I get better times on ETS. I'll try that out tonight 👍 I think process dictionary probably suffers when dealing with larger process dictionaries especially since they're updated as often as they are 👍

I should have been more clear on this in my above statements. I'll get back to you tonight 👍

How does a map perform? I have memories and experiments of Advent of Code experiments where maps performed poorly on larger number of keys. Would be curious to see how it performs here.

garazdawi Jan 30, 2024

When thinking about this problem early on I thought that maybe using persistent_term + counter would be an option. Though I don't know how to make max or min work with those, but doing the average should be easy enough.

IceDragon200 · 2024-01-30T01:55:01Z

IceDragon200
Jan 30, 2024
Author

Upon request I have ran the 10k weather station test, results are under the default 420 (heh) weather station tests, since this test takes much longer to run, I won't be updating it frequently

0 replies

garazdawi · 2024-01-30T08:36:49Z

garazdawi
Jan 30, 2024

Here are the 1B Results on 224 core Intel Xeon:

Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)	Comments
rparcus-just_elixir	00:04.44	6817%	401820	Exit status: 1
jesperes	00:06.59	3530%	795412
garazdawi	00:07.58	6989%	1007004
rparcus-better_file_reader	00:07.91	3874%	433164	Exit status: 1
rparcus-with_explorer	00:10.68	2945%	423096	Exit status: 1
IceDragon200	00:19.05	2840%	4644476
mneumann	00:30.56	19497%	3899480
onno-vos-dev	00:35.47	1494%	7546968
rrcook	01:53.42	3061%	1050764
Kartstig	05:59.84	395%	13658180
andypho	08:10.22	445%	760772
mnfloresv	08:37.70	494%	832216
stevensonmt	10:02.69	479%	860824

The machine is behind an http proxy that denied access to some packages that rparcus uses which is why those did not work. This is run using Erlang 26.2.1 and latest main Elixir.

4 replies

onno-vos-dev Jan 30, 2024

@garazdawi Wow, I'm surprised mine is at 35 seconds 😱 Are you sure it's using the latest version of the submodule? 🤔 It should be on par with Jesperes and yours 🤔 I have a hard time understanding why it's so much slower 🐌

And thank you for running these on this sort of hardware! 🙇‍♂️

garazdawi Jan 30, 2024

I was surprised as well. I ran 2bfe0d7 from your repo. The machine is not completely idle, so something could have interfered. I'll do another run to see if the results are the same.

onno-vos-dev Jan 30, 2024

Interesting as that commit should be fine. It's funny how it was 2x slower than on the Ryzen hardware 😅 I'll have to do some thinking on this one if the 2nd run is similar 😄

garazdawi Jan 30, 2024

Here are the 1B Results:

Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)	Comments
jesperes	00:06.12	3978%	798728
garazdawi	00:07.54	7175%	1026796
rparcus-just_elixir	00:08.13	3609%	455348	Exit status: 1
rparcus-with_explorer	00:08.75	3343%	435572	Exit status: 1
rparcus-better_file_reader	00:08.93	3245%	463164	Exit status: 1
IceDragon200	00:18.69	2900%	4893844
mneumann	00:30.05	19588%	3935204
onno-vos-dev	00:33.39	1604%	7383496
rrcook	02:02.45	2839%	1035208
andypho	08:40.99	442%	774624
stevensonmt	09:37.82	483%	841048
Kartstig	10:25.33	385%	13657772
mnfloresv	11:47.20	466%	835788

roughly the same on this run.

jesperes · 2024-01-31T21:38:50Z

jesperes
Jan 31, 2024

Updated my solution now, and replaced some shitty home-grown hash function with Fnv32 (thanks @onno-vos-dev) and stole some spawn options from him too. :)

4 replies

IceDragon200 Jan 31, 2024
Author

Results updated for both standard and 10k dataset (congrats on having the fastest standard implementation, kind of hard to beat an NIF in the 10k test)

mneumann Feb 1, 2024

@jesperes Did the spawn options make any difference in performance? I quickly tried some of these out, but did not see a signification change. Or are they only useful when the solution uses a lot memory to disable GC?

jesperes Feb 1, 2024

Yes, the spawn options actually do seem to make a difference, at least in my case. I saw 37s -> 31s, or something in that range.

jesperes Feb 1, 2024

The off_heap thing also makes a couple of seconds difference on the 1B dataset.

onno-vos-dev · 2024-01-31T22:23:42Z

onno-vos-dev
Jan 31, 2024

In case anyone is interested, feel free to rip out my (half-baked) PropEr test (PR: onno-vos-dev/1brc#4) and throw it at your solution 😄

It found some bugs for me and some really interesting ones... Apparently the FNV32a hash for <<"næðl">> and <<"JiÔk">> is the same 🤣 Who would have thought... My half-baked home-grown hash is probably not perfect still but it passes PropEr now for the last few runs I did so it gets the job done for now.

2 replies

IceDragon200 Jan 31, 2024
Author

Updated results, looks like you lost some performance generally, but it's still fast even with 10k cities

onno-vos-dev Jan 31, 2024

I would have expected to have lost a bit of performance but really wanted to get PropEr to not have to do any special things. Now it's my turn again to try and speed things back up while maintaining correctness through the support of PropEr 😅 😄

Thank you for running the results again! ❤️

onno-vos-dev · 2024-02-04T13:59:08Z

onno-vos-dev
Feb 4, 2024

After a huge amount of fighting trying to come up with a variant of the FNV32 Hash which was both fast and produced unique results, I finally managed! 🎉 See details in: onno-vos-dev/1brc#5 which includes a list of 128k unique cities (those with a population over 1000 inhabitants) with a unit test to ensure uniqueness across all of them.

With a runtime of Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.99 on my Mac M1 Pro I did not seem to have lost any time, possibly even gained some time compared to my original solution.

PR: IceDragon200/1brc_erl_ex_test#12

2 replies

IceDragon200 Feb 4, 2024
Author

Updated standard and 10k results, an improvement in times, but positions haven't really changed

onno-vos-dev Feb 4, 2024

Thank you! It's a start 😄 At least I'm confident at my PropEr test as well as the unit test and I can change things now and ensure things don't break going forward ❤️

Thanks for re-running and updating the results!

IceDragon200 · 2024-02-05T15:28:51Z

IceDragon200
Feb 5, 2024
Author

@mneumann Results updated with mneumann/1brc-elixir@efb7aae

0 replies

rparcus · 2024-02-08T16:38:41Z

rparcus
Feb 8, 2024

Hi everyone, new idea on top of what already worked well:
This is an elixir version of a few concepts shared here by Jesperes and Onno-vos + a cursed way to avoid hashing...
Also, I was hoping to see some performance improvements but process_flags on wsl2 didn't help at all so I kept them out.

rparcus/ex_1brc@198debe

It does pretty well on my machine for the 420 test, and should be close to the top solutions without using dependencies. Didn't test for 10k...
Whenever you have time, could you run RuntimeCompiled as well?

14 replies

rparcus Feb 8, 2024

@rparcus how does the generated code look like? (I can't run the code generator at the moment.)

Can share tmr, in about a few hours. I have printed and formatted one but am already in bed now.

IceDragon200 Feb 8, 2024
Author

Okay all tests completed for the new implementation, you've somehow won the lottery in all cases, so congrats

onno-vos-dev Feb 8, 2024

Wow @rparcus I tried doing some runtime code generation but it definitely wasn't faster so I scrapped that idea! Cool to see you managed to make it work! Will have to see how you did it 😄

onno-vos-dev Feb 8, 2024

@jesperes Excerpt of how the generated code looks like. Surprisingly similar to my approach which I now have to dust off again... 😅

def process_lines(<<\"Whitehorse\", ?;, rest::binary>>) do
  get_temp(rest, 411)
end

def process_lines(<<\"Ouagadougou\", ?;, rest::binary>>) do
  get_temp(rest, 412)
end

  def process_lines(<<>>), do: :ok
  def idx(0), do: \"Austin\"
def idx(1), do: \"Baghdad\"
def idx(2), do: \"Kano\"
def idx(3), do: \"Dhaka\"
def idx(4), do: \"Burnie\"

onno-vos-dev Feb 8, 2024

Great work @rparcus! 🎉 👍 🥇

Couldn't resist and pick up my generator again and figure out why it was so slow. Looking at your approach @rparcus the big difference was your usage of Code.compile_string/1 which doesn't exist in the Erlang stdlib so I resorted to writing the Erlang file and compiling that one which was super slow and hence I gave up on the idea 😢

A bit of trickery and cleaning up later and rolling my own Code.compile_string/1 and it's running fast. PR: onno-vos-dev/1brc#6

Sadly, still not faster than @jesperes approach on my laptop at least 😢 For some reason @jesperes is able to squeeze out an additional 150% CPU...

generate_and_compile(LkupTable) ->
  Start = init_module(),
  ProcessLines = process_lines(LkupTable),
  End = remaining_parts_of_module(),
  Full = <<Start/binary, ProcessLines/binary, End/binary>>,
  {ok, Tokens, _} = erl_scan:string(binary_to_list(Full)),
  {_, ParsedForms} = lists:foldl(fun(Token, {TokenAcc, FormsAcc}) ->
                                   case Token of
                                     {dot,_} ->
                                       FormTokens = lists:reverse([Token | TokenAcc]),
                                       {ok, Forms} = erl_parse:parse_form(FormTokens),
                                       {[], [Forms | FormsAcc]};
                                     Token ->
                                       {[Token | TokenAcc], FormsAcc}
                                   end
                                 end, {[], []}, Tokens),
  {ok, brc_workers, Bin} = compile:forms(lists:reverse(ParsedForms)),
  {module, brc_workers} = code:load_binary(brc_workers, "nofile", Bin).

Results on my Mac M1 Pro:

50M Results:

Contributor	Time (50M)	CPU% (50M)	Mem kb (50M)
jesperes	00:00.55	679%	397232
rparcus-with_explorer	00:01.10	594%	1275984
rparcus-runtime_compiled	00:01.12	420%	609072
garazdawi	00:01.61	814%	150736
onno-vos-dev	00:01.82	208%	522032
IceDragon200	00:01.99	660%	755280
mneumann	00:02.71	768%	337120
rrcook	00:06.46	830%	152624
rparcus-better_file_reader	00:08.95	720%	801392
rparcus-just_elixir	00:09.62	276%	156080
stevensonmt	00:10.89	298%	144624
andypho	00:12.76	306%	111504
mnfloresv	00:12.78	345%	134976
Kartstig	00:13.85	343%	743776

1B Results:

Contributor	Time (1B)	CPU% (1B)	Mem kb (1B)
jesperes	00:07.72	901%	426160
onno-vos-dev	00:08.75	757%	649600
rparcus-runtime_compiled	00:09.67	762%	1059200
rparcus-with_explorer	00:19.83	650%	19607376
IceDragon200	00:27.65	878%	1687376
garazdawi	00:29.15	893%	158272
mneumann	00:45.32	890%	406688
rrcook	02:21.86	939%	2487712
rparcus-better_file_reader	02:52.04	754%	1660384
rparcus-just_elixir	03:05.18	287%	161936
stevensonmt	03:18.61	319%	150128
andypho	03:59.03	324%	112560
Kartstig	04:01.46	391%	13533616
mnfloresv	04:24.53	325%	138544

1BRC in Elixir #93

Implementations

AMD Ryzen 5 2600 / 32Gb DDR4 IceDragon200

Standard Data Set (420 weather stations)

50 Million

1 Billion

10k City Data Set

50 Million

1 Billion

RETIRED: Intel i7-2710QE @ 2.10Ghz / 16Gb DDR3 IceDragon200

Dirty System - While under normal usage

Clean System - Nothing else running with it

AMD Threadripper 1950X @ ?Ghz / 94.2Gb DDR4 Kartstig

Related or Good reads

Current Status

No longer relevant original post

Replies: 36 comments · 145 replies

IceDragon200 Jan 8, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 14, 2024 Author

IceDragon200 Jan 14, 2024 Author

IceDragon200 Jan 14, 2024 Author

IceDragon200 Jan 15, 2024 Author

IceDragon200 Jan 14, 2024 Author

IceDragon200 Jan 15, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 19, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 24, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 16, 2024 Author

IceDragon200 Jan 16, 2024 Author

Replies: 36 comments 145 replies

IceDragon200 Jan 8, 2024
Author

IceDragon200 Jan 16, 2024
Author

IceDragon200 Jan 14, 2024
Author

IceDragon200 Jan 14, 2024
Author

IceDragon200 Jan 14, 2024
Author

IceDragon200 Jan 15, 2024
Author

IceDragon200
Jan 14, 2024
Author

IceDragon200 Jan 15, 2024
Author

IceDragon200 Jan 16, 2024
Author

IceDragon200 Jan 16, 2024
Author

IceDragon200 Jan 16, 2024
Author

IceDragon200 Jan 16, 2024
Author

IceDragon200 Jan 19, 2024
Author

IceDragon200
Jan 16, 2024
Author

IceDragon200 Jan 24, 2024
Author

IceDragon200
Jan 16, 2024
Author

IceDragon200 Jan 16, 2024
Author

IceDragon200 Jan 16, 2024
Author