2025-04-30
TL;DR: I resurrected an old project that pulls data from the National Weather Service, parses it, and produces a precipitation nowcast. While the existing code did work, it was a bit unfocused and needed to do “one thing well” instead of several things decently. The “one thing” I chose is to parse the data format that the NWS uses and convert it to common vector GIS formats. Other GIS tools can then continue processing the data for analysis or display. The code and more formal documentation are available on GitHub and crates.io.
threecast
, the Original
ProjectI’ve always had background levels of weather nerdiness. When I was a kid, I’d sometimes sit in front of the TV and copy down the information on The Weather Channel in a little handwritten table or try to classify different cloud types as they passed overhead. My interest in meteorology never really became a true hobby or serious career path, but it’s been a consistent interest of mine for years.
Sometime in late 2019 I became interested in precipitation nowcasting, which is a niche meteorological discipline where the goal is to predict when and where rain is going to fall. The nowcasting part indicates that the time and length scales of interest are much smaller than typical regional forecasts. Nowcasting systems are usually designed to predict precipitation over the next 60 minutes with resolution on the order of one kilometer. It’s the kind of prediction that you could use to make sure you won’t get rained on during a walk, rather than deciding which day to go to the park.
Like many weather nerds of that time, I was an active user of DarkSky, a web/mobile nowcasting program. The user experience and prediction accuracy were probably the best that was available to the public for this problem at that time, and I used it more than once to make minor plan changes and avoid the rain (or get ready to observe an approaching thunderstorm). I knew that the company behind DarkSky must have been getting their data from the National Weather Service (NWS), which provides more or less all meteorological data for public consumption in the United States. This is almost always the case with weather companies, since running a nationwide network of independent weather stations isn’t reasonable when high-quality data is available for free. In the particular case of nowcasting, operating the high-power radar equipment that NWS uses to generate the relevant products is probably illegal for a private company anyway.
Then I learned that Apple had acquired DarkSky and was going to shut it down for everyone except iPhone users in a few months. This motivated me to figure out where the data was coming from and how to use it in my own software. Originally my goal was to build a free software replacement of DarkSky, but this never happened—although I guess I could still try it in the future. I quickly figured out that the radar product I needed to get was number 176 in the NWS system and was labelled Digital Instantaneous Precipitation Rate (DIPR), which is certainly a descriptive name. Even though I knew what data I wanted, it took me a long time to figure out where to actually get it. Eventually, after a lot of forum and GitHub issue searching, I found the data on an unstyled web server that presents the data in a directory tree sorted by product type and radar station. As far as I know, this is a legitimate way to get the data and may be the way to get it outside of some formal agreement with NWS.
I was pretty sure I’d found the data I wanted, but I couldn’t be certain until I saw it rendered on a map. One relatively easy way to do this was to download and run the NOAA Weather and Climate Toolkit (WCT), which is a Java applet that reads all kinds of NOAA data. Loading up a file in WCT confirmed that I’d found the right data source, but I wanted to find a software library that could do the decoding in a way that I could integrate into a program. The WCT software was also slow to load the products (a few seconds) and was also a little buggy here and there.
I found MetPy, which supports DIPR and many other NEXRAD Level III products. It was still a bit slow, though. At one point I’d written enough code to pull down the latest data from every radar station on the server, parse it, and produce a suitable raster image for eventual display in a web app. The code I wrote—which I think was mostly sane?—could just barely keep up with the data stream when parallelized across eight cores. Each radar station publishes a new data file every five minutes or so, which for about 150 radar stations gives roughly one new file every two seconds. I guess that means I was seeing about 16 cpu-seconds per file, including parsing and whatever else I was trying to do. I don’t remember how much of this was parsing time, but I think it was between 10% and 40%.
It became clear that the only practical way to improve the speed of
the parsing step would be to write it myself. I was only just beginning
to learn Rust at that time, but I was able to get something working
based on the specification
document. It was substantially faster than MetPy or WCT, so I built
out additional tooling to rasterize the result into a GeoTIFF and
automatically pull down the data. I even wrote a very simple optical
flow algorithm to perform basic nowcasting over the coming 60 minutes,
although this was only partly useful. I abandoned my idea of creating a
proper replacement for DarkSky for reasons that are lost to time, but I
suspect that once I solved the parsing problem I simply lost interest in
the remaining web development work. I gave this iteration of the project
the name threecast
, since it produced forecasts on a
smaller scale.
I wrote threecast
before starting this blog and adopting
my principle of finishing projects. My typical approach to hacking on
side projects at that time was to “follow my nose” and do whatever
interested me. This is how I had operated since my earliest days of
programming as a kid. A serious problem with this approach—for me, at
least—is that my interest alone is never enough to get me to actually
finish a project and ship it. Interest and excitement are abundant at
the beginning of a project, but as the project becomes real I uncover
its difficult and often frustrating challenges. This is where I would
usually switch to the next project on my list, having used up the fun in
the current one. The result was that I would have a collection of decent
ideas and the beginnings of their realization, but none of them were
actually done.
threecast
is arguably an exception to this rule, which I
attribute to its ability to hold my interest long enough to get me to
actually build something that I could call complete. Still, my
later decision to deliberately do one project at a time, truly finish it
by some metric, and then write it up has more or less completely
resolved this issue for me. Somehow the pain in the second half of a
project is balanced by the desire to share a finished output online. A
neat side effect of this system is that I’m more willing to accept
failed or altered projects as “done”. Nobody is keeping score here, so
I’ve learned that it’s okay to change the target output of a project if
reality insists on it. In that way my projects have a slightly artistic
aspect, although I shy away from that idea for the most part because it
seems a little self-absorbed.
Since threecast
was among my best efforts from the
“before times” as I’ve described them, I thought it would be nice to
pull it forward into this new system and give it a proper place among my
other projects. This has amounted to a lot of cleanup, removal of
unnecessary features, and application of about four more years of Rust
experience.
The first improvement I made to the project was to remove everything
that wasn’t the core parser. I had built separate modules for looking up
radar stations by latitude / longitude coordinates, downloading data,
and rasterizing the parsed data to GeoTIFF, but all of these can be done
more flexibly with other tools. In particular, downloading data from the
NWS web server is embarassingly easy with curl
or
wget
and doesn’t need to be a part of the
threecast
tool. Rasterization to almost any target
format—including GeoTIFF—is also pretty easy with
gdal_rasterize
after composing the correct incantation.
Even the task of finding the closest radar station to a given set of
coordinates, which can’t be accomplished with any single tool as far as
I know, still belongs in its own script.
The parser itself was in decent shape and seemed to work on current files from the NWS web server.1 My only gripe with it was that it passed most arguments by value, which resulted in a lot of cloning. Even as a novice Rustacean I could tell that this was suboptimal, since the input bytes should never need to be modified or copied, only read for the purpose of parsing the data structure. At the time, I was able to accept a minor performance and style deficiency in exchange for getting something working, which I think was the right choice and still proved faster than WCT. Now, after a few years of Rust experience, it’s pretty easy for me to find the right arrangement of ampersands to pass everything down by reference. I haven’t taken any performance measurements to compare the two approaches, but my guess is that the new minimal-allocation version is negligibly faster. It’s mostly a style improvement.
For some reason I chose to return large tuples from several inner parsing functions rather than structs, so I fixed that. These internal structs aren’t exposed in the public API, but they don’t introduce any substantial overhead due to Rust’s design and the nature of this problem, so it’s purely an improvement in code semantics. That is, a struct with a name of its own and several named fields is much more clear than a tuple with some apparently random types, some of which may be identical. Destructuring the returned values at the call site is also much clearer.
I also wasn’t handling errors very well in the old code, but now there’s a crate-specific error enum that plays nicely with the mechanisms and conventions in the language—at least, as far as I know.
threecast
included a rasterization routine that produced
GeoTIFF files. I’m sure I had some kind of reasoning for this
decision at the time, but looking back on it now it seems like a
suboptimal choice. Only supporting raster output with a fixed pixel size
imposes artificial limitations on downstream processing because data is
lost in the conversion from vector to raster representations. It’s
better to keep the data in its native vector format and allow subsequent
routines to decide whether rasterization is necessary and select the
best parameters for the situation.
The only arguable benefit that I can see is that GeoTIFF is a single-file binary format, while Shapefile and GeoJSON—the two most common vector GIS formats—don’t meet these criteria. These are desirable properties because having one file per logical data unit seems more sensible, and binary formats are more compact and can usually be processed faster. As far as I can tell, there’s no popular format that has all three properties. (If you know of one, please email me. I’m still relatively new to GIS and may just be ignorant.)
Format | Vector or Raster? | Text or Binary? | Single-File? |
---|---|---|---|
GeoTIFF | Raster | Binary | Yes |
Shapefile | Vector | Binary | No |
GeoJSON | Vector | Text | Yes |
In short: vector, binary, or single-file; choose two.
Actually implementing conversion to Shapefile and GeoJSON was easy
because of the work that others have done to support these formats in other crates. The only substantial
work I had to do was to convert the internal Radial
data
structure, which holds the precipitation data for bins along a given
azimuth, into geo
polygons with
associated floats. After that, conversion from geo
to the
target types was more or less trivial. It seems to be a common pattern
in the Rust community that a given niche will have a central crate that
defines basic types and traits for that problem space, and then more
specific crates use these items to promote interoperation.
crates.io
ProcessHaving completed the improvements I described above, I called the new
iteration of the project simply dipr
, created a CLI tool,
and went through the process of releasing it on crates.io. This was my
first time releasing a crate, and I was a little apprehensive because of
the write-only nature of crates.io, but I see now that it’s not that
scary. The documentation
for the process is lovely, as usual, and the API
guidelines checklist was especially helpful because it alerted me to
a few small changes that made the library nicer to use. A small example
of this is C-COMMON-TRAITS
,
which encourages the implementation of several useful traits that the
library user might want to use on public items. It’s important to do
this as the library author because the Rust language doesn’t allow the
user to derive foreign traits on foreign types. This was a trivial
change for me that I wouldn’t have considered otherwise. It’s obvious
that the publishing process is a well-worn path, and many people have
collected their knowledge in these docs.
dipr
is pretty fast. I’m sure it’s not as fast as it
could be, but I’ve applied a handful of little tricks to get it to a
respectable speed. On my machine, conversion to Shapefile for a
reasonably large input file (~100 kB) takes about 130 ms with
--skip-zeros
and about 290 ms without. GeoJSON conversion
takes about 380 ms and 850 ms, respectively. (These figures include disk
I/O.) I don’t put much faith in these figures in general because my test
setup isn’t well-controlled, and the --skip-zeros
cases in
particular are sensitive to the amount of precipitation in the input
file, but the bottom line is this tool is decently fast. If I’d had this
way back when I was trying to create a replacement for DarkSky, I could
have run the entire system on a single-core machine and easily kept up
with the data streams for all active radar stations.
Other than the option to skip bins with zero precipitation, I used
two other tricks to get more speed. One was a suggestion from my friend
Brady to return iterators from
the conversion functions instead of vectors. This allows the caller to
ask for a single element and pass it to the I/O routine right away
instead of building up the entire vector first and only then moving on
to I/O. Brady described this as making some of the I/O time overlap with
the compute time, which reduces the overall duration. Implementing this
required standing up a separate thread for writing, but Rust’s strong
support for threads and message passing made this easy. Notably, this
only resulted in an improvement for Shapefile conversion. Using a
separate thread for writing GeoJSON actually made it slower. Maybe I
should try buffering the print!
calls a little rather than
either calling it every time a new message arrives or only once at the
very end.
The other performance trick I used was to approximate
atan2
for small arguments. I noticed in the flamegraph output
that the program was spending a lot of time computing atan2
during calls to destination
,
which figures out the latitude and longitude of a target point given a
starting point, bearing, and distance. Since most of the distances are
small fractions of the Earth’s circumference, I was able to pull this
code out of the library and modify it to give an approximation for small
inputs. I chose a threshold argument such that the maximum error should
always be less
than one part in 10,000. This improved the performance a lot but
didn’t noticeably change the output. I could improve this further by
actually doing precision analysis and figuring out what the best
threshold value is.
I might build on this project some more in the future. I don’t think
dipr
needs much more than it already has—although adding
time information to the converted output might be nice, and making it
faster could be fun—so any new work would probably use dipr
as a dependency. I could always come back to building a DarkSky clone,
maybe as a web frontend with WebAssembly or a mobile app. In any case,
this was a nice reminder of my interest in meteorology and GIS, and it
was gratifying to see how I’ve improved as an engineer and Rust user by
improving my own code and finally getting it shipped.
One benefit of the government’s generally glacial adoption of new tech is that nothing that worked yesterday is likely to break tomorrow due to careless breaking changes. Maybe this is just another instance of government serving its core purpose.↩︎