Digital Instantaneous Precipitation Rate (DIPR) Product Converter

TL;DR: I resurrected an old project that pulls data from the National Weather Service, parses it, and produces a precipitation nowcast. While the existing code did work, it was a bit unfocused and needed to do “one thing well” instead of several things decently. The “one thing” I chose is to parse the data format that the NWS uses and convert it to common vector GIS formats. Other GIS tools can then continue processing the data for analysis or display. The code and more formal documentation are available on GitHub and crates.io.

threecast, the Original Project

I’ve always had background levels of weather nerdiness. When I was a kid, I’d sometimes sit in front of the TV and copy down the information on The Weather Channel in a little handwritten table or try to classify different cloud types as they passed overhead. My interest in meteorology never really became a true hobby or serious career path, but it’s been a consistent interest of mine for years.

Sometime in late 2019 I became interested in precipitation nowcasting, which is a niche meteorological discipline where the goal is to predict when and where rain is going to fall. The nowcasting part indicates that the time and length scales of interest are much smaller than typical regional forecasts. Nowcasting systems are usually designed to predict precipitation over the next 60 minutes with resolution on the order of one kilometer. It’s the kind of prediction that you could use to make sure you won’t get rained on during a walk, rather than deciding which day to go to the park.

Like many weather nerds of that time, I was an active user of DarkSky, a web/mobile nowcasting program. The user experience and prediction accuracy were probably the best that was available to the public for this problem at that time, and I used it more than once to make minor plan changes and avoid the rain (or get ready to observe an approaching thunderstorm). I knew that the company behind DarkSky must have been getting their data from the National Weather Service (NWS), which provides more or less all meteorological data for public consumption in the United States. This is almost always the case with weather companies, since running a nationwide network of independent weather stations isn’t reasonable when high-quality data is available for free. In the particular case of nowcasting, operating the high-power radar equipment that NWS uses to generate the relevant products is probably illegal for a private company anyway.

Then I learned that Apple had acquired DarkSky and was going to shut it down for everyone except iPhone users in a few months. This motivated me to figure out where the data was coming from and how to use it in my own software. Originally my goal was to build a free software replacement of DarkSky, but this never happened—although I guess I could still try it in the future. I quickly figured out that the radar product I needed to get was number 176 in the NWS system and was labelled Digital Instantaneous Precipitation Rate (DIPR), which is certainly a descriptive name. Even though I knew what data I wanted, it took me a long time to figure out where to actually get it. Eventually, after a lot of forum and GitHub issue searching, I found the data on an unstyled web server that presents the data in a directory tree sorted by product type and radar station. As far as I know, this is a legitimate way to get the data and may be the way to get it outside of some formal agreement with NWS.

I was pretty sure I’d found the data I wanted, but I couldn’t be certain until I saw it rendered on a map. One relatively easy way to do this was to download and run the NOAA Weather and Climate Toolkit (WCT), which is a Java applet that reads all kinds of NOAA data. Loading up a file in WCT confirmed that I’d found the right data source, but I wanted to find a software library that could do the decoding in a way that I could integrate into a program. The WCT software was also slow to load the products (a few seconds) and was also a little buggy here and there.

I found MetPy, which supports DIPR and many other NEXRAD Level III products. It was still a bit slow, though. At one point I’d written enough code to pull down the latest data from every radar station on the server, parse it, and produce a suitable raster image for eventual display in a web app. The code I wrote—which I think was mostly sane?—could just barely keep up with the data stream when parallelized across eight cores. Each radar station publishes a new data file every five minutes or so, which for about 150 radar stations gives roughly one new file every two seconds. I guess that means I was seeing about 16 cpu-seconds per file, including parsing and whatever else I was trying to do. I don’t remember how much of this was parsing time, but I think it was between 10% and 40%.

It became clear that the only practical way to improve the speed of the parsing step would be to write it myself. I was only just beginning to learn Rust at that time, but I was able to get something working based on the specification document. It was substantially faster than MetPy or WCT, so I built out additional tooling to rasterize the result into a GeoTIFF and automatically pull down the data. I even wrote a very simple optical flow algorithm to perform basic nowcasting over the coming 60 minutes, although this was only partly useful. I abandoned my idea of creating a proper replacement for DarkSky for reasons that are lost to time, but I suspect that once I solved the parsing problem I simply lost interest in the remaining web development work. I gave this iteration of the project the name threecast, since it produced forecasts on a smaller scale.

Blowing the Dust Off

I wrote threecast before starting this blog and adopting my principle of finishing projects. My typical approach to hacking on side projects at that time was to “follow my nose” and do whatever interested me. This is how I had operated since my earliest days of programming as a kid. A serious problem with this approach—for me, at least—is that my interest alone is never enough to get me to actually finish a project and ship it. Interest and excitement are abundant at the beginning of a project, but as the project becomes real I uncover its difficult and often frustrating challenges. This is where I would usually switch to the next project on my list, having used up the fun in the current one. The result was that I would have a collection of decent ideas and the beginnings of their realization, but none of them were actually done.

threecast is arguably an exception to this rule, which I attribute to its ability to hold my interest long enough to get me to actually build something that I could call complete. Still, my later decision to deliberately do one project at a time, truly finish it by some metric, and then write it up has more or less completely resolved this issue for me. Somehow the pain in the second half of a project is balanced by the desire to share a finished output online. A neat side effect of this system is that I’m more willing to accept failed or altered projects as “done”. Nobody is keeping score here, so I’ve learned that it’s okay to change the target output of a project if reality insists on it. In that way my projects have a slightly artistic aspect, although I shy away from that idea for the most part because it seems a little self-absorbed.

Since threecast was among my best efforts from the “before times” as I’ve described them, I thought it would be nice to pull it forward into this new system and give it a proper place among my other projects. This has amounted to a lot of cleanup, removal of unnecessary features, and application of about four more years of Rust experience.

New Design

Removing Features

The first improvement I made to the project was to remove everything that wasn’t the core parser. I had built separate modules for looking up radar stations by latitude / longitude coordinates, downloading data, and rasterizing the parsed data to GeoTIFF, but all of these can be done more flexibly with other tools. In particular, downloading data from the NWS web server is embarassingly easy with curl or wget and doesn’t need to be a part of the threecast tool. Rasterization to almost any target format—including GeoTIFF—is also pretty easy with gdal_rasterize after composing the correct incantation. Even the task of finding the closest radar station to a given set of coordinates, which can’t be accomplished with any single tool as far as I know, still belongs in its own script.

Improving the DIPR Product Parser

The parser itself was in decent shape and seemed to work on current files from the NWS web server.¹ My only gripe with it was that it passed most arguments by value, which resulted in a lot of cloning. Even as a novice Rustacean I could tell that this was suboptimal, since the input bytes should never need to be modified or copied, only read for the purpose of parsing the data structure. At the time, I was able to accept a minor performance and style deficiency in exchange for getting something working, which I think was the right choice and still proved faster than WCT. Now, after a few years of Rust experience, it’s pretty easy for me to find the right arrangement of ampersands to pass everything down by reference. I haven’t taken any performance measurements to compare the two approaches, but my guess is that the new minimal-allocation version is negligibly faster. It’s mostly a style improvement.

For some reason I chose to return large tuples from several inner parsing functions rather than structs, so I fixed that. These internal structs aren’t exposed in the public API, but they don’t introduce any substantial overhead due to Rust’s design and the nature of this problem, so it’s purely an improvement in code semantics. That is, a struct with a name of its own and several named fields is much more clear than a tuple with some apparently random types, some of which may be identical. Destructuring the returned values at the call site is also much clearer.

I also wasn’t handling errors very well in the old code, but now there’s a crate-specific error enum that plays nicely with the mechanisms and conventions in the language—at least, as far as I know.

Conversion to Shapefile and GeoJSON

threecast included a rasterization routine that produced GeoTIFF files. I’m sure I had some kind of reasoning for this decision at the time, but looking back on it now it seems like a suboptimal choice. Only supporting raster output with a fixed pixel size imposes artificial limitations on downstream processing because data is lost in the conversion from vector to raster representations. It’s better to keep the data in its native vector format and allow subsequent routines to decide whether rasterization is necessary and select the best parameters for the situation.

The only arguable benefit that I can see is that GeoTIFF is a single-file binary format, while Shapefile and GeoJSON—the two most common vector GIS formats—don’t meet these criteria. These are desirable properties because having one file per logical data unit seems more sensible, and binary formats are more compact and can usually be processed faster. As far as I can tell, there’s no popular format that has all three properties. (If you know of one, please email me. I’m still relatively new to GIS and may just be ignorant.)

Format	Vector or Raster?	Text or Binary?	Single-File?
GeoTIFF	Raster	Binary	Yes
Shapefile	Vector	Binary	No
GeoJSON	Vector	Text	Yes

Actually implementing conversion to Shapefile and GeoJSON was easy because of the work that others have done to support these formats in other crates. The only substantial work I had to do was to convert the internal Radial data structure, which holds the precipitation data for bins along a given azimuth, into geo polygons with associated floats. After that, conversion from geo to the target types was more or less trivial. It seems to be a common pattern in the Rust community that a given niche will have a central crate that defines basic types and traits for that problem space, and then more specific crates use these items to promote interoperation.

Comments on the crates.io Process

Having completed the improvements I described above, I called the new iteration of the project simply dipr, created a CLI tool, and went through the process of releasing it on crates.io. This was my first time releasing a crate, and I was a little apprehensive because of the write-only nature of crates.io, but I see now that it’s not that scary. The documentation for the process is lovely, as usual, and the API guidelines checklist was especially helpful because it alerted me to a few small changes that made the library nicer to use. A small example of this is C-COMMON-TRAITS, which encourages the implementation of several useful traits that the library user might want to use on public items. It’s important to do this as the library author because the Rust language doesn’t allow the user to derive foreign traits on foreign types. This was a trivial change for me that I wouldn’t have considered otherwise. It’s obvious that the publishing process is a well-worn path, and many people have collected their knowledge in these docs.

Performance

dipr is pretty fast. I’m sure it’s not as fast as it could be, but I’ve applied a handful of little tricks to get it to a respectable speed. On my machine, conversion to Shapefile for a reasonably large input file (~100 kB) takes about 130 ms with --skip-zeros and about 290 ms without. GeoJSON conversion takes about 380 ms and 850 ms, respectively. (These figures include disk I/O.) I don’t put much faith in these figures in general because my test setup isn’t well-controlled, and the --skip-zeros cases in particular are sensitive to the amount of precipitation in the input file, but the bottom line is this tool is decently fast. If I’d had this way back when I was trying to create a replacement for DarkSky, I could have run the entire system on a single-core machine and easily kept up with the data streams for all active radar stations.

Other than the option to skip bins with zero precipitation, I used two other tricks to get more speed. One was a suggestion from my friend Brady to return iterators from the conversion functions instead of vectors. This allows the caller to ask for a single element and pass it to the I/O routine right away instead of building up the entire vector first and only then moving on to I/O. Brady described this as making some of the I/O time overlap with the compute time, which reduces the overall duration. Implementing this required standing up a separate thread for writing, but Rust’s strong support for threads and message passing made this easy. Notably, this only resulted in an improvement for Shapefile conversion. Using a separate thread for writing GeoJSON actually made it slower. Maybe I should try buffering the print! calls a little rather than either calling it every time a new message arrives or only once at the very end.

The other performance trick I used was to approximate atan2 for small arguments. I noticed in the flamegraph output that the program was spending a lot of time computing atan2 during calls to destination, which figures out the latitude and longitude of a target point given a starting point, bearing, and distance. Since most of the distances are small fractions of the Earth’s circumference, I was able to pull this code out of the library and modify it to give an approximation for small inputs. I chose a threshold argument such that the maximum error should always be less than one part in 10,000. This improved the performance a lot but didn’t noticeably change the output. I could improve this further by actually doing precision analysis and figuring out what the best threshold value is.

Future Work

I might build on this project some more in the future. I don’t think dipr needs much more than it already has—although adding time information to the converted output might be nice, and making it faster could be fun—so any new work would probably use dipr as a dependency. I could always come back to building a DarkSky clone, maybe as a web frontend with WebAssembly or a mobile app. In any case, this was a nice reminder of my interest in meteorology and GIS, and it was gratifying to see how I’ve improved as an engineer and Rust user by improving my own code and finally getting it shipped.

One benefit of the government’s generally glacial adoption of new tech is that nothing that worked yesterday is likely to break tomorrow due to careless breaking changes. Maybe this is just another instance of government serving its core purpose.↩︎

Digital Instantaneous Precipitation Rate (DIPR) Product Converter

`threecast`, the Original Project

Blowing the Dust Off

New Design

Removing Features

Improving the DIPR Product Parser

Conversion to Shapefile and GeoJSON

Comments on the `crates.io` Process

Performance

Future Work

🔗threecast, the Original Project

🔗Blowing the Dust Off

🔗New Design

🔗Removing Features

🔗Improving the DIPR Product Parser

🔗Conversion to Shapefile and GeoJSON

🔗Comments on the crates.io Process

🔗Performance

🔗Future Work

`threecast`, the Original Project

Blowing the Dust Off

New Design

Removing Features

Improving the DIPR Product Parser

Conversion to Shapefile and GeoJSON

Comments on the `crates.io` Process

Performance

Future Work