Machine learning with Steffen Mæland

The term ‘machine learning’ is on everyone’s lips these days. We talked to Young Professionals member Steffen Mæland, who is a researcher at NORSAR working on machine learning. Mæland emphasizes the benefits of machine learning for seismic monitoring and CTBTO related issues. In the video and interview below, Mæland explains what machine learning is, its usefulness, and how you can get started with machine learning.

Q: What is machine learning, exactly?

Machine learning is, in short, the collection of algorithms that learns from data. It's a subgroup of artificial intelligence, the somewhat hyped buzzword meaning decision-making computer programs. The decision is necessarily based on some set of parameters, and if these parameters are determined from data, it's machine learning.

In that sense, the good-old linear regression (y = ax + b) is a type of machine learning. By today's standards, however, the machine learning term is typically taken to imply methods that have a large number of parameters. Recent image recognition models easily have tens of millions of parameters.

Q: How is machine learning useful for seismic monitoring?

There are two main benefits: first, improved precision, and second, less work.

Improved detection capability is obviously a huge benefit for monitoring. There's been a lot of work on machine learning (ML) for phase picking over recent years, and the performance in noisy data is pretty impressive. ML detection fits very nicely in the gap between power-based detection, like the classic STA/LTA, which is insensitive to the type of event – and correlation-based, which is sensitive only to a very specific event.

However, recording more events doesn't come without its own problems, since the workload goes up for the analysts doing a manual review. At NORSAR we've implemented an ML-based ranking system into the analysis software, which allows for sorting and filtering the automatic events on their predicted importance. We still need the analysts for higher-level processing, but this way they can focus on the important events quicker and avoid spending too much time on noise.

Q: What else do you use machine learning for at NORSAR?

One interesting case is the continuous monitoring of the Åknes rock slope, a huge hillside of fractured and loose rock that might someday slide into the sea and create a massive tsunami. It's monitored by a geophone network – in addition to GPS, lasers, and every other imaginable technology – where triggered events are classified in real-time by a method called random forest, operating on various features of the data. The features are for instance signal duration, the dominant frequency, signal-to-noise ratio, and so on. You can have a look at recent events at https://www.norsar.no/r-d/safe-society/aknes/.

Another case, which is closer to the CTBTO work, is that we monitor explosions from a few selected mines near our station in Northern Norway. Some of these are located too close to each other to be separated just by the estimated epicenter, so we needed something that looks at waveform similarity. But finding a good single template event is hard when a mine has several different shafts or use ripple-fired explosions. Using convolutional neural networks is great here since they can generalize over a large number of templates and effectively aggregate similarity over many small segments of the signal – making it independent of the length of the event.

Q: People talk about ML as 'black box' methods. How do I know what goes on?

Modern ML methods are definitely complex, but at the same time, it's not magic. For detection tasks, a good approach is to use convolutional neural networks, as I mentioned, where one constructs many short, digital filters and try to optimize the response of these filters to detect the correct signal. Those who took courses on signal processing might recall that convolution is close to cross-correlation, which is a standard tool to recognize repeating seismic signals. So, the internals of such neural networks have a clear physical meaning and can be plotted and investigated.

Granted, once we start doing convolutions on top of convolutions, and have a stack of a million parameters, things won't be as clear anymore. But lots of different methods exist for visualizing and investigating complicated models, and this has really become a field of study by itself. Some cool interactive examples can be seen at https://lrpserver.hhi.fraunhofer.de/.

Q: How do I get started with machine learning?

The literal explosion of ML tools over recent years has significantly lowered the threshold for getting started. The first step is to choose a framework that implements the algorithms and the data wrangling tools you need. If you happen to be programming in Python, like the majority of ML researchers now do, most things are covered by the scikit-learn library. This one serves as a great educational tool too, because of the well-written user guide. For more neural network-oriented programming, the Keras library, built on top of Google's TensorFlow, offers a quick-yet-flexible interface building a huge variety of models. There are of course ML libraries in other languages too – MATLAB, R, and Julia all have decent implementations.

Of course, we shouldn't forget the more time-consuming part of learning how the various ML methods work. Believe me, modern ML can fail in the most infuriating ways. There are plenty of good books on the subject, many of them also available for free, such as the 800-page behemoth Deep Learning by Goodfellow, Bengio, and Courville. For an easier read, I can recommend Deep Learning with Python by F. Chollet, the guy who wrote Keras.

Another thing, if you want to get started on a very specific task that's been covered in a recent paper, make sure to check out the authors' GitHub account. In the spirit of reproducibility and open science, many, if not most, ML researchers publish their code freely.

Q: What kind of data do I need, and how much of it?

Seismology is actually a great place to do ML, because of the decade-long recording of event catalogs. The more training data, the better, and the IDC bulletins can be an excellent starting point. There is no general rule for how many data samples are required, but the complexity of the model should necessarily match the data set, to avoid running into the typical problems of making an overcomplicated model that just memorizes its inputs or a too simplistic model that can’t learn patterns. Finding the optimal balance here is typically a matter of experimenting.

The type of data you have at hand matters to which method to choose; the convolutional networks we have been talking about require ordered data such as time series or images, while for instance decision trees require tabular data, i.e., data in rows and columns. The latter can be obtained from time series by computing features such as signal-to-noise ratio, etc, as I mentioned for the Åknes case, and this can often be a good way of reducing complexity in the data.

Q: What hardware is required?

To get started, or even doing reasonably advanced stuff, you technically don't need any hardware at all – The Google Colab (https://colab.research.google.com/) service gives you access to a GPU-accelerated server for free, within limits of course. This is a so-called notebook, a webpage where one writes code and runs it in a nice, unified interface, which is perfect for experimenting. Given that it's free of charge and online-only, you probably don't want to upload confidential data, though.

The other budget option is to just use your laptop, which is actually my tool of choice for prototyping since there is no transfer time to upload data, or queueing for time on a cluster, and so on. It's going to be too slow for large models and large datasets, however, which brings us to the actual question – do I need a GPU?

Graphical Processing Units (GPUs) are computing devices that excel at doing parallel computation, such as matrix multiplication. And since training neural networks involve a lot of matrix multiplication, GPUs can speed up the training considerably. In particular, if you’re training convolutional networks, rather than working on plain tabular data, the difference is huge, and the cost of a GPU will yield returns pretty soon. Just make sure to get one with sufficient memory to match the size of your model and dataset. For one-off projects, it might be cheaper to rent a cloud solution rather than buying your own equipment, and this also requires a lot less maintenance.

Suggested readings

Mousavi, S. M., Sheng, Y., Zhu, W. & Beroza, G. C. STanford EArthquake Dataset (STEAD): A Global Data Set of Seismic Signals for AI. IEEE Access 7, 13 (2019).

Kong, Q. et al. Machine Learning in Seismology: Turning Data into Insights. Seismological Research Letters 90, 3–14 (2019).

Ross, Z. E., Meier, M.-A. & Hauksson, E. P Wave Arrival Picking and First-Motion Polarity Determination With Deep Learning. J. Geophys. Res. Solid Earth 123, 5120–5129 (2018).

Meier, M. et al. Reliable Real‐Time Seismic Signal/Noise Discrimination With Machine Learning. J. Geophys. Res. Solid Earth 124, 788–800 (2019).

Dickey, J., Borghetti, B., Junek, W. & Martin, R. Beyond Correlation: A Path‐Invariant Measure for Seismogram Similarity. Seismological Research Letters 91, 356–369 (2020).