Adding tops to a pandas DataFrame

When working with well log data in a pandas DataFrame, it is very likely that you’ll want to explore your data in the context of geologic zones. By adding zone labels to each row of your DataFrame, it is possible to use some of the fun and powerful features of pandas, like groupby() for stats aggregations. And while the process for adding tops to a DataFrame is not obvious, it is simple.

In this approach, I add a zone label to each row in the DataFrame by using the cut() method from the pandas library.

First we’ll load our log data using lasio and subsequently create a pandas DataFrame object from the LAS data section.

import lasio
import pandas as pd
import numpy as np

las = lasio.read('example.LAS')

This LAS file happens to have an extra header section ~T that lists tops. This is not a defined section in the LAS 2.0 standard, but nonetheless it is common to find tops information stored in a header section like this.

lasio does not parse the ~T section automatically, but it does load that section into the LASFile object and store it as a newline-delimited string in the sections dictionary.

print(las.sections['Tops'])
Out:
'STE_GEN     2879.3700\nST_LOUIS    3027.6700\nSALEM       3262.4400'

For this technique of adding tops to a DataFrame, we need a list of the top names and a list of the depths of each top. We’ll just use some simple string split()s to get the lists we need. Note that the depth values for the tops are stored in the LAS file as strings, so we’ll have to convert them to the float object type.

The depths list will need to include an arbitrarily large value as well so that the base of the deepest zone is defined. In this case we’ll just use 99,999 ft.

raw_tops_section = las.sections['Tops']


tops_names = [i.split()[0] for i in raw_tops_section.split('\n')]
tops_depths = [float(i.split()[1]) for i in raw_tops_section.split('\n')] + [99999]

print(tops_names, tops_depths)
Out:
(['STE_GEN', 'ST_LOUIS', 'SALEM'], [2879.37, 3027.67, 3262.44, 99999])

The lasio library has a convenient built-in method for creating a DataFrame object.

df = las.df()

By default that DataFrame uses the log depth as the index, which is fine for this exercise.

Now we create a label for each DataFrame row based on our tops and store it as a new column called ZONE. We can use the pandas cut() method to do so by providing our top names and depths to the bins and labels [kwargs]().

df['ZONE'] = pd.cut(df.index, bins=tops_depths, labels=tops_names)

Now we can use powerful pandas groupby()s to do powerful magic. In this example I’ll groupby the ZONE and aggregate the statistics.

df.groupby('ZONE')[['GR', 'DPHI', 'ILD']].agg(['mean', np.std]).round(2)
Out:
GR DPHI ILD
mean std mean std mean std
ZONE
STE_GEN 31.11 10.23 0.02 0.01 109.27 181.24
ST_LOUIS 39.96 10.55 0.02 0.01 167.73 169.24
SALEM 25.81 8.47 0.03 0.01 128.91 136.58

That’s one way to get tops into a DataFrame. I’d love to hear from you if you have other tricks for working with tops in a pandas DataFrame!