When working with well log data in a
DataFrame, it is very likely that you’ll want to explore your data in the context of geologic zones. By adding zone labels to each row of your
DataFrame, it is possible to use some of the fun and powerful features of
groupby() for stats aggregations. And while the process for adding tops to a
DataFrame is not obvious, it is simple.
In this approach, I add a zone label to each row in the
DataFrame by using the
cut() method from the
First we’ll load our log data using
lasio and subsequently create a
DataFrame object from the LAS data section.
import lasio import pandas as pd import numpy as np las = lasio.read('example.LAS')
This LAS file happens to have an extra header section
~T that lists tops. This is not a defined section in the LAS 2.0 standard, but nonetheless it is common to find tops information stored in a header section like this.
lasio does not parse the
~T section automatically, but it does load that section into the
LASFile object and store it as a newline-delimited string in the
Out: 'STE_GEN 2879.3700\nST_LOUIS 3027.6700\nSALEM 3262.4400'
For this technique of adding tops to a
DataFrame, we need a list of the top names and a list of the depths of each top. We’ll just use some simple string
split()s to get the lists we need. Note that the depth values for the tops are stored in the LAS file as strings, so we’ll have to convert them to the
float object type.
The depths list will need to include an arbitrarily large value as well so that the base of the deepest zone is defined. In this case we’ll just use 99,999 ft.
raw_tops_section = las.sections['Tops'] tops_names = [i.split() for i in raw_tops_section.split('\n')] tops_depths = [float(i.split()) for i in raw_tops_section.split('\n')] +  print(tops_names, tops_depths)
Out: (['STE_GEN', 'ST_LOUIS', 'SALEM'], [2879.37, 3027.67, 3262.44, 99999])
lasio library has a convenient built-in method for creating a
df = las.df()
By default that
DataFrame uses the log depth as the index, which is fine for this exercise.
Now we create a label for each
DataFrame row based on our tops and store it as a new column called
ZONE. We can use the
cut() method to do so by providing our top names and depths to the
df['ZONE'] = pd.cut(df.index, bins=tops_depths, labels=tops_names)
Now we can use powerful
groupby()s to do powerful magic. In this example I’ll groupby the
ZONE and aggregate the statistics.
df.groupby('ZONE')[['GR', 'DPHI', 'ILD']].agg(['mean', np.std]).round(2)
That’s one way to get tops into a
DataFrame. I’d love to hear from you if you have other tricks for working with tops in a