Difference between revisions of "WRF FAQ"

(What to do if I get a segmentation fault error with WRF?)
(What to do if I get a segmentation fault error with WRF?)
 
Line 69: Line 69:
  
 
you have a segmentation fault error. These can have multiple causes, the best way to find out what is causing the error is to recompile WRF using the debugging options and re-run.
 
you have a segmentation fault error. These can have multiple causes, the best way to find out what is causing the error is to recompile WRF using the debugging options and re-run.
# Clean the previous compilation in WRFV3/ with ./clean -a
+
# Clean the previous compilation in WRF/ with ./clean -a
 
# Re-run the compilation with the -d option: ./run_compile -d
 
# Re-run the compilation with the -d option: ./run_compile -d

Latest revision as of 21:44, 27 July 2021


Tips to reduce WRF outputs size

  • netCDF4 compression: you are probably using netCDF v4.0 or newer. In that case, you can enable compression directly in WRF code so the output is compressed at creation time. If you ever want an output file in classic format, you can then use the namelist option use_netcdf_classic=.true. in the &time_control section. The choice of netcdf happens at the configuration step so if you already have compiled the code, clean it up with

>

clean -a

> then follow these steps:

    • Define the NETCDF4 environment variable:

>> for csh

>>

setenv NETCDF4 1

>> for bash

>>

export NETCDF4=1
    • Configure and compile as usual.
  • output a subset of variables: WRF comes with a built-in mechanism to choose which variables to output or not. So you don't have to output all the default variables (and you can add some non-default ones). The details can be found in the | WRF User's guide
  • clean up the output file: After creation, it can be useful to clean up the output file. For example, all variables have a Time dimension even if they are constant in time (latitudes and longitudes are usually constants unless you use a moving nest). So it can save storage space to remove this dimension. For example by using NCO:

>

ncwa -a Time -v XLONG wrfout_d01_2000-01-24_12\:00\:00 time0.nc
ncks -x -v XLONG wrfout_d01_2000-01-24_12\:00\:00 no_longitude.nc
ncks -A -v XLONG time0.nc no_longitude.nc
mv no_longitude.nc wrfout_d01_2000-01-24_12\:00\:00

Obviously you can list more than 1 variable at once in the commands.

What to do if WRF does not compile on Raijin?

You need to make sure that you are using the WRF version that is stored on /projects/WRF on Raijin. Then you should first try to compile using the dmpar option. For this, clean (./clean -a) and run configure again and choose option #3 (dmpar). Then WRF should compile the code without problems. WRF has not yet been successfully compiled on Raijin using the other options.

Which processor crashed in my WRF simulation?

WRF output the standard output in rsl.out and the error in rsl.error. There is a pair of files for each processor. If you are running on a large number of processors, checking each file is very time-consuming. The first thing to do when your simulation stops is to check the output file for your script. If your script to launch WRF is called script.pbs, then at the end of the job PBS will create a script.pbs.o1234567 where "1234567" is to be replaced by your job ID. Open this file and check the end. If you see a message like:

--------------------------------------------------------------------------
mpirun has exited due to process rank 50 with PID 32659 on
node r2059 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Then it means the simulation did not finish normally. The first line of the message gives the processor number that finished abnormally: "due to process rank 50". So in this case, the problem happened on the processor 50 and the error message should then be located in rsl.error.0050. Note you might want to check rsl.out.0050 as well just in case a message was written to the output first.

What to do if I get a segmentation fault error with WRF?

If in rsl.error you get a message like this one:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 00000000017F59A1 Unknown Unknown Unknown
wrf.exe 00000000017F3655 Unknown Unknown Unknown
wrf.exe 0000000001ACFA07 Unknown Unknown Unknown
wrf.exe 0000000001387C8A Unknown Unknown Unknown
wrf.exe 0000000000EC4CDD Unknown Unknown Unknown
wrf.exe 0000000000DCCAB7 Unknown Unknown Unknown

you have a segmentation fault error. These can have multiple causes, the best way to find out what is causing the error is to recompile WRF using the debugging options and re-run.

  1. Clean the previous compilation in WRF/ with ./clean -a
  2. Re-run the compilation with the -d option: ./run_compile -d