著者: Hugues Levasseur 日付: To: Patrice Karatchentzeff CC: GUILDE 題目: Re: Problème load average Debian
Patrick,
La log apache en PJ
Le memckek (pas sur d'avoir bien compris ce que tu voulais)
11:56:20 #>valgrind --tool=memcheck /usr/sbin/apache2
==26185== Memcheck, a memory error detector
==26185== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==26185== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==26185== Command: /usr/sbin/apache2
==26185==
[Fri Jul 26 11:56:34.504742 2019] [core:warn] [pid 26185] AH00111: Config
variable ${APACHE_RUN_DIR} is not defined
apache2: Syntax error on line 80 of /etc/apache2/apache2.conf: DefaultRuntimeDir
must be a valid directory, absolute or relative to ServerRoot
==26185==
==26185== HEAP SUMMARY:
==26185== in use at exit: 4,293 bytes in 10 blocks
==26185== total heap usage: 28 allocs, 18 frees, 17,698 bytes allocated
==26185==
==26185== LEAK SUMMARY:
==26185== definitely lost: 0 bytes in 0 blocks
==26185== indirectly lost: 0 bytes in 0 blocks
==26185== possibly lost: 0 bytes in 0 blocks
==26185== still reachable: 4,293 bytes in 10 blocks
==26185== suppressed: 0 bytes in 0 blocks
==26185== Rerun with --leak-check=full to see details of leaked memory
==26185==
==26185== For counts of detected and suppressed errors, rerun with: -v
==26185== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Merci
On 26/07/2019 11:05, Patrice Karatchentzeff wrote: > Salut
>
> Donne les logs apache correspondant.
>
> Ça sent assez mauvais ce genre de message du noyau. Si tu peux, fais
> une passe de memcheck sur les barrettes de mémoire.
>
>
> Le ven. 26 juil. 2019 à 11:00, Hugues Levasseur
> <hugues.levasseur@???> a écrit :
>> Salut la guilde,
>>
>>
>> J'ai un mystère ... bien mystérieux* sur un serveur Debian.
>>
>> Si quelqu'un à une idée pour m'aider à comprendre le problème : je suis preneur.
>>
>>
>> D'avance merci
>>
>> Hugues
>>
>> * Principalement parce que je suis développeur, pas administrateur système :-)
>>
>> ---------------------------------------------------
>>
>> *Le contexte : *
>>
>> 2 serveurs dédiés (Debian 9), sur lesquels tournent des applications en Apache /
>> PHP / MariaDB
>>
>> L'un chez OVH, l'autre chez Online.
>>
>> Ils sont synchronisés entre eux par GlusterFS pour un point de montage commun et
>> par le plugin Galera de MariaDB pour les bases
>>
>> Tout roule depuis plus d'1 an
>>
>> Les serveurs sont à jour de mises à jour (dépôts stretch main)
>>
>> *
>> *
>>
>> *Les symptômes : *
>>
>> Depuis 3 jours, Le serveur B, à 3h du matin ... part en couilles sucette.
>>
>> La surveillance Nagios (sur une 3eme machine) me lève des alarmes de LOAD
>> AVERAGE CRITICAL : 15.02,15.06,15.00
>>
>> Et, bien sur, les applications deviennent - quasiment - inutilisables.
>>
>> A chaque fois un reboot résous le problème ... jusqu’à la prochaine fois
>>
>>
>> *Les - tentatives - d'analyse :*
>>
>> - Aucune tache cron ne se lance à 3h du mat' (Y'en a chaque heure, mais aucune
>> spécifiquement à 3h)
>>
>> - htop voit le load average, mais pas les process en cause
>>
>> Pour essayer de comprendre ce qui se passe à à 3h du mat :
>>
>> - cat /var/log/syslog.1 |grep "Jul 25 03:" > syslog.txt
>>
>> Ce que je comprends, c'est que Apache se met à redémarrer en boucle (Lignes 5 &
>> 127 de la PJ)
>>
>>
>> je met aussi tout les /var/log/* qui ont "quelque chose à 3h du mat'" :
>>
>> - cat /var/log/message |grep "Jul 25 03:" > message.txt
>>
>> - cat /var/log/kern.log |grep "Jul 25 03:" > kern.log.txt
>>
>> - cat /var/log/daemon.log |grep "Jul 25 03:" > daemon.log.txt
>>
>>
>>
> [Thu Jul 25 06:41:39.663607 2019] [http2:warn] [pid 5701] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.
[Thu Jul 25 06:41:39.733975 2019] [mpm_prefork:notice] [pid 5701] AH00163: Apache/2.4.25 (Debian) OpenSSL/1.0.2r configured -- resuming normal operations
[Thu Jul 25 06:41:39.733990 2019] [core:notice] [pid 5701] AH00094: Command line: '/usr/sbin/apache2'
[Thu Jul 25 08:42:50.174696 2019] [mpm_prefork:notice] [pid 5701] AH00169: caught SIGTERM, shutting down
[Thu Jul 25 08:45:37.521277 2019] [http2:warn] [pid 1802] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.
[Thu Jul 25 08:45:37.695388 2019] [mpm_prefork:notice] [pid 1802] AH00163: Apache/2.4.25 (Debian) OpenSSL/1.0.2r configured -- resuming normal operations
[Thu Jul 25 08:45:37.695425 2019] [core:notice] [pid 1802] AH00094: Command line: '/usr/sbin/apache2'
[Thu Jul 25 17:56:16.544048 2019] [mpm_prefork:notice] [pid 1802] AH00169: caught SIGTERM, shutting down
[Thu Jul 25 17:58:23.267488 2019] [http2:warn] [pid 1681] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.
[Thu Jul 25 17:58:23.451271 2019] [mpm_prefork:notice] [pid 1681] AH00163: Apache/2.4.25 (Debian) OpenSSL/1.0.2r configured -- resuming normal operations
[Thu Jul 25 17:58:23.451309 2019] [core:notice] [pid 1681] AH00094: Command line: '/usr/sbin/apache2'
[Fri Jul 26 03:00:05.034942 2019] [core:warn] [pid 1681] AH00045: child process 1682 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035029 2019] [core:warn] [pid 1681] AH00045: child process 1683 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035044 2019] [core:warn] [pid 1681] AH00045: child process 1686 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035057 2019] [core:warn] [pid 1681] AH00045: child process 1693 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035070 2019] [core:warn] [pid 1681] AH00045: child process 1698 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035084 2019] [core:warn] [pid 1681] AH00045: child process 2163 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035097 2019] [core:warn] [pid 1681] AH00045: child process 2561 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035110 2019] [core:warn] [pid 1681] AH00045: child process 2835 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035123 2019] [core:warn] [pid 1681] AH00045: child process 3036 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035136 2019] [core:warn] [pid 1681] AH00045: child process 3051 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035149 2019] [core:warn] [pid 1681] AH00045: child process 3263 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035162 2019] [core:warn] [pid 1681] AH00045: child process 3279 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035176 2019] [core:warn] [pid 1681] AH00045: child process 3813 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035189 2019] [core:warn] [pid 1681] AH00045: child process 4347 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:05.035202 2019] [core:warn] [pid 1681] AH00045: child process 4362 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037482 2019] [core:warn] [pid 1681] AH00045: child process 1682 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037536 2019] [core:warn] [pid 1681] AH00045: child process 1683 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037575 2019] [core:warn] [pid 1681] AH00045: child process 1686 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037588 2019] [core:warn] [pid 1681] AH00045: child process 1693 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037600 2019] [core:warn] [pid 1681] AH00045: child process 1698 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037612 2019] [core:warn] [pid 1681] AH00045: child process 2163 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037625 2019] [core:warn] [pid 1681] AH00045: child process 2561 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037637 2019] [core:warn] [pid 1681] AH00045: child process 2835 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037649 2019] [core:warn] [pid 1681] AH00045: child process 3036 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037661 2019] [core:warn] [pid 1681] AH00045: child process 3051 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037673 2019] [core:warn] [pid 1681] AH00045: child process 3263 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037685 2019] [core:warn] [pid 1681] AH00045: child process 3279 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037698 2019] [core:warn] [pid 1681] AH00045: child process 3813 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037710 2019] [core:warn] [pid 1681] AH00045: child process 4347 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:07.037735 2019] [core:warn] [pid 1681] AH00045: child process 4362 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.039989 2019] [core:warn] [pid 1681] AH00045: child process 1682 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040034 2019] [core:warn] [pid 1681] AH00045: child process 1683 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040047 2019] [core:warn] [pid 1681] AH00045: child process 1686 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040059 2019] [core:warn] [pid 1681] AH00045: child process 1693 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040071 2019] [core:warn] [pid 1681] AH00045: child process 1698 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040083 2019] [core:warn] [pid 1681] AH00045: child process 2163 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040094 2019] [core:warn] [pid 1681] AH00045: child process 2561 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040106 2019] [core:warn] [pid 1681] AH00045: child process 2835 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040118 2019] [core:warn] [pid 1681] AH00045: child process 3036 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040129 2019] [core:warn] [pid 1681] AH00045: child process 3051 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040141 2019] [core:warn] [pid 1681] AH00045: child process 3263 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040153 2019] [core:warn] [pid 1681] AH00045: child process 3279 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040165 2019] [core:warn] [pid 1681] AH00045: child process 3813 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040177 2019] [core:warn] [pid 1681] AH00045: child process 4347 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:09.040188 2019] [core:warn] [pid 1681] AH00045: child process 4362 still did not exit, sending a SIGTERM
[Fri Jul 26 03:00:11.042425 2019] [core:error] [pid 1681] AH00046: child process 1682 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042487 2019] [core:error] [pid 1681] AH00046: child process 1683 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042530 2019] [core:error] [pid 1681] AH00046: child process 1686 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042586 2019] [core:error] [pid 1681] AH00046: child process 1693 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042623 2019] [core:error] [pid 1681] AH00046: child process 1698 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042665 2019] [core:error] [pid 1681] AH00046: child process 2163 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042695 2019] [core:error] [pid 1681] AH00046: child process 2561 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042729 2019] [core:error] [pid 1681] AH00046: child process 2835 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042762 2019] [core:error] [pid 1681] AH00046: child process 3036 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042798 2019] [core:error] [pid 1681] AH00046: child process 3051 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042825 2019] [core:error] [pid 1681] AH00046: child process 3263 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042853 2019] [core:error] [pid 1681] AH00046: child process 3279 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042887 2019] [core:error] [pid 1681] AH00046: child process 3813 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042920 2019] [core:error] [pid 1681] AH00046: child process 4347 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:11.042951 2019] [core:error] [pid 1681] AH00046: child process 4362 still did not exit, sending a SIGKILL
[Fri Jul 26 03:00:12.044102 2019] [core:error] [pid 1681] AH00047: could not make child process 1682 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044145 2019] [core:error] [pid 1681] AH00047: could not make child process 1683 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044156 2019] [core:error] [pid 1681] AH00047: could not make child process 1686 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044166 2019] [core:error] [pid 1681] AH00047: could not make child process 1693 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044177 2019] [core:error] [pid 1681] AH00047: could not make child process 1698 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044199 2019] [core:error] [pid 1681] AH00047: could not make child process 2163 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044210 2019] [core:error] [pid 1681] AH00047: could not make child process 2561 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044220 2019] [core:error] [pid 1681] AH00047: could not make child process 2835 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044230 2019] [core:error] [pid 1681] AH00047: could not make child process 3036 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044240 2019] [core:error] [pid 1681] AH00047: could not make child process 3051 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044251 2019] [core:error] [pid 1681] AH00047: could not make child process 3263 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044261 2019] [core:error] [pid 1681] AH00047: could not make child process 3279 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044271 2019] [core:error] [pid 1681] AH00047: could not make child process 3813 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044281 2019] [core:error] [pid 1681] AH00047: could not make child process 4347 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044292 2019] [core:error] [pid 1681] AH00047: could not make child process 4362 exit, attempting to continue anyway
[Fri Jul 26 03:00:12.044353 2019] [mpm_prefork:notice] [pid 1681] AH00169: caught SIGTERM, shutting down
[Fri Jul 26 03:03:12.868910 2019] [http2:warn] [pid 6384] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.
[Fri Jul 26 03:03:12.947996 2019] [mpm_prefork:notice] [pid 6384] AH00163: Apache/2.4.25 (Debian) OpenSSL/1.0.2r configured -- resuming normal operations
[Fri Jul 26 03:03:12.948037 2019] [core:notice] [pid 6384] AH00094: Command line: '/usr/sbin/apache2'
[Fri Jul 26 06:27:15.979141 2019] [mpm_prefork:notice] [pid 6384] AH00171: Graceful restart requested, doing restart