ReplayGain, SoundCheck and metadata

Posted in Uncategorized on June 7th, 2011 by saintdev

A couple of days ago I finally got fed up with some of my library being different volumes.  I had previously run mp3gain on some of my mp3s, and aacgain on most of my aac files.  So I had a strange mishmash of normalized and non-normalized files in my library.  I decided, screw it, I’m just going to normalize everything. So I called up our good friend find to help me out.

find music/ -iname '*.mp3' -print0 | xargs -0 mp3gain -r -k

There, that should have taken care of the mp3s, except it didn’t (also this won’t work if you want album gain). It seems that if mp3gain encounters an error in a batch of arguments, it just quits, instead of continuing to the next file. So it would get 50-60 files out of 1000+, encounter an error and xargs would go on to the next group of 1000+. So I did some searching and found this post. Which has the fix for both issues above (assuming your albums are each in their own subdirectory).

find music/ -iname '*.mp3' -execdir mp3gain -r -k "{}" +

If you run this without the -r switch, it will calculate both track and album gain and only write the tags to the mp3. With the -r switch it will modify the files permanently. Yay!

Now I did the same with my AAC files, which presented more issues. First it seems aacgain 1.8 has a problem with some tags iTunes writes, and will render the metadata unreadable by some tag readers. All the metadata is still there and some applications can read it, but others can’t and don’t give any errors. So I had to revert those changes (thanks Unison). I ended up compiling aacgain 1.9, which takes an updated copy of mp4v2 for it’s tag writing (building aacgain was a PITA by the way). This seems to have fixed the incompatibility.

Now, the only real issue I have left is that the iTunes SoundCheck tags no longer match what they should be. So, in iTunes my volumes are going to be all over the place still. This is unacceptable for me, so I went in search of something that could remedy this. I knew that there was iVolume. I had played with the trial once. It’s actually a excellent program. I would highly recommend it, except $30 in my opinion is a little too much for something like this and doesn’t work on linux. Eventually I came across Richard van den Berg’s rg2sc script. It only works with mp3s, which was ok with me, as I have for more of those than AAC files. I tried it out on one file, just to check before I used it on my whole library. I checked the tags just to be sure it worked….and now there is only a iTunNORM tag on the file! For some reason it removes all the other ID3 tags from a file when it writes the new one. Looking at the script it seems to be doing what it should, but I don’t know perl all that well.

Another issue I didn’t know about but is mentioned on Richard’s rg2sc page is by default mp3gain creates APEv2 tags. I don’t like ID3 and APE tags in the same file. I found Rasher’s that converts APEv2 ReplayGain tags to ID3v2 tags.

I’ve been meaning to try to learn Python a little better for quite a while. I know there’s a high quality tagging library for Python – Mutagen. This seemed like a good chance to learn a little more (and I could add support for MP4 while I’m at it).

My result is available on github. It handles both mp3 and mp4 files. As an added bonus, it doesn’t erase all existing tags in a file! Please yell at me for anything you find wrong with it. This was really more of a fun project to learn python, so I probably did a lot of stupid things.

So finally, what I should have been using above can now be simplified to:

find music/ -iname '*.mp3' -execdir mp3gain -s i -r -k "{}" +
find music/ -iname '*.m4a' -execdir aacgain -r -k "{}" +
find music/ -iname '*.mp3' -print0 -or -iname '*.m4a' -print0 | xargs -0

The -s i switch to mp3gain tells it to write ID3 tags instead of APE tags.

Genetic Assembly Progamming

Posted in genetic algorithms, x264 on November 24th, 2010 by saintdev

A few weeks ago there was a lot of talk about the Using genetic algorithms to find Starcraft 2 build orders article. This reminded me of Jason’s attempt to use an evolutionary algorithm to find an SSE 8×8 zigzag for x264. He ended up giving up because the solution space was just too large. I thought this might be a fun way to learn about genetic algorithms. So I set my sights on a (relatively) easy goal: to get a working 4×4 zigzag. The 4×4 case is much easier than the 8×8, as the 16 coefficients, even a human can keep track of easily.

I started by looking at genetic algorithms in general. I already knew the basics, you take a population of individuals, select two of them, mate them with each other, by crossing over part of their genome, apply random mutations to the genome, and reintroduce the children into the population. In my case our individual genes would be one instruction, and a genome/individual would be a complete program. Most of what I found referred to this as genetic programming. Most of the resources I found used a tree-based approach for this, because it is much easier to do a crossover (you just exchange pointers in the two parents, but you have to exchange complete sub-trees), mutations are equally as easy (select a random node in the tree, and change it). Inserting or deleting nodes is a little more difficult to handle.  However, this just didn’t feel to me like it had the very good chance of finding good solutions. By definition, in a tree representation, every node affects the output. So any mutation is likely to be destructive.  You don’t carry what I now know are called Introns (non-coding areas of a genome, that exist in the middle of a coding section of a gene, aka junk DNA).

I finally came across this (short) article on Wikipedia about Linear Genetic Programming. This sounded more like what I was thinking of when I thought of genetic assembly programming. After another short search I came across the book with the title Linear Genetic Programming (oddly enough!). Just reading the introduction, this is exactly what I was thinking of. Programs are represented as (surprise!) a sequence of instructions that can be easily modified. For creating children, what the authors call a 2-point crossover is used. This is actually a very simple concept, all you do is select a random point in each parent, then also pick a random length, then swap the two segments. This is the primary method used for varying the size of programs. For mutation they use what they call micro-mutations. These randomly modify a part of an instruction (either an operation, an operand, or a constant). These are not the only ideas the authors try in the book, and it is heavily experimental.

One of the ideas I hadn’t thought of the book introduced me to, is only emulating effective code. The algorithm for this is very simple. Starting at the end of the program, you look backwards for the first instruction that outputs to the destination register. Mark that instruction, remove the destination register from the list of registers we’re looking for, then add the source registers to our list. This greatly speeds up emulation, because you’re not executing instructions that have no direct effect on the program output.

Overall, this was a fun experiment. I haven’t yet got a working 4×4 zigzag, however I still have a few things to try on my TODO list that should help. A couple of the easier ideas that lead to better results for the authors was limiting the difference in crossover points to within a few instructions of each other. Limiting the length of crossovers to less than 5-10 instructions also helped, although to a lesser degree. Also, multithreading using distinct populations for each thread, exchanging the best individuals after some generations can lead to better genetic diversity. Emulation can be sped up by using intrinsics (or even assembly instructions themselves) for the instruction emulation. Possibly using a technique similar to J-I-T compilers could speed this up also. I don’t plan on working on this for a little while, while I get back to ff-aac.

I’ll get the code up somewhere in a few days. Available temporarily here. I will get a git repo up in a few days. Repo available on my github.

FFmpeg AAC encoder improvements.

Posted in aac, audio, ffmpeg, lame on August 14th, 2010 by saintdev

A few weeks ago I decided to start on a new project; improving the FFmpeg AAC encoder. Currently it is in a very sorry state. Alex Converse has done quite a bit of work to get it usable. It still has lots of issues, anything more than one channel being the largest of these. Most of which I am not very confident I can currently handle.

When I started this project I knew absolutely nothing AAC, audio encoding, or psychoacoustics. So I decided the best place to start was with something proven. LAME is known to be one of the best mp3 encoders, so why not start porting it’s features over. The LAME psymodel seemed as good a place as any to start. I’ve mostly completed porting attack detection over, and it is quite a bit better than the current 3GPP-inspired implementation. You can see the results yourself in the images below.

Click to embiggen.

Current (3GPP) attack detection.

The same frame as encoded by iTunes

FFmpeg with LAME inspired attack detection.

This example is just one frame from the “castanets” sample. However, as you can see the current attack detection completely misses the attack. iTunes gets it just right, and now, so does FFmpeg! From the samples I have been testing with we are almost matching iTunes for window decisions (grouping is another matter). While the 3GPP implementation is mostly getting things wrong. I don’t think this is a problem with the 3GPP model itself, more than likely it is just the FFmpeg implementation. I’ve measured visually, and the LAME model appears to be accurate down to the third of a short block (~42 samples).

For those of you that are curious why I did not include Nero, here is why. Nero’s attack detection kind of sucks. It quite often misjudges which block to switch on , causing it to enter a short sequence too early. This even happens on ‘easy’ sources like castanets. Also, if you look at the groupings it uses, often where it detected the attack is not where it actually occurs, this is not shown in the sample below (it gets it correct in this case). If you look at the two following frames, which also have attacks, it groups them completely wrong.

Castanets as encoded by Nero. Note the attack is at the end of the window. Nero has a delay that is not an even multiple of the block-size.

This is not yet committed to FFmpeg, it is currently hacked into the 3GPP psymodel for testing. I am currently working on the analysis portion of the LAME psymodel. I would like to keep the two models separate (and maybe fix up the 3GPP implementation at some point). Thanks to Alex, and Kostya for all the stupid questions they have answered, and all help they have given me so far.

UPDATE: Now committed! Please play around with it, and let me know your thoughts.

PS: If you are curious about the screenshots, they were generated with Alex’ just-released AACX.