[PATCH] Proof-of-concept of a dual PHP–Python stack
Here’s a demonstration of a Python web stack running next to the current PHP stack, such that it’s invisible to the client. This approach aims at providing a way to migrate the PHP code base to Python, one endpoint after another, with as little glue as possible to have the two backends collaborate. Since they are both mounted on the web root, Python could implement, say, /packages/ and leave the rest to PHP. As soon as PHP yields it by returning 404 on this URL, Python will take over automatically. To run it, you need python-flask and nginx. Then, you need to start PHP, Flask, and nginx, in whatever order: $ cd path/to/aurweb $ AUR_CONFIG="$PWD/conf/config" php -S 127.0.0.1:8080 -t web/html $ FLASK_APP=aurweb.wsgi flask run $ nginx -p . -c conf/nginx.conf You may then open http://localhost:8081/ and http://localhost:8081/hello to check that the former URL goes to PHP and the latter to Flask. The key concept is nginx’s proxy_next_upstream feature. We set Flask as a fallback backend, and tell nginx to use the fallback only on 404 from PHP. See http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream The main limitation of this approach is that PHP and Python need to use the same gateway protocol, probably FastCGI or HTTP. A minor caveat with this system is that the body of the 404 returned by PHP is lost, though it could contain useful information like “Package X doesn’t exist”, rather than a generic “Page not found”. Luckily, all the 404 cases are handled by 404.php, so we could port its logic to Flask and preserve the current behavior. --- aurweb/wsgi.py | 15 +++++++++++++++ conf/nginx.conf | 23 +++++++++++++++++++++++ 2 files changed, 38 insertions(+) create mode 100644 aurweb/wsgi.py create mode 100644 conf/nginx.conf diff --git a/aurweb/wsgi.py b/aurweb/wsgi.py new file mode 100644 index 00000000..fd6b67d3 --- /dev/null +++ b/aurweb/wsgi.py @@ -0,0 +1,15 @@ +from flask import Flask, request + + +def create_app(): + app = Flask(__name__) + + @app.route('/hello', methods=['GET', 'POST']) + def hello(): + return ( + f"{request.method} {request.url}\n" + f"{request.headers}" + f"{request.get_data(as_text=True)}\n" + ), {'Content-Type': 'text/plain'} + + return app diff --git a/conf/nginx.conf b/conf/nginx.conf new file mode 100644 index 00000000..8e6e4edb --- /dev/null +++ b/conf/nginx.conf @@ -0,0 +1,23 @@ +events { +} + +daemon off; +error_log /dev/stderr info; +pid nginx.pid; + +http { + access_log /dev/stdout; + + upstream aurweb { + server [::1]:8080 max_fails=0; + server 127.0.0.1:5000 backup max_fails=0; + } + + server { + listen 8081; + location / { + proxy_pass http://aurweb; + proxy_next_upstream http_404 non_idempotent; + } + } +} -- 2.25.1
On Sat, 07 Mar 2020 at 16:01:52, Frédéric Mangano-Tarumi wrote:
Here\u2019s a demonstration of a Python web stack running next to the current PHP stack, such that it\u2019s invisible to the client.
This approach aims at providing a way to migrate the PHP code base to Python, one endpoint after another, with as little glue as possible to have the two backends collaborate. Since they are both mounted on the web root, Python could implement, say, /packages/ and leave the rest to PHP. As soon as PHP yields it by returning 404 on this URL, Python will take over automatically. [...] --- aurweb/wsgi.py | 15 +++++++++++++++ conf/nginx.conf | 23 +++++++++++++++++++++++ 2 files changed, 38 insertions(+) create mode 100644 aurweb/wsgi.py create mode 100644 conf/nginx.conf
Thanks! I like the approach. I wonder what the performance impact of always querying the Python backend first would be, though, especially at the beginning when most requests are expected to yield a 404. Alternatively, would it make sense to use multiple location blocks and use the right upstream based on matching the path against a predefined set of patterns? It would add some additional maintenance work but since the overall plan is to migrate everything to Python eventually, it would exist only temporarily. I guess we could use a similar approach if we ever wanted to decouple certain endpoints completely and make them a separate app (optionally sharing some code with the "main" backend). For an actual first patch to be merged, I suggest porting the RPC interface which is rather small and largely independent from other parts of the code. This patch should also add instructions to the documentation: both INSTALL and doc/maintenance.txt need to be updated. Maybe also add a note to README.md.
Lukas Fleischer [2020-03-11 19:44:37 -0400]
Thanks! I like the approach. I wonder what the performance impact of always querying the Python backend first would be, though, especially at the beginning when most requests are expected to yield a 404.
One way or the other, I don’t think it’s worth worrying about since most 404 consist of accessing a local socket and exchanging a few KB. Hardly any disk or database access is performed. Best way to be sure is to measure it under load though. Are the AUR servers often overloaded?
Alternatively, would it make sense to use multiple location blocks and use the right upstream based on matching the path against a predefined set of patterns? It would add some additional maintenance work but since the overall plan is to migrate everything to Python eventually, it would exist only temporarily.
I couldn’t find a smart way to do it without turning it into a maintenance burden. Beside, I can’t see any advantage over the fallback approach, except a performance speedup which I believe would be irrelevant.
For an actual first patch to be merged, I suggest porting the RPC interface which is rather small and largely independent from other parts of the code.
Sure! But first, are there other approaches you would like to try out before we begin the serious work? Also, I’d like to make a proposal about regression testing to limit the best we can potential bugs introduced by the rewrite.
On Fri, 13 Mar 2020 at 14:13:56, Frédéric Mangano-Tarumi wrote:
Lukas Fleischer [2020-03-11 19:44:37 -0400]
Thanks! I like the approach. I wonder what the performance impact of always querying the Python backend first would be, though, especially at the beginning when most requests are expected to yield a 404.
One way or the other, I don\u2019t think it\u2019s worth worrying about since most 404 consist of accessing a local socket and exchanging a few KB. Hardly any disk or database access is performed. Best way to be sure is to measure it under load though. Are the AUR servers often overloaded?
Being overloaded is a relative term. Yes, the AUR servers are often under heavy load, with millions of requests every day.
Alternatively, would it make sense to use multiple location blocks and use the right upstream based on matching the path against a predefined set of patterns? It would add some additional maintenance work but since the overall plan is to migrate everything to Python eventually, it would exist only temporarily.
I couldn\u2019t find a smart way to do it without turning it into a maintenance burden. Beside, I can\u2019t see any advantage over the fallback approach, except a performance speedup which I believe would be irrelevant.
Fair enough. We can always keep it in mind as an alternative solution in case there are any issues with the fallback approach. I also wonder whether we should ever use version with two backends in production. It might make sense to only switch over once the port has been completed.
For an actual first patch to be merged, I suggest porting the RPC interface which is rather small and largely independent from other parts of the code.
Sure! But first, are there other approaches you would like to try out before we begin the serious work?
Your proposal makes the rewrite relatively easy, has low overhead and at least one other person actively working on the rewrite (I briefly discussed it with Filipe) likes it. Unless somebody else wants to suggest an alternative approach here, I think we're good to go!
Also, I\u2019d like to make a proposal about regression testing to limit the best we can potential bugs introduced by the rewrite.
Great!
Lukas Fleischer [2020-03-15 08:25:31 -0400]
Being overloaded is a relative term. Yes, the AUR servers are often under heavy load, with millions of requests every day.
To get more concrete: are the AUR servers sometimes at 100% CPU capacity, or do they hardly ever reach the point of saturation? In other words, can we afford a 1% slowdown? What about 10%?
It might make sense to only switch over once the port has been completed.
I strongly recommend against that. First, not deploying the Python backend implies we keep developing the PHP stack too, which in turn means we either need to stop developing new features, or develop them twice. Second, deploying a wholly different codebase at once is dreadful for an actively used website. All the bugs introduced by the rewrite would pop up simultaneously. This is all the more risky if we decide to adjust features as we rewrite them. Debugging may also become harder if we can’t narrow down the commits based on the date the bug appeared. By the way, I think we should for that reason accelerate the release cycle when we start porting code.
Your proposal makes the rewrite relatively easy, has low overhead and at least one other person actively working on the rewrite (I briefly discussed it with Filipe) likes it. Unless somebody else wants to suggest an alternative approach here, I think we're good to go!
All right!
On Sun, 2020-03-15 at 14:16 +0100, Frédéric Mangano-Tarumi wrote:
It might make sense to only switch over once the port has been completed.
I strongly recommend against that.
First, not deploying the Python backend implies we keep developing the PHP stack too, which in turn means we either need to stop developing new features, or develop them twice.
Not really, we just put it in maintenance mode -- no more features just bugfixes.
Second, deploying a wholly different codebase at once is dreadful for an actively used website. All the bugs introduced by the rewrite would pop up simultaneously. This is all the more risky if we decide to adjust features as we rewrite them. Debugging may also become harder if we can’t narrow down the commits based on the date the bug appeared. By the way, I think we should for that reason accelerate the release cycle when we start porting code.
We can deploy it to aur-dev.archlinux.org and have users test it before we deploy it to the real website. The reason I don't want to deploy it to the real installation right away is mainly because we are changing the database structure. The plan is to move to SQLAlchemy (you already have a patch for this) and then start implementing the Flask app. If we mess something up in the database backend and it does not become apparent at me moment, we are screwing up the production database. Filipe Laíns
Filipe Laíns [2020-03-15 13:50:39 +0000]
Not really, we just put it in maintenance mode -- no more features just bugfixes.
That won’t benefit the end user, but if we manage to port the code fast enough I guess it’s a compromise. There’s still the second point though: stability.
We can deploy it to aur-dev.archlinux.org and have users test it before we deploy it to the real website.
Staging deployments help detect the bigger bugs, but we should still expect a little portion to be discovered only in production. How about a 2-month release cycle where aur-dev is one release ahead production?
The reason I don't want to deploy it to the real installation right away is mainly because we are changing the database structure. The plan is to move to SQLAlchemy (you already have a patch for this) and then start implementing the Flask app. If we mess something up in the database backend and it does not become apparent at me moment, we are screwing up the production database.
So far aurweb only uses SQLAlchemy Core, which is a neutral Pythonic wrapper over SQL. Unlike SQLAlchemy ORM, it does not make decisions on the structure of the database or the operations to perform. For that reason, I doubt SQLAlchemy will ever be a cause of screw up, but if we’re uncertain we can pass it the raw SQL PHP currently uses, and modernize it later on. When I made the SQLAlchemy schema, I double-checked to make sure SQLAlchemy uses the exact same structure as before, though I must admit I can’t guarantee the production database is perfectly identical to my local deployment. I can check it if you send me a structure dump of the production database. In any case, we won’t run initdb on the production server, and even if the new schema were to differ, at worst we’d get an SQL error. Databases are good at detecting that. Alembic is probably a much bigger factor of risk since database migrations do alter the structure. However, regardless of when we deploy it, every single new migration is equally risky. Please also note that Alembic uses the SQLAlchemy only for assisting migration generation, and that the SQLAlchemy migration of our Python code is completely independent from the use of Alembic as far as production is concerned. More generally, I think deploying everything at once or incrementally won’t affect the possibility of data screw ups. It’s mostly a matter of how thoroughly each individual piece of code is tested. Deploying incrementally helps focus on specific parts at a time, which I believe is an advantage over having users test everything without a plan. If our testers are overwhelmed by changes, it’s gonna make it harder for them to notice oddities.
On Sun, 15 Mar 2020 at 09:16:52, Frédéric Mangano-Tarumi wrote:
Lukas Fleischer [2020-03-15 08:25:31 -0400]
Being overloaded is a relative term. Yes, the AUR servers are often under heavy load, with millions of requests every day.
To get more concrete: are the AUR servers sometimes at 100% CPU capacity, or do they hardly ever reach the point of saturation? In other words, can we afford a 1% slowdown? What about 10%?
We are in the process of getting a new machine for the AUR and will be scaling up whenever we are reaching resource limits. So, practically, we will never reach full utilization unless we're running out of money. However, it might be better to think of optimizations as tradeoffs between total time invested and the optimization benefits (reduction in running expenses, knowledge acquisition of the person implementing the change, ...) As I mentioned before, I am fine with trying your fallback approach first and possibly optimizing later. Your explanation of the performance hit being relatively small sounds reasonable but a few simple experiments would be even better.
It might make sense to only switch over once the port has been completed.
I strongly recommend against that.
First, not deploying the Python backend implies we keep developing the PHP stack too, which in turn means we either need to stop developing new features, or develop them twice.
It also depends on how long the port will take. We should certainly prioritize the port and try to defer any feature additions for the time being.
Second, deploying a wholly different codebase at once is dreadful for an actively used website. All the bugs introduced by the rewrite would pop up simultaneously. This is all the more risky if we decide to adjust features as we rewrite them. Debugging may also become harder if we can\u2019t narrow down the commits based on the date the bug appeared. By the way, I think we should for that reason accelerate the release cycle when we start porting code.
That's a good point. I agree that introducing the rewrite gradually is a good idea.
participants (3)
-
Filipe Laíns
-
Frédéric Mangano-Tarumi
-
Lukas Fleischer